• No results found

Why is it difficult to understand statistical inference? Reflections on the opposing directions of construction and application of inference framework

N/A
N/A
Protected

Academic year: 2021

Share "Why is it difficult to understand statistical inference? Reflections on the opposing directions of construction and application of inference framework"

Copied!
18
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

Why is it difficult to understand statistical

inference? Reflections on the opposing

directions of construction and application

of inference framework

Fulya Kula

and Rüya Gökhan Koçer

Department of Research Methods, Measurement and Data Analysis, University of Twente, Drienerlolaan 5, 7522 NB Enschede, The Netherlands

Corresponding author. Email: f.wassink@utwente.nl

[Received June 2018; accepted October 2019]

Abstract

Difficulties in learning (and thus teaching) statistical inference are well reported in the literature. We argue the problem emanates not only from the way in which statistical inference is taught but also from what exactly is taught as statistical inference. What makes statistical inference difficult to understand is that it contains two logics that operate in opposite directions. There is a certain logic in the construction of the inference framework, and there is another in its application. The logic of construction commences from the population, reaches the sample through some steps and then comes back to the population by building and using the sampling distribution. The logic of application, on the other hand, starts from the sample and reaches the population by making use of the sampling distribution. The main problem in teaching statistical inference in our view is that students are taught the logic of application while the fundamental steps in the direction of construction are often overlooked. In this study, we examine and compare these two logics and argue that introductory statistical courses would benefit from using the direction of construction, which ensures that students internalize the way in which inference framework makes sense, rather than that of application.

1. Introduction

One of the crucial issues in statistics education is to explain the notion of inference to students, which refers to the process of making probabilistic statements about the entire population under scrutiny by only looking at a small part of it. This seemingly simple process, however, remains one of the most surprisingly complicated topics for students both at bachelor and graduate level courses (Ferguson, 1996). Quite often, the outcome of introductory statistics courses is that students only memorize the ‘ritual of finding significance’ (Franzosi, 2004) without comprehending or appreciating the underlying logic.

© The Author(s) 2020. Published by Oxford University Press. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.

(2)

In fact there are at least three challenging abstractions involved in the statistical inference: (1) comprehending the distinction between sample and population (which involves the ideas of sampling, sample size and the difference between statistic and parameter1); (2) grasping the meaning of distribution of a set of numbers over a range on the real line and making probabilistic interpretations of this distribution; and (3) understanding the asymptotic behaviour of such a distribution when it is used repeatedly in order to collect and display the estimates of a statistic.

Teaching inference essentially requires connecting these underlying abstractions in a logical sequence that can be converted into a coherent narrative. We argue that it is the presence of the two essential but different logics that makes statistical inference difficult to understand. These two logics are implicitly embedded into the inference framework and both connect the underlying abstractions. Namely, there is a certain logic in constructing the entire inference framework and another in the application phase. These two logics (and narratives based on them) commence their ‘story’ from opposite directions and follow different paths while connecting the abstractions involved in the inference framework. The starting point of the logic of construction is the population with its parameter of interest and this logic reaches the sample through some steps. This logic, then, after building and using the sampling distribution comes back to the population parameter. The process of application, on the other hand, is simpler: it starts from the sample and by making (often rather vague) use of sampling distribution reaches the population parameter.

The main problem in the teaching of statistical inference in our view is that students are taught the logic of application, and thus follow the direction of this particular logic, starting from sample and reaching the population. On the other hand, the direction of construction with the starting point of population, as distinct from the logic of application, follows some fundamental steps and reaches to the population again. In this respect, the two logics follow opposite directions and quite often within the logic of application the fundamental steps to comprehend inference are skipped over. The literature, while acknowledging the difficulties involved in teaching statistical inference, seeks the root cause in the way in which inference is taught. Therefore the main emphasis in the literature has been the scrutiny and comparison of relative merits of various pedagogical techniques that may facilitate comprehension of inference within the context of the logic of application. In our view the difficulties emanate from, to reiterate our point, the use of logic of application in teaching rather than that of construction. Thus, the answer to the question ‘Why is it difficult to understand statistical inference?’ is to be found not only by contemplating about ‘How inference should be taught?’ but also by focusing on ‘What it is to be taught as inference?’. Against this background the aim of the current study is to introduce a model for teaching statistical inference and unfold the contradictions in the directions of the two logics by means of this new model.

In the following pages we first provide the outline of the literature and then explain how inference could (or perhaps should) be taught by using the logic and direction of construction. The next section focuses on the logic of application and examines the steps involved. We try to exhibit the ambiguities and gaps that may appear in students’ minds with this particular logic that is designed for implementing the inference framework. We provide an illustrative example for the clarification of our model and examine thoroughly how the two directions within this instance would operate. The last part of the paper is reserved for reiterating the differences between the logic of construction and application from

1 The term ‘statistic’ in this context refers to a quantitative characteristic of a sample, and parameter refers to the same characteristic in the population. One of the key issues in statistical science is to examine the extent to which a statistic can be the substitute for a parameter.

(3)

a more generic perspective and reflecting upon the possible consequences of using something without understanding how it works.

2. Literature review

Inference has been studied among researchers, and the students’ difficulties in understanding the proce-dures of inference and reasoning the inferential ideas is widely documented (e.g.Falk & Greenbaum, 1995;Sotos, et al., 2007). Students find the statistical inference abstract and challenging. The difficulty of the process of statistical inference stem from the underlying complex and abstract concepts such as sample, population and sampling distribution (Garfield & Ben-Zvi, 2008). The incomplete understanding of these key concepts like sampling (e.g.Saldanha & Thompson, 2002,2006;Watson, 2004), distribution (e.g.Bakker & Gravemeijer, 2004), variation (e.g.Cobb et al., 2003;Moritz, 2000;Shaughnessy and Noll, 2006;Wild & Pfannkuch, 1999) and sampling distributions (e.g.delMas et al., 1999;Lipson, 2003,

Yu & Behrens, 1995) are also well reported to bring about the difficulties of inference. Studies show that even if the students have competency in the arithmetic (like computing with formulas), they still lack the interpretation of these concepts (Bright & Friel, 1998;Groth, 2003;Callingham, 1997). These stubborn difficulties are persistent and also similar among students and adults.

The components of inference were also proved to stay difficult for the learners. For instance, it was determined that while students have a good intuitive understanding of sample, it is troublesome to make transitions to the formal meaning of the term (Moritz, 2000;Watson & Moritz, 2000). The variability of samples remains a problematic issue in various age levels (Saldanha & Thompson, 2002; Well et

al., 1990).Saldanha & Thompson (2002)point out that a sample can be viewed either as a subset of a population or a quasi-proportional small-scale version of the population. Given the fact that students’ knowledge of the sampling distribution is closely related to their understanding of statistical inference (Lipson, 2003), it is crucial to emphasize that even most of the high-achieving students see the sample as a quasi-proportional small-scale version of the population (Saldanha & Thompson, 2002), which interferes their ideas of the sampling distribution.

Besides,Saldanha & Thompson (2002)showed the importance of seeing sampling as part of a repeated process with variability from sample to sample. Some misinterpretations of students stand as follows: a sampling distribution is not a distribution of sample statistics, confusion of one sample with all possible samples, confusion with the Law of Large Numbers and the Central Limit Theorem (Chance et al., 2004b;Smith, 2004). Studies found that students’ understanding of the sampling distribution idea can be supported by showing them different sampling distributions (delMas et al., 1999;Lunsford et al., 2006).

The studies on particular topics offer some valuable implications in the teaching methods of statistics. However, is still needed (Garfield & Ben-Zvi, 2007). The research studies with statistically significant evidences for special teaching methods to improve student learning are still not available. There are, however, some practical teaching implications of these studies such as a deep understanding of statistical concepts is challenging and time taking. Providing students repeated opportunities to compare and reason about the multiple representations of the same data set is another useful suggestion (Bakker & Hoffmann, 2005). Moreover, students learn better if they work actively and cooperatively in small groups with carefully designed learning activities (Giraud, 1997;Magel, 1998;Chick & Watson, 2002;Perkins & Saris, 2001).

To illustrate the abstract statistical concepts and enhance students’ understanding of these concepts, researchers widely used technology. Some software (such as Fathom and ThinkerPlots) for the teaching of statistics have been developed, introduced and followed by success in the basic statistical concepts.

(4)

As being directly involved in the process of statistics, technology appears to help students develop a positive attitude towards the basic statistical concepts (e.g. Zetterqvist, 2017). Technology is also used to provide students many populations and observing the distributions of statistics computed from samples drawn from these populations (e.g.Ben-Zvi, 2000). However, research also approaches the use of technology in a wise and careful way. Significant results are rare to observe, for instance the existence of technology to promote student learning of sampling distribution or online statistics education resulted in no significant results (e.g.Aberson et al., 2000;Bakker et al., 2009;Gunnarsson, 2001;Hong et al., 2003;Utts et al., 2003;Ward, 2004;Björnsdóttir et al., 2015). It is worthy to note that even carefully designed technological tools do not warrant students’ understanding and reasoning of abstract statistical concepts (delMas, et al., 1999;Chance et al., 2004a,2007). For instanceFrischemeier & Biehler (2015)

pointed out that the transition between the statistical and the software levels can itself be difficult even for undergraduate students. Teaching statistical inference should start from combining all of its contents to unify a narrative. Rather than the use of technology, it is the proper combination of the contents of inference and thus the coherent narrative offered, which would be the main solution for the learning of inference. Far from being such a solution to this problem, technology might be troublesome to create a floor for some misunderstandings.

It is nicely stated byGarfield & Ben-Zvi (2007)in their review that teachers and researchers of statistics tend to overestimate students’ understanding of basic concepts and to underestimate students’ difficulties of the same concepts. This pedagogical determination is in line with the ontological structure of statistical concepts like, for instance, inference, which includes uncertainty. Statistical inference and underlying concepts are abstract, which makes them difficult in an introductory statistics course from the point of the learner. This reflects on underestimating student difficulties in the case of statistical inference. Once these concepts are grasped it is difficult to reflect why these concepts were difficult at all. After this direction from abstract to concrete is proceeded, the process would then not be obvious (Aleksandrov et

al., 1999) to the learner. The pedagogical findings reviewed byGarfield & Ben-Zvi (2005)also point out this fact. The abstract structure of inference should be made more concrete to students. The difficulty of students to use data as evidence for conclusions (Andriessen, 2006;Berland & Reiser, 2009) directs the difficulty to start from parts (data from the sample) and make abstract conclusions for the whole (inference to the population).

It is suggested to model real-world situations (Zetterqvist, 2017) and simulate models by drawing many random samples (Garfield et al., 2012) for a better understanding of the idea of inference. There appear few studies in the literature to introduce population to students and then shift to samples (e.g.

Ben-Zvi, 2000).Ben-Zvi (2000)showed that students develop the understanding of inference by the construction of various populations and observing the distributions of statistics computed from samples drawn from these populations. In our view, for the teaching of inference the initial introduction of sample and making inferences based on this sample has important deficiencies. First of all students have no feeling of sampling distributions and, moreover, they cannot grasp the idea of inference. Thus, we think it is a good tool to start teaching inference with a population and look at many random samples of this whole.

In the current study we propose a model to represent the whole idea of statistical inference in the introductory statistics course. Our model also follows some steps, which are in line with the suggestions of the literature like presenting different samples (delMas et al., 1999;Lunsford et al., 2006), drawing many different samples from the population (Garfield et al., 2012), providing repeated opportunities for the multiple representations of the same data for the population (Bakker & Hoffmann, 2005) and using data as evidence for conclusions (Andriessen, 2006;Berland & Reiser, 2009). However, we believe our model transcends these suggestions and carries the teaching and learning of inference further. We believe

(5)

Fig. 1. Comparing the steps of construction and application of statistical inference.

such a model might be a powerful tool for the educational settings to make the mentioned concepts more concrete to the learner. Another useful gain for the learner would be to make sense of the whole idea of statistical inference in one line. Moreover, with the help of our model we represent two different logics about teaching inference and how these directions are different from and opposing to each other. Hence, our model might also be seen as demonstrating the roots of students’ difficulties. With the help of such a model we believe the abstract process of inference can be made more concrete to the learners. In the next section we describe our model deeply and make explicit two logics of inference and their contradictory directions; the one that inference is built on (logic of construction) and the one that is used not only in the research settings but also in the traditional pedagogical settings (logic of application).

3. Opposing directions of construction and application

In this section we examine the logics of construction and application in order to clarify our argument.

3.1.

The logic and direction of construction

The construction of inference starts with the whole population, which can be seen in the form of a distribution at the top of the left panel inFig. 1, with its central value as (in this case) the parameter of interest (see also the left column of Table 1). Thus, in the first step in the construction direction, the learner engages with the meaning of ‘population’ while visualizing it in the form of a distribution, which encapsulates the parameter of interest. Step 2 consists of two stages: understanding the meaning of sampling (step 2a) and then grasping that many random samples with the same sample size are to be taken from the population (step 2b). Step 3 requires to realize both the identicalness of these ‘many’ samples in terms of their size (3a) and the distinctiveness in terms of the elements they contain (3b). This means that these ‘many’ samples may coincidently contain some common elements while each one of them is sufficiently different to generate its own distribution. The distinct ‘sample distributions’ below each box depict the discrepancy in distribution shapes.

(6)

Sampling distribution as being distinct from the distribution of individual samples, that is, sample distributions is visualized in the fourth step. A particular value from each sample (i.e. statistic, which is in this case the mean value) is estimated and placed on the real line (step 4a). These estimated values as statistics of each sample accumulate in close proximity in each arbitrary interval defined on the real line. Step 4b points out that the distribution of the mentioned values would be approximately normal. Here it is crucial to link the pace of approximation to the (identical) size of samples and the number of samples: the higher the ‘identical number’ of different items each sample contains the faster is the pace or the higher the number of different samples of identical size again the faster would be the pace. Finally, the last phase (step 4c) introduces the parametric features of the resulting (normal) distribution, which at this point must be defined as ‘sampling distribution’. The sampling distribution would approach normality and its mean would be identical with that of the population distribution. The variance of the sampling distribution on the other hand would be equal to the population variance divided by the sample size. Through its three phases the fourth step constructs the premise on which the actual inference is performed. The name of the emerging entity itself, ‘sampling distribution’ may be confusing. Because the samples from which the sampling distribution is derived have their own distinctive distributions each of which may intuitively be called as ‘distribution of “a” sample’ or ‘sample distribution’. It is important to be aware of this possible source of confusion. After building the sampling distribution, firstly in step 5a we point out the fact that we only focus on any one of these many samples. We call this specific sample as ‘the’ sample and estimate its mean value (in this case). Secondly in the next phase, we demonstrate how this particular value would land somewhere on the real line on which the probabilities are massed by the sampling distribution (step 5b). Step 6 informs us that this value would be separated from the center of the sampling distribution by a certain distance depicted by d inFig. 1. The center of the sampling distribution is identical with that of the population distribution with which the distance d is informative. The area under the distribution curve from the dotted vertical grey line of the sample mean value to the right is demonstrated in step 7. This area represents the probability amassed over the range that is larger than the distance d from the center. Strictly speaking in a one-sided test this area would be the p-value associated with ‘the’ sample. Finally, the last step in the construction process (step 8) clarifies the meaning of this amassed probability. This is the probability of ‘the’ sample that we chose belonging to (or actually taken from) the population.

Once this underlying logic and direction (from population to sample, then to the sampling distribution and finally back to the population) is understood then it is easy to take the next mental step: ‘Imagine we know that we have a random sample taken from a population, but we don’t know the parameter value of this population’. Then if we have a guess about this value, we can test this guess by looking at the probability attributed to our sample by the inference machine. If this probability is very low then we can reason in two possible ways: (i) this very small probability has actually ‘happened’ and the sample belongs to the population that we identified by our guessed parameter value. This implies that our guess about the population parameter is correct. (ii) It is safer to assume that such a small probability would not really have ‘happened’ (because it is ‘small’) and thus the sample does not belong to the population we identified by our guessed parameter value. Thus, our parameter guess should be inaccurate. Once these choices are clarified then students may be introduced into the lexicon of statistical inference, that is, our guess about the parameter value is our null hypothesis, the decision (i) means not rejecting it, and the decision (ii) implies rejecting it. Once this point is reached in mental formation of students, then one can explain the meaning of type I and type II errors: respectively opting for (i) but being wrong and choosing (ii) and being wrong. Similarly, the controversial nature of conventional significance levels (what counts as small ‘p’) may also be easily and substantively discussed against this background.

(7)

T a ble 1. The explanations of the steps of the two lo gics Steps Steps Direction of Construction Pedagogical content Direction of Application 1 G rasping the meaning o f ‘population’ and understanding ho w distrib ution as a visualization instrument encapsulates the parameter of interest. Understanding about the population of interest and the population parameter . II 2 2a Understanding the meaning o f (random) sampling. Understanding the (random) sampling and the idea o f taking m an y random samples o f same size. III 2b Grasping the idea o f taking m an y random samples o f same size. 3 3 a C omprehending that each sample contains ex actly the same number o f observ ations. 3b Comprehending that the samples o f the same size are all dif ferent in terms o f the ex act content of observ ations that the y contain. 4 4 a Introducing the ‘sampling distrib ution’: from each sample a p articular v alue is estimated and placed on the real line as piled up. 4b Introducing the ‘sampling distrib ution’: a normal distrib ution emer ged as the v alues estimated accumulate. 4c Sampling distrib ution: the resulting (distrib ution) is called ‘sampling distrib ution’: it w ould approach normality , its central v alue, that is its mean, w ould be identical with that o f the population distrib ution, and its v ariance w ould be equal to the population v ariance di vided b y the sample size. 5 5 a Understanding the fact that the only focus is one from the man y samples (called (the) S ample in Fig. 1 ) after b uilding the sampling distrib ution and estimating the statistic for (the) S ample. Selecting a sample and estimating the statistic for this sample. I 5b Understanding the fact that the v alue estimated in step 5 a lands on the real line on which the probabilities are massed b y the sampling distrib ution. 6 G rasping that the estimate o f (the) S ample w ould be separated from the center of the sampling distrib ution by a certain distance ‘d’. 7 R ealizing that the area under the distrib ution curv e from the v ertical (dotted) line (commencing by the point at which the sample has landed) on to the right. This area represents the probability amassed o v er the range that equals to and greater than the distance d from the center . 8 G rasping the meaning o f the amassed p robability (mentioned in step 7 ): this is the probability that ‘the’ sample that w e ‘chose’ b elongs to (or actually tak en from) the population that w e scrutinize. Locating the sample on the sampling distrib ution in a number form, attaining the associated ‘p-v alue’, and d eciding whether this number is smaller than the alpha. III, IV a, IVb

(8)

3.2.

The logic and direction of application (and related difficulties)

The direction of application refers to the procedure that a researcher pursues as she uses statistical inference in practice. However, this is not an adequate framework for teaching statistical inference because it skips many crucial steps (of construction direction) that collectively make the reasoning in inference possible and comprehensible. In order to clarify this argument, let us now elaborate on the right column ofTable 1(and right panel ofFig. 1).

The application process starts from ‘the’ sample (imagine that ‘we have a sample of IQ measurements taken from employees of a company’) with often a vague reference to a population or two populations (‘the director claims that his employees are more intelligent than the average people’). This first step of application contains three sources of confusion. Firstly, ‘starting from sample’ renders the students completely oblivious to the idea that we can make inference about this (i.e. ‘the’) sample only if we can place it among many other samples with same characteristics (identical size and random). Secondly, the idea of population remains ambiguous because typically the application process aims to determine whether the given sample belongs to a projected population (in the above-mentioned example ‘people with more than usual IQ’) or it belongs to the usual population (‘people with normal IQ’). Thus, it already ‘assumes’ that students know that one can make a probabilistic statement regarding the origin of a given sample in terms of the population from which it might have been drawn. Thirdly, starting from the sample precludes to grasp the most crucial feature of the idea of sampling: choosing a set of items ‘randomly’ so that they may represent the population. Without having a clear picture of the population and thus without seeing the sample inside the population the learner cannot visualize or internalize the relationship between randomness and representation. Indeed all these ambiguities emanate from the fact that the first step of application actually refers to the fifth step of the construction. Thus, all the insights encapsulated in the previous four construction steps are skipped. The following steps in the application process typically focus on the population (step II) and link the population to the sample but only indirectly by articulating the null and alternative hypotheses (step III). These steps also involve identification of the parameter of interest. They often imply that there may be two different populations; one of which is the source of the sample but since the first four steps of construction direction are skipped, therefore how a sample may reveal its source through its location in the sampling distribution remains entirely beyond the mental horizon of students. For instance, if we inquire as to whether a particular firm employs people with more than the usual level of IQ, then the null hypothesis would probably be identified as ‘the average IQ of the entire population from which the sample is taken equals to 100 (which is the average level of ‘usual’ people’) while alternative being that it is larger than 100. Obviously, in here there is ‘the’ sample and there are two alternative population definitions but none of these populations are shown in a tangible form. Thus, once steps II and III are completed the empirical situation is supposedly translated into an analytical problem but actually the students are detached from the empirical reality and brought before an abstract formulation without understanding its constituting elements: empirical circumstance{there is a firm claiming to employ clever people} is now expressed as an analytical problem {we have a sample of unknown origin and there are two candidate populations and we want to know from which one this sample might have been taken}. And the tools ‘null’ and ‘alternative’ hypotheses are used to identify these two candidate populations, respectively: null, usual people; alternative, clever people. The idea here of course is that by finding the source of the sample (i.e. the population from which it is taken) we would be evaluating the claim that the firm employs ‘clever’ people.

This entire edifice is scattered with ambiguities for students. They might ask given we ‘know’ the sample comes from the firm, so why not the firm is ‘the’ population? Or they might also question, why do we have two populations while having a single sample? Moreover, the idea of evaluating a claim

(9)

through a sample by identifying its source population would remain obscure for a student who does not know (not having witnessed) that random samples (of equal size) would generate a distribution with quite tractable properties (sharing the same central value with the population, having a normal shape, etc.) and any one of these many samples (i.e. ‘the’ sample we have) can be located on this distribution, and this very location would allow us to identify the source population but only in a probabilistic manner. As mentioned above, this entire background, which is established in the first four steps of construction process, is entirely ‘assumed’ and thus skipped in the first phase of the application process.

Finally, we arrive at the final steps (IVa, IVb) of the application process. At this final phase of application process first ‘the’ sample is located on the sampling distribution, but this action is presented without any reference to its geometric meaning and without iterating the actual shape and features of the sampling distribution2. The usual practice is to find this ‘location’ in the form of a number (represented by the dot itself inFig. 1) in a table (or delegate the task to an automated algorithm and read it from computer output) and thereby divorce the inference process from its geometric/visual (and thus more intuitive) content. Once the point is located or more accurately the entire sample is replaced by a number found in an obscure (from the perspective of a student) table, then the associated ‘p-value’ is articulated again without any reference to geometric meaning3and the ‘only’ task that remains is to see whether this number is smaller than the alpha (taken usually as 0.05) in a one-sided test (in this case). Then follows the judgement regarding the null hypotheses (fail to reject or reject), which needs to be understood as a measure for the credibility of the claim under.

4. An illustrative example: average age of people living in a village

In this section we will demonstrate our argument by using an empirical example: ‘A researcher estimates the average age of 30 people that are randomly chosen from a village of 100 people whose population has the average age of 70. We assume that the village population does have a bimodal distribution, that is, it is not normal.

The average estimated from the sample of 30 people is 73 and the standard deviation (again estimated from the sample) is 15. The researcher would like to make inference about the average age in the entire village which is yet unknown to her. What she’s really interested in is to see whether or not the average age in this village is significantly higher than the average age in the country which is approximately 67.’ It is important to point out several things about this highly stylized example. Firstly the size of the sample (i.e. 30) is quite large relative to that of the population from which it is drawn (i.e. the village of 100 people). As well known, in those circumstances where the sample size is quite small the variance estimated from the sample would not suffice to reflect the variation in the population accurately. The sampling distribution in such a case needs to be defined as a particular t-distribution depending on degrees of freedom. But in our illustrative example, for the sake of brevity and clarity, we have a sufficiently large sample (of size 30 from a population of 100) that enables us to retain the ‘construction’ and ‘application’ logics that lead to a normally distributed sampling distribution. Another assumption is the fact that the population in our example (i.e. age of entire village community) is not normally distributed. This is because we would like to emphasize the fact that normality, at least approximately, emerges as a

2 That is, finding the location on the real line corresponding to the sample estimate whose distance from the center of the sampling distribution is the entity that represents the sample on the sampling distribution.

3 In all fairness one usually finds illustration above statistical tables that show the relationship between the actual value of the statistic to be read from the table and the probability amassed over the line segment marked.

(10)

characteristic of sampling distribution even if the population itself is not normal. Finally, the ultimate aim with the example is to test whether or not the parameter of target population (i.e. average age in entire village=70) complies with an expected/hypothesized value, rather than finding out an interval that is expected to contain the population value. This enables us to show the way in which our argument functions within a context of hypothesis testing instead of building a confidence interval. This is because in our view, despite its limitations, hypothesis testing is easier to understand for students at introductory level than the confidence intervals.

Against this background let us examine our empirical case by pursuing first the logic and direction of application and then those of construction.

4.1.

Application direction

I. We have a sample of 30 people. The mean value of their age is 73 with a standard deviation of 15. II. They were randomly chosen from the same village of 100 people, and we want to know whether this village is significantly different in terms of average from the rest of the country where the average age is 67.

III. Therefore, our null hypothesis is that the average age in the village equals to that of the entire country, which is 67 (H0:x= 67). The sample mean value, together with the corresponding standard deviation (assuming that the sample variance is a good substitute for the population variance) when combined with this hypothesized population value generates the following z-score:

z= 73−6715

30

= 6

2.74 = 2.19 on the basis of the formula z = X−μ

σn

.

IV. The threshold that marks significance at 0.05 level is 1.96. Since the z-score that we obtained is larger than this threshold, we reject our null hypothesis and conclude that the village population is significantly older than the country population.

4.2.

Construction direction

1. We are interested in the ages of the population of 100 individuals living in a village (Fig. 2). The average is 70. Each individual represented by a circle in which his/her age is written, and the overall distribution of these ages is given.

2. One can take many random samples of size 30 from this population. We show just 20 of these samples and their average values inFig. 3. In each particular sample selected individuals are marked. The purpose is to reveal ‘the randomness’: any single individual could have been part of the sample, and there are many different ways in which 30 of them could be selected to form a sample. We have only one sample (‘the’ sample) and this one sample could be any one of possible samples of size 30. As depicted inFig. 3, it turns out that ‘the sample’ that we have, which has an average value of 73, is the 16th sample.

3. Each of these samples has their own mean value and standard deviation and thus each has its own distribution (Fig. 4). These sample distributions do not need to follow any particular pattern, but cautiously one would expect them to display the general contours of the population distribution.

(11)

Fig. 2. The visualization of the population of the illustrative example.

4. Now we collect the mean values of all these 20 samples:{67, 66, 67, 65, 66, 69, 65, 67, 64, 66, 67, 70, 72, 67, 66, 73, 69, 66, 68, 69}. We can increase the size of this set by taking many more samples, each of which is of size 30 and collecting their mean values. The distribution of the resulting set would be as given inFig. 5.

This is the sampling distribution that is created by all possible samples of size 30 taken randomly from the village population. Unlike distributions of samples given inFig. 4, the sampling distribution is approximately normal despite the fact that the distribution of entire village population is not normal, its mean value (i.e. its center) equals to the mean value of the entire village population which is 70, and its standard deviation is approximately equal to the standard deviation of the village population divided by the square root of sample sizes that generate the sampling distribution, which is15/√30. The mean value of every single possible sample of 30 people taken from the village population would emerge as a point on the horizontal axis in the sampling distribution

5. ‘The’ sample we have with mean value of 73 is just one of these possible samples, and its location in the sampling distribution is depicted by a dot inFig. 5.

6. As one can see ‘the’ sample we have is detached from the center of the sampling distribution by a distance: d which equals to 3 (simply 73 – 70).

7. It is only with the help of the sampling distribution that we can attribute a probability to the event of taking a sample with mean value of at least 73 from the entire village (i.e. the sample we have) just by chance: the area under the distribution to the left of the vertical line drawn from the point that marks ‘the’ sample is actually the visual representation of the probability of drawing a sample of size 30 with a mean value of 73 and larger from the village population just by pure luck. Measuring the area we would in fact find that this probability equals approximately to 0.14.

8. This estimation would allow us to make the following statement: the probability of taking a sample of 30 people with a mean value 73 or higher from our village population which has the mean value

(12)

Fig. 3. Representation of the selection of 20 random samples of size 30 of the illustrative example.

of 70 is 0.14 or equivalently; there is a 14% chance that when we make a random selection of 30 people from our village with average age of 73 or higher.

Usually, we would not know the average value of the village population, but if we did then the entire picture that we draw above would always emerge. Keeping this picture in mind, let us examine the puzzle that we need to solve: what can we say about the difference between the average age of the village population, which we usually would not know, and the average age in the country, which equals to 67 on the basis of a single sample of size 30 taken from the village whose mean value equals to 73?

What we need to do is actually simple: just imagine how the sampling distribution would look like if there would be no difference between the village and the country in terms of average age. Under this assumption the sampling distribution would center around 67 and not around 70 as depicted inFig. 6. And once again ‘the’ sample that we have with mean value of 73 would emerge as a point on this new sampling distribution, which would be detached from the center by another distance of d which now would equal to 6 (simply 73 – 67). The probability of taking a sample with mean value of 73 or higher then would be equal to the area that is on the left side of the vertical line drawn from the point that marks the location of the sample inFig. 6. If we would measure this area, we would obtain approximately 0.014. And then we could make the following statement: the probability of taking a sample of 30 people with a mean of 73 or higher from our village population which has the mean value of 67 is 0.014 or

(13)

Fig. 4. The distributions for the random samples of the illustrative example.

equivalently; there is only 1.4% chance that when we make a random selection of 30 people from a village with average age of 67 we would obtain a sample with average of 73 or higher.

Now:

i. We may choose to believe that the average age of the village is really 67 and we made a very unlikely random selection from this population so as to obtain a sample that could be chosen only by a 1.4% chance.

ii. We may instead choose to believe that 1.4% is a very small chance so we could not be that ‘lucky’. But we still know that the random sample is indeed chosen from the village. Combining these insights, the only thing that we could argue is that the average age in the village cannot be 67. Usually, one would take the second option, because the convention is that one should classify any situation in which the chance of taking a particular random sample appears below 5% as ‘suspicious’. Of course the only thing that one could be suspicious about is the average value of the village, which is just an expected number or a hypothesis, whereas the information that the sample is randomly taken from the village and the mean value estimated from the sample are facts. It is also possible that a very small chance event (for instance that could occur only with 1.4%) may really take place and, hence, following the convention would make us believe that the expected/hypothesized average value for the village is accurate. Then by opting for (ii) above would make us commit an error that is called type I

(14)

Fig. 5. The sampling distribution of the illustrative example.

Fig. 6. The sampling distribution that would emerge if the hypothesized population value were true.

(15)

error: refusing the hypothesis while it is true. There is more to be gained from this exercise by asking what if our hypothesis were that the true population value is 71? In this case we would generate a new sample distribution with a center value of 71 and find out the probability of taking a random sample with an average of 73 from this particular population. This actual probability would be 0.23, and on the basis of the logic given above we would not reject the hypothesis that the true village average is 71. But of course this would be wrong because the true value is actually 70. This kind of a mistake would be called type II error: not rejecting the null hypothesis while it is false.

Within the context of this entire example (and by keeping the logic of construction in mind) one can see that the application logic begins with describing a sample of size 30 taken from ‘the’ village without clarifying the concept of population and the meaning of taking random samples (of same size from the same population). Moreover, it also fails to explain the actual population that we are interested in properly, that is the expected value 67 is derived from a country population but the population that we are concerned with is that of the village. Another pedagogically confusing aspect of the application narrative is that it neither clarifies nor distinguishes the three distinct distributions that emerge in the course of inference process: distribution of the population, distributions of each individual sample (and thus of ‘the’ sample) and the sampling distribution.

5. Conclusion: can we use an instrument without understanding

how it works?

Within the context of statistical inference there are two processes operate in opposing directions. The process of application, which starts from the sample and makes use of the sampling distribution to reach (a judgement about) the population, and the process of construction, whichstarts from the population and builds and reveals all the links on the way: from population to sample, from sample(s) to sampling distribution, between ‘the’ sample and the sampling distribution, and from sampling distribution back to the population. As we conclude our argument it is useful to present the steps of construction and application processes once again but this time in a simpler fashion to clarify the distinction between these two logics better (seeFig. 7). Here, one can see that the application process commences by skipping the first three steps in the construction circle, then shortly follows the construction steps 2 and 3, but then bypasses the remaining construction steps and reaches the final point. This abstract rendering actually provides a succinct answer to the question that we pose in this article. Why is it difficult to understand statistical inference? There is a basic ‘contradiction’ in the way in which the construction and application processes operate. They start from opposing points and the application process assumes almost all steps built in the construction process. Presumably researchers follow the application phases of the statistical inference only after fully grasping the construction phases. In those circumstances of applying statistical inference this entire application scheme is legitimate and works well. On the other hand, if one would like to teach inference by using the application process then it fails the task while generating a sense of incomprehensibility about the entire idea of making inference. Unfortunately, teaching statistical inference very often pursues the application steps and this is the root cause of difficulties involved in understanding statistical inference.

Now, as a last word in this article, let us detach ourselves from the content of construction and application processes and only consider them as two ways of approaching an object. The construction process refers to ‘making it’ and application process means ‘using it’. In real life we are often only aware of the application process regarding many objects (like using a cellphone or television). However, this consumer mentality when exported into the realm of teaching science would generate problems, because

(16)

Fig. 7. Abstract comparison between construction and application steps.

we use an instrument in science in order to understand something. Only after understanding how an instrument works and what exactly it does to a ‘thing’ under scrutiny it is possible to work this ‘thing’ through. Using a microscope to comprehend the anatomy of bacteria is a good example. In this case one should at least be aware of the fact that microscope magnifies the things that it is directed to and the magnification process essentially changes the relative size of the objects/subjects with respect to the rest of the reality. A 4-year-old child who sees the bacteria for the first time through a microscope without knowing how microscope works might have a rather distorted idea about the exact size and anatomy of these creatures. We think that it is appropriate to take this analogy into the realm of teaching statistics. If the learner sees and uses the instrument(s) of inference without understanding how they are constructed and how they work then she would have a rather distorted picture of the things that she examines by using the statistical inference.

R

EFERENCES

[1] Aberson, C. L., Berger, D. E., Healy, M. R., Kyle, D. & Romero, V. L. (2000) Evaluation of an interactive tutorial for teaching the central limit theorem. Teaching of Psychology, 27, 289–291.

[2] Aleksandrov, A. D., Kolmogorov, A. N. & Lavrent’ev, M. A. (1999) Mathematics: Its Content, Methods and Meaning. New York: Dover Publications.

[3] Andriessen, J. (2006) Arguing to learn. The Cambridge Handbook of the Learning Sciences (R. K. Sawyer ed). New York: Cambridge University Press.

[4] Bakker, A. & Gravemeijer, K. P. E. (2004) Learning to reason about distributions. The Challenge of Developing Statistical Literacy, Reasoning, and Thinking (D. Ben-Zvi & J. Garfield eds). Dordrecht, The Netherlands: Kluwer Academic Publishers.

(17)

[5] Bakker, A. & Hoffmann, M. H. G. (2005) Diagrammatic reasoning as the basis for developing concepts: a semiotic analysis of students’ learning about statistical distribution. Educ. Stud. Math., 60, 333–358. [6] Bakker, A., Kent, P., Noss, R. & Hoyles, C. (2009) Alternative representations of statistical measures in

computer tools to promote communication between employees in automotive manufacturing. Technol. Innov. Stat. Educ., 3, 1–29.

[7] Ben-Zvi, D. (2000) Toward understanding the role of technological tools in statistical learning. Math. Think. Learn., 2, 127–155.

[8] Berland, L. K. & Reiser, B. J. (2009) Making sense of argumentation and explanation. Sci. Educ., 93, 26–55.

[9] Björnsdòttir, A., Garfield, J. & Everson, M. (2015) Evaluating two models of collaborative tests in an online introductory statistics course. Stat. Educ. Res. J., 14, 36–59.

[10] Bright, G. W. & Friel, S. N. (1998) Graphical representations: helping students interpret data. Reflections on Statistics: Agendas for Learning, Teaching, and Assessment in K-12 (S. P. Lajoie ed). Mahwah, NJ: Lawrence Erlbaum Associates.

[11] Callingham, R. A. (1997) Teachers’ multimodal functioning in relation to the concept of average. Math. Educ. Res. J., 9, 205–224.

[12] Chance, B., Ben-Zvi, D., Garfield, J. & Medina, E. (2007) The role of technology in improving student learning of statistics. Technol. Innov. Stat. Educ., 1.

[13] Chance, B., Delmas, R. & Garfield, J. (2004a) Reasoning about sampling distributions. The Challenge of Developing Statistical Literacy, Reasoning, and Thinking (D. Ben-Zvi & J. Garfield eds). Dordrecht: Kluwer Academic.

[14] Chance, B. L., Delmas, R. & Garfield, J. (2004b) Reasoning about sampling distributions. The Challenge of Developing Statistical Literacy, Reasoning, and Thinking (D. Ben-Zvi & J. Garfield eds). Dordrecht, The Netherlands: Kluwer Academic Publishers.

[15] Chick, H. L. & Watson, J. M. (2002) Collaborative influences on emergent statistical thinking—a case study. J. Math. Behav., 21, 371–400.

[16] Cobb, P., Mcclain, K. & Gravemeijer, K. P. E. (2003) Learning about statistical covariation. Cogn. Instruc., 21, 1–78.

[17] Delmas, R. C., Garfield, J. & Chance, B. L. (1999) A model of classroomresearch in action: developing simulation activities to improve students’ statistical reasoning. J. Stat. Educ., 7.

[18] Falk, R. & Greenbaum, C. W. (1995) Significance tests die hard. Theory Psychol., 5, 75–98. [19] Ferguson, T. S. (1996) A Course in Large Sample Theory. London, UK: Chapman & Hall. [20] Franzosi, R. (2004) From Words to Numbers. Cambridge: Cambridge University Press.

[21] Frischemeier, D. & Biehler, R. (2015) Preservice teachers’ statistical reasoning when comparing groups facilitated by software. Ninth Congress of the European Society for Research in Mathematics Education (CERME9). Prague: Czech Republic.

[22] Garfield, J. & Ben-Zvi, D. (2005) A framework for teaching and assessing reasoning about variability. Stat. Educ. Res. J., 4, 92–99.

[23] Garfield, J. & Ben-Zvi, D. (2007) How students learn statistics revisited: a current review of research on teaching and learning statistics. Int. Stat. Rev., 75, 372–396.

[24] Garfield, J., Delmas, R. & Zieffler, A. (2012) Developing statistical modelers and thinkers in an introductory, tertiary-level statistics course. ZDM, 44, 883–898.

[25] Garfield, J. B. & Ben-Zvi, D. (2008) Developing Students’ Statistical Reasoning. New York: Springer. [26] Giraud, G. (1997) Cooperative learning and statistics instruction. J. Stat. Educ., 5, 1–12.

[27] Groth, R. (2003) High school students’ levels of thinking in regard to statistical study design. Math. Educ. Res. J., 15, 252–268.

[28] Gunnarsson, C. L. (2001) Student attitude and achievement in an online graduate statistics course. University of Cincinnati.

[29] Hong, K., Lai, K. & Holton, D. (2003) Students’ satisfaction and perceived learning with a web-based course. J. Educ. Technol. Soc., 6, 116–124.

(18)

[30] Lipson, K. (2003) The role of the sampling distribution in understanding statistical inference. Math. Educ. Res. J., 15, 270–287.

[31] Lunsford, M. L., Rowell, G. H. & Goodson-Espy, T. (2006) Classroom research: assessment of student understanding of sampling distributions of means and the central limit theorem in post-calculus probability and statistics classes. J. Stat. Educ., 14, 1–28.

[32] Magel, R. C. (1998) Using cooperative learning in a large introductory statistics class. J. Stat. Educ., 6. [33] Moritz, J. B. (2000) Graphical representations of statistical associations by upper primary students

Twenty-third Annual Conference of the Mathematics Education Research Group of Australasia Incorporated. Western Australia: Fremantle.

[34] Perkins, D. V. & Saris, R. N. (2001) A “jigsaw classroom” technique for undergraduate statistics courses. Teach. Psychol., 28, 111–113.

[35] Saldanha, L. & Thompson, P. W. (2006) Investigating statistical unusualness in the context of resampling. Proceedings of the Seventh International Congress on Teaching Statistics, pp. 1–6.

[36] Saldanha, L. A. & Thompson, P. W. (2002) Conceptions of sample and their relationship to statistical inference. Educ. Stud. Math., 51, 257–270.

[37] Shaughnessy, M. & Noll, J. (2006) School mathematics students’ reasoning about variability in scatter-plots. Proceedings of the Twenty Eighth Annual Meeting of the North American Chapter of the International Group for the Psychology of Mathematics Education, 269.

[38] Smith, M. H. (2004) A sample/population size activity: is it the sample size of the sample as a fraction of the population that matters? J. Stat. Educ., 12, 1–10.

[39] Sotos, A. E. C., Vanhoof, S., Van Den Noortgate, W. & Onghena, P. (2007) Students’ misconceptions of statistical inference: a review of the empirical evidence from research on statistics education. Educ. Res. Rev., 2, 98–113.

[40] Utts, J., Sommer, B., Acredolo, C., Maher, M. W. & Matthews, H. R. (2003) A study comparing traditional and hybrid internet-based instruction in introductory statistics classes. J. Stat. Educ., 11. [41] Ward, B. (2004) The best of both worlds: a hybrid statistics course. J. Stat. Educ., 12, 1–12.

[42] Watson, J. M. (2004) Developing reasoning about samples. The Challenge of Developing Statistical Literacy, Reasoning and Thinking (D. Ben-Zvi & J. Garfield eds). Dordrecht, The Netherlands: Kluwer Academic Publishers.

[43] Watson, J. M. & Moritz, J. B. Developing concepts of sampling. J. Res. Math. Educ., 31, 44–70. [44] Well, A. D., Pollatsek, A. & Boyce, S. J. (1990) Understanding the effects of sample size on the variability

of the mean. Organ. Behav. Hum. Decis. Process., 47, 289–312.

[45] Wild, C. J. & Pfannkuch, M. (1999) Statistical thinking in empirical enquiry. Int. Stat. Rev., 67, 263–265. [46] Yu, C. H. & Behrens, J. T. (1995) Applications of multivariate visualization to behavioral sciences. Behav.

Res. Methods Instrum. Comput., 27, 264–271.

[47] Zetterqvist, L. (2017) Applied problems and use of technology in an aligned way in basic courses in probability and statistics for engineering students–a way to enhance understanding and increase motivation. Teach. Math. Appl.: An International Journal of the IMA, 36, 108–122.

Fulya Kulais an assistant professor at the Department of Research Methods, Measurement and Data Analysis at University of Twente, the Netherlands. She studied Mathematics and Mathematics Education for her undergraduate and graduate studies, respectively. Her research interests lie in statistics, statistics education and mathematics education.

Rüya Gökhan Koçeris an assistant professor at the Department of Research Methods, Measurement and Data Analysis at University of Twente, the Netherlands. His research in the field of political economy focuses on the legitimation of inequalities in advanced capitalist countries. He is also involved in medical research and health analytics.

Referenties

GERELATEERDE DOCUMENTEN

Previously, Stevin's compa- triot Goropius Becanus discussed both characteristics of 'Duyts', in his Latin publications, and so did the authors of the flISt Dutch grammar, the

Treating states as primitive and treating program variables as functions from states to values thus allows us to have many different types of things that can be stored as the value

and A.F.M.Z.’s institutional managent of spiritual, temporal and church governance systems with an aim to review their management models in an endeavour later to formulate a model

Hoogte spoor in m TAW Vondsten (V) en staalnames (St) Werkputcontour Structuur Nieuwe/nieuwste tijd Middeleeuwen/nieuwe tijd Middeleeuwen Romeinse tijd Metaaltijden/Romeinse

The theory developed in this monograph provides a basefora theory on delay- insensitive circuits. In this chapter we point out a numher of generalizations that might

Results: C-Fos expression in both eutopic and ectopic endometrium from patients with endometriosis was significantly higher than that in control endo- metrium (eutopic vs

In Section 2 we study the likelihood ratios in an unpaired design and develop a procedure for joint hypothesis testing in case of multiple binary diagnostic tests. Section 3

This research paper investigates how these sectors participate in supply chain collaboration to become more resilient with the following research question: “How