Spaced learning in MOOCs : an online A/B testing experiment

(1)

Final Version 2.0 05/07/2016

Spaced Learning in MOOCs: An Online

A/B Testing Experiment

Mink Rohmer University of Amsterdam Word count: 4503 Word count abstract: 143 Student number: 10578633 Email: mink.rohmer@student.uva.nl

(2)

Abstract 3 1. Introduction 4 1.1 Free and Openaccess Online Learning: MOOCs 1.2 A/B Testing in Online Environments 1.3 Spaced Learning 2. Methods 10 2.1 Participants 2.2 Materials 2.3 Procedure 3. Results 14 3.1 Standardisation Checks 3.2 Manipulation Check 3.3 Main Analysis 3.4 Further Exploratory Analysis 4. Discussion 20 5. References 23

(3)

Abstract

Psychological research in education over the past decades has shown that studying certain material in multiple study sessions, also known as spaced learning, is superior to studying the same material for a longer period at once, also known as massed learning. In this article we investigate how the technique of spaced learning translates from the traditional learning environments to the environment of the Massive Open Online Course (MOOC). 61 participants were recruited through a course by the University of Amsterdam through the Coursera MOOC platform. These participants were split into three groups: a spaced learning group, a massed learning group and a control group. The three groups were compared on their final exam grades, which showed no significant difference between groups. There were however, some methodological and technical problems which may have confounded these findings, these problems are also discussed in this paper.

(4)

1. Introduction

1.1 Free and Openaccess Online Learning: MOOCs

In the current study we attempt to find a way to improve the effectiveness of a specific type of online learning: the Massive Open Online Course (MOOC). Even though MOOCs are increasingly popular, little research has been done to improve the extent to which a learner actually learns something from the course. In this study we attempt to do just that by applying the technique of spaced learning to a MOOC. The MOOC is a fairly new phenomenon in the online learning paradigm. The term, which was first coined by Stephen Downes and George Siemens in 2008 (Mcauley et al., 2010), refers to courses that are offered in an online environment openly accessible to anyone (with an internet connection). The ‘M’ in the acronym refers to massive. One of the things that characterizes MOOCs is the vast number of participants, or learners as they are referred to in MOOC literature. The very first MOOC by Downes and Siemens already attracted 2300 learners, far more than a traditional course at a university. Even though Downes and Siemens’ MOOC was quite different than the one we discuss in this paper, the numbers are still striking. Since the first MOOC, numbers of participants have only been on the rise, with the most popular MOOC ever amassing a whopping 119297 participants (Online Course Report, 2015). Across the

(5)

different platforms such as edX, Coursera and others, about 25 million learners have enrolled themselves in the past three years (Zhengao et al., 2015). MOOCs consist of a combination of any of the following elements: video lectures, reading material, quizzes, writing assignments and other online learning activities. Most MOOCs have anywhere between 4 and 16 weeks worth of content. An example of the structure of an 8 week long MOOC is as follows: there are video lectures and reading materials throughout week 17, with a quiz at the end of each week which covers that week’s subject matter. Every other week there is a writing assignment in which students write an essay about that week’s topic, students review and grade each other’s work. In the final week there is a quiz on all topics covered in the course. To pass the MOOC, learners have to complete all quizzes and writing assignments with satisfactory grades. All MOOCs are accessible to anyone and everyone who wants to learn more about the topic being covered in the course, all that is required is an internet connection. Aside from a few exceptions, the vast majority of the MOOCs are offered for free it only costs money if the learner wants an official certificate. The open and free nature of the MOOC explains why these courses are able to attract so many participants. Because of the high number of participants, research data can accumulate quite quickly with MOOCs. Even though the different MOOC platforms have terabytes of data available, little experimental research is being done on MOOCs (Reich, 2015). The majority of the research on MOOCs has been correlational so far, using readily available data from a completed course and doing posthoc analysis on that data (Murphy et al., 2014; Reich et al., 2014). In previous studies correlations have been found between items completed within the course and overall course

(6)

grade (Reich et al., 2014) and total time spent on the MOOC and overall course grade (Murphy et al., 2014). While these are interesting results, we believe that the MOOC holds much more power as a scientific research platform than merely finding posthoc correlations between sets of actions and final grades. The current study attempts to go beyond the correlational design and seeks a way to improve online learning in MOOCs through implementing different learning techniques using an experimental design.

1.2 A/B Testing in Online Environments

Online environments lend themselves very well for research, mainly so because it is relatively easy to track actions a user takes in an online environment, and because internet users are very large in numbers. These characteristics of online traffic makes it possible to generate a lot of data quickly and accurately, which is very useful when doing research. This observation has not gone unnoticed by big internet companies such as Amazon, Google and Microsoft, who have all been conducting controlled experiments online since as early as 2000 (Kohavi, Longbotham, Sommerfield, & Henne, 2008). The most popular form of the online experiment is that of the A/B test: testing a new online feature by simply dividing visitors of a website into two groups, group A gets the standard webpage and group B gets a different version of the page with a new feature. The experiment runs for as long as is required to get enough data, and then the two groups are compared on a relevant metric (Kohavi, Longbotham, Sommerfield & Henne, 2008). The version of the website that performs better becomes the new default version of the website. The idea of A/B testing translates quite well from the environment of big company websites such as Amazon to the online learning environment of the MOOC. Instead of choosing

(7)

the best version of a website one can choose the best version of a course. Even though MOOCs lend themselves very well for an experimental approach, little research has been done on how to improve learning effectiveness in MOOCs.

1.3 Spaced Learning

Unlike MOOC research, quite a lot of research has been done when it comes to ‘traditional’ offline learning. Several effective learning interventions have been found in previous research (Dunlosky, Rawson, Marsh, Nathan, & Willingham, 2013). One such learning intervention is _{spaced learning}

, also known as _{distributed practice}. Spaced learning refers to

revising the same material for a shorter period of time at different points in time. Spaced learning is the opposite of massed learning, which refers to studying the same material for an extended period of time at once. A very robust finding in educational research is that spaced learning outperforms massed learning (Bloom & Schuell, 1981; Fisherman, Keller & Atkinson, 1968; Grote, M.G., 1995). According to Belleza and Young (1989), spaced learning works well because being exposed to similar material elicits retrieval of previously encountered material, which strengthens memory of that material. Another explanation for the effectiveness of spaced learning is that the increased effort required to retrieve information after a delay has a more beneficial effect on retention than if the same material is presented earlier after the initial exposure (Jacoby, 1978). Both of these theories make it quite thinkable that spaced learning would translate well to the world of MOOCs. Still as of yet, very little research has been done as to how these findings translate to the online classroom. A study by Kerfoot and colleagues (2010) on the effects of spaced learning in

(8)

online learning environment found that spaced learning improved long term retention over regular online learning. A correlational study carried out through the HarvardX MOOC platform showed that students who spaced their sessions more had a higher chance of earning a certificate (Miyamoto et al., 2015). The Miyamoto study shows that spaced learning might be effective in the MOOC environment; however, this study was only correlational and thus has not explored the effect of incorporating the use of spaced learning into a course’s design. The aim of the current study is to further investigate whether spaced learning can improve learning in MOOCs, using an experimental design similar to online A/B testing. Based on previous research we expect that online learners will benefit from the effects of spaced learning. In the current study we assess the effects of spaced learning in the MOOC environment using an experimental between subjects design with three conditions: a control condition, a spaced learning condition and a massed learning condition. Participants in the spaced learning condition complete the same course as the control condition, with additional content spaced out over the duration of the course. To make sure that differences found between the spaced learning and control conditions are not solely due to the exposure to additional content, there is also a massed learning condition. The massed learning condition receives the same content as the spaced learning condition, but not spaced out over multiple weeks. As such, the massed condition does not revisit the content from earlier weeks in later weeks, while the spaced condition does. Learners will be evaluated by their result on a final exam covering all the subjects they studied during the course. We expect learners in the spaced condition to outperform the learners in the massed condition, which in turn outperform the control condition.

(9)

Besides planned hypothesis testing, we will also look at the data generated during our experiment in an exploratory fashion. Since MOOC research is still in a very early stage, we might come across other interesting findings during the analysis of the collected data. All exploratory efforts will be marked as being exploratory in the results section.

(10)

2. Methods

2.1 Participants

Participants for this experiment were recruited through a course offered by the University of Amsterdam (UvA) on the Coursera MOOC platform. All learners registered through the Coursera platform and all learners registered due to their own interest in the course, no participants were actively recruited by the UvA. At the beginning of the course all learners were asked for their consent to use their data in the current study. A technical error caused learners that denied consent for their data to be used to automatically fail the course, this issue was later solved so that no nonconsenting learners were penalized. The learners who did not give consent for their data to be used were excluded from the data analysis. Participants were randomly assigned to three conditions. The course used for the current study has been running since September 2014 and starts with a new cohort of learners every month. Course registration opens two weeks prior to the start of the course, all course material from week 1 is also accessible during this period. From the moment the course starts, learners have access to all eight weeks of course content. It is encouraged to complete the course in the recommended 8 weeks time span, but all learners can go through the course at their own chosen pace.

(11)

2.2 Materials

The course used in the current study, called ‘Quantitative Methods’, focuses on the principles of scientific methods in the behavioral and social sciences. The course was developed and taught by staff from the University of Amsterdam. The course consisted of weekly video lectures, writing assignments and a quiz. All course material is available for free online on Coursera (https://www.coursera.org/learn/quantitativemethods). The course, which ran for 8 weeks, was structured as following: the first 6 weeks consisted of theory, quizzes and writing assignments on a new topic every week, week 7 was dedicated to selfstudy as exam preparation and contained a practice exam, the final exam took place in week 8 of the course. The recommended time for learners to spend on the course was four to five hours per week. During the first six weeks, learners were given video lectures on that week’s topic, consisting of about ten educational videos, each about three to five minutes long. In addition to the videos there was also written content to elaborate on the subjects discussed in the videos. At the end of the week learners made a quiz consisting of ten multiple choice questions. An example of a question from such a quiz: _{“What is the primary difference between a scientific} theory and a scientific law?” . For each question three answer options were given. Besides the weekly quizzes, learners also completed weekly writing assignments, they also graded each others’ work on the writing assignments. An example of such a writing assignment is one where learners were asked to create their own psychological questionnaire items. In the seventh week learners were given time to prepare for the final exam in week 8 and they made a practice exam.

(12)

The final week saw a final exam on all the topics discussed in week one through six. Both the final exam and the practice exam consisted of 30 multiple choice questions with three answer options. To evaluate learners’ performance during the course and test our hypotheses, we looked at learner grades on the final exam. The final exam grade was calculated as a proportion of correct answers out of 30 questions. To pass the final exam, a satisfactory grade of at least 70% was required. Participants were randomly assigned to three groups of equal size: a control group, a spaced learning group and a massed learning group. The control group was given the standard course with no extra content. The spaced learning and massed learning groups received additional content in the form of extra questions on the weekly quizzes in week one through six. The massed learning group was given ten extra questions every week on that week’s topic. The spaced learning group was given the same extra questions but distributed over the remaining weeks. For a full overview of the amount of extra questions given to each group, see Table 1. All extra questions are available in Appendix B. Due to technical errors there were no extra questions given to either the spaced or massed learning groups for week one.

(13)

Week 1 2 3 4 5 6 Control 0 0 0 0 0 0 Massed 0 10 10 10 10 0 Spaced 1 0 0 0 0 0 0 2 4 3 2 1 3 5 3 2 4 5 5 5 10 6 0 Table 1: Schedule of extra questions on weekly quiz, for each group.

2.3 Procedure

During the first week of the course, all participants followed the same curriculum. After the first week learners were split up into the three different conditions. It was not possible to divide learners into separate groups from the start of the course, since the course content from week 1 was visible before enrolling. Random assignment to the conditions was done through the Coursera platform. Every group of learners had a separate course environment from the other groups, so that learners could not communicate with learners from other groups through the forums or writing assignments. All data for the current study was collected through the Coursera platform. The data was anonymized before data analysis to protect learner privacy. All data analyses were done using

(14)

the R programming language and the Rstudio integrated development environment. The R scripts used for the current study can be found in Appendix A and are also available digitally upon request. All anonymized data is also available upon request.

3. Results

3.1 Standardisation Checks

Of the 61 participants used in the data analysis, gender and age information was known of 45 participants. Of the 45 participants for which gender information was known, 23 were male. The mean age of participants was 41.20 years, with a standard deviation of 12.88. To check for systematic differences between groups, several standardisation checks were performed. We looked at a difference in distribution of gender between groups and the distribution of age between groups. Due to the fact that it is not mandatory to fill in demographic data on Coursera, many participants did not fill in all possible demographic data. Because of the missing data it was not possible to conduct a statistical analysis for the difference between level of education between groups. To check for a difference in distribution of gender between groups we performed a chi square test. The null hypothesis that there was no significant difference in the distribution of gender between groups could not be rejected with χ²(2) = 3.2918, _p = .193 . For the distribution of gender for each condition, see Table 2. To check for a difference in the distribution of age between groups a oneway ANOVA was performed. There was no significant difference in the distribution of age between groups

(15)

with F(2) = .717, _p = .494. For the means and standard deviations of age for each group, see Table 2. Group (_n ) Mean Age (_SD) Male / Female Control (₆ ) 43.71 (_13.19) 1 / 5 Massed Learning (₂₅ ) 41.00 (_13.00) 14 / 11 Spaced Learning (₁₄ ) 36.17 (_12.29) 8 / 6 Table 2 : Descriptive statistics for age and gender, for each group.

3.2 Manipulation Check

Since learners could go through the course at their own chosen pace, it was also possible for learners to complete the whole course within a very short time span. The effect of spaced learning depends on the distributing of study sessions over an extended period of time, therefore it is undesirable for learners to complete the course too quickly or to have too little time between their study sessions. To make sure that it was possible for learners to actually benefit from the effects of spaced learning, we decided to check whether participants completed the course within a reasonable time frame on average and whether participants left enough time between quizzes. The mean completion time for each group was greater than 4 weeks, which we deemed to be long enough for the current study’s purposes. The mean time between quizzes for each group was greater than 4 days. Previous research has shown the spacing effect to exist with as little as 1 and as much as 7 days between sessions (Kornell, 2009; Sobel, Cepeda, & Kapler, 2010),

(16)

therefore we deemed 4 days as a mean time between quizzes to be sufficient for the current study’s purposes. To check for standardisation of course completion time we performed a oneway ANOVA. The null hypothesis that there is no difference in the population means between groups could not be rejected with F(2) = 1.504, _p = .231. For the means and standard deviation of course completion times for each group, see Table 3. To check for standardisation of time between quizzes we performed a oneway ANOVA. The null hypothesis that there is no difference in the population means between groups could not be rejected with F(2) = 1.027, _p = .364. For the means and standard deviation of time between quizzes for each group, see Table 3. Group (_n ) Mean Course Completion Time in Weeks (_SD ) Mean Time Between Quizzes in Days (_SD ) Control (₁₈ ) 4.77 (_2.18) 3.69 (_1.71) Massed Learning (₂₃ ) 5.62 (_1.99) 4.26 (_1.45) Spaced Learning (₂₀ ) 5.85 (_2.18) 4.35 (_1.44) Table 3 : Descriptive statistics for course completion time and time between quizzes, for each group.

3.3 Main Analysis

To test the hypothesis that the spaced learning condition outperforms the massed learning condition, which in turn outperforms the control condition, we used a oneway ANOVA. The

(17)

assumptions necessary for a oneway ANOVA were all met. A ShapiroWilk test for normality showed that the null hypothesis that the data came from a normally distributed population could not be rejected, with _p = .617, _p= .262 and _p = .553 for the control, massed learning and spaced learning conditions respectively. A Levene’s test showed that the null hypothesis that the variance within groups was equal could not be rejected with F(2) = .110, _p = .897. The main analysis oneway ANOVA was performed on the final exam grades between groups. The null hypothesis that there is no difference in the population means between groups could not be rejected with F(2) = .253, _p = .778. For the means and standard deviations of final exam grades for each group, see Table 4. For a visual presentation of the distribution of grades among the different groups, see Figure 1.

(18)

Group (_n ) Mean Final Exam Grade (_SD) Control (₁₈ ) 8.13 (_1.18) Massed Learning (₂₃ ) 8.32 (_1.10) Spaced Learning (₂₀ ) 8.10 (_1.02) Table 4 : Descriptive statistics for final exam grades, for each group. Figure 1 : Distribution of grades on final exam for each group.

(19)

3.4 Further Exploratory Analysis

In the analysis we saw that final exam grades seemed to be quite high for all participants. This could be in part due to the fact that during the final exam, learners can easily look up answers to questions in another internet browser tab and therefore get very high grades. This makes it harder to detect differences between groups, since participants in all groups have instant access to all course material and as such can easily get very high grades regardless of their studying techniques. Because of this we decided to also look at the grades on the practice exam, which was taken the week before the final exam. The practice exam is not a mandatory part of the course and does not have to be completed with a satisfactory grade in order to pass the course. For this reason it’s quite thinkable that learners would not ‘cheat’ on the practice exam, and it might be more likely to find a difference between groups on the practice exam grades. There were two practice exams available in the course, we only looked at the first practice exam. There were more learners that made the practice exam than the final exam, hence the discrepancy between number of participants between this analysis and the main analysis discussed previously. We conducted a oneway ANOVA over practice exam grades. The null hypothesis that there is no difference in the population means between groups could not be rejected with F(2) = 0.076, _p = 0.927. For the means and standard deviations of the practice exam for each group, see Table 4.

(20)

Group (_n ) Mean Final Exam Grade (_SD) Control (₂₀ ) 8.63 (_1.59) Massed Learning (₂₅ ) 8.51 (_1.10) Spaced Learning (₂₀ ) 8.63 (_1.25) Table 4 : Descriptive statistics for practice exam grades, for each group.

4. Discussion

The aim of this study was to investigate whether spaced learning could improve learning in MOOCs. There was no significant difference in the mean final exam grades between the spaced learning, massed learning and control groups. Also in the exploratory analysis of practice exam grades, we did not find a significant difference between groups. The results found do not indicate that there is a clear benefit of applying spaced learning in MOOCs. The fact that certain expected results have not been found might have partly been due to some aspects of the methodology of the current study and nature of the collected data. The first point of discussion that should be noted is the low number of participants in the current study. One of the reasons that the MOOC should be an interesting research platform is the M in the acronym, Massive, but the course used in the current study failed to attract a massive amount of learners. The low sample size means that the study also has lower power, which means that it is less likely to find an effect, if it would exist. A meta analysis conducted on 63 studies on spaced learning found a mean weighted effect size of .46 (Donovan & Radosevich,

(21)

1999), which is a small to medium effect size according to Cohen (1992). A small to medium effect size for a three group oneway ANOVA should have at the very least 52 participants in every group in order for the study to have an adequate power of .80, according to Cohen (1992). This means that the current study might have been lacking in power and this could be a reason that no apparent effects of spaced learning were found. Another point of discussion for the current study is that spaced learning is typically most effective for longterm retention (Karpicke & Roediger III, 2007). The current study only looked at quiz grades immediately after a final studying session and not at possible long term retention benefits of spaced learning in MOOCs. A followup quiz some weeks after the course has ended might show the same beneficial effects of spaced learning that it’s renowned for in the traditional classroom. For further research it might be very interesting to look at possible long term effects of spaced learning in MOOCs, with the use of a follow up quiz. A problem with this approach is the possibility of a very low response rate on such a follow up quiz, since learners might not be interested in making more quizzes after they have already finished the course. A third point of discussion is the apparent ceiling effect for the final exam grades. In total, 6 of 61 participants included in data analysis had a final exam grade of the maximum 10 out of 10, with 16 out of 61 participants having a grade of 9.0 or higher. This could mean that the final exam was not difficult enough, or that learners perhaps looked up the relevant course content while making the exam. Maybe the fact that there was no difference in exam grades between groups was due to the fact that it was too easy to get a high grade on the exam, regardless of the group participants were in. Another factor that could be related to the high final exam grades is the fact that learners can retake the final exam once every 48 hours. Learners

(22)

could take the exam the first time, fail it, and then retake the same exam with the very same questions 48 hours later. This would make it quite easy to get high grades on the exam, not only due to the testing effect but also because students could go back and study the relevant material after seeing the questions. Problems like these might be inherent to conducting research in organic research environments such as the MOOC and are difficult to solve. In a future study it could be interesting to look at the grades learners got on their very first final exam attempt, unfortunately that data was not available to us in the current study. A final point of discussion is that due to a technical error, it was only possible to split learners into the different groups after the first week had already been completed. Therefore it was only possible to apply the spaced learning techniques over week 2 through 5. Instead of 5 weeks of spaced content, there were only 4 weeks, which means that 20% of the spaced content was lost. With the reported effect size of spaced learning already being small to medium (Donovan & Radosevich, 1999), losing 20% of that effect could have had a considerable impact on the results found in this study. In a future study it could possibly be beneficial to split learners into groups from the very start of the course. As of yet, no clear benefit of using spaced learning in MOOCs has been found. This could be in part because of the discussion points raised in this paragraph, therefore further research is still necessary to better understand how spaced learning works in the online classroom. Hopefully researchers across MOOC platforms will study the way spaced learning works in the online classroom and share their findings.

(23)

5. References

Bellezza, F. & Young, D. (1989). _{Chunking of repeated events in memory} . Journal Of Experimental Psychology: Learning, Memory, And Cognition, ₁₅ (5), 990997. Bloom, K. C., & Shuell, T. J.. (1981). Effects of Massed and Distributed Practice on the Learning and Retention of SecondLanguage Vocabulary. _{The Journal of Educational Research} , ₇₄(4), 245–248. Cohen, J. (1992). _{A power primer} . Psychological Bulletin, ₁₁₂(1), 155159. Deming, D., Yuchtman, N., Abulafi, A., Goldin, C., & Katz, L. (2016). _{The Value of} Postsecondary Credentials in the Labor Market: An Experimental Study . American Economic Review, 106 (3), 778806. Donovan, J. & Radosevich, D. (1999). _{A metaanalytic review of the distribution of practice} effect: Now you see it, now you don't. Journal Of Applied Psychology, ₈₄(5), 795805. Dunlosky, J., Rawson, K., Marsh, E., Nathan, M., & Willingham, D. (2013). _{Improving Students'} Learning With Effective Learning Techniques: Promising Directions From Cognitive and Educational Psychology. Psychological Science In The Public Interest, ₁₄(1), 458. Grote, M. G. (1995). _{Distributed versus massed practice in high school physics.} School Science and Mathematics, ₉₅ (2), 97101. Jacoby, L. (1978). On interpreting the effects of repetition: Solving a problem versus remembering a solution. _{Journal Of Verbal Learning And Verbal Behavior,} ₁₇(6), 649667. Fishman, E. J., Keller, L., & Atkinson, R. C. (1968). _{Massed versus distributed practice in} computerized spelling drills . Journal of educational psychology, ₅₉(4), 290. Karpicke, J. D. & Roederig III, H. L. (2007). _{Repeated retrieval during learning is the key to} longterm retention. Journal Of Memory And Language, ₅₇(2), 151162.

(24)

Kohavi, R., Longbotham, R., Sommerfield, D., & Henne, R. (2008). _{Controlled experiments on} the web: survey and practical guide. Data Mining And Knowledge Discovery, ₁₈(1), 140181. Kornell, N. (2009). _{Optimising learning using flashcards: Spacing is more effective than} cramming . Applied Cognitive Psychology, ₂₃(9), 12971317. McAuley, A., Stewart, B., Siemens, G., & Cormier, D. (2010). _{The MOOC model for digital} practice. University of Prince Edward Island. Miyamoto, Y., Coleman, C., Williams, J., Whitehill, J., Nesterko, S., & Reich, J. (2015). _Beyond timeontask: The relationship between spaced study and certification in MOOCs . Journal Of Learning Analytics, 2(2), 4769.

Murphy, R., Gallagher, L., Krumm, A.E., Mislevy, J. & Hafter, A. (2014). _{Research on the Use of}

Khan Academy in Schools: Research Brief . Menlo Park: SRI International. Online Course Report (2015). _{The 50 Most Popular MOOCs of All Time} . (2015). Retrieved 22 April 2016, from http://www.onlinecoursereport.com/the50mostpopularmoocsofalltime/ Reich, J. (2015). _{Rebooting MOOC Research.} Science, 347, 3435. Reich, J., Emanuel, J., Nesterko, S., Seaton, D., Mullaney, T., & Waldo (2014). _{Heroesx: The} Ancient Greek Hero: Spring 2013 Course Report ._{SSRN Electronic Journal.} Sobel, H., Cepeda, N., & Kapler, I. (2010). _{Spacing effects in realworld classroom vocabulary} learning . Applied Cognitive Psychology, ₂₅(5), 763767. Zhenghao, C., Alcorn, B., Chrisentsen, G., Eriksson, N., Koller, D. & Emanuel, E.J. (2015). Who’s Benefiting from MOOCs, and Why . (2015). Retrieved 22 April 2016, from https://hbr.org/2015/09/whosbenefitingfrommoocsandwhy

(25)

Appendix A

R scripts

A.1 Main analysis library(dplyr) # Final Exam ID users_ids < as.data.frame(read.csv2('public.course_branch_items.csv', header = TRUE)) qi < as.matrix(users_ids[which(users_ids[,6] == 'Final Exam'), 2])[1] # Informed Consent course_items < as.data.frame(read.csv2('public.course_branch_items.csv', header = TRUE)) itemid < as.matrix(course_items[which(course_items[,6] == 'Informed Consent Form'), c(1,2)]) # Assign condition IDs c1 < itemid[1,1] c2 < itemid[2,1] c3 < itemid[3,1] item_answers < as.data.frame(read.csv2('public.course_formative_quiz_grades.csv', header = TRUE)) infans1 < item_answers[which(item_answers[,2] == itemid[1,2]), 5] # blijkbaar mensen die goedkeuring gaven al gefilterd... infans2 < item_answers[which(item_answers[,2] == itemid[2,2]), 5] infans3 < item_answers[which(item_answers[,2] == itemid[3,2]), 5] # eigenlijk wel logisch # Final exam dependent variable feg < as.data.frame(read.csv2('public.course_item_grades.csv', header = TRUE)) depraw < item_answers[which(item_answers[,2] == qi), c(3,5,6)] depgrad < feg[which(feg[,2] == qi), c(3,5,6)] depgradpas < depgrad[which(depgrad[,2] != 0), c(1,3)] as.matrix(depraw[,1]) %in% as.matrix(depgradpas[,1]) # Seperate IDs on conditions connec < as.data.frame(read.csv2('public.course_branch_grades.csv', header = TRUE))[,c(1,2)] ids1 < connec[which(connec[,1] == c1), 2] # user ids per conditie ids2 < connec[which(connec[,1] == c2), 2] ids3 < connec[which(connec[,1] == c3), 2] # Get indices of users in depraw i1 < na.omit(match(depraw[,1], ids1)) i2 < na.omit(match(depraw[,1], ids2)) i3 < na.omit(match(depraw[,1], ids3)) # Indicies of users in seperate conditions in depraw I1 < match(ids1[i1], depraw[,1])

(26)

# Grades for each condition cijfers_con1 < depraw[I1, 2]/30 * 10 cijfers_con2 < depraw[I2, 2]/30 * 10 cijfers_con3 < depraw[I3, 2]/30 * 10 # Means for each condition m1 < mean(cijfers_con1) m2 < mean(cijfers_con2) m3 < mean(cijfers_con3) c(m1,m2,m3) length(na.omit(cijfers)) # ANOVA ccom1 < c(cijfers_con1, rep(NA, times = max(length(cijfers_con1), length(cijfers_con2), length(cijfers_con3)) length(cijfers_con1))) ccom2 < c(cijfers_con2, rep(NA, times = max(length(cijfers_con1), length(cijfers_con2), length(cijfers_con3)) length(cijfers_con2))) ccom3 < c(cijfers_con3, rep(NA, times = max(length(cijfers_con1), length(cijfers_con2), length(cijfers_con3)) length(cijfers_con3))) cijfers < c(ccom1, ccom2, ccom3) condities < c(rep('a',length(ccom1)), rep('b', length(ccom1)), rep('c',length(ccom1))) data < data.frame(cijfers, condities) res < aov(cijfers ~ condities, data = data) summary(res) # Gives same results Data < data.frame(cijfers = c(cijfers_con1, cijfers_con2, cijfers_con3),condities = factor(rep(c("1", "2", "3"), times=c(length(cijfers_con1), length(cijfers_con2), length(cijfers_con3))))) res2 < aov(cijfers ~ condities, data = Data, contrasts = 'contr.helmert') summary(res2) fcon < factor(condities) res3 < anova(lm(cijfers ~ condities)) contrasts(fcon) < cbind(c(1, 1/2, 1/2), c(0, 1, 1)) A < aov(cijfers ~ condities) summary.lm(A) ## Assumption checks # Assumption of normality shapiro.test(cijfers_con1) shapiro.test(cijfers_con2) shapiro.test(cijfers_con3) # Assumption homogenity of variance library(Rcmdr) leveneTest(Data[,1], Data[,2])

(27)

layout(matrix(c(1, 2, 3), nrow = 3, ncol = 1, byrow = T)); hist(cijfers_con3, main = "Distribution of Grades on Final Exam \n Control", ylim = c(0,8), lwd = 4, xlim = c(5,10), xlab = "Grade on Final Exam", col = rgb(0, 0.8, 0, 0.5), breaks = 10) hist(cijfers_con2, main = "Distribution of Grades on Final Exam \n Massed", ylim = c(0,8), xlim = c(5,10), col = rgb(0.8, 0, 0, 0.5), lwd = 4, breaks = 10, xlab = "Grade on Final Exam", ) hist(cijfers_con1, main = "Distribution of Grades on Final Exam \n Spaced", ylim = c(0,8), xlim = c(5,10), col = rgb(0, 0, 0.8, 0.5), lwd = 4, breaks = 10, xlab = "Grade on Final Exam",)

(28)

A.2 Standardization checks # Conditions course_items < as.data.frame(read.csv2('public.course_branch_items.csv', header = TRUE)) itemid < as.matrix(course_items[which(course_items[,6] == 'Informed Consent Form'), c(1,2)]) # Condition IDs c1 < itemid[1,1] # Spaced c2 < itemid[2,1] # Massed c3 < itemid[3,1] # Control # Answers on gender question demans < as.data.frame(read.csv2('public.demographics_answers.csv', header = TRUE)) gans < demans[which(demans[,1] == 11), c(2,4)] # Answers on age question lans < demans[which(demans[,1] == 12), c(2,5)] # Seperate user ids on condition connec < as.data.frame(read.csv2('public.course_branch_grades.csv', header = TRUE))[,c(1,2)] ids1 < connec[which(connec[,1] == c1), 2] # user ids per conditie ids2 < connec[which(connec[,1] == c2), 2] ids3 < connec[which(connec[,1] == c3), 2] # Indicies of user ids i1 < na.omit(match(gans[,1], ids1)) i2 < na.omit(match(gans[,1], ids2)) i3 < na.omit(match(gans[,1], ids3)) # Indices of user ids I1 < match(ids1[i1], gans[,1]) I2 < match(ids2[i2], gans[,1]) I3 < match(ids3[i3], gans[,1]) # Gender for each condition gender_con1 < gans[I1, 2] gender_con2 < gans[I2, 2] gender_con3 < gans[I3, 2] age_con1 < na.omit((lans[I1, 2] 2016) * 1) age_con2 < na.omit((lans[I2, 2] 2016) * 1) age_con3 < na.omit((lans[I3, 2] 2016) * 1) # Sum per condition of number of males male1 < sum(gender_con1 == 1) male2 < sum(gender_con2 == 1) male3 < sum(gender_con3 == 1) # Sum per condition of number of females female1 < sum(gender_con1 == 0) female2 < sum(gender_con2 == 0) female3 < sum(gender_con3 == 0) # Make table for chi square

(29)

# Chi square test chisq.test(mavr) # Make vectors for barplots mf1 < replace(gender_con1, gender_con1 == 1, "Female") mf1 < replace(mf1, mf1 == 0, "Male") mf2 < replace(gender_con2, gender_con2 == 1, "Female") mf2 < replace(mf2, mf2 == 0, "Male") mf3 < replace(gender_con3, gender_con3 == 1, "Female") mf3 < replace(mf3, mf3 == 0, "Male") # Plot male vs female proportions layout(matrix(c(1, 2, 3), nrow = 1, ncol = 3, byrow = T)); barplot(prop.table(table(mf1)), main = "Spaced") barplot(prop.table(table(mf2)), main = "Massed") barplot(prop.table(table(mf3)), main = "Control") # Anova for age cijfers < c(age_con1, age_con2, age_con3) condities < c(rep('a',length(age_con1)), rep('b', length(age_con2)), rep('c',length(age_con3))) data < data.frame(cijfers, condities) res < aov(cijfers ~ condities, data = data) summary(res) # Plots layout(matrix(c(1), nrow = 1, ncol = 1, byrow = T)); plot(density(age_con1), main = "Distribution of age", lwd = 4, xlim = c(15,75), ylim = c(0,0.1), xlab = "Grade on final exam", col = rgb(0, 0.8, 0, 0.9)) lines(density(age_con2), col = rgb(0.8, 0, 0, 0.9), lwd = 4) lines(density(age_con3), col = rgb(0, 0, 0.8, 0.9), lwd = 4) polygon(density(age_con1), col = rgb(0, 0.8, 0, 0.2), border=rgb(0, 0.8, 0, 0.2)) polygon(density(age_con2), col = rgb(0.8, 0, 0, 0.2), border=rgb(0.8, 0, 0, 0.2)) polygon(density(age_con3), col = rgb(0, 0, 0.8, 0.2), border=rgb(0, 0, 0.8, 0.2)) legend(20, 0.09, c("Spaced", "Massed", "Control"), lty=c(1, 1, 1), lwd = c(3,3,3), col = c(rgb(0, 0.8, 0, 0.9),rgb(0.8, 0, 0, 0.9),rgb(0, 0, 0.8, 0.9)), bty = 'n')

(30)

A.3 Manipulation Checks usePackage = function(p) { if (!is.element(p, installed.packages()[, 1])) install.packages(p, dep = TRUE); require(p, character.only = TRUE); } usePackage('dplyr'); # connect to db con = src_postgres(dbname = "coursera", host = "rens.amsterdam", user = "coursera", password = "Coursera123"); # get the course_id #var_course_id < con %>% tbl("courses") %>% filter(course_slug == "classicalsociologicaltheory") %>% select(course_id) %>% collect #var_course_id < as.matrix(var_course_id)[1]; # get the course_item_type_id's for 'quiz' and 'exam' var_type_ids < con %>% tbl("course_item_types") %>% filter(course_item_type_desc == "quiz" || course_item_type_desc == "exam") %>% select(course_item_type_id) %>% collect var_type_ids < as.matrix(var_type_ids); # check how many users completed the final exam var_completed_final_exam = con %>% tbl(sql(paste(' SELECT DISTINCT(progress.amsterdam_user_id) FROM course_formative_quiz_grades AS progress INNER JOIN course_branch_items AS items ON items.course_item_id = progress.course_item_id WHERE items.course_branch_item_name = \'Final Exam\' AND items.course_branch_id != \'rTTFFgb8EeWJMSIAC7Jl0w\' Exclude unbranched items. ', sep = ''))) %>% collect # get the grade timestamps for all users var_timestamps_old = con %>%

(31)

SELECT progress.amsterdam_user_id, MIN(progress.course_progress_ts) AS timestamp, grades.course_branch_id AS condition, 0 AS difference, 0 AS average, items.course_item_id, types.course_item_type_desc, progress.course_progress_state_type_id AS state FROM course_progress AS progress INNER JOIN course_branch_items AS items ON items.course_item_id = progress.course_item_id INNER JOIN course_item_types AS types ON types.course_item_type_id = items.course_item_type_id INNER JOIN course_grades AS grades ON grades.amsterdam_user_id = progress.amsterdam_user_id INNER JOIN course_branch_grades AS grades ON grades.amsterdam_user_id = progress.amsterdam_user_id WHERE items.course_item_type_id IN (', var_type_ids[1], ', ', var_type_ids[2], ') AND progress.course_progress_state_type_id = 2 Item has been completed. AND progress.course_progress_ts > \'20160410\' ... after the experiment has started. AND grades.course_passing_state_id > 0 Participants have completed the course. AND items.course_branch_id != \'rTTFFgb8EeWJMSIAC7Jl0w\' Exclude unbranched items. AND grades.course_branch_id != \'rTTFFgb8EeWJMSIAC7Jl0w\' Exclude beforeexperiment participants. AND progress.amsterdam_user_id IN ( SELECT DISTINCT(progress.amsterdam_user_id) FROM course_formative_quiz_grades AS progress INNER JOIN course_branch_items AS items ON items.course_item_id = progress.course_item_id WHERE items.course_branch_item_name = \'Final Exam\' AND items.course_branch_id != \'rTTFFgb8EeWJMSIAC7Jl0w\' Exclude unbranched items. ) GROUP BY progress.amsterdam_user_id, grades.course_branch_id, items.course_item_id, types.course_item_type_desc, progress.course_progress_state_type_id

(32)

timestamp ASC ', sep = ''))) %>% collect var_timestamps = con %>% tbl(sql(paste(' SELECT progress.amsterdam_user_id, MIN(progress.course_quiz_grade_ts) AS timestamp, grades.course_branch_id AS condition, 0 AS difference, 0 AS average, items.course_item_id, types.course_item_type_desc FROM course_formative_quiz_grades AS progress INNER JOIN course_branch_items AS items ON items.course_item_id = progress.course_item_id INNER JOIN course_item_types AS types ON types.course_item_type_id = items.course_item_type_id INNER JOIN course_grades AS grades ON grades.amsterdam_user_id = progress.amsterdam_user_id INNER JOIN course_branch_grades AS grades ON grades.amsterdam_user_id = progress.amsterdam_user_id WHERE items.course_item_type_id IN (', var_type_ids[1], ', ', var_type_ids[2], ') AND progress.course_quiz_grade_ts > \'20160411\' ... after the experiment has started. AND grades.course_passing_state_id > 0 Participants have completed the course. AND items.course_branch_id != \'rTTFFgb8EeWJMSIAC7Jl0w\' Exclude unbranched items. AND grades.course_branch_id != \'rTTFFgb8EeWJMSIAC7Jl0w\' Exclude beforeexperiment participants. AND progress.amsterdam_user_id IN ( SELECT DISTINCT(progress.amsterdam_user_id) FROM course_formative_quiz_grades AS progress INNER JOIN course_branch_items AS items ON items.course_item_id = progress.course_item_id WHERE items.course_branch_item_name = \'Final Exam\' AND items.course_branch_id != \'rTTFFgb8EeWJMSIAC7Jl0w\' Exclude unbranched items. )

(33)

AND progress.amsterdam_user_id != \'ff069d3156f5001b96b0bff4f24810f7\' AND progress.amsterdam_user_id != \'c5dc966de13c5b4079d91280e755ca99\' GROUP BY progress.amsterdam_user_id, grades.course_branch_id, items.course_item_id, types.course_item_type_desc ORDER BY progress.amsterdam_user_id ASC, timestamp ASC ', sep = ''))) %>% collect # Total number of unique user ids: length(unique(var_timestamps$amsterdam_user_id)); users1 = unique(var_completed_final_exam$amsterdam_user_id); users2 = unique(var_timestamps$amsterdam_user_id); setdiff(users1, users2); # This should be empty. for(i in 1:nrow(var_timestamps)) { row = var_timestamps[i, ]; if (row["condition"] == "branch~ip5_pQDmEeazAhK2M8bWmQ") var_timestamps[i, "condition"] = "control" else if (row["condition"] == "branch~iqDxTgDmEeapgRKsUGvS7w") var_timestamps[i, "condition"] = "spaced" else if (row["condition"] == "branch~iqkt6QDmEeaXKAohPoPYww") var_timestamps[i, "condition"] = "massed" } for(i in 2:nrow(var_timestamps)) { row = var_timestamps[i, ]; prev = var_timestamps[i 1, ]; # Check if user id's match. if (row["amsterdam_user_id"] == prev["amsterdam_user_id"]) { time1 = as.POSIXct(as.matrix(row["timestamp"][1])); time2 = as.POSIXct(as.matrix(prev["timestamp"][1])); var_timestamps[i, "difference"] = difftime(time1, time2, units = "secs"); } } aggr = aggregate(var_timestamps$difference, list(var_timestamps$amsterdam_user_id), mean); aggr_cond = aggregate(var_timestamps$condition, list(var_timestamps$amsterdam_user_id), function(values) { return(values[1]); });

(34)

var_summary = data.frame( user = character(nrow(aggr)), condition = numeric(nrow(aggr)), avg_sec = numeric(nrow(aggr)), started = numeric(nrow(aggr)), ended = numeric(nrow(aggr)) ); var_summary$user = aggr[, 1]; var_summary$condition = aggr_cond[, 2]; var_summary$avg_sec = aggr[, 2] / (60 * 60); # Seconds to hours. var_summary$started = aggr_start[, 2]; var_summary$ended = aggr_end[, 2]; var_summary$duration = as.double(difftime(var_summary$ended, var_summary$started, units = "weeks")); var_summary; # N per condition: nrow(var_summary[var_summary$condition == "control", ]); nrow(var_summary[var_summary$condition == "massed", ]); nrow(var_summary[var_summary$condition == "spaced", ]); # Histograms: def.par = par(no.readonly = T); xmax = max(var_summary$avg_sec); layout(matrix(c(1), nrow = 1, ncol = 1, byrow = T)); hist(var_summary$avg_sec, prob = T, xlim = c(0, xmax), breaks = 100, main = "Average time between quizzes (overall)", xlab = "Time in hours"); lines(density(var_summary$avg_sec)); layout(matrix(c(1, 2, 3), nrow = 3, ncol = 1, byrow = T)); hist(var_summary[var_summary$condition == "control", ]$avg_sec, prob = T, xlim = c(0, xmax), breaks = 100, main = "Control", xlab = "Time in hours"); lines(density(var_summary[var_summary$condition == "control", ]$avg_sec)); hist(var_summary[var_summary$condition == "massed", ]$avg_sec, prob = T, xlim = c(0, xmax), breaks = 100, main = "Massed", xlab = "Time in hours"); lines(density(var_summary[var_summary$condition == "massed", ]$avg_sec)); hist(var_summary[var_summary$condition == "spaced", ]$avg_sec, prob = T, xlim = c(0, xmax), breaks = 100, main = "Spaced", xlab = "Time in hours"); lines(density(var_summary[var_summary$condition == "spaced", ]$avg_sec)); # De 2 outliers in de spaced conditie hebben maar 2 quizzes gedaan, met ongeveer een maand ertussen: var_timestamps[var_timestamps$amsterdam_user_id == 'ff069d3156f5001b96b0bff4f24810f7', ]; var_timestamps[var_timestamps$amsterdam_user_id == 'c5dc966de13c5b4079d91280e755ca99', ]; xmax = max(var_summary$duration);

(35)

hist(var_summary$duration, prob = T, xlim = c(0, xmax), breaks = 100, main = "Course completion duration (overall)", xlab = "Time in weeks"); lines(density(var_summary$duration)); layout(matrix(c(1, 2, 3), nrow = 3, ncol = 1, byrow = T)); hist(var_summary[var_summary$condition == "control", ]$duration, prob = T, xlim = c(0, xmax), breaks = 100, main = "Control", xlab = "Time in weeks"); lines(density(var_summary[var_summary$condition == "control", ]$duration)); hist(var_summary[var_summary$condition == "massed", ]$duration, prob = T, xlim = c(0, xmax), breaks = 100, main = "Massed", xlab = "Time in weeks"); lines(density(var_summary[var_summary$condition == "massed", ]$duration)); hist(var_summary[var_summary$condition == "spaced", ]$duration, prob = T, xlim = c(0, xmax), breaks = 100, main = "Spaced", xlab = "Time in weeks"); lines(density(var_summary[var_summary$condition == "spaced", ]$duration)); # ANOVA average time times = c( var_summary[var_summary$condition == "control", ]$avg_sec, var_summary[var_summary$condition == "massed", ]$avg_sec, var_summary[var_summary$condition == "spaced", ]$avg_sec ); groups = c( rep('control', length(var_summary[var_summary$condition == "control", ]$avg_sec)), rep('massed', length(var_summary[var_summary$condition == "massed", ]$avg_sec)), rep('spaced', length(var_summary[var_summary$condition == "spaced", ]$avg_sec)) ); data = data.frame(times, groups); res = aov(times ~ groups, data = data); summary(res); # ANOVA course duration times = c( var_summary[var_summary$condition == "control", ]$duration, var_summary[var_summary$condition == "massed", ]$duration, var_summary[var_summary$condition == "spaced", ]$duration ); groups = c( rep('control', length(var_summary[var_summary$condition == "control", ]$duration)), rep('massed', length(var_summary[var_summary$condition == "massed", ]$duration)), rep('spaced', length(var_summary[var_summary$condition == "spaced", ]$duration)) ); data = data.frame(times, groups); res = aov(times ~ groups, data = data); summary(res);

(36)

(37)

A.4 Exploratory Analysis library(dplyr) # Final Exam ID users_ids < as.data.frame(read.csv2('public.course_branch_items.csv', header = TRUE)) qi < as.matrix(users_ids[which(users_ids[,6] == 'Practice Exam 1 immediate feedback'), 2])[1] # Informed Consent course_items < as.data.frame(read.csv2('public.course_branch_items.csv', header = TRUE)) itemid < as.matrix(course_items[which(course_items[,6] == 'Informed Consent Form'), c(1,2)]) # Assign condition IDs c1 < itemid[1,1] c2 < itemid[2,1] c3 < itemid[3,1] item_answers < as.data.frame(read.csv2('public.course_formative_quiz_grades.csv', header = TRUE)) infans1 < item_answers[which(item_answers[,2] == itemid[1,2]), 5] # blijkbaar mensen die goedkeuring gaven al gefilterd... infans2 < item_answers[which(item_answers[,2] == itemid[2,2]), 5] infans3 < item_answers[which(item_answers[,2] == itemid[3,2]), 5] # eigenlijk wel logisch # Final exam dependent variable feg < as.data.frame(read.csv2('public.course_item_grades.csv', header = TRUE)) depraw < item_answers[which(item_answers[,2] == qi), c(3,5,6)] depgrad < feg[which(feg[,2] == qi), c(3,5,6)] depgradpas < depgrad[which(depgrad[,2] != 0), c(1,3)] as.matrix(depraw[,1]) %in% as.matrix(depgradpas[,1]) # Seperate IDs on conditions connec < as.data.frame(read.csv2('public.course_branch_grades.csv', header = TRUE))[,c(1,2)] ids1 < connec[which(connec[,1] == c1), 2] # user ids per conditie ids2 < connec[which(connec[,1] == c2), 2] ids3 < connec[which(connec[,1] == c3), 2] # Get indices of users in depraw i1 < na.omit(match(depraw[,1], ids1)) i2 < na.omit(match(depraw[,1], ids2)) i3 < na.omit(match(depraw[,1], ids3)) # Indicies of users in seperate conditions in depraw I1 < match(ids1[i1], depraw[,1]) I2 < match(ids2[i2], depraw[,1]) I3 < match(ids3[i3], depraw[,1]) # Grades for each condition cijfers_con1 < depraw[I1, 2]/30 * 10

(38)

# Means for each condition m1 < mean(cijfers_con1) m2 < mean(cijfers_con2) m3 < mean(cijfers_con3) c(m1,m2,m3) length(na.omit(cijfers)) # ANOVA ccom1 < c(cijfers_con1, rep(NA, times = max(length(cijfers_con1), length(cijfers_con2), length(cijfers_con3)) length(cijfers_con1))) ccom2 < c(cijfers_con2, rep(NA, times = max(length(cijfers_con1), length(cijfers_con2), length(cijfers_con3)) length(cijfers_con2))) ccom3 < c(cijfers_con3, rep(NA, times = max(length(cijfers_con1), length(cijfers_con2), length(cijfers_con3)) length(cijfers_con3))) cijfers < c(ccom1, ccom2, ccom3) condities < c(rep('a',length(ccom1)), rep('b', length(ccom1)), rep('c',length(ccom1))) data < data.frame(cijfers, condities) res < aov(cijfers ~ condities, data = data) summary(res) # Gives same results Data < data.frame(cijfers = c(cijfers_con1, cijfers_con2, cijfers_con3),condities = factor(rep(c("1", "2", "3"), times=c(length(cijfers_con1), length(cijfers_con2), length(cijfers_con3))))) res2 < aov(cijfers ~ condities, data = Data, contrasts = 'contr.helmert') summary(res2) fcon < factor(condities) res3 < anova(lm(cijfers ~ condities)) contrasts(fcon) < cbind(c(1, 1/2, 1/2), c(0, 1, 1)) A < aov(cijfers ~ condities) summary.lm(A) ## Assumption checks # Assumption of normality shapiro.test(cijfers_con1) shapiro.test(cijfers_con2) shapiro.test(cijfers_con3) # Assumption homogenity of variance library(Rcmdr) leveneTest(Data[,1], Data[,2])

(39)

Appendix B Extra Quiz Material Week 2 1. What is a possible solution to maturation? a. Select participants at random from a population. b. Control for extraneous influences. c. Create a separate group of participants, which is not manipulated. x 2. When external validity is low a. It is unlikely that the manipulation caused the observed effect. b. No strong generalizations can be made to other contexts. x c. The observed effect can be explained by flaws in the research design. 3. What should you do when the data you found is not in accordance with the predictions you made on basis of your hypothesis? a. Start collecting new data to test your hypothesis again, so you can be more certain it was correctly rejected. b. Reject your hypothesis and start thinking about new hypotheses. c. Evaluate whether the hypothesis can be adjusted on the basis of your results, if so, adjust your hypothesis and collect new data to test your adjusted hypothesis. x A researcher wants to investigate whether people who take notes on a laptop during lectures learn more than people who take notes on paper. She does this by keeping record of which people use a laptop and which use paper to take notes on. At the end of the course she compares the grades on the final test of students who use a laptop to the students who use paper. Students who used paper to take notes on scored significantly higher on the final test than students who used a laptop. 4. What conclusion can be drawn from the above example? a. Taking notes on paper improves learning. b. Taking notes on paper makes students score higher on the final test. c. Students who took notes on paper during lectures scored higher than students who took notes on a laptop. x 5. A colleague of the researcher in the above example, questions whether grades on the final test are a good measure of learning. About what kind of validity is her colleague doubtful? a. External validity b. Construct validity x c. Internal validity 6. Another colleague thinks this relation can be explained by the fact that people who take notes on paper get less distracted by irrelevant stimuli than people who take notes on a laptop. He replicates the original study and measures the amount of distraction students in the two groups experience. Distraction in this study is a a. Background variable b. Confounding variable c. Control variable x 7. When different independent researchers, over different points in time find the same results, this means a. The results have shown to be reliable x b. The result have shown to be valid

(40)

Suppose you are interested in differences in happiness between men and women. The hypothesis you came up with predicts that men are on average happier than women. You think it is important to observe the subjects in their natural setting, so you go to the park and rate men and women on how happy they look. The results show that men look on average happier than women. 8. What is the dependent variable and what is the independent variable? a. The dependent variable is happiness and the independent variable is gender. x b. The independent variable is amount of happiness and the dependent variable is gender. c. The dependent variable is happiness, but there is no independent variable in the above example. 9. What could be an alternative explanation for the results found in the above example? a. Demand characteristics b. Experimenter expectancy effect x c. Background variables 10. The research in the above example is a. Experimenterblind b. Double blind c. Neither experimenterblind nor double blind x Extra vragen voor de toets: 1. Which statement is true? a. Through induction, predictions are formed on the basis of a hypothesis. b. On the basis of predictions, a hypothesis is formed through induction. c. Predictions are formed through deduction from a hypothesis. x 2. An experimenterblind experiment avoids problems with a. Experimenter expectancy effects x b. Lurking variables c. Demand characteristics 3. Choose the right answer I. Empirical statements can never be conclusively rejected. II. Empirical statements can never be proven. a. I is true and II is false. b. I is false and II is true. x c. I and II are both true. 4. What is true about causality? a. When a correlation exists between two variables, this means a causal relation is not possible between the two variables. b. When a causal relation exists between two variables, this means these two variables necessarily correlate. x c. Causality does not imply correlation.

Spaced learning in MOOCs : an online A/B testing experiment