• No results found

What’s in a game? : Game Based Learning and rehearsal strategies : two experiments on addition problems

N/A
N/A
Protected

Academic year: 2021

Share "What’s in a game? : Game Based Learning and rehearsal strategies : two experiments on addition problems"

Copied!
44
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

What’s in a Game? Game Based Learning and Rehearsal

Strategies: Two Experiments on Addition Problems

E.J.S.S. Tan

Student number: 6065090

Master’s thesis Psychological Methods Mentor: dr. R.J. Zwitser

(2)

Abstract

In this thesis two experiments on rehearsal strategies within game based learning (GBL) are presented. A set of math problems were administered at Squla, a web-based GBL platform. Experiment 1 was marked by severe dropout and resulted in omission of one of the three rehearsal strategies. In experiment 2 a memory rehearsal (MR) and operation rehearsal (OR) intervention are compared. Rehearsal of incorrect items without feedback (OR, n = 208) was expected to increase learning efficiency compared to rehearsal of all items with feedback (MR, n = 189 ). However, no difference was found. Explorative research did show that success rates are positively related to engagement. It is proposed that future research on learning efficiency of GBL should include the influence of difficulty on engagement within the research design.

(3)

1. Introduction

Educational games are generally accepted as a tool for educational purposes. Game based products range from single app games to interactive e-learning environments

(Vonderwell & Boboc, 2013; Slade & Prinsloo, 2013). An assumptions underlying game based learning (GBL) is that it facilitates learning by the experience of flow, a mental condition in which one is completely immersed in a task (Beard, 2014; Kiili, 2005). As a consequence, engagement is maintained by both enjoyment and intrinsic motivation (Jabbar, Iliya, & Felicia, 2015). A great deal of research within GBL is focused on the attitude towards GBL (Huang, Huang, & Tschopp, 2010; Jabbar et al., 2015) often measured by a motivation or experience questionnaire. Understanding psychological mechanisms underlying the effects of GBL is undoubtedly important. Especially as a feedback mechanism for design and engagement related hypotheses (Boyle, Connolly, Hainey, & Boyle, 2012). However, relatively little research is done on whether GBL promotes learning (Jabbar et al., 2015), the main focus of the following article.

1.1 GBL Attitude and Learning Efficiency

Few research was found where both motivational aspects and learning efficiency of GBL are included. Moreover, what is illustrated by the three studies described below, it appears that between studies results are mixed. In a study of Boeker, Andel, Werner, and Frankenschmidt (2013) a group of 145 medical students was randomly assigned to one of two instructional methods that were aimed at learning about urine pathologies. Students in the GBL condition were trained by an educational adventure game. Students in the control

(4)

condition got a written script. Scores on posttest were higher for the students in the GBL group. Results also showed that students in the GBL condition rated higher on positive learning experience. The researchers were mildly enthusiastic regarding usage of GBL within the medical domain since former study results were predominantly unconvincing (Boeker et al., 2013).

Wang, Tsai, Chou, and Hung (2010) investigated the effect of GBL on motivation and reasoning skills, the ability to solve complex problems, on 124 elementary school children. The experimental group got a GBL intervention of three strategic thinking games during a 6 weeks trial. The control group got regular teaching without GBL. The experimental group did score higher on educational motivation than the control group. Although no difference could be found for reasoning skills.

In the experiment of Kebritchi, Hirumi, and Bali (2010) the effects of instructional computer games on students’ motivation and mathematics achievement were examined. Ten algebra teachers were randomly assigned to intervention- and control groups which resulted in a total sample of 193 high school student. The intervention consisted of several games including a single- and team player game where mathematical concepts were learned by execution of virtual missions. The control group did not receive a game based intervention. The treatment group scored higher on math achievement than the control group. However, no difference was found between groups concerning math motivation.

The three studies above illustrate that the effect of GBL on both motivation and learning achievement is mixed. When interested in GBL design it is difficult to underpin which factors in the game could be the reason for those mixed results. Clearly the learning subjects differ between studies but maybe more important is that few words are dedicated to game elements that could have influenced the results substantially. An example is the usage of three distinctive games instead of one game to facilitate learning in the study of Kebritchi

(5)

et al. (2010). It is not clear why the researchers thought including multiple games was

necessary. Moreover, the intervention included a multiple player game next to a single player game while nothing is mentioned on how these different type of games are expected to affect learning. Another example is the study of Wang et al. (2010) in which the teacher is present as both instructor and monitor during the game. Again, nothing is mentioned on how this could affect the results although it seems likely that a teacher is a crucial factor for both motivation and learning. The examples show that distinctive choices are made between studies concerning the research design but they lack explanation on why those choices are made. This can be considered as flaw in research methodology since consequences of these unexplained choices could influence motivation and learning substantially or may even be confounding factors. As a consequence, interpretation of results is difficult and therefore uninformative as a starting point for effective game design. In the present research it is proposed that an effect of GBL on learning achievement is only meaningful when founded on an articulated vision on learning which, as a consequence paves the way for a clear

methodology for both present and future research on GBL.

Because the current research was framed by an internship at Squla it’s scope is delimited to test like GBL design, characterized by a multiple choice item format, in which rehearsal of items is used as a learning strategy. Research on the relation between memory and learning will be discussed hereafter, followed by studies that investigated rehearsal strategies in relation to math problems, the learning subject of the present study. The section will be concluded by a description of the present research.

(6)

In the quest for a clear methodology in GBL design, principles from learning theory are chosen as a starting point for research. A comprehensive amount of literature is written on the testing effect, the observation that repeated test trials of educational material combined with reading and repeated studying is a more effective learning strategy than reading and repeated studying alone (Carpenter, 2012; Jönsson, Hedner, & Olsson, 2012; van Gog & Sweller, 2015; Soderstrom, & Bjork, 2014). Besides Squla there are more web-based learning platforms such as Snappet and Oefenweb, that base their game design on the assumption that children learn by means of answering short multiple choice questions in an attractive GBL environment. Nevertheless, the design of such gamy multiple choice questions can still differ greatly, for example in the way rehearsal is applied. In order to determine what kind of rehearsal is effective for learning, studies on memory based rehearsal strategies are discussed first, followed by the proposition of a non memory based rehearsal strategy for math

problems.

1.3 Memory Based Rehearsal Strategies

Next to the earlier mentioned testing effect, another well investigated memory based learning concept is distribution of rehearsal. This area of research was initiated by

Ebbinghaus in 1885 (Mulligan & Peterson, 2014; Schutte, Duhon, Solomon, Poncy, Moore, & Story, 2015). Extensively studied distributions of rehearsal are spaced rehearsal (SR), the distribution of information over several sessions (Schutte et al., 2015) and, as opposed to SR, massed rehearsal (MR), high frequency rehearsal within one session (Smith, 2012). In

general, learning results are in favor of SR strategies when compared to MR strategies, assumedly because in a SR strategy between the rehearsals forgetting occurs while in a MR strategy it does not (Carpenter & DeLosh, 2005; Gerbier & Toppino, 2015; Schutte et al., 2015). Interestingly, most studies on distributed rehearsal are directed at syllable or nonsense

(7)

syllable retrieval. When searching for insights on how math practice is affected by rehearsal much less information is at hand which is the subject of the next paragraph.

1.4 Rehearsal Strategies and Math

Although few research is done, some evidence can be found in support of rehearsal strategies focused on improving math fact fluency which is the quick and correct response to basic math operations in the domain of addition, subtraction, multiplication and division (Bailey Littlefield, & Geary, 2012; Musti-Rao, Lynch, & Plati, 2015). In this section two research designs on the effectiveness of rehearsal for math fact fluency will be described in detail. The first study as an example of memory based rehearsal. The second study

contributes to the idea of another kind of rehearsal strategy. This strategy is defined as operation rehearsal (OR) in which repetition can be seen as a an exercise of skill acquisition not as an exercise of memory.

In a study of Nelson, Burns, Kanive, and Ysseldyke (2013) the effect of a computer program intervention on the application of single-digit multiplication facts was compared to a control group. Participants were around 60 third- and fourth grade students with math

difficulties. The computer practice intervention consisted of a set of math problems that had to be solved. Feedback was given after both correct and incorrect answers, which included providing the right answer. Right hereafter students were allowed to repeat the item. Results on a posttest indicated that the intervention led to higher mean digits correct per minute (DCPM) scores on retention of new math facts compared to the control group.

Schutte et al. (2015) investigated whether a one minute practice intervention led to increased performance on math facts. In this small scale study 48 students were placed in one of three conditions. Over a period of 19 days each group got four times a day a one minute math intervention. The intervention consisted of one minute of solving basic math problems.

(8)

Distribution was either massed in one session, spaced over the whole day or spaced over a double session in the morning and a double session later that day. Results were in favour of the SR conditions compared to the MR condition showing both a higher mean number of DCPM compared to the MR condition.

The studies above both support rehearsal practice for learning simple math facts. The different approach to rehearsal for learning is implicit but obvious when taking a closer look at the rehearsal designs. The approach of Nelson et al. (2013) can be considered memory based since correct answers were given as feedback in order to improve performance on a second item response (Karpicke & Roediger, 2008). The second approach can be considered an operation based rehearsal strategy since Schutte et al. (2015) focussed on the repeated practice of an ability, not the actual repetition of specific content. Seemingly, the assumption underlying the design of Schutte et al. (2015) is that math skill acquisition is accomplished through repeated practice of the operation, rather than through memory retrieval as proposed by Nelson et al. (2013).

What to think of the memory based approach when correct answers are given as feedback upon rehearsal? According to famous learning theorist Edwin Guthrie (1942), making an error strengthens undesirable response patterns even when one was conscious of this process (Kang, Pashler, Cepeda, Rohrer, Carpenter, & Mozer, 2011). Hence, it seems legit to add the correct answer as feedback, including a direct rehearsal trial as in the study by Nelson et al. (2013) because it provides the opportunity to instantly overwrite an error with a successful response and thus increase strengthening the desired behavior. On the other hand one could doubt how overwriting the error with a successful response due to memory will lead to improved math ability. The operations of multi-digit problems, the interest of the current research, could be considered complex since they require a carry in the addition and are not solely based on retrieval processes (van der Ven, Straatemeier, Jansen, Klinkenberg,

(9)

& van der Maas, 2015).Therefore, remembering an answer to overwrite an earlier error does not seem an effective intervention to promote learning for complex math problems.

The approach to learning of complex math tasks in the present research shares some commonalities with the theory of intelligence of Sternberg (1985; Sternberg, 1989; cited in Fein & Day, 2004). This theory of intelligence distinguishes three components of which two, the knowledge-acquisition components and performance components, reflect the two

rehearsal approaches at hand. The knowledge-acquisition components are referred to as processes that support learning of new information and storing it in memory which are analogous to the MR strategy. Whereas the performance components, described as mental processes that support stimuli encoding, inferring relations between stimuli, and applying previously learned relations to new situations, are analogous the OR strategy (Fein & Day, 2004). By taking rehearsal as a possible learning strategy, an effort is made to clarify in what way learning can be promoted by GBL with regards to complex math tasks.To our

knowledge, no study has ever compared the effects of memory based rehearsal versus operation based rehearsal on complex math problems before. Implications for the GBL design will be explained more in detail in the methods section.

(10)

1.5 The Present Research

This study is performed as part of an internship at the GBL platform Squla and therefore certain game characteristic were the departure point of this research. This Dutch online learning platform offers missions, gamy multiple choice questions in a rehearsal format closely resembling testing. Answers on multiple choice questions are given through a catapult throw on a can, as shown in Figure 1, or by a click on an answer bar, see Figure 2.

(11)

Figure 2. Squla interface with multiple choice addition problem.

The assumption underlying this GBL design is that through a testing effect learning outcomes improve. A second assumption of Squla’s game design is that repetition with correct answer feedback in the same game session leads to improved learning achievement. The last, yet important characteristic of the player logic is that children can leave and re-enter the missions any time they choose, resulting in self-regulated distribution of learning.

The purpose of this research was to study GBL rehearsal designs based on formal theory and empirical research findings while staying close to the existing GBL design in Squla. The effect of different rehearsal strategies on complex addition tasks was tested within an existing online game environment. In the first experiment, three rehearsal strategies are examined for learning efficiency concerning complex math problems. SR strategy is expected to promote learning efficiency because in SR forgetting can occur between the rehearsal of items. OR strategy is believed to enhance learning as well since it focusses on rehearsal of the

(12)

operation and, as proposed by Guthrie, problems are practiced until all responses are correct while at the same time memory effects are reduced as much as possible. MR strategy is not expected to promote learning since it addresses short term memory. Therefore, in experiment 1, through different GBL interventions, both SR and OR are expected increase learning efficiency with regards to complex math problems while MR is not. In experiment 2 adjustments in research design are made based on results from the first experiment. Furthermore, descriptives and explorative research on dropout will be provided. Due to engagement problems that were encountered in experiment 1 it was decided to test for MR strategy versus OR strategy only in experiment 2. Further, motivated by results of experiment 2, explorative research on condition mission data was performed to investigate whether engagement was influenced by success rate and rehearsal strategy. At last, results will be discussed together with recommendations for further research on effects of GBL.

(13)

2. Experiment 1 Method

The independent variable of this randomized experiment is rehearsal strategy. Rehearsal strategy is operationalized through three different GBL rehearsal condition missions of which MR and SR are both memory based interventions and OR is an operation based intervention. The dependent variable learning efficiency with regards to complex math tasks is operationalized with a posttest mission that is the same for all conditions. In a pilot study it was found that administration of more than 15 items resulted in substantial dropout. Since the interest of the research was the effect of rehearsal strategy it was decided to limit the amount of unique items for both condition and posttest mission to five to ensure that rehearsal of items would actually take place. The selection of items was based on difficulty level index (DI), the (expected) number of correct answers on a given item divided by the total number of given answers (Cohen & Swerdlik, 2010). Success rate is a term related to DI which is DI multiplied by a hundred. In this research DI is used in relation to information on item level and success rate is used in relation to results on group level. To promote

measurement precision while taking engagement into account ((Eggen & Verschoor, 2006; Klinkenberg et al., 2011) items with DI’s between 0.5 and 0.7 were chosen for condition and posttest missions. Descriptions of design for the condition missions and posttest mission are next.

Massed rehearsal strategy. This condition was designed after the study of Nelson et al.

(2013). Participants in the MR condition received five unique multiple choice questions in Dutch, see translated items in appendix 1. These five items were repeated in three sets, independent of performance, resulting in 15 items in the mission. Feedback showed the correct and, when applicable, the incorrect answer. After completion of the 15 items the mission disappeared from the platform, making additional rehearsal impossible.

(14)

Spaced rehearsal strategy. This condition mission contained the same unique items and

amount of rehearsal but distributed in a spaced way in order to promote forgetting between sets. This condition contained the same five items as MR and OR however, rehearsal of the items was done on three different days spread over the experiment week. After completion of the mission it would disappear from the platform.

Operation rehearsal strategy. This condition was designed after the study of Schutte et al.

(2015). Participants in the OR condition received the same five items as MR and SR. Except, they were repeated only when played incorrect and did not contain feedback about which answer was correct, only which answer was incorrect in order to minimise a memory effect. Again, participants had no more access to the mission after completion.

Procedure. All participants were randomly assigned to one of three rehearsal conditions. The

condition missions were activated in a 6 days timeframe in order to standardize time intervals within and between conditions, see table 1. In the implementation funneling would occur meaning only when a participant finished the rehearsal condition a participant would be allowed to proceed into the posttest mission.

Table 1

Timetable of Activations per Condition for Experiment 1.

Day 1 2 3 4 5 6 7 8

MR - - - MR - posttest

SR SR-1 - SR-2 - - SR-3 - posttest

OR - - - OR - posttest

Learning efficiency. Learning efficiency was measured through a posttest mission consisting

of five items comparable to condition items, see appendix 1. The items in this mission were administered in a random order, to participants in all conditions.

(15)

Power analyses in G*Power (Faul, Erdfelder, Lang & Buchner, 2007). Because differences

between condition designs were distinct but the impact of the operationalisation was expected to be subtle, a small positive effect was predicted for SR compared to MR and OR compared to MR. Using Cohen’s prescriptions, a small effect of q = 0.15 was expected. Setting the power to 0.8 and � = 0.05, a final sample size of 1659 participants was required. Considering the pilot study condition dropout was expected to be around 30% but, due to the standardized rehearsal format, more dropout was expected, especially for condition SR. Therefore

distribution over conditions was slightly uneven over conditions with SR= 40: MR = 30: OR = 30.

Participants. From Squla 6573 only home users and not school users to reduce possible effects on learning caused by a scholarly setting. All selected participants were playing in the 6th grade level and had a matching birth date to 6th grade level to reduce the chance of inappropriate sampling of participants. In order to be able to repeat the experiment about half of the selected sample was kept aside for experiment 2. For experiment 1 a sample of 2600 participants was distributed as followed: SR, n = 1000: MR, n = 800: OR, n = 800. All participants had access from their home computer. The participants were not informed about the research, neither were their parents.

(16)

3. Results Experiment 1

From the data could be concluded that implementation difficulties had resulted in a poor dataset. It was found that despite the predetermined distribution over conditions of 40:30:30 (SR, n = 1000; MR, n = 800; OR, n = 800) the distribution of participants was disproportionate, being approximately 50:35:15 (SR, n = 175; MR, n = 112; OR, n = 41). Secondly, funneling of participants from condition to posttest mission had failed which had resulted in non valid participation at posttest mission. Finally, next to implementation problems of the research design, high dropout rates made hypothesis testing impossible. A total of five participants finished to whole experiment based on the common procedure, see Table 2.

Table 2

Distribution of participants over conditions by click rate, dropout and finish rate for experiment 1.

SR MR OR Total Condition Clicked 175 112 41 328 Dropout or/and unusable 171 55 12 238 Finished Condition 4 57 29 90 posttest Clicked 11 9 6 26 Dropout or/and unusable 9 8 4 21 Finished posttest 2 1 2 5

(17)

4. Discussion Experiment 1

The high amount of dropout was a major setback for experiment 1. To prepare for experiment 2 general login frequencies from the Squla database were obtained based on the expectation that the experimental rehearsal schedule did not match actual login behavior of participants. The average login frequency was found to be two times a week and the median login frequency three times a week. Based on this information it was concluded that the chance of capturing children playing on the specified days was fairly small and, to minimise the chance on dropout of similar magnitude in experiment 2, redesign was necessary. The redesign concerning reduction of dropout consisted of five changes that are discussed next.

The first change was to allow for self-distributed practice of learning. Exiting and re-entering the experiment would be possible throughout the whole research period. One advantage of changing the research administration to a self distributed practice design is assumed to reduce the risk of dropout. Also, it would be more ecologically valid, resembling the authentic experience of GBL within Squla. Based on dropout rates of the first experiment and the argument of ecological validity, standardized time frames of administration of

rehearsal strategies were dismissed for MR and OR. The second change was to take out the SR condition as a whole. Based on its dropout rate and Squla login frequencies this condition seemed to demanding to test within the Squla platform.The third change within the design was to extend the experiment and make access possible for a two week period instead of a one week period. Fourthly, to minimise dropout, completion of the condition mission would be followed up by immediate unlocking of the posttest mission. The final change was an implementation push for the experiment as a whole. This implied that the selected

participants would have the experiment mission appear on the screen before and after every mission they would play on the Squla platform until they had finished both the condition and

(18)

posttest mission. As a consequence of the adjustments in experiment 2 only hypothesis testing was done for MR strategy versus OR strategy, entailing that higher learning efficiency was expected for participants in OR compared to participants in MR.

5. Experiment 2 Method

As mentioned in the discussion of experiment 1 several adjustments in the research design had to be made for experiment 2. The independent variable rehearsal strategy was

(19)

only operationalised through two conditions, massed and operation rehearsal strategy to increase the chance on high power of the study. Besides learning efficiency also engagement would be included as a dependent variable for explorative research purposes. Descriptions of the operalization of variables are next.

Massed rehearsal strategy. No adjustments were made.

Operation rehearsal strategy. A small adjustment in feedback was made to reduce the effect

of memory feedback. Instead of highlighting the incorrect answer, the only feedback present would be audible, for both correct and incorrect responses.

Learning efficiency. No changes in posttest were made.

Engagement. For explorative research engagement was chosen as a dependent binary

variable. When participants finished the mission this was considered engaged while dropout of the mission was considered not engaged. Participants were considered as dropout when at least one but not all questions of the condition mission were answered.

Power analyses in G*Power (Faul et al. 2007). Because differences between condition

designs were distinct but the impact of the operationalisation was expected to be subtle it seemed a small effect positive was predicted for OR compared to MR. Using a one tailed Z-test for difference of proportions between independent samples a small effect of q = 0.1 was expected. Setting the power to 0.8 and � = 0.05, a sample size of 2480 that finished the experiment was required.

Participants and design. Due to miscommunication the sample of participants that were

reserved for the second experiment had been added to the first experiment therefore these participants were no longer regarded as valid participants for experiment 2. To assure random sampling and increase the chance on a powered study, a selection of 31.905 newparticipants, either player level grade 6 or matching birthdate grade 6 were selected from Squla. Different from experiment 1, not only home users but also classroom users were incorporated in the

(20)

selected sample. Since the research was randomized and teachers were not informed about the research, it was not expected that teacher involvement would bias the results.

Procedure. All participants were randomly assigned to one of two rehearsal conditions upon login to the Squla website. The posttest unlocked immediately after the condition mission was finished. Exiting and re-entering the mission was allowed during a 14 day time span, except when a participant had finished the mission.

(21)

6. Results Experiment 2 Hypothesis testing

A total of 2648 participants (MR, n = 1319; OR, n = 1329) started the experiment. Two categories of participant data that were not included in hypothesis testing can be

distinguished. Firstly, unusable data of 1120 participants (MR, n = 501; OR, n = 619) defined as data of participants that only clicked on the missions but did not proceed further into the experiment or to invalid data due to bugs that had resulted in either bonus rehearsal or skipping of missions and items. Secondly, there was a considerable amount of dropout throughout the experiment. There were three moments within the experiment where dropout occurred. Dropout 1 contains participants from the condition missions that did not finish the condition but did answer at least one item. Between dropout is defined as the participants that did finish the condition mission but did not proceed further into posttest. Dropout 2 are participants that finished the condition missions and answered at least one item at posttest but did not finish the posttest. Dropout and sample sizes for condition and posttest are shown in Table 3. The final sample size consisting of participants that finished both the condition missions and the posttest mission as intended was N = 397, (MR, n = 189; OR, n = 208).

(22)

Table 3

Number of participants and drop out in conditions and posttest.

Condition MR OR Total Total 818 710 1528 Dropout 1 -447 -302 -749 Finished 1 371 408 784 Posttest MR OR Total Between Dropout -131 - 142 -273 Subtotal 240 266 506 Dropout 2 -51 -58 -109 Finished 2 189 208 397

To measure scale reliability of the posttest the Cronbach’s alpha was performed with function cronbach() in R (R Core Team, 2016)resulting in α = 0.56 which can be

considered acceptable for a small scale (Field, 2009). The frequency distribution of the posttest data appeared to be approximately normally distributed for both MR and OR, see Figure 3. To test for differences in proportions correct between conditions a Z- test and carried out in R (R Core Team, 2016) by the z.prop() function. Although participants in OR (M = 0.57) performed slightly better than participants in MR (M = 0.56), the difference was not significant, z = (1, N = 397) = 0.73, p = 0.46.

(23)

Figure 3. Distribution of success rate on posttest by condition.

A post hoc power analysis in G*Power (Faul et al. 2007) indicated that the final sample size of N = 397 had resulted in a low power of 0.11. Therefore in the present research the null hypothesis could not be rejected, but the chance of rejecting the null hypothesis in case it was false was anyway fairly small.

(24)

7. Explorative Analysis 7.1 Descriptives and Hypothesis Testing

Because dropout rates were unexpectedly high for both experiments it was decided to investigate further how engagement of participants could have been affected by looking into data of the condition missions of experiment 2. The exploration dataset therefore consists of data of participants that dropped out after answering the first item of the condition mission and data of participants that finished the condition mission. The first paragraph presents descriptives of MR on success rates and dropout rates. These findings were the departure point for explorative hypothesis testing on data of both MR and OR. By means of a logistic regression it was investigated whether engagement of participants was affected by success rate and condition.

7.2 Descriptives for Memory Rehearsal Condition

Four descriptives concerning the MR data are relevant to address. First, an important finding was that the success rates of the experiment were unexpectedly low compared to the success rates of the pilot studies, see Itemset Pilot and Itemset 1 in Table 5. Second, within Table 5

Success rates in percentages by attempt on item for condition MR (N = 818)

Itemset Pilot 1 2 3 Item 1 59 47 72 85 Item 2 63 40 71 84 Item 3 65 45 73 87 Item 4 68 56 71 84 Item 5 53 49 77 89

(25)

the condition mission success rates increased with about 30%, see Itemset 1 compared to Itemset 3 Table 5. A raise in success rates was expected since we assumed that within the MR condition short term memory would lead to inflated success rates. Third, when looking at the MR condition data the highest number of dropout was observed in the first itemset with 38% and was much less for the second and third itemset with respectively 11% and 5% dropout, see Figure 4. Fourthly, it was observed that, although both dropout and finishing participants

Figure 4. Dropout rates in percentage and absolute numbers per item set for condition MR.

show an increase in success rates, participants that dropped out (n = 447) have a lower average success rate (M = 48.33) compared to the group that finished (n = 371, M = 73.33), see Figure 5.

(26)

Figure 5. Success rates for finished and dropped out participants in MR (N = 818). F = participants that finished the mission; D = participants that dropped out.

Based on the former descriptives it was decided to explore all the condition data, thus including OR data, for effects of success rate on engagement. It was expected that the

experience of item difficulty could be an explanation for the high number of dropout within the present research.

7.3 Hypothesis testing and Condition Data

Motivated by the descriptives mentioned in the former paragraph additional analysis was performed to see how success rates and condition group could have influenced

engagement. The data included data from participants of both MR and OR that dropped out after one question or later in the condition mission and data of participants that finished the missions. The condition groups were coded as 0 = MR and 1 = OR. This sample consisted of 1528 participants of which 749 participants (54%) finished the mission, coded as F, and 779 participants (46%) dropped out in the mission coded as D.

(27)

It was expected that participants with a higher success rate would be higher engaged increasing the chance in finishing the condition mission. Moreover, an interaction was expected due to differences in rehearsal logic between MR and OR. Since the MR rehearsal logic is assumed to inflate success rates due to memory effects while OR is not, a higher increase of success rate was expected for MR when compared to OR. Also, eyeballing the dropout descriptives for separate conditions, OR seemed less prone to dropout than MR (OR, n = 302 (42%); MR, n = 447 (55%). To test the hypotheses the predictors success rates (X1 =

Success rate), condition group (X2 = Group) and the interaction term (X1 X2 = Success

rate*Group) were regressed on the variable finished.

Condition groups were nearly even distributed with OR = 46.5% (n = 710) and MR = 53.5% (n = 818). The success rate ranged from 0 to 100 % with M = 50.25, SD = 32.35 for the total sample. For descriptives by condition group see Table 4.

Table 4

Sample and descriptives data for logistic regression.

Finished the mission?

Total sample MR OR Success Rate

M SD

Yes 749 371 408 67.35 22.88

No 779 447 302 32.47 31.15

Summary 1528 818 710 50.25 32.35

A three predictor logistic model was fitted to the data to test the research hypothesis regarding the relationship between the likelihood that a child finished the mission and her

(28)

success rate and condition group. The analysis was carried out in R (R Core Team, 2016) using the glm function set to binomial. The results showed that

Predicted logit of (FINISHED) = - 3.156736 + (1.757960)*Group + (0.053984)*Success Rate + (- 0.017337)*Success Rate*Group

Further information on individual predictors are shown in Table 5.

Table 5

Logistic Regression Analysis on 1528 participants that did or did not finish the condition mission by glm in R

Predictor � SE z-statistic df p odds ratio

Constant -3.1567 0.2350 -13.439 1 < 0.0001 NA Success Rate 0.0540 0.0037 4.5131 1 <0.0001 1.0555 Condition Group (1= OR, 0= MR) 1.7580 0.2880 6.104 1 <0.0001 5.8006 Success Rate* Condition Group -0.0173 0.0049 -3.527 1 <0.001 0.9828

As expected, the logistic regression indicated an interaction for success rate and condition, β = -0.017, p < .001. The interaction indicates that the effect of success rate is

(29)

indeed different for MR compared to OR, see Figure 6.

Figure 6. Logistic regression curves for MR and OR.

For participants in MR (Group = 0) the effect of succes rate is 0,053984 this means a one-unit increase in success rate yields a change in log odds of approximately 0.054. This is explained as an increase in chance of finishing the mission with 5% for every one increase in success rate. For participants in OR (Group = 1) the effect of succes rate is 0.053984 + (-.017337*1) indicating that a one-unit increase in success rate yields a change in log odds of 0.037. Explained as an increased chance of finishing the mission with 3% for one increase in success rate.

The results show that the chance of finishing the condition mission for every one increase in success rate is therefore higher for MR (5%) than OR (3%) although the overall chance of finishing is higher for participants in OR compared to participants in MR. For both OR and MR higher success rates were related to a higher chance of finishing the mission.

To assess model fit it was tested whether the model with predictors fitted significantly better than a null model with only an intercept. To find the difference in deviance for the two models the with() and pchisq() functions in R (R Core Team, 2016) were used.

(30)

Results of the Chi-square test were highly significant, χ²(3, N = 1528) = 557.6885, p < 0.001,

(31)

8. Discussion Experiment 2

In this experiment no evidence was found in favour of an operation based rehearsal strategy for complex math problems when compared to a memory based rehearsal strategy. However, in explorative research on condition data it was found that, regardless of rehearsal strategy, higher math success rates were related to higher engagement.

One reason why OR did not result in higher math performance than MR in the present research is expected to be caused by lack of power. More specifically, that the

operationalisation of OR was too weak to observe an effect. Ideally, as with the research of Schutte et al. (2015) a more frequent distribution of practice, together with an administration of a larger number of items, would have taken place to ensure a powerful manipulation. However, an encountered problem in experiment 1 was that a standardized spaced distributed rehearsal condition did not result in a big enough a sample for analytics. In retrospect, it seems that the chance on finding an effect of OR could have been higher when engagement was enforced in two ways. First, by seeking cooperation with teachers, as with the

experiment of Kebritchi et al. (2013), because they are assumed to positively influence engagement as measured in this experiment. The same influence on engagement could come from parents when GBL is played on home devices. Second, engagement could be enforced by upward adjustment of DI’s. Although DI’s for the experiment were specifically chosen between 0.5 and 0.7 it did not result in comparable success rates within the actual experiment. A possible explanation for the discrepancy is that at the time of the pilot random

administration of items was not yet possible on the Squla platform. Difficulty would have caused children to dropout at the beginning of the pilot mission and therefore the items that were included for the experiment were based on the performance of children with above

(32)

average math skills. As a consequence higher success rates are observed in the pilot study when compared to the first itemset of MR. As mentioned by Klinkenberg, Straatemeier and van der Maas (2011) a DI of around 0.5 is experienced by the majority of children as demotivating. Looking at the MR data, observing success rates in the first itemset close to 50%, similar to DI = 0.5, it is expected that feelings of demotivation caused dropout of such magnitude. Moreover, it can explain why the majority of dropout took place in the beginning of the game. Based on these observations it has to be acknowledged that, although

engagement and difficulty were not the scope of this research they have proven to be critical factors for the current GBL research.

The relationship between difficulty and engagement is extensively studied in the field of Computer Adaptive Testing (CAT). With CAT it is possible to program expected success rates according to individual ability levels, which could diminish the chance on drop out. This can be illustrated by the study of Jansen et al. (2013) on effects of math anxiety and success rates on math performance. During a six week intervention the researchers used a CAT math game within the online Oefenweb domain to compare a control group and three experimental groups on learning efficiency. The corresponding chance of success on solving math

problems for the experimental groups was easy with DI = 0.9, medium with DI = 0.7 and difficult with DI = 0.6. Results indicated that higher success rates positively affected engagement, the number of problems that were tackled. Moreover math performance

increased for all experimental groups with the most learning progress for children in the easy chance of success rate. In line with the results from Jansen et al. (2013) dropout was less for children with higher success rates. Although the game being used by Jansen et al. (2013) was much more sophisticated than the one presented here it was possible to gain similar results on engagement. When looking specifically into engagement data of the conditions it was found

(33)

that higher success rates positively influenced engagement observed as lower dropout of children during GBL.

Further, two more alternative explanations for not finding a difference in favor of OR are brought up in a meta-review on the testing effect by van Gog and Sweller (2015). Firstly, it could be that the test effect was present but not observed. The researchers designate that a delay between the test intervention and the posttest is an important factor in finding

differences between the experimental and control group. Although in experiment 1 such a delay was implemented, in order to collect data it was found necessary to let go of a

standardised time schedule. Although beyond the scope of this research, for future research it could be interesting to investigate how automation of time schedules that allow for individual login frequency within certain time interval parameters could result in optimal research and learning schedules. Secondly, van Gog and Sweller (2015) pose that when problems are considered complex it is possible that the testing effect disappears or is not even detectable

(Van Gog and Sweller, 2015). Based on the present research this suggestion seem legit.

However, as shown by the earlier mentioned research of Jansen et al. (2013) in which not only test based GBL was used but also math problems were administered to 6th graders. Such math problems can be considered complex as well therefore, the advantage of CAT, being able to manipulate chance on success rates while controlling for individual differences in math ability, might be crucial in maintaining engagement in order to promote learning efficiency. In future research it would be interesting to investigate whether game based CAT is more effective for learning than other math related games, including a control group. Since, although results are promising, it is still conceivable that CAT improves learning efficiency because more time on math problems is spend rather than positive effects of game based CAT alone (Jansen et al., 2013).

(34)

Also interesting findings came from dropout descriptives and success rates of MR. It was observed that after initial low success rates in the first itemset rehearsal lead to a strong increases of success rates. However, the mean posttest success rate of MR was much lower than those of the rehearsed itemsets within the condition mission of MR. Therefore, when interested in estimation of math skill it is strongly advised to discard success rates coming from a massed rehearsal GBL design, especially when correct answer feedback is included.

Finally, it is worth mentioning that dropout could not only be viewed as a problem but, when interested in behavior it also showed to be a source of information for research design and useful as a metric. This is clearly illustrated by the present research since insights on motivation during the experiments were given through dropout rates within the missions. Although dropout used as an engagement metric is not aimed at providing insight on specific GBL characteristics it does provide a “fast answer” on whether the research is worth

pursuing. Especially in the case of online research, data can be relatively easy extracted and adjusted when necessary. Moreover, it can overcome reliability problems that arise when motivation is measured by an oral response. This can be illustrated by a youtube video of a boy playing Squla. When the boy, after taking some substantial amount of thinking time, answers the item incorrectly, instead of admitting it was difficult, he claims the game was boring and immediately exits the game. Another but related reliability problem can arise when children have the tendency to answer in a socially desirable way. For example, when a researcher would ask a child how she likes the the game, a socially desirable response would be to exhibit gratefulness. The child would therefore say it is nice, independently of how she really feels about the game. For future research it could be interesting to investigate on more possible engagement metrics such as time spent on learning subject or login frequency on learning subject. Having reliable behavioral metrics supported by a powerful database system

(35)

could be an important addition in the search for more insights on GBL or other online behavior related research.

In the present research no evidence could be found for OR strategy compared to a MR strategy for complex math problems. Explorative research indicated that higher success rates were associated with higher engagement. Although engagement and it’s relation to difficulty was not the main focus of this research it does seems to be the golden ticket to learning efficiency. From a practical point of view it is important to understand how GBL results can be influenced by memory and rehearsal. Moreover, it is important for all stakeholders in GBL to acknowledge that increasing success rates are not necessarily a sign of learning. In order to produce meaningful results, it is proposed that future research on GBL should be more articulate in the explanation of its learning design and encompass both engagement and learning efficiency.

(36)

Literature

Bailey, D. H., Littlefield, A., & Geary, D. C. (2012). The codevelopment of skill at and preference for use of retrieval--based processes for solving addition problems: Individual and sex differences from first to sixth grades. Journal of

Experimental Child Psychology, 113, 78--92.

Beard, K. S. (2014). Theoretically speaking: An interview with Mihaly Csikszentmihalyi on flow theory development and its usefulness in addressing contemporary challenges in education. Educational Psychological Review, 27, 353--364.

Boeker, M., Andel, P., Werner, V., & Frankenschmidt, A. (2013). Game-based e-learning is more effective than a conventional instructional method: a randomized controlled trial with third-year medical students. PLOS ONE, 8 (12). doi: 10.1371/journal.pone.0082328

Boyle, E. A., Connolly, T. M., Hainey, T., & Boyle, J. M. (2012). Engagement in

digital entertainment games: A systematic review. Computers in Human Behavior 28(3), 771--780.

Carpenter, S. K. (2012). Testing enhances the transfer of learning. Current Directions in Psychological Science, 21(5), 279–283.

Carpenter, S. K., & DeLosh, E. L. (2005). Application of the testing and spacing effects to name learning. Applied Cognitive Psychology, 19, 619–636.

(37)

tests & measurements (7th ed.) New York: McGraw-Hill.

Eggen, T. J. H. M, & Verschoor, A.J. (2006). Optimal testing with easy or difficult items in computerized adaptive testing. Applied Psychological Measurement, 30(5),

379-393.

Faul, F., Erdfelder, E., Lang, A.-G., & Buchner, A. (2007). G*Power 3: A flexible statistical power analysis program for the social, behavioral, and biomedical sciences. Behavior Research Methods,39, 175-191.

Fein, E. C., & Day, E. A. (2004).The PASS theory of intelligence and the acquisition of a complex skill: a criterion-related validation study of Cognitive Assessment System scores.

Personality and Individual Differences, 37(6), 1123–1136. doi:10.1016/j.paid.2003.11.017.

Field, A. (2000). Discovering statistics using SPSS (3th ed.) London: SAGE.

Gerbier, E., & Toppino, T. C. (2015). The effect of distributed practice: Neuroscience, cognition, and education. Trends in Neuroscience and Education, 4, 49–59.

Harrell, F. E., (2016). rms: Regression Modeling Strategies. R package version 4.5-0.

https://CRAN.R-project.org/package=rmsHarrell Jr, (2016). rms: Regression Modeling Strategies. R package version 4.5-0. https://CRAN.R-project.org/package=rms.

Huang, W., Huang, W., & Tschopp, J. (2010). Sustaining iterative game playing

processes in digital game based learning: The relationship between motivational processing and outcome processing. Computers & Education, 55, 789--797.

Jansen, B., Louwerse, J., Straatemeier, M., Van der Ven, S. H. G.,Klinkenberg, S., & van der Maas, H., L., J ( 2013). The influence of experiencing success in math on math

(38)

anxiety, perceived math competence, and math performance. Learning and Individual Differences, 24, 190-197.

Jönsson, F. U., Hedner, M., & Olsson, M. J. (2012). The testing effect as a function of explicit testing instructions and judgments of learning. Experimental Psychology, 59 (5), 251-257.

Kang, S. H., Pashler, H., Cepeda, N. J., Rohrer, D., Carpenter, S. K., & Mozer, M. C. (2011). Does incorrect guessing impair fact learning? Journal of Educational Psychology, 103(1), 48-59.

Karpicke, J. D., & Roediger, H. L. (2008). The critical importance of retrieval for learning, Science, 319(5865), 966--8.

Kebritchi, M., Hirumi, A., & Bali, H. (2010). The effects of modern mathematics computer games on mathematics achievement and class motivation. Computer & Education, 55(2),427-443

Kiili, K. (2005). Digital game--based learning: Towards an experiential gaming model. The Internet and Higher Education, 8(1), 13--24.

Klinkenberg, S., Straatemeier, M., & Van Der Maas, H. L. J. (2011).Computer Adaptive Practice of Maths Ability Using a New Item Response Model for on the Fly Ability and Difficulty

Estimation. Computers & Education, 57(2), 1813-1824.

Mulligan, N. W., & Peterson, D. J. (2014). The spacing effect and metacognitive

(39)

40(1), 306–311.

Musti-Rao, S., Lynch, L. L., & Plati, E. (2015). Training for fluency and generalisation of math facts using technology. Intervention in School and Clinic, 51(2), 112-117.

Nelson, P. M., Burns, M. K., Kanive, R., & Ysseldyke, J. E. (2013). Comparison of a math fact rehearsal and a mnemonic strategy approach for improving math fact fluency. Journal of School Psychology, 51(6), 659-667.

Peng, C. J., Lee, K. L., & Ingersoll, G. M. (2002). An introduction to logistic regression analysis and reporting. The Journal of Educational Research, 96(1), 3-14.

Rawson, K. A., & Dunlosky, J. (2013). Relearning attenuates the benefits and costs of spacing. Journal of Experimental Psychology, 142(4), 113--1129.

R Core Team (2016). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL: https://www.R-project.org/.

Schutte, G. M., Duhon, G. J., Solomon, B. G., Poncy, B. C., Moore, K., & Story, B. (2015). A comparative analysis of massed vs. distributed practice on basic math fact fluency growth rates. Journal of School Psychology, 53, 149–159.

Slade, S., & Prinsloo, P. (2013). Learning analytics: Ethical issues and dilemmas. American

Behavioral Scientist,57(10), 1510-1529.

Smith, S. A. (2012). An exploration of the negative Effects of repetition and testing on Memory. Yale Review of Undergraduate Research in Psychology, 78--91.

(40)

Soderstrom, N. C., & Bjork, R. A. (2014). Testing facilitates the regulation of subsequent study time. Journal of Memory and Language, 73, 99–115.

Wickham, H. (2011). The Split-Apply-Combine Strategy for Data Analysis. Journal of Statistical Software, 40(1), 1-29. URL: http://www.jstatsoft.org/v40/i01/.

Wickham, H. (2007). Reshaping data with the reshape package. Journal of Statistical Software, 21(12).

Wickham, H., & Francois, R. (2015). dplyr: A Grammar of Data Manipulation. R package version 0.4.3. https://CRAN.R-project.org/package=dplyr

Van Gog, T., & Sweller, J. (2015). Not new, but nearly forgotten: the testing effect decreases or even disappears as the complexity of learning materials increases. Educational

Psychology Review, 27, 47–264. doi: 10.1007/s10648-015-9310-x

Van der Ven, S. H. G., Straatemeier, M., Jansen, B. R. J., Klinkenberg, S., & van der Maas, H. L. J. (2015). Learning multiplication: An integrated analysis of the multiplication ability of primary school children and the difficulty of single digit and multidigit multiplication problems. Learning and Individual Differences, 43, 48-62.

Vonderwell S. K., & Boboc M. (2013). Promoting Formative Assessment in Online Teaching and Learning. TechTrends, 57(4), 22--27.

Wang, H., Tsai, C., Chou, H., & Hung, H. (2010). The study of motivation and reasoning faculties of game-based learning in elementary school students. International

(41)
(42)

Appendix 1.

Item content and success rates percentages for condition and posttest mission

Item Condition Content Pilot Success Rates Posttest Content Pilot Success Rates 1. Martina has 136

marbles, Ruben 178, Tom 204 en Carmen 193. How many marbles do they have together?

59 Sol, Samir en Simeon are

counting cars for an hour. Sol counts 187 blue cars, Samir 215 black cars and Simeon 189 grey cars. How many cars did they count in total?

52 2. Calculate. 333 + 777 = ... 63 Calculate. 222 + 888 = ... 67

3. Primary school “de Oorsprong” has 547 chairs. De

concierge ordered another 78.

65 Suzy, Leon and Tobias talk to their grandma through webcam. This month Suzy talked 134 minutes, Leon

(43)

Soon the school will have, how many chairs?

68 minutes en Tom 212 minutes. How many minutes did they talk to grandma in total? 4. Calculate. 268 + 9 + 45 = ... 68 Calculate. 370 + 18 +7 = ... 64 5. Mohammed

emptied his money-box. He has one € 50 bill, three of € 20. four of € 5 and twenty € 1 coins.

How much euro did he save?

53 Abel has € 25 in his money-box. On his birthday he gets another € 30 of his parents and € 62,50 of his grandmothers.

How much money has he got now?

(44)

Referenties

GERELATEERDE DOCUMENTEN

In order to know if people make a realistic decision in the game, we need to know what people would do in real life when facing such a situation.. For this research, the best method

This result can be applied in two ways; firstly that the influence of any external vibration spectrum on the flow error can be estimated and secondly that the performance of different

If the sidepayment procedure yields a cycle containing a strong arc, then we can apply the permutation procedure and get a strict decrease of the number of strong ares as well..

Copyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners and it is a condition of

If this volume, then, will also reflect on the life and legacy of Beyers Naudé, how 

Like the glucoside, the conjugated me.tabolites are also non-toxic to cells in tissue culture, but they can be activated by treatment with glucuronidase.' Since it is known that

Verwijzing door huisarts Gesprek met welzijnscoach Gekozen activiteit Welzijn op Recept in 3 stappen.