• No results found

Formula scoring in educational games

N/A
N/A
Protected

Academic year: 2021

Share "Formula scoring in educational games"

Copied!
21
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

UNIVERSITY OF AMSTERDAM

Formula Scoring in Educational Games

Bachelor Project Psychological Research Methods

Tara Cohen 24-6-2015

Abstract: Formula scoring has been a topic of debate ever since it was proposed in 1919. In

this article formula scoring is compared to number right scoring for a specific type of testing:

educational games. This is done using high speed high stakes scoring for both conditions.

Educational gaming has two goals, one to monitor and two to promote practicing. To achieve

these goals the game score has to be valid; it has to be similar other tests on the same ability

and guessing has to be discouraged. The game also has to promote the best possible

performance and it has to be motivating and fun. A test was conducted at primary schools.

Both scoring rules were equally valid compared to another test and there was no indication of

a difference in guessing behavior. Performance did not differ between scoring rule conditions,

but an interaction effect between scoring rule conditions and time pressure visualization was

found. Most pupils did not report formula scoring to be troublesome or number right scoring

to provoke extra guessing. In conclusion this means that either scoring rule could be used

when choosing a method for educational gaming.

(2)

1

Index

1. Introduction

2

1.1 Validity

2

1.2 Performance

3

1.3 Motivation

5

1.4 Rekentuin and hypotheses

5

2. Methods

5

2.1 Participants

5

2.2 Materials

6

2.3 Procedure

7

2.4 Data Analyses Plan

7

3. Results

8

3.1 Validity

8

3.2 Performance

9

3.3 Motivation

11

4. Discussion

12

4.1 Conclusions

12

4.2 Explanations and Implications

13

(3)

2 1. Introduction

Educational gaming can be used in schools as a tool to help pupils practice certain skills like Math’s and language. Games like this often offer possibilities for monitoring a pupil’s progress. These types of games are seen as efficient tools to teach children because they are practice based, they can create a setting in which the pupil is motivated to practice and the level can be adapted to each pupil individually (Kebritchi, 2008). Educational games pose new challenges in scoring. Since the scoring method now has multiple objectives. Scoring methods used in school exams for example focus mainly on how to reflect the true ability of the pupil and how the pupil ranks in his or her class. When choosing a scoring method in educational gaming there is an extra dimension that becomes relevant. The pupil has to be motivated to keep playing in order to improve their ability. In short the two objectives of scoring in educational games are to reflect the pupil’s ability and to motivate.

There are a few factors to take into account when choosing a scoring rule that complies with these two objectives. The first factor is performance. To reflect the pupil’s ability a scoring rule has to encourage the best possible performance, unclouded by confounds caused by the scoring rule. The second factor is validity. The score has to reflect ability and preferably nothing else. So it should be comparable to results on other tests that test the same ability and it should discourage guessing. The third factor is fun. If a game is fun children will be more motivated to play it more often.

In this research article two scoring rules are compared that can be used in educational games. These two scoring rules are number right scoring and formula scoring. The simplest way of scoring a test is number right scoring. In this type of scoring the test score consists of the number of right answers. It is easy to calculate and the strategy that most test takers use is simple, namely: answering all the questions (Budescu et al, 1993). Formula scoring is any scoring rule that penalizes wrong answers. This type of scoring was first introduced by Thurstone (1919). In formula scoring omitting a question gives the test taker zero points and for correct answers points are earned. The effectiveness of formula scoring has been a topic of debate ever since it was proposed in 1919 (Lord, 1975).

1.1 Validity

To reflect true ability (or: for a test to be valid) it is important that a pupil answers items correct that fall within their ability and answers items wrong that don’t. A way in which this simple principle could fail is if pupils guess when answering questions. If guessing occurs unrelated to different levels of knowledge the error that guessing causes would be random. If the error was random an assumption would be that it affects the score an equal amount in every pupil. So the ability rank that is estimated for each pupil would remain meaningful. Unfortunately this is not what is expected to happen. A pupil with a higher ability is not expected to guess on an item they know the answer to when a pupil with a lower ability might be tempted to guess. This means that even though an item does not fall under the ability of a pupil, the pupil could still get the item right when he or she should not be able to. This is seen as a systematic error since the pupils it occurs for are different with each item. Furthermore the reliability of the test is influenced when people guess. A person could guess differently at different times, as a result the test might yield different results for one person at different times (Alnabhan, 2002). What all of this ultimately means is that two people with a different level of knowledge could still get the same score on a test (Bar-Hillel et al, 2005).

The guessing problem was first brought on by the introduction of multiple choice testing (Ben-Simon et al., 1997). Multiple choice tests have been popular for quite some time. This type of testing brought on the guessing problem because it offers a reasonable chance for a test taker to get a question right when he or she does not actually know the answer. Formula scoring was specifically designed to

(4)

3 deal with this problem. In counting the number right scoring the test taker has nothing to lose by guessing. He or she solely has a chance of getting extra points by guessing the question right. Formula scoring on the other hand gives more incentive not to guess then the “no reward for a wrong answer” incentive that is given in number right scoring by penalizing wrong answers (Bar-Hillel et al, 2005). So theoretically formula scoring will discourage guessing in tests and make them more valid.

Previous research however is inconclusive about this theory. Downing (2003) states that the ability rank remains unchanged under the use of formula scoring and is therefore superfluous. The article however does not state whether two tests with different instructions were compared or if the score for one test were calculated both ways and then compared. Since the difference in scoring rules is supposed to cause a difference in strategy, for formula scoring to yield different results examinees should know what scoring rule applies to the test. If this wasn’t the case in the research named by Downing (2003) the comparison between the two scores seems meaningless.

Diamond & Evans (1973) found that the reliability indices of tests are not affected by the method of dealing with guessing. Alabhan (2002) also found that the reliability of a test wasn’t affected by the method of dealing with guessing when comparing the Cronbach’s reliability index. This article further concluded that formula scoring is not more valid then number right scoring when looked at concurrent validity coefficients.

A possible explanation for why no difference is found between the two types of scoring is given by Burton (2002). In this article two sources of test unreliability are named. In answer-right scoring this unreliability is guessing but unreliability can also exist in formula scoring when some examinees rely on partial knowledge and others don’t. This seems like a likely explanation but relying on partial knowledge could be related to how risk aversive examinees are. Alabhan (2002) compares people of two different risk taking levels (High and low) separately and no difference between scoring rules is found in either group. Another explanation for why an effect of validity and reliability weren’t found is that blind guessing is very uncommon in tests (Downing, 2003; Budescu & Bo, 1993). Formula scoring focuses on making random guessing less desirable then omitting, but since random guessing is very rare it could be superfluous. People usually have some background knowledge on which to base their answer even if they do not know the exact one. Budescu and Bo (1993) stated that any guessing but random guessing yields positive expected values under certain formula scoring rules, this means that even a formula scoring rule could or should not fully prevent guessing. Also Lord (1963) concluded that discouraging guessing is not always necessary from a more technical point of view. According to the article it is only beneficial if there is a big amount of variation in guessing behavior and if there are less than five answer options.

1.2 Performance

The next factor that has to be taken into account when choosing a scoring method is performance. A good scoring rule should inspire the best possible performance in a pupil. In achieving this goal formula scoring might have a positive effect. The threat of losing points that formula scoring poses might make pupils more focused and serious while answering questions. Pupils might also get more persistent to find the exact right answer and guessing or sloppy work might be less tempting then it is under number right scoring. However, formula scoring could cause stress or distract pupils from performing at their best.

Something else to take into consideration is that discouraging guessing is not always desirable. For example if trying to answer each item is an aim formula scoring should not be used (Ben-Simon et. all, 1997). This could be something to take into consideration when educational gaming is

(5)

4 concerned since one does not solely want to reflect a true score but one also wants to promote practicing and a part of practicing is trying new things. This might even enhance the performance and therefore number right scoring might yield better results.

A facet of performance that will not be extensively discussed or analyzed but that is included in the actual test executed in this paper is the speed accuracy trade off (SAT). In most skills taught in primary school, time is of the essence. Basic math for example is a skill in which the level is partially determined on how fast it is done. This means that in educational games developed for pupils in primary schools this has to be an element in scoring. The speed accuracy trade off simply means that a fast response yields less accurate results then a slow response does. Figure 1 shows an example of what a model of this trade off could look like though it can be theorized to look any way depending on the task and participant. In SAT an instant reaction has chance accuracy and a very slow reaction could in some models lead to maximum accuracy. In the type of educational game used in this research article the SAT is framed within the scoring. This is done by using the remaining time in seconds as the amount of points a person can gain when an item is answered correctly or loose when an item isn’t. This type of scoring is called high speed, high stakes scoring (Klinkenberg, 2014; van der Maas & Maris, 2012).

In this research article high speed high stakes scoring is used to represent formula scoring and is partially used to represent number right scoring. To adapt this scoring rule to number right scoring only the non-penalty part of the rule is used in the number right scoring condition. Number right scoring in this research could therefore also be called non-penalty scoring. The amount of time left per item is visualized by coins in the screen, one coin per second remaining. These coins also symbolize the number of points to be received (or lost) when the item is answered. The time pressure, or coins, will not be visualized for every game played in the test developed for this research. The formula and the number right high speed high stakes scoring rules can be found in figure 2 and 3.

Figure 1: Example of a Speed Accuracy Trade-Off Model

(6)

5 1.3 Motivation

The third factor of importance in choosing a scoring rule for educational games is motivation. Educational gaming has been found to have a positive effect on attitude towards math (Oprins et al, 2013). This positive attitude could encourage pupils to keep playing and, by doing so, to keep practicing. Since practicing is one of the main goals of educational gaming an important aspect of the scoring rule should therefore be the motivation to keep playing, and enjoyment of the game. The motivation and enjoyment could be influenced by formula scoring. Subtracting points could provide motivation to work with precision and after points are subtracted this could motivate to regain lost points by getting the next question right or by playing some more. It could also however be demotivating to see points getting subtracted after every mistake. This might take away the fun of playing and practicing.

1.4 Rekentuin and Hypotheses

The specific game this research was based on is called Rekentuin. In this game pupils practice math in an online personalized gaming environment. The program automatically adapts to the pupil’s level based on the items the pupil gets right and wrong. High speed high stakes formula scoring is used throughout the game. It is used when there are multiple choice questions but also when a particular game contains open questions. Guessing typically is not as good of a strategy when a test contains open questions. This means that gambling prevention in the form of formula scoring does not seem as necessary as it does with multiple choice questions. Still there might be an effect of the scoring rule on other factors (mentioned above) and it may still have an effect on risk taking

This research looks at the effect of formula scoring on a test with open items. Previous research only focusses on multiple choice questions. This research also takes into account all the before mentioned factors of importance in choosing a scoring method for educational games specifically. One of these factors is not mentioned in previous research, namely motivation.

In this research number right scoring and formula scoring will be compared as to which is more suitable for educational gaming open questions. Which one is more suitable will be determined by three important factors. The first focuses on under which method of scoring pupils achieve better results. The second factor is the validity of the test, so which test has results that are most similar to another test, testing the same ability. Since the validity relies partially on pupils not guessing, which method discourages guessing will also be examined. The third and final factor taken into consideration is under which rule pupils feel more confident, happy and positive about the game afterwards and which game made them feel more motivated to play.

2. Method 2.1 Participants

197 school going children between the ages of 7 and 12 participated in this research (figure 4). There were 91 boys and 99 girls (the sex of some participants is unknown due to missing data). The pupils were tested at school during their school day. The test took about 30 minutes per pupil. As a reward for participating pupils were given a small decorative stamp. Passive informed consent was send to parents. All the pupils were familiar with Rekentuin, the educational game the computerized part of the test taken in this research was modelled to look like.

(7)

6

Figure 4: Distribution of Age Participants

2.2 Materials

The test consists of four parts, three of which are completed by every child. In the first part there are some background questions to be filled out. The second part of the test is a TempoToets. The third part is a computerized math test and the final part was an interview (not every pupil was interviewed).

The TempoToets is a test that was used to follow the progress of pupil’s math’s abilities. The part of this test used here is a one minute test where pupils are asked to sum as many of the given items. These items start out simple and gradually get more difficult. To score this test the number of correct answers were added up.

The computerized part of the test was modeled to look like Rekentuin. A game all the pupils were familiar with. Every item had a 20 second time constraint. The game is scored with high speed high stakes formula scoring. So after answering an item the remaining time in seconds is the score added in case of a correct answer, and the score subtracted in case of a wrong answer in the formula scoring condition. Each pupil will play four games that varied in scoring rules and visualized time pressure. In between each game the pupils filled out two questions, one on how well they think it went and one on how fun they think it was. Table 1 shows the four testing conditions that were presented in random order for each pupil.

Table 1

Testing Conditions of the Computerized Test:

1. Penalty for wrong answers, time limit visualized in the screen.

2. No penalty for wrong answers, time limit visualized in the screen.

3. Penalty for wrong answers, time limit not visualized in the screen.

4. No penalty for wrong answers, time limit not visualized in the screen

Note: The order testing conditions were presented in was random for each pupil, so numbers shown here are not representative of the order.

After the pupils finish the tests, some were selected for a short interview on their experiences with the different testing conditions. They were asked to describe the difference between the four

(8)

7 games they played to check whether they noticed the difference in scoring rules and the time visualization. Then they were asked what they thought about the different scoring rules: whether it bothered them, whether they would guess when there was no penalty and whether they felt like they were more motivated when points were subtracted. Their answers were scored by the interviewers on a 3 point scale.

2.3 Procedure

5-10 pupils at a time were taken out of class to participate. First they filled out the background questions. Then after a short explanation of the rules, they took the TempoToets. After they finished the TempoToets the rules of the computerized test were explained and the four conditions were briefly mentioned. Then the pupils were asked to play. Once they finished the test some were asked to accompany one of the researchers to another room for a short interview. All children were allowed to choose a decorative stamp as a reward for participating.

2.4 Data analysis plan

To test which of the two scoring rules yields more valid test results both scores will be correlated with the scores on the TempoToets. If one of the two scoring rule conditions has a stronger correlation with the TempoToets this would mean that the test is more valid when compared to other math ability tests. To compare the correlation the equation by Chen & Popovich found in Field (2013) is used. If there is a significant difference this would mean that the test with the higher correlation is more valid when compared to alternative testing methods.

To test which method discourages people from guessing the amount of omitted questions will be compared for both conditions. This will be done using a paired sample t-test, or if the assumptions are not fulfilled, a Wilcoxon signed rank test. If there is a difference between the two conditions this means that pupils probably didn’t take as much risk or guess as much in the condition with the higher mean of omitting. Additionally the number of participants who do not omit any question will be looked at for both conditions. Not omitting could be a sign of guessing or risk taking. Under number right scoring a participant has nothing to lose by guessing, so if pupils adjust their strategy based on scoring rule, the number of pupils who never omit should go down under formula scoring, where there is something to lose. This counts assuming that most of these pupils did not get every question right. Whether the number omitted is significantly different in the two conditions will be tested. This will be done by comparing the number of participants who never omit and the remaining participants in both conditions with a chi-square test.

Then the results of all four conditions of the 2x2 (coins and scoring rule) design will be compared using a Repeated Measures ANOVA. To do so the sum correct in each condition will be calculated. This test will also be done only including the pupils who, in the interview, have shown an understanding of the different scoring rules. These two analyses will then be repeated with the test scores calculated with the high speed high stakes formula scoring rule for every condition, so with reaction speed calculated into the scores.

The next part of the analysis is more descriptive. To test pupil’s attitudes towards the different scoring rules two data sets will be looked at. First the results of the short interview held with most of the participants will be analyzed. The frequency of answers to the questions on formula scoring will be reported and compared using a chi-square test. Also the attitudes expressed in the two question questionnaire the participants filled out after each game will be compared across conditions using a chi-square test.

(9)

8 3. Results

3.1 Validity

Both conditions correlated highly with the TempoToets. The formula scoring condition was significantly correlated with the TempoToets (r = 0.681, p < .000) as was the number-right scoring condition (r = 0,664, p < .000). The two conditions correlated significantly among each other as well, r = 0.861, p < .000. There is no significant difference of the strength of the correlations between the two conditions and the TempoToets, t (169) = 0.58, p = 0.563. This was tested using the formula by Chen & Popovich found in Field (2013).

The number of omitted questions per condition was very skewed (Figure 4, Figure5) with a big standard error. This suggests a large strategic difference between participants. The mean and standard error of number omitted did not differ much per condition with M = 5.15, SE = 6.81 for the formula scoring condition and M = 5.49, SE = 6.86 for the number right condition. A Wilcoxon signed rank showed that there is no significant difference between the mean number omitted in the two conditions, V = 3949, p = 0.316.

Figure 5: Number Omitted Formula Scoring Figure 6: Number Omitted Number Right Scoring

To further test whether there is a difference between the amounts of guessing in the two conditions the amount of omitting was compared, specifically the frequency with which participants never omitted on the test. This was done using the results in table 2. No difference was found between the two conditions, χ² (1) = 0.056, p = 0.814.

Table 2

Number of Participants Who Omitted:

Formula Scoring Number Right Scoring Total N

Never 49 52 101

Other 126 123 223

Total N 175 175 350

Notes: Total N = Total Number of Participants.

Out of the 141 pupils who were interviewed 74 noticed the difference in scoring rules between the games and 67 did not. The same analysis as done before was now done using only the 74 pupils who reported seeing a difference in the scoring rule in the interview. Pupils omitted less questions under the formula scoring rule (M = 4, SE = 6.21) then under the number right scoring rule (M = 4.97, SE = 6.69). This difference was not significant when tested with a Wilcoxon signed rank test, V =

(10)

9 442.5, p = 0.0896. The number of pupils who never omitted (Table 3) was also found to be the same across the two conditions, χ² (1) = 0.285, p = 0.594.

Table 3

Number of Participants Who Omitted and Who Reported Seeing the Difference in Scoring Rule:

Formula Scoring Number Right Scoring Total N

Never 23 27 50

Other 45 41 86

Total N 68 68 136

Notes: Total N = Total Number of Participants.

3.2 Performance

To compare the four different conditions a Repeated Measures ANOVA was done with the sum scores of number correct per condition. The means and standard deviations per condition can be found in table 4. No main effect of scoring rule was found, f (1) = 0.118, p = 0.731. Furthermore no main effect was found of visualized time pressure, f (1) = 1.805, p = 0.181. An interaction was found between the scoring rules and visualized time pressure, f (1) = 4.749, p = 0.031 (figure 7). A post hoc analyses of separate t-tests shows that the number right scoring rule with visualized time pressure does not yield significantly better scores then formula scoring in either the formula scoring condition with visualized time pressure and the condition without. With t (174) = 1.232, p = 0.22 when condition two (no penalty, time visualized) and three are compared (penalty, no time visualized) (table 1) and t (174) = -1.795, p = 0.074 when condition two (no penalty, time visualized) and one (penalty, time visualized (table 1) are compared. It does however indicate that, if number right scoring is chosen, showing coins will yield the best results, t (174) = 2.605, p = 0.01.

The same repeated measures ANOVA was done using the mean scores per pupil calculated with the high speed high stakes formula scoring rule for every condition (Table 5, Figure 8). Still no main effects were found with, f (1) = 1.229, p = 0.269 for time pressure and f (1) = 1.183, p = 0.278 for scoring rules. The interaction effect found in the repeated measures ANOVA with score calculated by adding number correct per pupil was not found when the high speed high stakes formula scoring rule was used to calculate scores for each condition, f (1) = 4.730, p = 0.031.

Table 4

Mean and Standard Deviation of Summed Number Correct per Condition

Formula Scoring Number Right Scoring

Coins No Coins Coins No Coins

Mean 14.9 15.0 15.2 14.8

SD 3.85 3.94 3.90 4.03

Notes: SD = Standard Deviation. “Coins” refers to visualized time pressure, “No Coins” refers to no visualized time pressure.

Table 5

(11)

10

Formula Number Right Scoring

Coins No Coins Coins No Coins

Mean 9.045 9.254 9.246 9.330

SD 3.528 3.595 3.622 3.538

Notes: SD = Standard Deviation. “Coins” refers to visualized time pressure, “No Coins” refers to no visualized time pressure.

Figure 7: Interaction Plot with Number Correct Data. Figure 8: Interaction Plot High Speed High Stakes Formula Scoring Data.

The same analysis as done before on all the data was also done only including the pupils who correctly reported the difference between the scoring rules in the interview (Table 6, Figure 9). Again there were no main effects found of either scoring rules, f (1) = 1.336, p = 0.252, or visualized time pressure, f (1) = 0.838, p = 0.363. No significant interaction was found between the scoring rules and visualized time pressure, f (1) = 3.902, p = 0.0523. The means and standard deviations of the sum scores per pupil of each condition can be found in table 6.

A repeated measures ANOVA was done with the score calculated by the high speed high stakes formula scoring rule for each condition (Table 7, Figure 10). No significant main effect of visualized time pressure was found, f (1) = 1.019, p = 0.316 and no significant main effect of scoring rule was found, f (1) = 0.422, p = 0.518. No interaction between scoring and visualized time pressure was found, f (1) = 0.934, p = 0.336.

Table 6

Mean and Standard Deviation of Summed Number Correct per Condition With Pupils Who Saw Scoring Rule Difference

Formula Scoring Number Right Scoring

Coins No Coins Coins No Coins

Mean 15.6 15.8 15.7 15.2

SD 3.43 3.51 3.70 4.06

Notes: SD = Standard Deviation. “Coins” refers to visualized time pressure, “No Coins” refers to no visualized time pressure.

Table 7

Mean and Standard Deviation High Speed High Stakes Formula Scoring for Every Condition With Pupils Who Saw Scoring Rule Difference

Formula Scoring Number Right Scoring

(12)

11

Mean 9.533 9.935 9.878 9.879

SD 3.125 3.195 3.487 4.608

Notes: SD = Standard Deviation. “Coins” refers to visualized time pressure, “No Coins” refers to no visualized time pressure.

Figure 9: Interaction Plot with Number Correct Data (only including pupils who understood scoring rule).

Figure 10: Interaction Plot High Speed High Stakes Formula Scoring Data.

(only including pupils who understood scoring rule)

3.3 Motivation

The results of the interviews were looked at to test the attitudes of the pupils towards the formula scoring. The frequencies of answers in the interview can be found in table 8. There was a significant difference in how many pupils thought the scoring rule was “a lot”, “a little” and “not at all” troublesome, χ² (2) = 19.8095, p < .000. Most pupils didn’t find the scoring rule troublesome. A significant difference was found between the number of pupils who reported “a lot”, “a little” and no guessing when formula scoring is used, χ² (2) = 41.71, p < .000. Most children reported not guessing more when formula scoring was used over when number right scoring was used. A non-significant amount reported to be motivated by the formula scoring rule, χ² (2) = 5.931, p = 0.052. This means that formula scoring for the most part was not seen as troublesome or particularly motivating and that it did not prevent guessing in the opinions of the pupils.

Table 8

What did you think of the (formula scoring) rule? Was it:

Motivating Troublesome Reports Less Guessing*

N % N % N % A Lot 28 38.4 9 12.2 2 2.9 A Little 16 21.9 17 23.0 13 18.6 Not at All 14 19.2 37 50.0 40 57.1 Don’t Know 15 20.5 11 14.9 15 21.4 Total N 73 74 70

Note: N = Total Number of Participants, % = Percentage of total N.

* Most pupils did not report any guessing, it was therefore not less common under formula scoring.

(13)

12 The two questions filled out after each game yielded very similar results in the two conditions as can be seen in table 9 and table 10. The frequencies of answers on how fun the pupils thought the game was, were almost identical in the summed total of the two games played per condition, χ² (2) = 0.461, p = 0.794. The same goes for how well the pupils thought they did, χ² (2) = 0.993, p = 0.609. Table 9

How fun did you think this game was? This game was:

Formula Scoring Number Right Scoring

Game 1 Game 2 Total N Game 1 Game 2 Total N

Fun 71 65 136 66 70 136

A Little Fun 82 82 164 80 82 162

Not Fun 16 20 36 16 14 30

Total N 169 167 126 166

Notes: N = Total Number of Participants

Table 10

How well did you think this game went? This game went:

Formula Scoring Number Right Scoring

Game 1 Game 2 Total N Game 1 Game 2 Total N

Good 97 89 186 93 98 191

Ok 53 61 114 52 59 111

Not Good 15 15 30 15 8 23

Total N 165 165 160 165

Notes: N = Number of Participants

4. Discussion 4.1 Conclusions

No difference was found between the validity of the tests under both scoring conditions. Most pupils did not omit more than five times in either condition. Both tests also correlated equally high with another test measuring the same ability. This means that no one scoring rule can conclusively be said to be the most valid when compared in these criteria.

The performance of pupils between the two scoring conditions did not differ either. Pupils perform equally well regardless of the scoring rule. One out of three of the analyses showed an interaction effect of visualized time pressure and scoring rule. The effect of visualized time pressure within the number right scoring condition could indicate that visualizing time pressure in games could be beneficial for performance under number right scoring. Since this result was only found when number correct scores were looked at this result should be interpreted with caution.

38% of the interviewed pupils reported the formula scoring rule to be motivating (Table 8) but this was not a significant majority. A significant majority of the pupils who were interviewed however reported not finding the formula scoring rule troublesome at all. An interpretation of these two facts could be that overall pupils do not care that much which scoring rule was used. A fraction (12.2%) of pupils found formula scoring very troublesome.

(14)

13 4.2 Explanations and Implications

Overall the results did not show a strategic difference within pupils over different testing conditions. Pupils not using strategy could be explained many different ways. The simplest one is that pupils do not want to use strategies. A majority of pupils report not guessing more under the formula scoring rule (Table 8). The majority of pupils also omits under five times per forty items regardless of scoring method (Figure 5 & 6). Another explanation could be that pupils do not understand the implications of scoring rule changes enough to adapt their strategy. There is a strategic difference seen in omitting between pupils, a fraction does it very often and most do it rarely. Changing strategies, even if a pupil understands the scoring rule, might be too advanced. To see if this is the case older participants could be used in similar research. If there actually is a strategic difference between pupils could be researched by taking into account overall math ability and age.

The lack in strategic difference within pupils could also be caused by the research methods. In the computerized test used pupils only had forty items per scoring rule, twenty items per game, before a rule changed. This could be too short of a time for pupils to adjust to the scoring rules and think of new strategies. Another explanation for the pupil’s lack of strategy might be that in this research pupils did not get any score related reward. Therefore they might not have been interested in earning as many points as possible. Furthermore the research was done in an unusual setting. Children might have felt too much pressure to act socially desirable to play strategically or to be honest in their answer to the question on whether they would guess more under formula scoring. To check if either of these reasons influenced the final results other research could be done where pupils play with one scoring rule for a longer time and if the reward is linked to the amount of points earned.

Since most pupils reported not finding formula scoring troublesome, it does not seem surprising that there is no difference in performance between the scoring rules. Even though a 38% of the interviewed pupils indicated that they found formula scoring motivating (Table 8).

There are no real implications of the outcome of this research for games like Rekentuin since no difference was found between the validity, performance or motivation between the different scoring rules. The only thing one might want to take into consideration when choosing a high speed high stakes number right scoring rule is that visualizing the time pressure might yield better results than not visualizing time pressure. Educational games can use either scoring rule and have the same number of omitted questions, the same performances out of children and even the same enjoyment (Table 8).

(15)

14 5. References

Alnabhan, M. (2002). An empirical investigation of the effects of three methods of handling guessing and risk taking on the psychometric indices of a test.Social Behavior and Personality: an

international journal, 30, 645-652.

Bar-Hillel, M., Budescu, D., & Attali, Y. (2005). Scoring and keying multiple choice tests: A case study in irrationality. Mind & Society, 4, 3-12.

Ben-Simon, A., Budescu, D. V., & Nevo, B. (1997). A comparative study of measures of partial knowledge in multiple-choice tests. Applied Psychological Measurement, 21, 65-88.

Budescu, D., & Bar‐Hillel, M. (1993). To guess or not to guess: a decision‐theoretic view of formula scoring. Journal of Educational Measurement, 30, 277-291.

Diamond, J., & Evans, W. (1973). The correction for guessing. Review of Educational

Research, 181-191.

Field, A. (2013). Discovering statistics using IBM SPSS statistics. Sage.

Kebritchi, M. (2008). Examining the pedagogical foundations of modern educational computer games. Computers & Education, 51, 1729-1743.

Klinkenberg, S. (2014). High Speed High Stakes Scoring Rule. In Computer Assisted Assessment. Research into E-Assessment (pp. 114-126). Springer International Publishing.

Lord, F. M. (1975). Formula Scoring and Number‐Right Scoring.Journal of Educational

Measurement, 12, 7-11.

Maris, G., & Van der Maas, H. (2012). Speed-accuracy response models: Scoring rules based on response time and accuracy. Psychometrika, 77(4), 615-633.

Oprins, E. A. P. B., Roozeboom, B., Visschedijk, G. C., Kistemaker, J. A., & MKB, O. T. C. (2013). Effectiviteit van serious gaming in het onderwijs.Onderwijsinnovatie, 6, 32-34.

Steiger, J. H. (1980). Tests for comparing elements of a correlation matrix. Psychological Bulletin, 87, 245.

(16)

UNIVERSITY OF AMSTERDAM

Children’s Motivation in High

Speed, High Stakes Penalty and Non

Penalty Scoring.

A Research Proposal

Tara Cohen 25-6-2015

In this research proposal the effect of type of scoring rule on motivation will be discussed.

First a short explanation of the compared scoring rules will be given. Then previous research

of the role and types of motivation will be discussed. After this research methods will be

proposed. This research proposal will be concluded with how the data should be analyzed

and interpreted.

(17)

1

Introduction and Research Question

Formula Scoring and Number Right Scoring

Formula scoring was first proposed by Lord in 1919. Since then there has been a lot of debate on its functionality. Formula scoring is a type of scoring where wrong answers are penalized and right answers are rewarded. Another option in scoring is to use number right scoring, in this type of scoring right answers are rewarded and wrong answers are not penalized. This can also be called no penalty scoring. Both these scoring rules can be used in many settings, one of which is educational gaming. Educational gaming can be used to help children practice certain skills. An example of an educational game is Rekentuin, a game in which children practice math. This game uses a high speed high stakes formula scoring rule (Figure 1). In this scoring rule a player has twenty seconds to answer an item. If the item is answered correctly the remaining time in seconds are rewarded as points, if the answer is wrong the number of seconds left is subtracted from the total amount of points. The pupils have the option to omit when they do not know the answer. When this option is used no points are rewarded or subtracted (Klinkenberg, 2014; Maris & van der Maas, 2012). To adapt the high speed high stakes formula scoring rule to number right scoring the same rules apply but when an item is answered wrongly there are no points subtracted (Figure 2).

Figure 1 + 2: High Speed, High Stakes Scoring Rules

Previous Research

The research previously done (Cohen, Colnot, Houtkamp, de Mooij & Plak, bachelorthesis 2015) compared the two scoring rules mentioned. There was no difference found in overall

performance between the two scoring rules, the validity of tests with both scoring rules was equal and there was no significant majority of children who found the scoring rule motivating (38%).

Motivation is an important factor in choosing a scoring rule for an educational game. This is important because ideally children will be motivated to play the game and, in doing so, practice their

(18)

2 skills. In the research previously done some attention was put on motivation but it was not the sole main focus. The research proposed wants to expand on this topic.

In the previous research children were asked whether they found formula scoring motivating. However they were not asked whether they found number right scoring motivating and motivation was not measured in any other way. Children’s drive to play for example was not tested in any way.

Motivation

There are two ways in which children could be motivated in educational games. Firstly they can be motivated to do their best and to try to answer every question and secondly they could be motivated to play longer and more often if they have the chance.

Trying to answer every question might be influenced negatively by formula scoring. Bliss (1980) found that motivation to try to answer each question goes down when a test uses formula scoring. Children started avoiding risk when formula scoring was used. This lead to them not answering questions they had a reasonable chance of getting right. In practicing motivation to try to get things right even if it’s risky is important because without taking risk there might be less progress and children might not try on challenging items. The number of omitted items could be looked at to compare risk taking behavior in both formula and number right scoring. This was done in the previous research, there was no difference found in number omitted between these two scoring rules. The tests per condition only lasted about ten to fifteen minutes. This might have been too short for children to adapt their strategies to a (new) scoring rule; this might explain the fact that no effect was found.

The previous research did not measure whether children would be motivated to play more under either scoring rule. What type of scoring is most motivating can differ across people. People who are very sensitive to punishment are more motivated to avoid punishment then people who are not that sensitive to punishment. Also people who are very sensitive to reward are more motivated by reward then people who are not that sensitive to reward (Boksem, Tops, Kostermans & De Cremer, 2008). This means that number right scoring might motivate only part of the children while formula scoring is expected to motivate both groups. Formula scoring offers a combination of reward and punishment. Hoge and Scocking (1912) found that, in rats, this combination is more motivating then either one incentive separately. On the contrary, Chase (1932) found that it’s highly probably that babies are more motivated by failure punishment then success reward. The article reports that this could also be because the two conditions were not offered in random order. Still if this is the case one would expect formula scoring (that does include a penalty) to be more motivating then number right scoring (that does not).

(19)

3 Research Question

The research question of the proposed research is whether formula scoring or number right scoring gets children more motivated in educational games like Rekentuin. Motivated meaning both; motivated to try to answer every question and motivated to play the game again or to keep playing. This research would answer this question by having a design that not only measures performance but also measures how often and how long children play. It will differentiate between children that are reward motivated and children that are punishment motivated.

Method

Materials and procedure

The research participants of the proposed research will be (at least) 150 primary school children. They will be asked to participate through their school.

The research will be set up as follows. First every child plays a set amount of items for about 30 minutes under both scoring rules. The game will be modeled after the educational math game

Rekentuin. The number omitted on each test will be collected. After this first test an interview is

conducted in which every child is classified in one of two incentive groups. The incentive groups are: reward sensitive and punishment sensitive. See table 1 for a schematic overview of the groups compared in this part of the test.

Table 1

Conditions in the First and in the Final Test.

Formula Scoring Number Right Scoring

Reward Motivated Punishment Motivated Reward Motivated Punishment Motivated Note: Scoring is a within participant variable and Motivation is a between participant variable.

For the second part children are divided into two groups randomly. One group will be playing under formula scoring and the other group under number right scoring. The children will not be told about the existence of the condition they are not in. For the children not to find out about the other conditions through classmates the groups might be formed by randomly distributing classes or schools over the two conditions. Over the next 5 weeks the children will be given the opportunity to play the game when they want to. The type of scoring will be determined by the condition they are in. They will be able to play at school at set times or at home. How long and how many times children play will

(20)

4 be recorded. The scores per played game will also be saved. For a schematic difference in groups in this part of the test see table 2.

Table 2

Conditions in the Second Part of the Test.

Formula Scoring Number Right Scoring

Reward Motivated Punishment Motivated Reward Motivated Punishment Motivated Note: Scoring is a between participant variable and Motivation is a between participant variable.

The third and final part of the test happens at the end of the research, after five weeks, and is a repetition of the first part (see table 1). Again children will play a game with a set amount of items under the two scoring conditions. It takes about 30 minutes per game. Their number omitted scores will be saved.

Data Analysis and Interpretation of Possible Outcomes

The data will be analyzed in three separate parts. First the number omitted in the two scoring conditions in the first test will be compared within participants. This will happen for all participants together but additionally it will be done separately for the two incentives groups. If formula scoring yields a higher number of omitted items a conclusion might be that formula scoring does indeed promote not trying to answer difficult items.

The second part of the data that will be analyzed is the number of times children played and total time played in the five recorded weeks. This number and this amount of time will be compared over the two conditions (formula scoring and number right scoring). It will also be compared separately for the two incentive groups. One would expect children who are punishment sensitive to play less often and less time in the number right condition, since this doesn’t offer a punishment incentive. In which condition the game is played most will show which scoring rule motivates more overall.

The third part of the analysis includes the final test the children take. This will be analyzed the same way the first test is done. The progress between the overall score of the first and last test will be compared within subjects and between scoring rule condition in the five weeks. This will be done not correcting for time practiced and also correcting for time practiced. The same analysis will be done taking into account the two incentive groups separately. One would expect the children who are punishment motivated to improve more in the formula scoring condition but another expectation is

(21)

5 that children in the number right condition improved more due to taking more risks (or chances) then the children in the formula scoring condition.

References

Ames, C. (1992). Classrooms: Goals, structures, and student motivation. Journal of educational psychology, 84, 261.

Bliss, L. B. (1980). A test of Lord's assumption regarding examinee guessing behavior on multiple-choice tests using elementary school students. Journal of Educational Measurement, 147-153.

Boksem, M. A., Tops, M., Kostermans, E., & De Cremer, D. (2008). Sensitivity to punishment and reward omission: evidence from error-related ERP components. Biological psychology, 79, 185-192.

Chase, L. (1932). Motivation of young children: an experimental study of the influence of certain types of external incentives upon the performance of a task. University of Iowa Studies: Child Welfare.

Hoge, M. A., & Stocking, R. J. (1912). A note on the relative value of punishment and reward as motives. Journal of animal Behavior, 2, 43.

Klinkenberg, S. (2014). High Speed High Stakes Scoring Rule. In Computer Assisted Assessment. Research into E-Assessment (pp. 114-126). Springer International Publishing.

Lord, F. M. (1975). Formula Scoring and Number‐Right Scoring.Journal of Educational

Measurement, 12, 7-11.

Maris, G., & Van der Maas, H. (2012). Speed-accuracy response models: Scoring rules based on response time and accuracy. Psychometrika, 77(4), 615-633.

Referenties

GERELATEERDE DOCUMENTEN

initial approximation, an iterative method is usually applied to approximate the relative maximum reasonably well.. In order to examine the convergence in the Scoring Method, we

If E is a finite local commutative self-dual ring with residue field of.. cardinality q, then #E qk with k

Echter, een aantal vragen moet worden omgescoord, wat wil zeggen dat 'geheel mee eens' met 4 punt wordt gewaardeerd en 'geheel mee oneens' met 1 punt.. Wanneer alle items

Depending on the presence or absence of particle-hole symmetry, time-reversal symmetry, spin-rotation symmetry, and chiral (or sublattice) symmetry, this relation takes the form of

Already since decades ago, medical scoring systems have been used to summarize medical knowledge and serve as decision support.. One example is Alvarado for

1) The data transformation in (2) is performed on the initial dataset X. Using the computed Z and the known labels, one can solve a standard Support Vector Machine

Table 2 Comparison of test performance (AUC) and complexity (I: number of intervals, V: number of variables) for all ICS setups (lpICS and enICS, each with and without

Figure 5 gives an impression of the dependency of the lateral relaxation length on the vertical force and tyre inflation pressure for a standard passenger car tyre.. Next to the