• No results found

Scoring rules and the visualization of time-pressure : strategy and performance of elementary school children

N/A
N/A
Protected

Academic year: 2021

Share "Scoring rules and the visualization of time-pressure : strategy and performance of elementary school children"

Copied!
32
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

Student: Jonnemei Colnot Student number: 10003185

Date: 26/06/2015

Supervisor: Han van der Maas Word count: 5069

Scoring rules and the visualization of time-pressure:

Strategy and performance of elementary school children

Bachelor project: second version University of Amsterdam

J.M. Colnot

Abstract

Scoring rules can help adjust for the speed-accuracy trade-off and the effects of guessing by promoting different answering strategies. The current study assessed formula scoring and number right scoring through digital assessment in 197 Dutch elementary school children. The effects of time-pressure visualization on their performance was also assessed. The results indicated that the different scoring rules did not affect the answering strategy and score of the children. The

visualization of time-pressure did have a slight impairing effect on children’s reaction times. In addition the visualization of time-pressure influenced children’s answering strategy.

Introduction

Throughout their academic career, people are evaluated on their knowledge and mastery of subjects through the administration of tests. When tests are scored this score should represent the knowledge or mastery of that person as accurate as possible. It is of great importance that the content of the test represents the desired topic as accurately and complete as possible, and that the quality of the items are appropriate. Secondly, the way the test and its items are scored affect the validity and score representation of the test. The use of different scoring methods allows correction or control for error and make the score representation as accurate and valid as possible.

(2)

The traditional and by far most popular scoring method is through number right scoring, where correct answers are scored with a positive value and incorrect or omitted answers are given a zero value (Lord, 1975; Budesco & Bar-Hillel, 1993;Kurz, 1999; Lesage, Valcke & Sabbe, 2013). To maximize one’s score in this situation, it is best to answer every item, since an answered question will never have a lower expected score than an omitted question (Lord, 1975). However, this

method of scoring in multiple choice tests allows test takers to answer an item through random guessing. Consequentially, the score does not discriminate between correct answers acquired through actual knowledge or as a result of guessing.

Another conventional scoring method, that tries to deal with the problem of guessing, is negative marking, also known as formula scoring. It tries to reduce the tendency of risking a guess by marking incorrect answers negatively. Omits are still given the value of zero to make this a more favorable option compared to guessing, where a negative mark is risked. (Lesage, Valcke & Sabbe, 2013; Kurz, 1999). There are several methods to correct for guessing within formula scoring. The

rights minus wrongs correction model is most generally used, denoted as S = R-W/(k-1). Where R,

W, and k represent number of right answers, wrong answers, and alternatives respectively (Diamond & Evans, 1973; Lord, 1975; Kurz, 1999). The penalty for an incorrect response is based on the number of answering options. For multiple choice test with 20 items and 4 answering options, where someone answers 15 items correctly, 3 incorrectly , and omits 2 answers, the score (S) would be equal to 15 – 3/(4-1).

An omitted item in a test marked with formula scoring is assumed to be a random guess under number right scoring (Lord, 1975). With random guessing, error variance increases and therefore test reliability decreases (Mattson, 1965; Michael, 1968). By discouraging guessing, formula scoring increases test reliability (Burton, 2004). On the other hand formula scoring does have a higher risk of measurement bias. A test taker's score is not solely based on knowledge but also depends on their attitude towards taking risks (Bar-Hillel, Budescu & Attali, 2005; Espinoza & Gardeazebal, 2010). In general guessing leads to a higher test score, thus a test taker who is more inclined to risk a guess will receive a higher score than a test taker with equal knowledge but a is less inclinded to risk a guess (Muijtjens, 1999, Espinoza & Gardeazebal, 2010).

Choosing an adequate scoring rule is a consideration between reducing bias (number right scoring) and improving test reliability (formula scoring). Either way, when a certain type of scoring rule is administered, it is important that the test taker is aware of this. The different scoring rules try

(3)

to promote different answering strategies and therefore need to be recognized and well instructed (Budesco & Bar-Hillel, 1993). Scoring rules are based on the assumption of an ideal test taker who wants to maximize his expected score. The calculation of this expected score can cause some difficulties. With formula scoring it requires test takers to assess their degree of knowledge and estimate the probability of a correct answer (Budesco & Bar-Hillel, 1993, Espinoza & Gardeazebal, 2010). This assessment is subjective and test takers differ in their ability to estimate their probability of a correct answer. Additionally, test takers are not expected to use the calculation of expected score in actual test situations.

Speed accuracy trade-off

When measuring a person’s ability we want this to be independent from a person’s answering strategy. A speed-accuracy trade-off can influence a person’s way of answering test items (Dennis & Evans, 1996). This trade-off remains an obstacle in psychometric testing and ability testing.

The speed-accuracy trade-off is a within-person, generally negative, correlation between the speed and accuracy of that person (Van der Linden, 2007). A person can decide to work faster but with lower accuracy, or focus on being accurate and work slower. This creates an issue when we try to compare a person who responded faster with less items correct to someone with more correct items but a slower response.

Proposed solutions are to model this speed-accuracy trade-off afterwards, leaving the test taker free to use a preferred answering strategy. These strategies, or speed-accuracy compromises, influence the test taker’s score. Applying a scoring rule can reduce the influence of answering

strategies (Dennis & Evans, 1996; Van der Linden, 2007; Klinkenberg, 2014) by promoting a certain strategy.

A way to incorporate this speed-accuracy trade-off into the general scoring rules is by setting a time limit to every item, and to give out higher rewards for fast responses. However, a scoring rule without negative scoring for an inaccurate response and a higher reward for a fast than a slow correct response can provoke risk taking behavior (Van der Maas & Wagenmakers, 2005; Klinkenberg 2014). When an item is difficult, a fast guess still has the probability to receive a high score if guessed correctly. While an omitted or false answer will both receive a score of zero. This makes risking a guess with the chance of a high reward very attractive. A high speed, high stakes scoring

(4)

rule (HSHS;Van der Maas & Maris, 2012; Klinkenberg, 2014), where an inaccurate response will receive negative scoring that is highest with a fast response, would make fast guessing (e.g. taking risks) not favorable. This scoring rule tries to deal with both the speed-accuracy trade off and the problem of guessing.

This HSHS rule is also the scoring rule that is currently implemented in Math Garden

(Oefenweb, 2009), an online game application for arithmetic practice designed for elementary school children. The scoring rule and the time limit are visualized as coins, slowly decreasing on the

computer screen. The answer is awarded positively or negatively with the amount of remaining coins. Children are also allowed to press a question mark button, by which they omit the item. See figure 1.

Figure 1. Math Garden addition game.

Several schools that are using Math Garden expressed their concerns regarding the time limit visualization. Some children were said to have difficulties staying focused on the game due to excessive focus on the time limit. The visualization of the time limit might be causing feelings of stress. This can possibly affect the children’s performance negatively. We would have to explore whether this visualization affects children’s performance and their motivation to keep playing.

Formula scoring was originally constructed for the scoring of multiple choice items.

However, there are no multiple choice answering options in the addition game - and in several other games- in Math Garden. This makes guessing a less profitable option in the first place and the negative scoring rule might actually not be as necessary. Comparing the tendency to omit items

(5)

between formula scoring (e.g. high speed high stakes) and number right scoring would give more insight in the necessity of this scoring rule. It is possible that children are tempted to omit a difficult item sooner if they lose points for a wrong answer. Additionally, it can show whether the scoring rule affects answering strategy or ability scores of children in general.

The current study will look at the before mentioned issues. In an addition task, the number of omitted items when formula scoring in the form of a HSHS rule is administered will be compared to when number right scoring is administered. It is expected that there will be a higher number of omitted items when formula scoring is in use.

In addition we will explore how the speed-accuracy trade-off is affected by time-pressure, namely how the visualization of a time limit affects the children’s performance and reaction time. The number of correct items and the reaction time when time-pressure is visualized is compared to when this is not visualized. The number of correct items is expected to be higher without

visualization of time-pressure, since children are not exposed to additional sources of distraction or stress. Reaction time is expected to be longer when time-pressure is visualized. Although it promotes them to work fast, the time-pressure visualization can also distract them from answering the

question.

General method Participants

197 elementary school children from the ages 7 to 14 (M =10.01, SD =1.59) participated in this study. Of these children, 99 were girls and 91 were boys. For the other children this information was missing. The children were acquired from four different Dutch elementary schools that used Math Garden for arithmetic practice.

Pre-study

A pre-study was conducted at two elementary schools that contacted Math Garden

(Oefenweb, 2009) with concerns about the visualization of time limit. Some children seemed to have difficulty handling the time-pressure and remain focused on the game. In response to this concern, a number of children were observed while they played games in Math Garden. The time-limit and scoring rule, visualized by coins disappearing while time decreases, was manually made invisible (e.g. covered with a piece of paper). After the observation, a short interview followed to discuss the

(6)

children’s evaluation of the coins, time limit, scoring, and Math Garden in general. The findings from this pre-study were used to design the main study and for further developmental purposes of Math Garden.

The opinions and experiences of Math Garden varied remarkably over children. About one third to half of the children found the coins representing the time limit stressful or distracting. Several children did not report any difference when the coins were made invisible, while others evidently preferred it. A detailed report can be found in the appendix. The following study can possibly shed more light on how children actually perform under these different conditions.

Experiment set up

A maximum of ten children at a time were seated behind a laptop in a different room than their classroom. The children were asked to fill in a questionnaire in a booklet next to the laptop. This involved general information about their sex, age, grade, and how often they played in Math Garden. Then a one minute addition test (part of Tempo Toets Automatiseren, TTA; de Vos, 2010), included in the booklet, was administered and instructed according to the manual.

After that, the children were instructed about the following computer task. They were informed that the four different games they were about to play had different scoring rules (e.g. lose coins for wrong answers or not), that there was the option to omit an item, and that some of the games did not visualize the time limit (e.g. coins decreasing on screen). They were also asked to answer two questions in the booklet after each game, regarding their experience of the game and their subjective performance.

After the instructions, the children were asked to put on headphones and to start the computer task. The task was designed to resemble Math Garden (figure 2) .

Following the computer task, the children received a stamp to thank them for their participation. A structured interview was conducted with as many children as possible. In the

interview two control questions were asked to see if children noticed the manipulations and followed instructions, these were followed by some questions about their opinion and experience with the scoring rules and time-pressure visualization. The booklet and structured interview is included in the appendix.

(7)

The computer task presents the children with 80 addition problems that have to be solved within a time limit of 20 seconds. These problems are divided into four blocks of 20 items. The problems start at an easy level, and slowly increase in difficulty. These four sequences of 20 items are taken from the Math Garden database and have a similar difficulty rating to serve as parallel tests.

The four blocks represent different conditions that are chosen at random. The conditions represent two different scoring rules, one based on formula scoring and the HSHS rule as

implemented in Math garden (Van der Maas & Maris, 2012; Klinkenberg, 2014) and one based on number right scoring. With the formula scoring rule, a correct response is rewarded with the amount of seconds left at the moment of answering while an incorrect response is penalized with this

amount. An omitted answer is marked with zero. Number right scoring has an identical reward for correct answers as formula scoring, but incorrect responses are given the same value as an omitted answer regardless of response time.

Within these two conditions the representation of the time-pressure and scoring is manipulated, either made visible or invisible. For example, first the condition with number right scoring is randomly selected and subsequently it is randomly determined that the first game has visualized time-pressure (e.g. coins visible) and the second game is without time-pressure

visualization. Then, in the following games formula scoring is administered and the coin visibility is determined at random once more.

The items are presented in a set order, so that every child will start and end with the same item. However, the order of the conditions is different for every child, thus the items presented in every condition vary over children. This way decreasing concentration or learning effects are avoided.

The computer task was designed using Python (Python Software Foundation, version 2.7) and the program was run in PsychoPy (Pierce, 2007; Pierce, 2009).

(8)

Figure 2. Computer task

Results

The data of 22 participants was excluded from further analysis because they were not able to complete all parts of the experiment or their responses failed to be stored correctly.

The data of the remaining 175 participants was used for the following analyses. Some of the remaining participants had missing data for part of the data and have not been included in these specific analyses.

As a validity check, a Pearson correlation was calculated between the sum scores of the four computer tests and the written test (TTA; de Vos, 2010). Since the TTA is extensively used to

measure elementary school children’s arithmetic ability, a high correlation between the computer test and the written test will support the validity of the computer tests. The sum scores show strong correlation with the TTA (see figure 3), and support the validity of the computer test.

(9)

Figure 3. Pearson correlations r of the sum scores of the four computer tests with the TTA.

To review the scoring rules and how the time-pressure visualization influenced scores and answering strategy a repeated measures ANOVA was administered to the number of correct, incorrect, and omitted items. A Shapiro Wilk test of normality claimed that the data of correct (W = 0.910, p<0.001), incorrect (W=0.866 , p<0.001), and omitted items (W=0.721, p<0.001) all did not meet the assumption of normal distribution. The QQ plots, see figure 1, show some noteworthy features, such as a staircase-like distribution, that can be explained by the fact that the data contains discrete variables.

The QQ plot of correct items can be considered approximately normal. However, the distribution of incorrect and omitted items look skewed (figure 4). An ANOVA is relatively robust for the violation of the normality assumption, and further analysis will be continued.

There are no significant differences in the main effect of scoring rule or time-pressure visualization on the number of correct items. However, there is a significant interaction effect between the scoring rule and pressure (F(174)=4.72, p=0.003). Thus, the effect of the time-pressure visualization on number of correct items is different for formula scoring and number right

(10)

scoring. Where the removal of the visualization (no coins) increases the number of correct items when formula scoring is administered, this decreases for number right scoring, see figure 5 and table 1.

For the number of incorrect items a main effect is found for the visualization of time-pressure (F(174)=6.81, p=0.009). The number of incorrect items when time-pressure is visualized (coins) is significantly higher than when this is not visualized (no coins). No other significant differences were found in the number of incorrect items between the scoring rules (figure 6 and table 1).

A main effect for the visualization of time-pressure is also found for the number of omitted items (F(174)=16.23, p<0.001). The number of omitted items when time-pressure is visualized is significantly lower than when it is not visualized. No main effects due to the scoring rule were found. Additionally, an interaction effect between the scoring rule and the time-pressure visualization is also found for the number of omitted items (F(174)=8.09, p=0.0049). That means that the effect of the visualization of time-pressure on the number of omitted items is different for the two scoring rules. When the visualization of time-pressure is removed, the number of omitted items increases more for number right scoring than for formula scoring , see figure 7 and table 1.

These findings are not in line with our expectations that the number of omitted items would be higher when formula scoring is administered than when number right is used. It seems that the scoring rules do not affect children’s answering strategy. However, the visualization of time-pressure does seem to have slight effect on answering strategy, or at least on the number of omitted and incorrect items. Where the number of omitted items increases when there is no time-pressure visualization, the number of incorrect items decreases.

(11)

Table 1. Mean(M) and standard deviation (SD) of number of correct, incorrect and omitted items for

the two scoring rules and visualization

With visualization Without visualization

M SD M SD Correct ** Formula scoring 14.91 3.85 15.00 3.94 Number right 15.23 3.90 14.78 4.03 Incorrect Formula scoring 2.62* 2.11 2.32* 2.10 Number right 2.41* 1.82 2.10* 1.84 Omit** Formula scoring 2.47* 3.43 2.68* 3.62 Number right 2.37* 3.34 3.12* 3.85

* p<0.05 main effect for time-pressure visualization

** p<0.05 interaction effect between scoring rules and time-pressure visualization

(12)

Figure 5. Interaction effect between scoring rule and time-pressure visualization for the number of

correct items. Coins = visualized time-pressure, no coins = time-pressure not visualized.

Figure 6. Main effect of time-pressure visualization for the number of incorrect items. Coins =

(13)

Figure 7. Interaction effect between scoring rule and time-pressure visualization. Main effect for

time-pressure visualization for the number of omitted items. Coins = visualized time-pressure, no coins = time-pressure not visualized.

A post hoc paired t-test was conducted to look at the effects of time-pressure visualization. The data of the different scoring rules was summed and separated by visualization, leaving a condition with and a condition without time-pressure visualization. Several paired t-tests were conducted over the number of correct, incorrect and omitted answers to assess the differences between the computer tests with and without time-pressure visualization.

A Shapiro-Wilk test of normality showed that the difference between the conditions did not meet the assumption of normal distribution, W = 0.9778, p = 0.0067 for correct items, W = 0.9532, p < 0.001 for incorrect items, and W = 0.857, p <0.001 for omitted items. These QQ plots (figure 8) suggest approximate normality and this is considered enough to continue with the paired t-tests.

The number of correct, and incorrect responses are not significantly different between the conditions with and without the visualization of time-pressure. The visualization does not seem to affect children’s performance. The number of omitted items does show to be significantly different between the conditions (see table 2). Although not significant, the number of incorrect responses is showing a trend to be higher in the condition with time-pressure visualization, similar as to what was previously found with the ANOVA. This suggests that the items omitted in the condition without

(14)

time-pressure visualization would most likely have been answered incorrectly in the condition with time-pressure visualization. However, this result should be interpreted with caution considering the non-normality that was established previously.

To see if there is more support for this finding we administered a paired Mann-Whitney-Wilcoxon Test for non-parametric testing. This also showed a difference in number of omitted items between the two conditions of visualization (V = 2440.5, p < 0.001).

With these findings we can describe a small change in answering strategy due to time-pressure visualization. This does however, not seem to be due to the scoring rules as was initially expected. Therefore we will explore this further through a review of the structured interview below.

Figure 8. QQ plots time-pressure visualization.

Table 2.

Mean number of correct, incorrect and omitted items with and without time-pressure visualization

With visualization Without visualization

M SD M SD T P Correct 30.1 7.39 29.8 7.47 1.3433 0.1809 Incorrect 5.0 3.50 4.4 3.20 1.8335 0.0683 Omit 4.84 6.40 5.8 6.96 -4.029* < 0.001 df = 174 * p < 0.001

(15)

To analyze the effects of the scoring rules and the time visualization on the reaction time we performed a repeated measures ANOVA on the log transformed reaction times. The log reaction times did not meet the assumption of normal distribution according to a Shapiro Wilk test of normality (W=0.96, p<0.001). The QQ plot and histogram showed a left skew (figure 9). Considering the robustness against violations of the normality assumption, the analysis will be continued.

A main effect of the time-pressure visualization on the reaction times is detected. The mean (log transformed) reaction times when time-pressure is visualized are significantly different from when the time-pressure is not visualized (F(172)= 13.25, p=0.0003). An interaction effect between the time-pressure visualization and the scoring rules and their effect on reaction times is also detected. The effect of visualization of time-pressure on the mean reaction times are significantly different for formula scoring and for number right scoring (F(172)=19.55, p <0.001). There was no significant difference in reaction times between formula scoring and number right scoring (figure 10 and table 3).

Table 3. Mean (M) and standard deviation (SD) of log transformed reaction times for the two

scoring rules and visualization of time-pressure.

With visualization Without visualization

M SD M SD Formula scoring ** 1.690* 0.37 1.715* 0.33 Number right** 1.754* 0.31 1.638* 0.35

* p<0.05 main effect for time-pressure visualization

(16)

Figure 9.QQ plot and histogram of the distribution of log transformed reaction times.

Figure 10. Interaction effect between scoring rule and time-pressure visualization. Main effect for

pressure visualization for the mean log transformed reaction times. Coins = visualized time-pressure, no coins = time-pressure not visualized.

(17)

An ad hoc analysis of the effects of time-pressure visualization on the reaction time was done with a paired samples t-test. A longer reaction time in the condition with visualization of

time-pressure would suggest that the visualization is distracting or impairing children’s performance. The difference between the condition with visualization and the one without did not meet the assumption of normal distribution according to the Shapiro-Wilk test of normality (W = 0.923, p <0.001). A QQ plot showed approximate normality (figure 11), so we decided to continue with the following analysis. A paired t-test found a significant difference in average log transformed reaction times between the two conditions (T=3.640, p=0.00036).The reaction times when time-pressure is visualized are slightly longer (M=1.72, SD=0.30) than when time-pressure is not visualized (M=1.68,

SD=0.29). This is in line with our expectations and it suggests the visualization has a slight

distracting effect.

As previous research mentions (Budesco & Bar-Hillel, 1993), the scoring rule has to be explicitly known to have effect on the answering strategy. Through the interviews we established whether the children understood the different scoring rules and noticed the time-pressure

visualization. A total of 141 children were interviewed, of which 74 children noticed and understood the different scoring rules. The data of 68 of those children was evaluated, as six children were missing data from the computer task. A paired t-test showed no difference between the scoring rules for the children who seemed to understand the scoring rule, t = 1.1556, p = 0.252, t = -1.8741 , p = 0.065, and t = 1.0154, p = 0.314 for number of correct, incorrect, and omitted items respectively. This suggests that the different scoring rule does not affect the children’s answering strategy, even when the scoring rules are explicitly known.

Considering that the scoring rules do not seem to affect measurement of ability or answering strategy of the children, we continued with some exploratory analyses of the structural interview. The responses on the interview were only analyzed for the children who answered the control questions accurately. 121 children noticed the difference between the time-pressure visualization conditions (e.g. decreasing coins). The majority of children do not find the coins stressful or

distracting, and it does not bother them that the time indication is gone when coins are not visualized. Still a substantial number of children found the coins indicating the time limit a little distracting or even very distracting. 74 children were reported to have understood the different scoring rules. The interview showed that most children do not mind the penalty for a wrong answer, some even claim it will be a better way to learn from their mistakes. The majority of children also note that they are not

(18)

more likely to guess an answer if there is no penalty for incorrect answers. See table 4 for frequencies.

Figure 11. QQ plot log transformed reaction times.

Table 4.

Frequencies interview questions

Question Frequency

Very A little Not at all Chi-square

Do the visible coins cause stress? 21 29 55 18.0571*

Is the lack of time indication when coins are not visible bothersome?

28 19 60 26.0374*

When visible, are the coins distracting?

28 31 54 10.7434*

Do you guess more when there is no negative scoring?

2 13 40 41.7091*

Do you think the deduction of points for an incorrect answer is

bothersome?

9 17 37 19.8095*

Are you extra motivated with a negative scoring rule?

28 16 14 5.322

* p < 0.001

(19)

Discussion

As the results mentioned above suggest, scoring rules have little effect on answering strategy when it comes to non-multiple choice questions. The use of formula scoring does not seem to

influence the amount of omitted or correct items, therefore its use appears to be trivial. The

visualization of time-pressure has a small effect on the answering strategy of children. They tend to omit items a little more often when there is no time-pressure. In addition, children take longer to answer a question when there is time-pressure.

It is possible that when the children felt pressured to work fast they also felt pressured to perform better. Therefore they might have put more effort into trying to answer items correctly, making them omit fewer answers. Another possibility is that the visualization serves as a motivation for children. The time-pressure visualization (e.g. decreasing coins) shows their reward, making them more motivated to answer an item, because it serves as a reminder of the reward they will receive if answered correctly. When this is not visualized, the children possibly feel less motivated and are more likely to omit an item.

An explanation for the fact that we did not find a difference in answering strategy or ability scores between the two scoring rules could be the use of non-multiple choice questions. Formula scoring is initially used to minimize guessing in multiple choice questions (Lesage, Valcke & Sabbe, 2013; Kurz, 1999). This would result in more omitted items instead of guesses. Since the expected value of guessing is extremely low while answering a non-multiple choice question, it is unlikely that one will try to guess an item even without the use of formula scoring. There is also a possibility that the children are slightly too young to change their answering strategy. An answering strategy is usually based on calculating your expected score. Most children will not be capable of doing this correctly yet, thus limiting the possibility of a strategy change. The importance of understanding the scoring rule and being able to calculate the profit of using a certain answering strategy has already been emphasized. Unfortunately, only a limited number of children were rated to have understood the scoring rules that were used in the experiment. Because of the young age of the children, the instructions and the scoring rules might just have been too challenging to understand. It is difficult to properly analyze the effects of the scoring rules on answering strategy, if the basic differences are not recognized accordingly. To substantiate our findings, a follow up study could focus ensuring the scoring rules are understood. Then confidence is gained that the findings are not due to lack of understanding of scoring rules.

(20)

The visualization of time-pressure is shown to have a slight impairing effect on the children’s reaction times. Although many children will say the visualization does not distract them, more than half of the children report that it distracts them at least a little bit. This would suggest that although it might slow some children down, it is not a generalized problem. Since it is mostly a subjective issue, the option to hide the coins in Math Garden (e.g. time-pressure visualization) could reduce stress and distraction for children who are affected by it and potentially make the game feel more rewarding.

Considering that the ability scores and score representations are unchanged by the different scoring rules in the context of children using Math Garden, the choice for a scoring rule will have a different justification. We want to be able to keep children motivated to practice and develop their knowledge and mastery of arithmetics. They seem untroubled by a penalty for incorrect responses and some children do report that it motivates them to try harder. If it does not hurt their performances and motivates a fair amount of children, a high speed high stakes, scoring rule would be most

preferable to use.

References

Bar-Hillel, M., Budescu, D., & Attali, Y. (2005). Scoring and keying multiple choice tests: A case study in irrationality. Mind & Society, 4, 3-12.

Budescu, D., & Bar ‐Hillel, M . (1993)

formula scoring. Journal of Educational Measurement, 30, 277-291.

Burton, R. F. (2002). Misinformation, partial knowledge and guessing in true/false tests. Medical

Education, 36, 805-811.

Burton, R. F. (2004). Multiple choice and true/false tests: reliability measures and some implications of negative marking. Assessment & Evaluation in Higher Education, 29, 585-595.

Burton, R. F. (2005). Multiple‐choice and true/false tests: myths and misapprehensions. Assessment

& Evaluation in Higher Education, 30, 65-72.

Dennis, I., & Evans, J. S. B. (1996). The speed–error trade‐off problem in psychometric testing.

British Journal of Psychology, 87, 105-129.

Diamond, J., & Evans, W. (1973). The correction for guessing. Review of Educational Research, 43, 181-191.

(21)

Espinosa, M. P., & Gardeazabal, J. (2010). Optimal correction for guessing in multiple-choice tests. Journal of Mathematical Psychology, 54, 415-425.

Klinkenberg, S. (2014). High speed high stakes scoring rule: assessing the performance of a new scoring rule for digital asssesment. In Computer assisted assessment: research into

e-assessment.

Lesage, E., Valcke, M., & Sabbe, E. (2013). Scoring methods for multiple choice assessment in higher education–Is it still a matter of number right scoring or negative marking?. Studies

in Educational Evaluation, 39, 188-193.

Van der Linden, W. J. (2007). A hierarchical framework for modeling speed and accuracy on test items. Psychometrika, 72, 287-308.

Lord, F. M. (1975). Formula scoring and number right scoring. Journal of Educational

Measurement, 12, 7-11.

Van Der Maas, H. L., & Wagenmakers, E. J. (2005). A psychometric analysis of chess expertise.

The American journal of psychology, 1, 29-60.

Mattson, D. (1965). The effects of guessing on the standard error of measurement and the reliability of test scores. Educational and Psychological Measurement, 25, 727-730 Maris, G. & Van der Maas, H. L. J. (2012). Speed-accuracy response models: scoring rules based

on response time and accuracy. Psychometrika, 77, 615-633.

Michael, J. J. (1968). The reliability of a multiple-choice examination under various test-taking instructions. Journal of Educational Measurement, 5, 307-314.

Muijtjens, A. M. M. , van Mameren, H., Hoogenboom, R. J. I., Evers, J. L. H & van der Vleuten, C. P. M. (1999). The effect of a ‘don't know’option on test scores: number ‐right and formula scoring compared. Medical Education, 33(4), 267-275.

Oefenweb. (2009). http://www.oefenweb.nl/

Python Software Foundation. Python Language Reference, version 2.7. Available at

http://www.python.org

Peirce, J. W. (2007). PsychoPy—psychophysics software in Python. Journal of neuroscience

methods, 162, 8-13.

Peirce, J. W. (2008). Generating stimuli for neuroscience using PsychoPy. Frontiers in

(22)

Rowley, G. L., & Traub, R. E. (1977). Formula scoring, number‐right scoring, and test‐taking strategy. Journal of Educational Measurement, 14, 15-22.

De Vos, T. (2010). Tempo Toets Automatiseren. Amsterdam: Boom Test Uitgevers.

Appendix

Pre-study:

Vooronderzoek Rekentuin Ervaringen

Om na te gaan welke problemen kinderen ervaren rondom Rekentuin, is er een vooronderzoek uitgevoerd bij twee basisscholen in Nederland. In het vooronderzoek is de aandacht vooral uitgegaan naar het effect van de muntjes die aflopen in het scherm tijdens een item. Daarnaast is er ook naar meer algemene ervaringen rondom Rekentuin gevraagd en is er op beide scholen met een leerkracht gepraat. Omdat deze bevindingen interessant kunnen zijn voor Oefenweb zijn deze ook in het verslag verwerkt.

Het vooronderzoek vond plaats op de Willibrodusschool in Diessen en de Paus Joannesschool in Zaandam. Beide basisscholen maken al een aantal jaar gebruik van Rekentuin en zijn dus bekend met het programma. Hieronder zullen onze bevindingen per school besproken worden.

Wilibrordschool in Diessen (13-04-2015) Procedure

Het vooronderzoek vond plaats in het computerlokaal van de school. Uit elke klas (groep drie tot en met groep acht) waren twee a drie kinderen geselecteerd door de leerkrachten. De leerkrachten hadden van Ben Hagenberg, een ICT medewerker van de school, de opdracht gekregen om in ieder geval een sterke rekenaar en een zwakke rekenaar uit hun klas te selecteren. De in totaal vijftien kinderen gingen voor 30 minuten aan de slag met Rekentuin en ondertussen gingen de onderzoekers de kinderen een voor een af om wat vragen te stellen over Rekentuin en mee te kijken tijdens het spelen van een spelletje. De onderzoekers vroegen de kinderen wat zij van Rekentuin vinden, wat zij juist erg leuk en minder leuk vinden, of ze ook thuis Rekentuin spelen, hoe zij de muntjes in het spel ervaren, en of zij de in Rekentuin gebruikte score regel begrijpen.

Na het stellen van de vragen gingen de kinderen willekeurige Rekentuin spelletjes spelen, waarbij de muntjes die in het scherm te zien zijn tijdens een opgave afgedekt werden met een zwart kartonnetje. Alleen de muntjes werden afgedekt, niet de totaal score. De kinderen speelden een of meerdere spelletjes terwijl de onderzoekers bij het volgende kind de vragen stelden en het kartonnetje plaatsten. Vervolgens kwamen de onderzoekers weer terug bij het kind om te vragen hoe het was bevallen zonder de muntjes te kunnen zien.

Bevindingen Kinderen

De tendens was dat kinderen Rekentuin leuk vinden. De kinderen gaven aan vooral het mollenspel, het kangoeroespel, hun tweede tuintje, de mogelijkheid om verschillende spelletjes te spelen, het winnen van muntjes en het kopen van prijzen erg leuk te vinden. Kinderen die zeiden

(23)

Rekentuin minder leuk te vinden, gaven hiervoor vaak als reden dat ze de spellen saai vinden. Vooral de kinderen uit groep zeven en acht die de spellen al vanaf groep drie speelden gaven aan dat Rekentuin niet uitdagend genoeg was. Tot slot gaven kinderen met een lagere rekenvaardigheid aan Rekentuin niet zo leuk te vinden, met als reden dat ze geen toegang krijgen tot alle spelletjes. Verder begrepen alle kinderen de in Rekentuin gebruikte scoreregel.

Wanneer de kinderen werd gevraagd hoe zij de muntjes in het spel ervoeren, vond ongeveer de helft van de kinderen de muntjes ‘prima’ en de andere helft ervoer de muntjes als ‘stressvol’. Kinderen die de muntjes als stressvol ervoeren gaven aan dat ze zich niet meer goed kunnen concentreren op het uitrekenen van de som en dat ze hun antwoord soms vergeten wanneer ze de muntjes zien weglopen in het scherm. Nadat de kinderen spelletjes hadden gespeeld waarbij de muntjes waren afgedekt, werd hen gevraagd naar hun ervaringen. 87% Van de kinderen gaf aan het prettiger te vinden om zonder de muntjes in beeld te spelen. Zij vertelden dat ze minder stress ervoeren, zich beter konden concentreren op het uitrekenen van de som en het idee hadden minder fouten te maken dan met muntjes in beeld. Wanneer deze kinderen werd gevraagd of zij de muntjes voor altijd uit het beeld willen hebben, vonden zij dit een goed idee. Twee van de vijftien kinderen gaven aan het juist lastig te vinden zonder de muntjes in beeld, omdat zij daardoor niet meer wisten hoeveel tijd ze nog hadden om een opgave te maken. Deze twee kinderen hadden geen sterke voorkeur voor met of zonder de muntjes in beeld spelen.

Bevindingen Leerkracht

Toen de kinderen klaar waren met spelen, is er met Ben Hagenberg gepraat over zijn ervaringen met Rekentuin. Ben Hagenberg is een ICT medewerker van de school en staat ook een aantal middagen voor de klas.

Ben gaf aan dat kinderen met een lage vaardigheid ook meer last hebben van de muntjes. Het leek hem een goed idee om kinderen de optie te geven om de muntjes aan of uit te zetten. Daarnaast gaf hij aan dat er meer opties moeten zijn om het tempo aan te kunnen passen voor zwakke rekenaars. Als voorbeeld gaf hij de optie om een kind 30 seconden per opgave te geven in plaats van 20. Verder gaf Ben aan dat hij de rekenvaardigheid van kinderen niet zag verbeteren door gebruik van Rekentuin. Wel zijn de kinderen door het gebruik van Rekentuin gemotiveerder geworden. Ben vertelde echter dat dit niet geldt voor een groot aantal kinderen uit groep zeven en acht, omdat zij al vanaf groep drie met Rekentuin werken en daardoor hun interesse in de spelletjes hebben verloren.

Bevindingen voor Oefenweb

Hieronder worden puntsgewijs een aantal bevindingen en opmerkingen van de leerkracht besproken die nuttig kunnen zijn voor Oefenweb.

• Rekentuin is teveel gericht op kinderen die al hoge rekenvaardigheid hebben.

• Leerkrachten worden te weinig geïnformeerd over Rekentuin. Zij begrijpen onder andere vaak niet dat het programma adaptief is en weten niet dat zij invloed kunnen uitoefenen op welke spellen er door de kinderen gespeeld worden. Door dit gebrek aan kennis, zijn vele leerkrachten terughoudend in het gebruik van Rekentuin.

• Ben Hagenberg gaf ook aan dat kinderen het waarschijnlijk prettig zouden vinden als zij in het spel zelf meer informatie zouden krijgen over Rekentuin. Informatie over hoe het tuintje werkt, wat er gebeurd met de bloemen wanneer je niet speelt en hoe je meer spelletjes in je tuintje kan krijgen.

• Ben zou graag een koppeling zien van rekentuin resultaten naar het leerlingvolgsysteem (in hun geval ParnasSys). De resultaten van Cito en van Wereld in Getallen worden via dit

(24)

systeem aan de ouders bekend gemaakt en Ben lijkt het een goed idee om de rekentuin resultaten van de kinderen hier aan toe te voegen.

• Daarnaast gaf Ben aan dat de prijzen die gekocht kunnen worden niet leuk zijn voor alle leeftijden. Ook vindt hij de prijzen minder geschikt voor jongens (“Wat moeten jongens nou met juwelen?”). Daarnaast willen meisjes misschien liever kleren kopen en poppetjes aankleden. Voor kinderen uit groep zeven en acht zou hij graag wat volwassenere prijzen zien.

RKBS Paus Joannesschool in Zaandam (14-04-2015) Procedure

Uit de klassen vier tot en met acht werden telkens drie leerlingen geselecteerd om in de leraren kamer vragen over Rekentuin te beantwoorden en een Rekentuinspelletje te spelen op de iPad. Naast het verschil in onderzoeksruimte en het gebruik van een iPad in plaats van een computer, was de procedure identiek aan de gevolgde procedure op de Willibrordusschool.

Bevindingen Kinderen

De meeste kinderen vinden het leuk om in Rekentuin te spelen. Kinderen vinden het leuk dat er telkens nieuwe spellen te spelen zijn. De kinderen die Rekentuin wat minder leuk vinden en daardoor minder spelen vinden het vervelend dat niet alle spellen voor hen beschikbaar zijn en geven aan dat ze dit demotiverend vinden. De kinderen uit groep zeven en acht gaven soms aan dat ze het spel saai vinden. Alle kinderen begrepen de scoreregel.

Wanneer aan de kinderen werd gevraagd wat zij van de muntjes in het spel vinden, bleek dat ongeveer één derde van de kinderen de muntjes als storend of afleidend ervoeren. Nadat de kinderen spelletjes hadden gespeeld waarbij de muntjes waren afgedekt, werd hen gevraagd naar hun ervaringen. De reacties waren wisselend. Acht van de vijftien kinderen gaven aan het prettiger te vinden om zonder de muntjes in beeld te spelen. Zij vertelden dat ze minder afgeleid werden, zich beter konden concentreren, hun antwoord niet vergaten, minder stress ervoeren en dat ze minder gingen gokken dan wanneer de muntjes wel in beeld waren. Voor vier van de vijftien kinderen maakte de aan- of afwezigheid van de muntjes geen verschil. Drie van de vijftien kinderen gaven aan het vervelend te vinden wanneer ze de muntjes niet in beeld zagen. De onduidelijkheid over de resterende tijd leverde volgens hen stress op. Zij vertelden dat ze graag willen zien hoeveel muntjes ze verdienen of verliezen.

Bevindingen Leerkracht

Nadat alle vijftien kinderen mee hadden gedaan aan het vooronderzoek is er met Ethlyne Hart gepraat over haar ervaringen met Rekentuin. Ethlyne Hart is de reken coördinator van de school en tevens docent van groep 6.

Ethlyne gaf aan niet te merken dat de aflopende muntjes stress veroorzaken bij leerlingen. De muntjes zijn naar haar ervaring juist over het algemeen motiverend. Rekentuin wordt door de meeste leerkrachten alleen gebruikt als “klaarwerk” of als iets wat de kinderen mogen spelen wanneer ze voor aanvang van de les al op school zijn. Ethlyne gaf aan een kleine verbetering van de

rekenvaardigheid van kinderen te zien nadat zij veel oefenen met Rekentuin. Ook worden kinderen meer gemotiveerd wanneer zij regelmatig met Rekentuin oefenen. Kinderen die meer moeite hebben met rekenen hebben vaak extra aansporing nodig om in Rekentuin te gaan spelen.

(25)

Hieronder worden puntsgewijs een aantal bevindingen en opmerkingen van de leerkracht besproken die nuttig kunnen zijn voor Oefenweb.

• Ethlyne gaf aan dat bij een aantal kinderen een andere groep wordt aangegeven dan waar ze daadwerkelijk in zitten. Patrizia vertelde ons dat dit een fout is van de leerkrachten zelf. Die geven namelijk aan in het systeem in welke groep een kind zit. Hieruit blijkt dat de docenten niet geheel bekend zijn met alle instellingen van Rekentuin.

• Daarnaast bleek Ethlyne niet bekend te zijn met het adaptieve aspect van Rekentuin. niet snapte en dat ze Rekentuin niet gebruikt om informatie over haar klas te krijgen.

• De leerkrachten op de school maakten nauwelijks of geen gebruik van de backend. Een ICT medewerker van de school print elke week alleen een overzicht uit waarop te zien is hoeveel elke klas van Rekentuin gebruik maakt. Ethlyne wist niet dat er nog vele andere manieren zijn om in Rekentuin informatie over het niveau en de voortgang van het kind op te vragen. Het aantal uren dat een kind in Rekentuin speelt wordt wel vertaald in een onvoldoende, voldoende of goed op het rapport van het kind.

• Als laatste gaf Ethlyne aan dat het uiterlijk van Rekentuin veranderd zou mogen worden. Het spel bestaat inmiddels al 6 jaar, dus volgens haar was het tijd om de lay-out te vernieuwen.

(26)

Booklet

Oefenweb Onderzoek

Houd dit boekje nog even dicht

Deelnemer nummer:

Datum / /

Test afgemaakt? Ja/Nee

(27)

Ik ben een:

Meisje Jongen

Ik ben zoveel jaar oud:

6 7 8 9 10 11 12 13

Ik zit in groep:

4 5 6 7 8

Op school speel ik Rekentuin:

Altijd

Vaak

Soms

Nooit

Thuis speel ik Rekentuin:

Altijd

Vaak

Soms

(28)
(29)

29

NA SPEL 1

Ik vond dit spel:

Heel leuk

Een beetje leuk

Niet leuk

Het spel ging:

Goed

❏ Een beetje goed

Niet goed

Begin nu aan spel 2 op de computer.

NA SPEL 2

Ik vond dit spel:

Heel leuk

Een beetje leuk

Niet leuk

Het spel ging:

Goed

❏ Een beetje goed

Niet goed

(30)

30

(31)

31

NA SPEL 3

Ik vond dit spel:

Heel leuk

Een beetje leuk

Niet leuk

Het spel ging:

Goed

❏ Een beetje goed

Niet goed

Begin nu aan spel 4 op de computer.

NA SPEL 4

Ik vond dit spel:

Heel leuk

Een beetje leuk

Niet leuk

Het spel ging:

Goed

❏ Een beetje goed

Niet goed

(32)

32

Gestructureerd interview

Vragen over Muntjes

Was het kind zich bewust van of de muntjes wel of niet in het

scherm stonden?

1

2

Wordt het kind gestrest van de muntjes in het scherm?

1

2

3

4

Vind het kind het vervelend/lastig dat er zonder muntjes geen

tijdsindicatie meer is?

1

2

3

4

Vind het kind de muntjes in het scherm afleidend?

1

2

3

4

Notities:

Vragen over Scoreregels

Snapt het kind het verschil tussen de twee scoreregels?

1

2

Gaat het kind meer gokken als er geen straf is voor een fout

antwoord?

1

2

3

4

Vind het kind het storend als er een straf voor een fout antwoord

is?

1

2

3

4

Is het kind extra gemotiveerd als er muntjes afgetrokken worden

bij een fout antwoord?

1

2

3

4

Notities:

Referenties

GERELATEERDE DOCUMENTEN

Developing the MES-system and other systems into a sufficient maturity level can play a strategic role in supporting Industry 4.0 technologies; focusing on the future, the data

Each stakeholder has a different role regarding strategy communication, for example, top management is mainly responsible for the formulation of said strategy and communicating it

As time pressure is expected to have a negative impact on both thriving and turnover intention, and thriving individuals seem to be more inclined to stay at the firm, it

Figure 3 Boxplots of the relationship between the magnetic tracer and: (A) the number of excised sentinel lymph nodes during surgery; (B) the ex vivo magnetometer counts of the

Chapter 4 thus gives a detailed biochemical characterization and product profile analysis of mutant enzymes with changed amino acid residues in the active site of BgaD-D of B..

Alleen op het niveau van dus echt historisch kunnen denken is dat denk ik dan, heeft dat niet zoveel zin denk ik Alejandro: Nee Annelieke: Maar ja, ik denk het is al heel wat

The theory thus developed is applied to give solutions in state space and frequency domain formulation of the disturbance decoupling problem, the disturbance decoupling with

Because this is a first exploratory study, aiming to validate the instrument, we focused specifically on the influence of sexual stimuli on the threshold of unpleasant