• No results found

Using learning analytics : examining processes on time series of single items and improving the system of Math Garden

N/A
N/A
Protected

Academic year: 2021

Share "Using learning analytics : examining processes on time series of single items and improving the system of Math Garden"

Copied!
28
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

Using learning analytics: Examining processes on time series

of single items and improving the system of Math Garden

Name: Timo Fernhout Student number: 10346597 Date: 15-06-2016

(2)

Introduction

There is a large amount of data available on learning due to the emergence of online learning systems. These systems can provide feedback for instructors, give recommendations to students, predict future performance of students and facilitate the development of cognitive models of students (Romero & Ventura, 2010). Furthermore the data that is generated by these online learning systems can shed light on what kind of developmental trajectories underlie learning processes. These trajectories can subsequently inform theories about developmental change. To be able to distinguish between different trajectories, frequent sampling with small intervals is needed (Adolph, Robinson, Young & Gill-Alvarez, 2008). Moreover, online learning systems often have high ecological validity because they are used as an actual learning environment (Brinkhuis, Savi, Coomans, Hofman, van der Maas & Maris, 2015).

One way in which frequent sampling has been applied, is by using an approach called the microgenetic method. The microgenetic method has three key properties: (1) Observations span the period from when the change started to when it ended, (2) The frequency of the observations is high relative to the rate of change and (3) The observations are subjected to intensive analysis with the intention to find underlying change processes (Siegler & Crowley, 1991; Siegler, 2006).

One of the key findings of research using the microgenetic method is the existence of high variability in the cognitive development of children. There is a high variability in the usage of strategies within children when a problem is presented to them on multiple moments close in time, as well as between children (Siegler, 2006). According to Siegler (1996) multiple strategies are available to an individual and the initial arrival of a new and more advanced strategy does not lead to consistent application of that strategy. At first the less adequate strategies will continue to

compete with the more advanced strategy, but over time the more advanced strategy will replace the older strategies. Thus developmental change is not a simple change from strategy A to strategy B, but could be a continuous shift in the distribution of use of multiple strategies (Kuhn, 1995).

(3)

(Lemaire, 2010). At first children use mostly simple counting strategies. After they get more experience, children will advance to using more complex strategies, such as repeated addition for solving single digit multiplication (van der Ven, Boom, Kroesbergen & Leseman, 2012). In an experiment done by Lemaire and Siegler (1995) children progressed to using more complex strategies more often, but at each time point children used a mixture of strategies.

Mathematical proficiency is essential for functioning in today's society. For instance, mathematical proficiency is associated with higher levels of employability (Hoyles, Wolf, Molyneux-Hodgson & Kent, 2002; Meng & Finnie, 2006) and it is necessary for making well-informed health decisions (Reyna & Brainerd, 2007). Despite the importance of mathematics, relatively little is known about the way in which children learn mathematics (van der Ven et al., 2012). Hence this research will use frequent sampling on the domain of mathematics in order to improve knowledge in this field. This will be done by using data that has been collected with Math Garden.

Math Garden

Math Garden is an adaptive learning environment where children can learn mathematics. Math Garden has currently collected almost half a billion responses from 409,000 children on more than 22,000 items (Brinkhuis et al., 2015). Children themselves can set the difficulty setting. They can choose between easy (approximately 90% correct), medium (approximately 75% correct) and hard (approximately 60% correct). Math Garden adaptively matches items to children in such a way that children have a fixed chance of answering an item correct, where the fixed chance corresponds to the chosen difficulty level. In order to do this Math Garden uses an extension of classic

computerized adaptive testing (CAT) methods. CAT is based on item response theory (IRT) and the extension of CAT used by Math Garden is based on an extension of the Rasch model (van der Maas, Kan, Hofman & Raijmakers, 2014). In CAT item administration depends on the response of the previous item. A subject that answers an item incorrect or too slowly will receive an easier item,

(4)

while a correct response within the expected time leads to a more difficult item. One of the differences with classic CAT methods is that Math Garden tries to optimize learning and

motivation, whereas classic CAT methods try to optimize measurement precision (Brinkhuis et al., 2015).

The extended CAT method of Math Garden is based on two psychometric innovations. First, it implements an explicit scoring rule that uses both accuracy and response time. It is called the Signed Residual Time (SRT) scoring rule and was introduced by Maris and van der Maas (2012). The scoring rule discourages fast guessing and it makes the speed-accuracy trade-off explicit. Second, it uses an Elo estimation algorithm based on the Elo Rating System (ERS) which originated in chess competitions (Elo, 1978). The Elo estimation allows for on the fly item calibration by updating the ability estimates of persons and the difficulty estimates of items with each answered item (Klinkenberg, Straatemeier & van der Maas, 2011).

One of the assumptions of the Elo estimation used by Math Garden is that each domain within Math Garden is unidimensional. This means that the responses of a person to all items within a given domain are conditionally independent given the ability estimate of that person for that domain. It also means that the responses of all persons on an item are conditionally independent given the difficulty estimate of that item. If this assumption holds it should not be possible to find systematic differences between users with the same ability estimates.

For this research a subset of the data of Math Garden will be used. To be specific, the data is from children that have visited Math Garden almost daily and that have played frequently for prolonged periods. Included in this data is a large set of person-by-item time series, which are time series of responses of a single child to a single item. These responses will be coded as zero for an incorrect response and one for a correct response. The use of a question mark will count as an incorrect response, unless stated otherwise.

Because of the adaptivity of Math Garden it is expected that most of these person-by-item time series are rather small. The reason is that a long time series implies that during that whole time

(5)

no learning has taken place on the ability level. If learning would have taken place, it should be reflected in the ability estimate of the person, which in turn should lead to more difficult items. However, it is not possible to exclude learning on the item level. For instance if learning takes place on some items, but not on others, it could be that the ability estimate does not change significantly. This in turn could lead to long time series, because no new and more difficult items would be presented.

Research Questions

Based on the great variation in the use of strategies between persons and within persons as discussed earlier, it is hypothesized that the current Math Garden model cannot account for all the patterns in the data. Most research about learning strategies within mathematics is done for addition and multiplication; therefore we will focus on these two domains. One of the findings from this research is that children can invent strategies for multi digit multiplication (Ambrose, Baek & Carpenter, 2003). Therefore it is expected that there are more strategies for multiplication that the current model cannot pick up. We hypothesize that it is possible to find more and larger deviations from the expected patterns based on the current model for multiplication than for addition.

Earlier research using data from Math Garden showed that for most children multiplication with the 100 or 1000 operator is easier than the majority of the single digit multiplication (van der Ven, Straatemeier, Jansen, Klinkenberg & van der Maas, 2015). This is striking because these operators are taught years later than single digit multiplication. To illustrate the kinds of patterns that can be found in the data a selection of time series will be shown for one user, see Figure 1. As can be seen there are some time series that have almost only correct responses and some have almost only question marks. The former often feature the 100 or the 1000 operator, while the latter often feature the regular multiplication tables.

Based on the patterns shown in Figure 1 and the findings from van der Ven et al. (2015) we hypothesize that the 100 or 1000 operator items will be answered more often correct than you

(6)

would expect based on the Math Garden model, while the multiplication tables items will be

answered less often correct than expected. Just as the user shown in Figure 1, we expect that a large group of users will perform above expectation on the 100 and 1000 operator items, while

performing under expectation on the regular multiplication tables. However, we would also expect that there is a group of users for whom it is the other way around. This is because if almost all users would perform better or worse than expected on some item, the difficulty estimate of that item would change to respectively a much lower or a much higher estimate. The difficulty estimate would change until the users matched to that item would perform on average as predicted by the model. Because of this adaptivity it is not possible for all users to perform above or below expectation on a given item. Thus, we hypothesize that it is possible to identify the two aforementioned groups of users. This implies that the domain of multiplication would not be unidimensional, which would be a violation of one of the assumptions of the Math Garden model.

During this research different learning analytics will be collected. It could be that in the future some of these learning analytics will be used in the ordinary course of business in Math Garden. Therefore we focus on learning analytics that are feasible to use in a big data setting. The learning analytics should thus be fairly simple and not computation intensive. Therefore we will only use learning analytics that fulfill this restriction.

The aims of this research are twofold: (1) To shed light on the processes underlying the learning of mathematics and (2) To collect learning analytics to improve the system of Math Garden. This will be done in the following way. First, learning on the level of person-by-item time series will be quantified. Second, descriptives of the person-by-item time series will be collected. Third, these descriptives will be used to compare addition with multiplication and items featuring the 100 or 1000 operator with items featuring the regular multiplication tables.

The structure of this paper is as follows. First, we will introduce the method of data selection. Second, we will explain the learning analytics that we collected. Third, some initial results of the data selection and the learning analytics will be given. Fourth, the results of testing for

(7)

unidimensionality will be provided. Fifth, results about learning effects will be given. Last, our conclusions and remarks about the research will be presented.

Methods Data selection

We focused on the items from the domains addition and multiplication and on responses made between 1 September 2014 and 1 September 2015. A subset was created in the following way. First, for each domain users were selected that had a minimum of 500 responses during that period and had at least one response in 10 different weeks. In this way all users that are selected are users that played almost daily and that have played frequently for prolonged periods. Second, all time series shorter than 5 were excluded, because most descriptives will not be meaningful for short person-by-item time series. Third, all responses that were made under the easy difficulty level were excluded. These responses have a chance of 90% of being correct, so most responses will indeed be correct and will not carry much information. The minimum of 500 responses with at least one response in 10 different weeks was still required after the exclusion of data from the fourth and fifth step.

Learning analytics

The following descriptives of person-by-item time series were collected for the sole purpose of learning analytics:

• Transition probability matrix of correct and incorrect responses • Percentage correct responses in the last 5 or 2 responses

The transition probability matrix is important because we want to know how stable switches from incorrect to correct responses are. The percentage correct responses in the last 5 or 2 responses are important because we want to know if users are able to answer an item correct at the end of a time series.

(8)

Figure 1. Accuracy development for one user on the multiplication domain. The minimum number

of responses for each time series was 5. On the y-axis the items sorted on difficulty are plotted.

Unidimensionality

In order to find deviations from the expected patterns based on the Math Garden model the following descriptives of person-by-item time series were collected:

• Percentage correct in the last 10 responses

(9)

• Percentage fluctuations: the number of fluctuations between correct and incorrect responses divided by the maximum number of fluctuations

• Fit: the mean difference between observed and expected score according to the SRT scoring rule

The most important deviation that will be tested is the violation of the assumption of unidimensionality.

Learning effects

In order to quantify learning we fitted logistic regressions to person-by-item time series. The logistic function is:

𝑝𝑝(𝑥𝑥) = 1 1 + 𝑒𝑒−(𝛽𝛽0+𝛽𝛽1𝑥𝑥)

where 𝛽𝛽0is the intercept and 𝛽𝛽1is the steepness of the curve. We used the index of the person-by-item time series as the explanatory variable. The steepness of the curve could reflect the extend to which the probability of a correct response increases after a repetition of the same item. A flat curve could indicate that an item is already learned at the start of the data collection, or that an item is not learned during the period of data collection.

We have chosen to use Bayesian logistic regression instead of regular logistic regression, because regular logistic regression cannot handle complete separation (Gelman, Jakulin, Pittau & Su, 2008). Complete separation occurs when a logistic regression function can generate perfect predictions. Furthermore, when a developmental trajectory involves a step-like function where the absence or occurrence of a skill is probabilistic a simple smoothing function can give a clearer view of the developmental trajectory (Adolph et al., 2008). Therefore we considered a simple smoothing function, in this case a moving average of two responses. We performed a small simulation study in order to test the power of different logistic regression methods, see Appendix A. According to this simulation study it was best to use Bayesian logistic regression using a smoothing function. Therefore this is the logistic regression method we used during the remainder of this research.

(10)

Results Data selection

All responses within the domains addition and multiplication from users that had a minimum of 500 responses during the period between 1 September 2014 and 1 September 2015 were selected. Another requirement was that the users should have at least one response in 10 different weeks. The exclusion of time series shorter than 5 led to the removal of 42.58 % of responses for addition and 21.24 % of responses for multiplication. The exclusion of the responses made under the easy

difficulty level setting led to the removal of a further 59.57 % of responses for addition and 38.12 % of responses for multiplication. Some descriptives of the data after the completion of all the steps of the data selection process are shown in Table 1. The selected number of users for addition was 1.9 % of all potential users and for multiplication it was 2.9 %. Potential users is defined here as users with a minimum of 15 responses for at least one month during the selected period.

Table 1

The number of users, the number of items, the number of responses, the number of person-by-item time series and the maximum and mean length of a time series for each domain.

Addition Multiplication

# Users 2,458 2,683

# Items 1,172 694

# Responses 1,104,513 2,013,814

# Time Series 133,199 178,706

Maximum Length of Time Series 102 264

(11)

Figure 2. The distribution of lengths of the person-by-item time series with a length of 50 or

shorter.

Learning analytics

The distribution of lengths of the person-by-item time series is shown for lengths 50 and smaller in Figure 2. As can be seen a high proportion of the person-by-item time series have a small length. The mean transition probabilities for addition and multiplication are shown respectively in Table 2 and Table 3. For both the domains, if there is a correct response, the next response is most probable again a correct response. However, the probability that the next response is incorrect is higher than we would want. This implies that a switch from incorrect to correct responses is not very stable.

The probabilities of having all responses correct in the last 2 and 5 responses to an item were calculated, see Table 4. Only in approximately half of the cases are all responses in the last 2

responses of an item correct. If we look at having all responses correct in the last 5 responses the probabilities are only 0.14 for addition and 0.24 for multiplication. We could therefore conclude that it is uncertain that users are able to answer an item correct at the end of a time series.

(12)

Table 2

The mean transition probabilities for addition.

Incorrect Correct

Incorrect 0.40 0.60

Correct 0.46 0.54

Table 3

The mean transition probabilities for multiplication.

Incorrect Correct

Incorrect 0.47 0.53

Correct 0.39 0.61

Table 4

The probabilities of having all responses correct in the last 2 and 5 responses to an item.

Addition Multiplication

The last 2 responses 0.46 0.53

The last 5 responses 0.14 0.24

Unidimensionality

The distribution of percentage correct in the last 10 responses to an item for all time series is plotted in Figure 3a. Included in this figure is the expected distribution. The expected distribution is sampled from a binomial distribution with a chance of correct of 0.675. This chance of correct was chosen because it is the midpoint between the chances of correct under the difficulty settings medium and hard. As can be seen the distribution for addition is not very different from the expected distribution, only the tails have more mass than expected. However, the distribution for multiplication is different than expected. Instead of a peak around the value of 0.675 there is a peak around the value of 0.9 and there seems to be also a peak around the value 0. This bimodal

character of the distribution gets more noticeable when only the time series of length larger than 20 are used, see Figure 3b, while it gets less noticeable when only the time series of length smaller of

(13)

equal than 20 are used, see Figure 3c. When using only the time series of length larger than 40 it gets even more pronounced, see Figure 3d. At the same time, varying the lengths of the time series does not make a big difference for the distribution of addition. For multiplication this means that for large time series the chance is high that a user either answers the last 10 responses to an item all correct or all incorrect. The high chance of answering all last 10 responses to an item incorrect is not desirable. Because it were the last 10 responses, an item was not again provided to a user. That will only happen if according to the ability estimate of a user an item should be too simple.

However, in reality the user was not able to answer that item correct.

Figure 3. From left to right: (a) The distribution of percentage correct in the last 10 responses. (b)

The distribution of percentage correct in the last 10 responses for time series of length larger than 20. (c) The distribution of percentage correct in the last 10 responses for time series of length smaller than or equal to 20. (d) The distribution of percentage correct in the last 10 responses for time series of length larger than 40.

(14)

The effect of the length of the person-by-item time series for the domain of multiplication is evident in multiple ways. The mean percentages correct for time series of the same length are plotted in Figure 4a. The mean percentage correct decreases when the length increases for multiplication, while it is relatively stable for addition. The decrease in percentage correct for longer time series could partly be caused by the increase in percentage question mark, see Figure 4b. The mean percentage of fluctuations between correct and incorrect responses decreases when the length increases for both multiplication and addition, see Figure 4c. This decrease is especially noticeable for multiplication. Longer time series are more stable and will feature more often long series of responses consisting solely of correct or incorrect responses.

Figure 4. From left to right: (a) The mean percentage correct per length of the time series. (b) The

mean percentage question mark per length of the time series. (c) The mean percentage fluctuations per length of the time series.

(15)

The mean percentage correct per item does not differ much between addition and

multiplication, only the dispersion is greater for multiplication, see Figure 5a. Within multiplication there is a difference in the mean percentage correct between items featuring the regular

multiplication tables and items featuring the 100 or 1000 operator, see Figure 5b. The table items have a smaller mean percentage correct and a greater standard deviation than the 100/1000 items. Furthermore, the table items have a greater mean percentage question mark and a greater standard deviation than the 100/1000 items, see Figure 6b. The mean percentage question mark is greater for items from multiplication than for items from addition, see Figure 6a.

Another way to look at how well items perform according to expectation, is by using the mean fit per item, see Figure 7a. For addition the fit is centered on zero, which is what we expect. For multiplication the mean fit per item is most of the times higher than zero, which means that for most items the observed scores are often greater than the expected scores. This is especially true for the 100/1000 items, see Figure 7b.

For each user that was included in both the domains the mean percentage correct was calculated for all items within addition and for all items within multiplication. We expected no correlation between these means, however they had a strong positive correlation, r(1012)=0.71, p<0.001. We did not expect that because each user should have a percentage correct around 0.675 for both the domains and deviations from that percentage should be random. This result could partly be explained by the fact that users will probably use the same difficulty level on both the domains. Users who use the hard difficulty level on both domains are to be expected to have a lower

percentage correct than users who use the medium difficulty level on both the domains. Another partial explanation could be the individual differences in the usage of question marks. However, this is not a full explanation. Thus we could conclude that users that score above expectation on one domain also tend to score above expectation on the other domain.

Next we tried to identify the two groups of users who differ in their ability for the 100 and 1000 operators and the regular multiplication tables. For each user in the multiplication domain the

(16)

Figure 5. From left to right: (a) The distribution of mean percentage correct per item. (b) The mean

percentage correct per item on the x-axis, the standard deviation percentage correct per item on the y-axis.

Figure 6. From left to right: (a) The distribution of mean percentage question mark per item. (b)

The mean percentage question mark per item on the x-axis, the standard deviation percentage question mark per item on the y-axis.

Figure 7. From left to right: (a) The distribution of fit per item. (b) The mean fit per item on the

(17)

mean percentage correct was calculated for all items featuring the regular multiplication tables and for all items featuring the 100 or 1000 operator, see Figure 8. Only users that had more than five responses for both sets of items were included. We expected that these means of percentage correct would have a negative correlation. Indeed they had a moderate negative correlation, r(1597)=-0.40, p<0.001. It seems thus that users that score higher than expected on one set of items, tend to score lower than expected on the other set of items and the other way around. In other words, there appears to be a group of users for whom the 100 and 1000 operators are easier than predicted by the model, while the regular multiplication tables are more difficult than predicted, and one group of users for whom it is the other way around. Therefore we could conclude that the domain of

multiplication is not unidimensional and that there are multiple processes present in the learning of multiplication. It should be noted that the negative relationship between the percentages correct for the 100 and 1000 operators and for the regular multiplication tables is a local effect due to the adaptive nature of Math Garden. This negative relationship holds for users who have roughly the same ability estimate for multiplication, because only users with roughly the same ability estimate will make the same items. Thus, it does not mean that the same negative relationship would hold if users would be presented with all items irrespective from their ability.

Learning effects

To get a feeling for how a learning effect would look like according to our logistic

regression method three time series with their corresponding curve are plotted in Figure 9. Included here are the time series for three different users on the same item, showing three different

possibilities. In Figure 9a the slope of the logistic regression was significant and a learning effect is present. In Figure 9b the slope was not significant and learning did not take place during the data collection. In Figure 9c the slope was again not significant, but this time learning did take place before the data collection started. This shows that interpreting a flat curve is not straightforward.

(18)

Figure 8. The mean percentage correct per user over all time series featuring the regular

multiplication tables and over all time series featuring the 100 or 1000 operator. Each dot represents one user.

Figure 9. Three time series from three different users for the item 6 x 2, with their corresponding

(19)

In order to find differences in learning effects between different lengths of the time series the mean slope of the fitted logistic regression where calculated per length. The mean slope decreases when the length increases for both multiplication and addition, see Figure 10a. This decrease is especially noticeable for multiplication. Furthermore, the mean slopes per item are greater for multiplication than for addition, see Figure 10b. Time series with length smaller than 10 or larger than 50 were excluded, because long time series appear to be different than the other time series and according to our simulation it is not possible to find effects for short time series. The difference between the mean of the mean slopes per item for addition (M=0.08, SD=0.06) and for multiplication (M=0.14, SD=0.09) was significant, t(394)=-8.10, p<0.001.

Furthermore, for each user that was included in both the domains the mean slope of the logistic regression was calculated for all items within addition and for all items within

multiplication. Time series shorter than 10 or longer than 50 were excluded. Based on the positive correlation for the means of percentage correct we expected also a positive correlation for the means of the slopes. Indeed, the means of the slope of the logistic regression had a moderate

positive correlation, r(242)=0.33, p<0.001. This implies that users that show learning effects on one domain also tend to show learning effects on the other domain.

Figure 10. From left to right: (a) The mean slope of the logistic regression per length of the time

(20)

Additionally, for each user in the multiplication domain the mean slope of the logistic regression was calculated for all items featuring the regular multiplication tables and for all items featuring the 100 or 1000 operator. Time series shorter than 10 or longer than 50 were excluded. Based on the negative correlation for the means of percentage correct for the two sets of items we expected that users would often only show learning effects for one of the sets of items and not for the other. Accordingly we expected a negative correlation between the means of the slope of the logistic regression for the two sets of items. However, they had no correlation, r(529)=0.04, p=0.36. Thus, the negative correlation that holds for the mean percentage correct does not hold for the mean slope of the logistic regression, when comparing the two sets of items within multiplication. This is even more surprising when considering that these correlations showed the same positive

relationship when they were calculated for the two domains. The existence of multiple processes in the learning of multiplication was evident from our earlier finding. However, these multiple

processes are not reflected in the mean slopes of the logistic regression per user. From this we could conclude that either the multiple processes do not cause differences in learning effects or our

logistic regression method is not able to pick up these differences in learning effects.

Discussion

We started this research with two aims in mind. On the one hand we wanted to examine the processes underlying the learning of mathematics and on the other hand we wanted to improve the system of Math Garden. As was hypothesized it was possible to find more and larger deviations from the expected patterns based on the current Math Garden model for multiplication than for addition. For instance, the distribution of percentages correct in the last 10 items was almost as expected for addition, while it was radically different for multiplication. For long time series within the multiplication domain most responses in the last 10 items were either all correct or all incorrect. This was not the only effect of length, for example the percentage of question mark increases with increasing length for multiplication, while it was stable for addition.

(21)

Furthermore it was possible to find differences between the items that feature the 100 or 1000 operator and the items that feature the regular multiplication tables. We hypothesized that the 100 or 1000 operator items would be answered more often correct than you would expect based on the Math Garden model, while the multiplication tables items would be answered less often correct than expected. This was indeed the case. We also hypothesized that it would be possible to identify two groups of users. One for whom the 100 and 1000 operators are easier than predicted by the model, while the regular multiplication tables are more difficult than predicted, and one group of users for whom it is the other way around. The most important support for this hypothesis was that users who have a higher mean percentage correct than expected on the 100 or 1000 operator items tended to have a lower mean percentage correct than expected on the multiplication tables items and vice versa. From this we could conclude that the domain of multiplication is not unidimensional and that there are multiple processes present in the learning of multiplication.

It was not possible to find a relationship between the mean slopes of the logistic regression per user for the 100 or 1000 operator items and the mean slopes per user for the multiplication tables items. From this we could conclude that either the multiple processes present in the learning of multiplication do not cause differences in learning effects or our logistic regression method is not able to pick up these differences in learning effects. However, there was a difference in the mean slopes per item between addition and multiplication. Learning effects were more often present in time series from multiplication than in time series from addition.

As was discussed earlier, the interpretation of a flat curve is not straightforward. It could either indicate that an item is already learned at the start of the data collection, or that an item is not learned during the period of data collection. This also makes it more difficult to interpreter our findings about learning effects. A solution for this would be to enforce that at the beginning of a time series an item is not already learned before a logistic regression would be fitted. For instance this could be done by only using time series that begin with three incorrect responses.

(22)

explanatory variable, instead of the index of the time series. The time between responses can be of varying length and during that time children will probably also learn because of the education they receive. Therefore it could be that elapsed time is a better explanatory variable. This is something that could be used in future research.

A shortcoming of this research is that it only used the correctness of the responses and disregarded the reaction times. The reaction times also carry a lot of information. When only using the correctness a long series of correct responses can seem as stable, while it could be that the responses get faster over time. So learning can take place, while the correctness of the responses stays the same. In future research the response times could shed light on what happens during the long series of correct and incorrect responses from long time series. However, it should be noted that quantifying learning processes on time series of single items, as collected by Math Garden, is uncharted territory. This research made an important start, which will enable future research on learning processes.

One of the aims of this research was to improve the system of Math Garden. One way in which Math Garden could improve is by estimating a different ability for users on the 100 and 1000 operator items and on the multiplication tables. This research could also be used as guide for

finding violations of the unidimensionality assumption of the Math Garden model. Items within a given domain could be grouped together based on information about the content of the items. On these sets of items the mean percentage correct per user could be calculated. If a negative

correlation would be found between some of these means, this would be evidence for multidimensionality. Thus, Math Garden could use this to explore if violation of the unidimensionality assumption is a problem in other domains as well.

As can be seen from this research the data collected by Math Garden is rich in information and can be used to infer underlying processes of the learning of mathematics by children. New methods need to be developed to make use of all the raw data that is available. We are only at the beginning of unlocking all the insights that can be found with the aid of Math Garden.

(23)

References

Adolph, K. E., Robinson, S. R., Young, J. W., & Gill-Alvarez, F. (2008). What is the shape of developmental change?. Psychological review, 115(3), 527.

Ambrose, R., Baek, J. M., & Carpenter, T. P. (2003). Children's invention of multidigit

multiplication and division algorithms. The development of arithmetic concepts and skills:

Constructive adaptive expertise, 305-336.

Brinkhuis, M. J. S., Savi, A. O., Coomans, F., Hofman, A. D., van der Maas, H. L. J., & Maris, G. (2015). Learning as it happens: Advances in computerized adaptive practice. Manuscript submitted for publication.

Elo, A. (1978). The rating of chessplayers, past and present. Georgetown, CT: Arco.

Gelman, A., Jakulin, A., Pittau, M. G., & Su, Y. S. (2008). A weakly informative default prior distribution for logistic and other regression models. The Annals of Applied Statistics, 1360-1383.

Hoyles, C., Wolf, A., Molyneux-Hodgson, S., & Kent, P. (2002). Mathematical skills in the workplace: final report to the Science Technology and Mathematics Council.

Klinkenberg, S., Straatemeier, M., & van der Maas, H. L. J. (2011). Computer adaptive practice of maths ability using a new item response model for on the fly ability and difficulty

estimation. Computers & Education, 57(2), 1813-1824.

Lemaire, P. (2010). Executive functions and strategic aspects of arithmetic performance: The case of adults' and children's arithmetic. Psychologica Belgica, 50(3-4).

Lemaire, P., & Siegler, R. S. (1995). Four aspects of strategic change: contributions to children's learning of multiplication. Journal of Experimental Psychology: General, 124(1), 83.

Meng, R., & Finnie, R. (2006). The importance of functional literacy: Reading and math skills and

labour market outcomes of high school drop-outs (No. 2006275e). Statistics Canada,

(24)

Reyna, V. F., & Brainerd, C. J. (2007). The importance of mathematics in health and human judgment: Numeracy, risk communication, and medical decision making. Learning and

Individual Differences, 17(2), 147-159.7

Romero, C., & Ventura, S. (2010). Educational data mining: a review of the state of the art.

Systems, Man, and Cybernetics, Part C: Applications and Reviews, IEEE Transactions on, 40(6), 601-618.

Siegler, R. S. (2006). Microgenetic analysis of learning. In D. Kuhn & R. S. Siegler (Eds.),

Handbook of child psychology: Vol. 2. Cognition, Perception, and Language (pp. 464 –

510). New York: Wiley.

Siegler, R. S., & Crowley, K. (1991). The microgenetic method: A direct means for studying cognitive development. American Psychologist, 46(6), 606.

van der Maas, H. L. J., Kan, K. J., Hofman, A., & Raijmakers, M. E. J. (2014). Dynamics of

development: A complex systems approach. Handbook of developmental systems theory and

methodology, 270-286.

van der Ven, S. H., Boom, J., Kroesbergen, E. H., & Leseman, P. P. (2012). Microgenetic patterns of children’s multiplication learning: Confirming the overlapping waves model by latent growth modeling. Journal of experimental child psychology, 113(1), 1-19.

van der Ven, S. H., Straatemeier, M., Jansen, B. R., Klinkenberg, S., & van der Maas, H. L. (2015). Learning multiplication: An integrated analysis of the multiplication ability of primary school children and the difficulty of single digit and multidigit multiplication problems.

(25)

Appendix A - Simulation

We performed a small simulation study in order to test the power of different logistic regression methods. Hereby we focused on time series that are similar to the time series that we have collected. As discussed earlier we have chosen to use Bayesian logistic regression instead of regular logistic regression, because regular logistic regression cannot handle complete separation. Furthermore, we considered the quasi-binomial model, because it is possible that the variation within each person-by-item time series is much greater or smaller than that the binomial model would suggest. The following logistic regression methods were used in the simulation:

• Bayesian logistic regression

• Bayesian logistic regression using smoothing

• Bayesian logistic regression using a quasi-binomial model

• Bayesian logistic regression using a quasi-binomial model and smoothing

Methods

Because we do not know what kind of developmental trajectories underlie the observed data in Math Garden we have chosen a simple approach using an idealized step-like function. We have created possible response patterns that consist of ones and zeros. These ones and zeros can have different interpretations, such as a correct response and an incorrect response, a fast response and a slow response or no question mark and a question mark. We have used three two groups of

responses. One group where there is a step-like change from zero to one and a group where all the responses are random ones and zeros. In the case of using the interpretation of correct and incorrect responses, this corresponds respectively to having a learning effect and having no effect. In the first group the place of the step-like change in the time series was varied systematically. Furthermore we have created additional groups of response patterns by adding noise to the first two groups. This was done by letting each response in the response patterns change from one to zero or from zero to one with probabilities 0.15 and 0.25. All these kind of groups were created for response patterns of

(26)

the following lengths: 5, 7, 10, 15, 20, 25 and 35.

To summarize, for each length there were four groups of response patterns: random, zero to one, zero to one with noise 0.15 and zero to one with noise 0.25. The last three groups were

grouped together in a group called the change group. A good method needs to find almost no effects in the random groups and in the change groups it does need to find effects. To this end the

percentage of significant results was collected for the change groups and for the random groups, for each method and using an alpha of 0.05.

Results

In Table A1 the results for the change group can be found and in Table A2 the results for the random group can be found. As can be seen the Quasi-binomial with Smoothing method has the highest probability of significant results for small lengths in the change group. For larger lengths the probabilities are comparable to the probabilities for the Bayesian with Smoothing method.

However, the Quasi-binomial with Smoothing method has also the highest probabilities in the random group. For the small lengths it is especially higher than all the other methods. Therefore we choose instead the Bayesian logistic regression with smoothing method as the best logistic

regression method. What should be noted is that it is not possible for this method to find significant effects in response patterns of length lower than 10.

After the data was collected for the main research, the correlations between all the logistic regression methods were calculated using all time series, see Figure A1. As can be seen the correlations are high, so using a different logistic regression method should not lead to big differences. Therefore it was safe to use only one of the logistic regression methods.

(27)

Table A1

Probability of a significant result in the change group for all logistic regression methods. Featured in the change group are the response patterns: zero to one, zero to one with noise 0.15 and zero to one with noise 0.25.

Length of

response patterns

Bayesian Bayesian with Smoothing Quasi-binomial Quasi-binomial with Smoothing 5 0.0 0.0 0.0 0.0 7 0.0 0.0 0.0 0.26 10 0.0 0.19 0.19 0.32 15 0.15 0.47 0.32 0.46 20 0.28 0.59 0.32 0.53 25 0.42 0.64 0.49 0.64 35 0.52 0.71 0.58 0.71 Table A2

Probability of a significant result in the random group for all logistic regression methods. In the random group the response patterns are randomly generated.

Length of

response patterns

Bayesian Bayesian with Smoothing Quasi-binomial Quasi-binomial with Smoothing 5 0.0 0.0 0.0 0.0 7 0.0 0.0 0.0 0.14 10 0.0 0.0 0.0 0.10 15 0.0 0.07 0.0 0.10 20 0.0 0.10 0.0 0.13 25 0.04 0.10 0.04 0.12 35 0.04 0.13 0.03 0.14

(28)

Figure A1. The correlation plot of the different logistic regression methods using all the time series

Referenties

GERELATEERDE DOCUMENTEN

This study focused specifically on developing an item pool to measure the various personality facets, sub clusters and clusters that the researchers identified in the

4 Importantly, however, social identity theory further suggests that perceived external threat to the team (such as observed abusive supervision) should only trigger

Note: The dotted lines indicate links that have been present for 9 years until 2007, suggesting the possibility of being active for 10 years consecutively, i.e.. The single

High value cage Releases processor.. 23 bunker for hazardous chemicals and explosive, the other warehouse is assembled with a high- value cage for sensitive-to-theft items.

The solutions have already been approved in many regional projects by the concerned NRAs, subscribed to by many NEMOs and (in the case of the DA) used to support operations. They

Moreover, the findings also verified earlier research by Dewett (2006; 2007) stating that willingness to take risks mediates the effect of encouragement on individuals’

MVT-3 (P), Personal dimension of PO fit in Matching Values Test, three items per underlying culture dimension; MVT (O), Organizational dimension of PO fit in Matching Values

werden de parochiale rechten van de kerk van Petegem aan de nieuwe abdij geschonken. Later werden de kanunniken vervangen door Benedictijnermonniken. In 1290 stichtte gravin Isabella,