Beyond Norm-based Learning Analytics

(1)

Beyond Norm-based

Learning Analytics

(2)

Layout: typeset by the author using LA_TEX.

Cover illustration: "https://www.freepik.com/free-photos-vectors/business", Business vector created by katemangostar - www.freepik.com

(3)

Beyond Norm-based

Learning Analytics

Julius Huizing 11898402 Bachelor thesis Credits: 18 EC

Bachelor Kunstmatige Intelligentie

University of Amsterdam Faculty of Science Science Park 904 1098 XH Amsterdam Supervisor dr. B. Bredeweg Informatics Institute Faculty of Science University of Amsterdam Science Park 904 1098 XH Amsterdam Jul 7th, 2020

(4)

Abstract

Open-ended learning environments (OELEs) can be used to teach 21-st century skills to an increasing number of people. However, there is a shortage of effective assessment techniques for evaluating learner’s learning processes in OELEs. De-veloping such techniques can provide teachers with actionable insights that allow them to teach these skills to learners more effectively. This thesis explores three new assessment techniques for evaluating learner’s performance in OELEs: (1) Us-ing mean-variance among student features to assess the learnUs-ing process of a group of students, (2) using z-scores of individual student features to identify outlier stu-dents, and (3) using k-means to automatically partition students into meaningful clusters in which learners show similar learning behavior. These new techniques are applied and evaluated in the context of the OELE DynaLearn, an intelligent learning environment dedicated to learning conceptual modeling. Results suggest that these assessment techniques provide explanatory information about the learn-ing process of learners. Providlearn-ing teachers with real-time information about these assessments during classroom sessions can provide them with actionable insights about the learners’ learning process.

(5)

Chapter 1 Introduction

There is an increasing need for assessment techniques for Open-Ended Learning Environments (OELEs) [1]. Researchers are unanimous in stating that we need to teach the so-called "21st-century skills", and advancement in educational software may allow us to teach and assess these skills for everyone [12]. OELEs are par-ticularly valuable for teaching these skills to a larger number of learners. Instead of directing learners along some predetermined path of learning objectives as in directed learning environments, OELEs strive to stimulate self-inquiry by enabling learners "to play" with concepts to improve their understanding of these concepts [7]. However, learners’ performance in OELEs is hard to assess using traditional assessment techniques: traditional assessment techniques assess learning products, where OELEs need assessment techniques that evaluate the learning process [1].

This thesis’s primary goal is to explore several new assessment techniques for assessing learners’ learning process in OELEs. More specifically, this thesis focuses on how the data generated through learner activity in the OELE DynaLearn can be used to assess learners’ learning process in this environment. Consequently, this thesis’s main contribution is in its findings on new useful assessment techniques for evaluating learners’ learning process in Dynalearn, but these findings generalize to other OELEs dedicated to learning conceptual modeling as well. By exploring new assessment techniques for OELEs, teachers may be equipped with better in-sights into the learners’ learning process, which may allow them to intervene when a learner remains stuck in a sub-optimal learning process. Providing teachers with such actionable insights could improve the effectiveness of using OELEs for teaching 21st-century skills, thereby making these skills available to an increasing number of people.

The set-up of this thesis is as follows. In Chapter 2, the theoretical background, including prior findings on assessing learners’ learning process in OELEs, are dis-cussed in more detail. Chapter 3 presents the OELE Dynalearn, the application

(8)

context of this thesis. Chapter 4 provides a methodological outline of how we used different representations for modeling learners in DynaLearn, and how we used these representations to explore three different new assessment techniques for evaluating learners’ learning process. Chapter 5 present the results of apply-ing these assessment techniques within the context of the OELE DynaLearn, and Chapter 6 summarizes these results. Finally, Chapter 7 concludes with evaluating these results in their technological and societal context.

(9)

Chapter 2 Theoretical Background

2.1 Learning Analytics & Educational Datamining

Because of the increasing use of educational software to support learning, increas-ing amounts of data is available about learners. [16]. This has given rise to two separate fields of research, Educational Datamining (EDM) and Learning Analyt-ics (LA), that try to use this data to improve educational practice. Although both fields show considerable overlap, the two fields differ in their dominant research goals [15]. In EDM research, educational data is often used to power adaptive systems, for instance to desing and develop intelligent tutoring systems [8]. In contrast, in LA research data is often used to inform and empower instructors and learners.

Because this thesis explores new techniques for assessing the leaning process of learners in OELEs as to provide teachers with actionable insights to improve this process, this thesis predominantly aims to add to the findings of LA research. However, some of these insights may also be used to automate parts of the actions instructors or learners would take. Therefore, the findings of this thesis may also be relevant for EDM.

2.2 Open-Ended Learning Environments

Open-ended learning environments (OELEs) are the counterpart of directed learn-ing environments [7]. Directed learnlearn-ing involves the systematic acquisition and retention of externally-defined knowledge and skills [7]. Directed learning en-vironments are therefore often scripted, meaning learners are guided through a pre-determined succession of steps to reach some pre-defined norm. Consequently learner assessments in these environments are often norm-based, meaning learner

(10)

performance measures are obtained my comparing a learners results (e.g. a math-ematical proof) by the pre-defined outcome (e.g. the mathmath-ematical proof as given in a textbook).

Instead of systematically acquiring well-defined knowledge, open-ended learn-ing is concerned with developlearn-ing cognitive skills such as identifylearn-ing and manipulat-ing variables, interpretmanipulat-ing data, hypothesizmanipulat-ing, and experimentmanipulat-ing [7]. To support such learner-centered inquiry, Open-Ended Learning Environments (OELEs) often utilize educational software tools that allows learners to "play" with concepts by manipulating the environment and receiving timely feedback about the effects of their actions [9, 4]. Because the learners themselves establish what is to be learned, how learning will occur, and what tools will be used to learn, OELEs are often not scripted and cannot use pre-defined norms to asses learner performance. Instead, in OELEs it is common to analyze and asses the learning process itself [14].

2.3 Learning Analytics in OELEs

The learning process of learner in OELEs is well-known for being challenging to measure and assess because current assessment techniques are based evaluating learning products instead of the learning processes [1]. Traditional learning envi-ronments often require the learner to recall specific information, enabling the use of pre-defined norms to to evaluate the process of different learners [3]. In this way, assessing learners can be done relatively cheaply by using the same norm-based models on different learners. In OELEs, however, the learning process itself needs to be assessed. The learning process is often operationalized by making the learners the central point of measurements, for instance by measuring their meta-cognitive skills that are important for completing open-ended learning tasks [10, 13]. However, such learner-centered processes often require fine-grained mea-surements to capture, and have in the past been difficult to measure in detail for large number of learners [1]. However, recent advances both data collection and machine learning could make it possible to understand learners’ trajectories in these environments [1].

Most of the recent work on learning analytics in OELEs involves collecting and analyzing multi-modal data [11]. In most research in this area, multi-modal data is used to asses the emotional state of learners, which can indirectly provide insights into the learning process of the learner [2]. For example, [6] created a system to automatically detect the facial expressions of learners and identify changes in emotional state while working on an open-ended learning tasks. However, multi-modal learning analytics tend to be intrusive and often require investments in extra sensors to make the analytics possible.

(11)

for most OELEs: log data as generated through user activity in the learning environment. Using this log data, the challenge is to find representations for learners that are not only discriminatory, but also explainable. A representation of learners that can be used to identity outlier learners might me valuable in its own right, but is much more valuable if the representation also provides explanatory information about what is divergent about the learner’s learning process. Is the learner an outlier because the learner is progressing much slower than his peers, or perhaps because the learner is progressing exceptionally quickly? And if the learner is progressing much slower, can the underlying deficiency be derived from the representation as to allow the teacher to effectively intervene?

(12)

Chapter 3 Application Context

3.1 OELE DynaLearn

DynaLearn is an interactive learning environment (ILE) for learning conceptual knowledge [5]. DynaLearn is designed for learners to express and develop their their understanding of how a system works. To this end, learners use a graphical interface to create a qualitative representation of the system. For instance, to express an understanding of how the temperature of a substance in a system is affected by the energy present in the substance, a learner might create a model consisting of an entity called ’substance’, two quantities belonging to that entity called ’energy’ and ’temperature’, and a positive influence relationship pointing from the first quantity to the second to indicate that the energy of a substance positively influences its temperature (Figure 3.1). By creating more elements and relationships, the learner can create representations of increasingly complex sys-tems. And by running simulations, a learner can test how changes to the system affects working of a system.

DynaLearn belongs to the category of OELEs because norm-based assessments are not fit for assessing a learners learning process in the environment and because the learning process is unscripted. For instance, to check whether his current belief that energy positively affects temperature is correct, a learner might rightfully flip the relationship to check whether this leads to erroneously results in simulation. In a directed-environment that uses a pre-defined norm to asses the quality of the model, this would probably be considered as a regression in the learning process of the learner1. In DynaLearn, however, such behavior could actually be an indicator of an improvement in the learning process.

1_{Possibly, such an action would not even be allowed in a scripted learning environment.}

(13)

Figure 3.1: A DynaLearn model representing a learners possible understanding of how the energy in a substance affects its temperature.

3.2 Datasets

The datasets analyzed consist solely of log data as generated though learner ac-tivity in DynaLearn. Through every action a learner performed in DynaLearn, a datapoint is generated that summarizes the corresponding action. Each datapoint indicates the corresponding action, the learner who performed that action, and the moment at which the action occurred. We used the data from two datasets that logged the activity of learners as they were working on the same assignments2.

2_{Although learners had to create the same models, the learning environment was still}

(14)

Chapter 4 Design & Implementation

4.1 Reconstructing Learner Created Models

To analyze a learner’s learning process in DynaLearn, a real-time representation of the model, a learner is working on needs to be maintained. However, constructing and maintaining a real-time representation of learner’s models using only log data is challenging. In DynaLearn, when a user deletes an element or relationship, only the relevant information about that element or relationship gets logged, but no in-formation gets logged about elements or relationships whose lifecycles depend on the deleted element. For example, when an entity is deleted that had two configu-rations specified with it, the log records only show the delete action as performed on the targeted entity, while the user would see these two configurations disappear as well, including any relationship between them. To maintain representations of the models’ learners are working on in DynaLearn, the challenge is to design and implement a representation of the model that could keep track of the dependencies between different elements and relationships in the models from only the log files. The models’ learners make in DynaLearn form a combination of different graph structures. At first sight, these models seem to form a tree data structure. For instance, when a user creates an entity or agent, he essentially creates the root node of a hierarchical tree structure. All other elements and relationships that will be created can be seen as child nodes or edges between these nodes, respectively. It follows that when a parent node is deleted, all child nodes disappear as well; however, deleting a child node will not affect the lifecycle of the parent node (see Figure 4.5).

(15)

Figure 4.1: Subsets of the graphs users make in Dy-naLearn can be considered hierarchical trees, where an entity (or agent) forms the root node and all other el-ements and relationships can be considered child nodes and edges respectively.

However, keeping only a tree data structure is not sufficient to maintain a representation of a learner’s model. First of all, within one model, a learner can create multiple entities and choose to link them. The resulting graph does not contain a single parent node anymore, and therefore already lacks the requirements to be considered a tree data structure (see Figure 4.2). Furthermore, users can also create multiple direct edges pointing to the same node, violating another of the tree data structure’s constraints.

Figure 4.2: Not all the models users can make in Dy-naLearn can be considered hierarchical trees, since there can be no apparent root node and nodes can have multiple incoming edges.

(16)

struc-tures cannot. A multigraph is a graph that is permitted to have multiple edges (also called parallel edges), that is, edges with the same end nodes. Thus two vertices may be connected by more than one edge. Therefore, representing mod-els as directed multigraph solves the problems mentioned above because now the graph does not need to have a parent node, and nodes may have multiple incom-ing edges. However, there is no hierarchical structure in directed multigraphs: if an entity node a directed multigraph were to be deleted, its apparent child nodes (e.g., quantities) would erroneously continue to exist in the graph.

Ultimately, representations of the models were implemented as bidirectional graphs in which subsets of the nodes and edges can form hierarchical tree struc-tures. Every new creation, whether it is an element or a relationship that is created, is interpreted as initializing a new node or edge in a directed multigraph. Further-more, every entity or agent created is interpreted as creating a new hierarchical tree with the corresponding entity or agent as its root node. Every other element that is created is interpreted as a new child node in one of the existing tree struc-tures. When an element is deleted, it is deleted from both the tree and graph data structure. When an element is deleted that had a child node, the child nodes and corresponding edges will automatically be deleted from the tree structure. In this way, the collection of tree structures can be used to keep the directed multigraph up-to-date: by obtaining the set of disjoint nodes and edges between the graph and the collection of trees, the disjoint nodes and edges can also be deleted from the graph.

The algorithm for updating a learner created model is shown in Algorithm 1. Every time a data point is processed, it is first determined whether the target type should be represented as a node or an edge in the directed multigraph. An overview of which target types are interpreted as nodes and which as edges is shown Table 4.1. If the target type is specifically either an ’entity’ or an ’agent’, a new tree data structure is initialized with that entity or agent as the root node. For any other node, the corresponding tree is retrieved, and the node is added to as a child to the proper parent node in the tree.

Algorithm 1 Algorithm for updating a learner created model.

1: _{procedure updateModel(cton, trget, trgettype)} 2: Initialize Node with proper target and targettype

3: if targettype is Entity or if targettype is Agent then

4: processRootNode(Node, action)

5: else

6: processNode(Node, action)

7: if action is delete then

(17)

Table 4.1: Nodes and edges in leaner created models. Nodes Edges agent configuration assumption correspondence_qs_directed attribute correspondence_qs_directed_reverse derivative_value correspondence_qs_normal entity correspondence_qs_normal q_exo_decr correspondence_qs_reverse q_exo_incr correspondence_qv_directed q_exo_para_neg correspondence_qv_normal q_exo_para_pos influence_negative q_exo_random influence_positive q_exo_sin proportionality_negative q_exo_steady proportionality_positive quantity q_calc_min quantity_allvalues q_calc_plus quantity_space q_ineq_eq quantity_space_interval q_ineq_gt quantity_space_point q_ineq_gte quantity_value q_ineq_lt

4.2 Representing Learners

To allow for effective algorithms for analyzing learners, learners are represented as n-dimensional feature vectors as shown in Equation 4.1.

~s = ƒ1 ƒ₂ _{· · · ƒ}_n _(4.1)

We tried three different sets of features for representing learners. Below, each representation is outlined in detail.

4.2.1 Representing Learners from Element-usage

Frequen-cies

Learners can be represented by how frequently they use each element in their model. This representation can provide valuable insights. Every feature simply indicates how frequently the learner is currently using a specific element. Although these numbers do not explain much by themselves in OELEs, when comparisons are made between learners, the differences between these features give information

(18)

about which learners use which elements less or more than their peers. Such comparisons allow for valuable insights such as: "learner X uses element a twice as often as learner Y."

For DynaLearn, we represented individual learners as the feature vector shown in Equation 4.2.

~s = #enttes #conƒ grtons · · · #nƒ ences (4.2)

4.2.2 Representing Learner’s Efficiency Scalar

Learners can also be represented as a scalar by computing the ratio between their currently active elements and the total number of elements they have created. This representation might provide insight into the efficiency of a learner. Two learners can end up with two identical models, but one of them might have performed more actions to get to the final results. This might indicate that the latter was less efficient during the assignment, and allows for meaningful insights such as: "learner B seems to be less efficient than learner A." Note that inefficiency is not necessarily an indicator of a sub-optimal learning process: inefficiency might also indicate that a learner was exploring the learning environment, which can be very valuable from a learning perspective.

For DynaLearn, we represented learners as the feature vector shown in Equation 4.3. ~s =_{#cte eements} #crete ctons (4.3)

4.3 Assessment Techniques

keeping a real-time representation of a group of learners working on the same assignment makes it possible to assess the group’s learning process as a whole, to identify learners whose behavior deviates from that of their peers, and to cluster learners into groups whose members behave similarly. A group of k learners where each learner is represented by an n-dimensional feature vector is modeled as the k_{× n matrix shown in Equation 4.4.}

G=     ƒ₁₁ ƒ₁₂ ƒ₁₃ . . . ƒ_1n ƒ₂₁ ƒ₂₂ ƒ₂₃ . . . ƒ_2n .. . ... ... . .. ... ƒ_k1 ƒ_d2 ƒ_d3 . . . ƒ_kn     (4.4)

For each of the three representations of learners as outlined in section 4.2, we tried three different techniques for assessing the learning process of a group of

(19)

learners working on the same assignment: We looked at the mean-variance in the feature values among learners to assess the group process. We looked at the z-scores of individual learners to identify learners whose behavior deviated from their peers. We used the K-means algorithm to assess whether learners can be classified into distinct clusters whose members show similar behavior in the OELE. Below, these three techniques are outlined in detail.

4.3.1 Mean Feature Variance & Group Process

Looking at the mean variance in feature values among learners working on the same assignment can give teachers insights about the learning process of the group as a whole. The variance in feature values among learners can be computed by computing the variance of the matrix along its vertical axis for each feature ƒj as

shown in Equation 4.5. σ2 ƒj = 1 k k X  [(ƒj− μƒj)2] (4.5)

Where μƒj is computed using Equation 4.6.

μ_ƒ j = 1 k k X  ƒ_j _(4.6)

Applying equation 4.5 to every column of the matrix representing the learners yields the 1 × n matrix shown in Equation 4.7.

~ σ2 ƒ = σ2 ƒ₁ σ2ƒ₂ σ2ƒ₃ . . . σƒ2n (4.7) And taking the mean of 4.7 yields the scalar representing the mean variance in feature values among learners, as shown in Equation 4.8.

μ σ2 ƒ = 1 n n X  σ2 ƒ (4.8)

Below, we reflect upon the different insights that can be derived from the group’s learning process by analyzing the mean-variance in real-time.

Mean Variance in Element-Usage among Learners

Assuming learners in a classroom session are working on the same assignment, all learners should start and (roughly) end up with the same frequencies of elements and relationships. Namely, every learner starts with an empty model and tries to

(20)

progress towards the same end product as his peers. Consequently, measuring the variance in element-usage frequencies among learners during a classroom session, the variance is expected to increase as the session begins and expected to decrease as the session comes to an end when the learners approach the same final models (Figure 4.3).

Figure 4.3: Expected variance in element-usage frequen-cies among learners during a classroom session.

Providing teachers with information about the variance in element-usage fre-quencies among learners could provide them with various valuable insights. First of all, at the beginning of a session, a teacher would expect to see the variance increase. If this is not the case, this indicates that learners are either not working (keeping empty models) or all taking the same steps (perhaps indicating that the assignment is too easy). Secondly, near the end of a session, the teacher is expected to see variance decrease. If this not the case, this might indicate that the learners have not yet finished the assignment or that maybe learners have found different solutions for the assignment.

Mean Variance in Efficiency among Learners

As a classroom session begins, variance in efficiency among learners is expected to increase rapidly at first. There are likely significant differences between the modeling strategies of different learners, where some learners are likely to have a try-and-error approach (probably resulting in relatively low efficiency). Other learners are likely to behave much more tentatively (probably resulting in relatively high efficiency). Even as the learners attain focus and get more comfortable in the

(21)

learning environment, we expect variance to remain constant since both groups become more efficient.

Providing teachers with information about the variance in efficiency among learners could provide them with various valuable insights. In general, (sudden) drops in variance might indicate which parts of the assignment are relatively easy to complete, and (sudden) spikes in variance might indicate which parts of the assignment are relatively hard to complete.

4.3.2 Individual Z-Scores & Outlier Learners

Having computed the mean and standard deviation of every element’s usage, the z-score1 of each feature value for each learner can also be computed. The z-score of a feature value is the number of standard deviations by which that feature value is above or below the mean value of that feature. Z-scores can, therefore, be used to detect individual learners whose learning behavior deviates significantly from their peers. The z-score is computed using Equation 4.9.

z_ƒ j = ƒ_j_{− μ}_ƒ j σ_ƒ j (4.9) Applying this to a learner row will yield the z-scores of each feature value for the learner as shown in Equation 4.10.

~

z_{er ner} = zƒ₁ zƒ₂ · · · zƒn

(4.10) Z-scores of Element-Usage among learners

Providing teachers with real-time information on the z-scores of learners’ element-usage frequencies might enable teachers to identify learners whose progress on the assignment is divergent and intervene when necessary. Z-scores are positive when they are above the mean and negative when they are below the mean. Neg-ative z-scores might indicate that the learner progresses slower than his peers, while positive z-scores might indicate that the learner progresses more quickly.

2 _{Furthermore, the z-score can be used to detect outlier learners automatically.}

Observations that are three or more standard deviations away from the mean are so rare that they are usually considered to be outliers. Algorithm 2 would provide the teacher with learners that require further inspection.

1_{Also know as the Standard score.}

2_{Teachers should be care full to check whether this is the case, for using certain elements}

(22)

Algorithm 2 Algorithm for detecting outlier learners.

1: _{procedure getOutlierlearners(GropMtrt)} 2: outlierlearners = []

3: for learner in GroupMatrix do

4: if Zscore of learner is equal or greater than 3 then

5: add learner to outlierlearners

6: return outlierlearners

Z-scores of efficiency among Learners

Providing teachers with real-time information on the z-scores of learners’ efficiency might enable teachers to identify learners whose modeling strategy on the assign-ment is divergent and intervene when necessary. Like eleassign-ment-usage frequencies z-scores, negative efficiency z-scores might indicate that the learner has a much less efficiency modeling strategy than his peers. In contrast, positive efficiency z-scores might indicate that the learner employs a remarkably efficient modeling strategy. Again, algorithm 2 can be used to automatically provide the teacher with the learner names that need further inspection.

4.3.3 K-Means & Meaningful Clusters of Learners

learners can automatically be classified into distinct clusters using the K-means algorithm. Given n learner feature vectors, the k-means algorithm aims to partition those learners into k(≤ n) clusters to minimize the sum of within-cluster variance (measured using the squared euclidean distance). By using the elbow method, we can use K-means to partition a group of learners into the optimal cluster configuration automatically. We automated the process of finding the optimal number of clusters by iteratively performing k-means for k = 1...10. For each value of k, the inertia (summed within-cluster variance) of the resulting cluster configuration was computed, and these inertia values were paired and ordered with the corresponding and increasing values for k. To find the elbow point (i.e., the point at which adding another cluster does not give much better modeling of the data), we approximated the second derivative of each inertia value using equation 4.11. The optimal value for k was found by choosing the k -value with the maximum absolute second derivative value for its corresponding inertia.

(23)

Clusters by Element-usage frequencies among Learners

The K-means algorithm is expected to partition the group of learners into an un-known number of clusters where the learners in each cluster have similar element-usage frequencies, and where the element-element-usage frequencies of learners from dif-ferent clusters are expected to differ from each other significantly. K-means is expected to start with finding just one cluster as all learners start with the same (empty) model. However, as the classroom session progresses, k-means is expected to find an increasing number of distinct clusters as learners take different routes toward the same result. Near the end of a classroom session, k-means is expected to find just one cluster again, as learners have reached the same final model.

Providing teachers with real-time information on the clusters found in the group could lead to various actionable insights. First of all, when no apparent clusters emerge during a classroom session, this might indicate again that learners are not working, or that everybody is following the same route (which might indicate that the assignment is too easy). On further inspection as to what is the case, the teacher could intervene with the necessary measures. Moreover, if distinct clusters emerge but fail to assemble again within a reasonable time, this might indicate that learners are taking different approaches to solve the same problem. If this becomes troublesome, the teacher could automatically assign learners from different clusters to work together and learn from each other’s progress.

Clusters by efficiency among Learners

Similar to k-means applied to the element-usage frequencies representation, the K-means algorithm is expected to partition the group of learners into an unknown number of clusters, where the learners in each cluster have similar modeling strate-gies, and where the modeling strategies of learners from different clusters are ex-pected to differ from each other significantly. K-means is exex-pected to start with finding just one cluster as all learners start with the same (empty) model. How-ever, as the classroom session progresses, k-means is expected to find an increasing number of distinct clusters as learners employ different modeling strategies. Near the end of a classroom session, k-means is expected to find multiple clusters, as there remain differences in modeling strategies and, therefore, in efficiency.

Applying algorithm x to this representation of learners would provide teachers with learners whose modeling strategies are likely to differ significantly. Once again, the teacher could automatically assign learners from different clusters to work together and, this time, learn from each other’s modeling strategies.

(24)

4.4 Simulating Classroom Sessions

To test the assessment techniques outlined in the previous section, we simulated two classroom sessions were learners were supposed to work on the same assign-ment, and one long session simulating all learner activity over a longer time where learners were working on the multiple similar assignments asynchronously. The two classroom sessions were derived from the first dataset by splitting the dataset on the dates 2020/02/10 and 2020/02/11. This is possible because, in this dataset, there is a clear distinction between two classes working in DynaLearn during two subsequent days (see Figure 4.4). The third and longer session was obtained by simply taking all learner activity as logged in the second dataset. This was done because the learners logged in this dataset were working asynchronously, and we, therefore, could not find meaningful clusters of learner activity to split the dataset on (see Figure 4.6).

The sessions were simulated by taking the corresponding data points in chrono-logical order and using these data points to construct and update the learner mod-els as outlined in sections 4.2 and 4.2. After processing each 1/50th of the data in a session, the assessment techniques outlined in section 4.3 were performed on the group representation at that time. The results are shown in the next section.

Figure 4.4: learner activity over time in dataset 1. learner activity is clearly spreaded over two distinct days and al-lows for easy clustering into separate classroom sessions.

(25)

Figure 4.5: Datapoints from dataset 1 splitted into two distinct classroom sessions. In each session, the learners are supposed to be working on the same assignment

Figure 4.6: learner activity over time in dataset 2. learner activity is not clearly spreaded over distinct days.

(26)

Chapter 5 Results & Evaluation

5.1 Group Process

5.1.1 Using Element-usage Frequencies

Results of assessing the group process through mean-variance in element-usage fre-quencies are shown in Figure 5.1. Mean-variance is shown for both including outlier learners as well as excluding them. During all three sessions, the mean-variance starts at zero and increases rapidly as the classroom session starts. During session one, mean-variance fluctuates between a range of 0 and 1.8 standard deviations. However, during sessions two and three, mean-variance reaches a peak of 2.4 and 4.7 standard deviations, respectively. Both session one and session three show a decrease in variance as the session comes to an end if outliers are ignored. How-ever, in session two, variance increases near the end. During none of the sessions variance completely diminished near the end.

As expected, all sessions start with zero variance because all learners begin with the same empty model at the start of each session. The rapid increase in variance at the beginning of each session can be caused by learners pursuing different paths towards the same end model, or by some learners working more efficiently than others, as both result in increased variance in element-usage frequencies. The lat-ter can possibly also explain the high peaks of variance during session two and three: For both sessions, these peaks occur near the beginning of the session; if some learners start inefficiently (i.e., keep predominantly empty models) while others progress quickly, mean-variance can reach very high values as zero-valued frequencies occupy the distribution together with high-valued frequencies. The latter could also explain the relatively high mean-variance during session three, as learners worked asynchronously on assignments during that session during shorter intervals. Near the end of sessions two and three, variance decreases, supposedly

(27)

because learners are reaching the same final models. However, contrary to what was expected, variance does not diminish entirely at the end of the sessions. Pos-sibly, learners had not yet finished the assignment or had found different solutions. Especially The rise in variance near the end of session two might indicate that the learners were not yet finished when the classroom session ended.

(28)

Figure 5.1: Mean variance in learner’s element-usage frequencies during class room sessions.

(29)

5.1.2 Using Efficiency

Results of assessing the group process through efficiency in element-usage frequen-cies are shown in Figure 5.2. No outliers were detected during session one and two. During all three sessions, the mean-variance starts at zero and increases as the classroom session starts. For session one, mean-variance hits a peak of 0.4 standard deviations; session two of 0.3 standard deviations; and session three of 0.2 standard deviations. Session one shows a sharp drop and peak in variance during the first quarter of the session, while session one shows a sharp drop in variance after the first quarter. Session three shows repeating peaks and drops of variance during the entire session, with almost zero variance in between peaks. Both session one and show a trend of a rapid increase in variance near the begin-ning of the session, a slight decrease during the session, and a slight increase near the end. All of the sessions end with prevailing variance among learners’ efficiency. As expected, during all three sessions, mean-variance in efficiency starts at zero because all learners start with an efficiency-scalar of one as learners create their first element (Equation 4.3). The peaks in mean-variance might indicate that some learners found that part of the assignment more difficult than others, or that some learners started to explore the learning environment while others continued to work on the assignment. The drops in mean-variance might indicate that those parts of the assignment were relatively easy for all learners or that some learners stopped exploring the learning environment and went on to complete the assignment. The latter may also explain why both the drops in variance as well ass the general trend of decreasing variance happen during the first half’s of session one and two: at the beginning of a session, some learners might have been more exploratory and playful to get more familiar with the OELE, leading to increasing variance at the beginning. However, these learner’s efficiency could have increased as they reached a certain threshold of familiarity with the environment or as they started to feel the time pressure of having to complete the assignment, leading to a decrease in mean-variance. The slight increase in variance near the end of session one and two might indicate that they made more mistakes as they rushed to complete the assignment, or that some learners started to explore the environment as they had completed the assignment before time. As expected, variance among learners remains near the end of the sessions as some learners have reached their current (or possibly final) model with degrees of efficiency. The peaks and drops in variance during session three may be due to the asynchronous nature with which learners worked on the assignments, leading to sharp increases in variance as some learners started working on assignments earlier than others, and sharp decreases in variance as these other learners caught-up.

(30)

(31)

5.2 Outliers

5.2.1 In Element-usage Frequencies

Results of detecting outliers using element-usage frequencies are shown in Figure 5.3. Learners are denoted positive outliers if one of the z-scores values of their element-usage frequencies was above three and were denoted negative outliers if the z-score value was below minus three. Positive outliers are detected in all sessions, negative outliers only in session three. During session one, relatively many outliers are detected in the beginning of the sessions, relatively few during the middle of the assignment, and relatively many (but less than at the beginning) near the end of the session. At max, three outliers are detected at one point in time during this session. During session two, there seems to be an upward trend of the number of outliers that are detected, with very little outliers detected at the beginning of the session, and up to four outliers detected near the end. Session three shows a similar trend as session two, but this session also shows negative outliers and reaches a peak of detecting five outlier learners at one point in time. The appearance of negative outliers in session three may be due to the more complex models learners had to make during this session. These complex models require more elements. Therefore, the mean element-usage frequencies during this session may have reached a much higher value than the mean element-usage-frequencies of the learners as captured in session one and two. Consequently, this could cause the z-scores of learners who lacked behind to drop below the minus three threshold value to be considered a negative outlier. In contrast, during sessions one and two, the mean values were probably not high enough to allow for -3 individual z-score values. Outliers at the beginning of a session might indicate learners that had a slow start, while outliers near the end of a session are more like to indicate learners that have gone astray (possibly by misunderstanding the assignment or environment). The disappearance of outliers in the middle of session two might be due to the teacher intervening with the outlier learners during the break.

(32)

Figure 5.3: Outliers detected in learner’s element-usage frequencies during class room sessions.

(33)

5.2.2 In Efficiency

Results of detecting outliers using learner efficiency are shown in Figure 5.4. No outliers were detected during sessions one and two, and during session three a maximum of one outlier was found at different points in the session. These outliers were only found during the second half of the session.

The lack of outliers in sessions two and three might be because learners did not vary much in their efficiency during these sessions. This presumption is backed up by the relatively stable mean-variance of learners’ efficiencies. Compared to the mean-variance in element-usage frequencies of learners, the mean-variance in efficiency fluctuates much less and within a much smaller range (Figure 5.2 and Figure 5.1). However, another reason might be that the chosen threshold values for the z -score ( _{−3 ≤ z ≥ 3) are too high to detect outliers in efficiency.}

(34)

(35)

5.3 Clusters

5.3.1 In Element-usage Frequencies

Results of cluster formation using element-usage-frequencies are shown in Figure 5.5. At the start of all sessions, the best found cluster configuration consists of one cluster with zero inertia (within-cluster summed variance). We see a general trend of increasing inertia and occasional drops in inertia during all sessions when a better cluster configuration is found consisting of more distinct clusters. During both sessions one and three, there are points at which a cluster configuration with k = 4 suddenly provides a much better clustering of learners than previous configurations with lower values for k. In session two, the same happens, but then for k = 3. Session one ends with an optimal cluster configuration for k = 3, session two with for k= 1, and session three for k = 2. However, even for session two, where the last optimal configuration consists of one cluster, there remains relatively much inertia within that configuration.

As expected, during all sessions, the optimal cluster configuration at the begin-ning is to cluster learners into one group as they all start with the same (empty) models, resulting in zero within-cluster summed variance. In line with expecta-tions, the inertia of the optimal cluster configuration increases during the sessions, supposedly because learners take different paths towards the same final model. The sudden drops in inertia as a better cluster configuration is found with more clusters were not expected, but are not surprisingly either: learners are likely to move into distinct clusters of similar behavior, but such clusters emerge step-wise over time, and not all learners will enter those clusters at the same time. Only once enough learners cross the boundary of some behavioral cluster, a higher value for k will suddenly cluster the learners much better. Contrary to the expectations, not all sessions end with k = 1 for their optimal cluster configuration. This might indicate that not all learners had yet finished the assignment at that point, or that learners have found different solutions for the assignment. Although we expected to see sessions end with a single cluster, the inertia within the last (and single) cluster found for session two might indicate that learners during that session ended up with much more varying models than during other sessions: their models might be so different that no distinct clusters were found.

(36)

Figure 5.5: Optimal cluster configurations found in learners’ element-usage fre-quencies during class room sessions. The numbers within the bars show the best number of clusters found using the elbow-method. The x-axis shows the within-cluster summed variance within the best within-cluster configuration.

(37)

5.3.2 In Efficiency

Results of cluster formation using learner’s efficiency scalars are shown in Figure 5.6. At the start of all sessions, best found cluster configuration once again consists of only one cluster with zero inertia (within-cluster summed variance). Also, for all sessions, the optimal number of clusters found increases quickly to two and remains two up until the end of the session. Both sessions one and two show iterative increments and decrements in inertia, while session three shows a trend of increasing inertia during the entire session.

As expected, all sessions start with an optimal cluster configuration for k= 1 as all learners start with the same efficiency scalar. In line with the expecta-tions, the value for k increases for the optimal cluster configuration as the sessions progress and some learners more efficiently work on the assignment than others. Surprisingly, the value for k remains two during the progression of all sessions. This might indicate that k-means found a cluster of efficient learners and a cluster of inefficient learners. The iterative increments and decrements in inertia during ses-sions one and two might indicate parts of the assignment that some learners found relatively hard or easy, leading to an increase or decrease in within-cluster vari-ance, respectively. The increase in inertia during the second half of session three might indicate that learners started working on assignments that some learners found more difficult than others, leading to an increase in within-cluster variance.

(38)

Figure 5.6: Optimal cluster configurations found in learners’ efficiency during class room sessions. The numbers within the bars show the best number of clusters found using the elbow-method. The x-axis shows the within-cluster summed variance within the best cluster configuration.

(39)

Chapter 6 Discussion

The results of applying three new assessment techniques on different learner repre-sentations as derived from learner activity in the open-ended learning environment DynaLearn suggest that these assessment techniques can provide explanatory in-formation about the learning process of learners when these learners are working on the same assignment. The main finding with respect of using the mean-variance to assess the learning process of a group of learners is that this technique can be used to assess whether all learners in the group have reached the same result when their element-usage frequencies represent the learners and that it can be possibly used to assess whether there are parts of the assignment that are relatively hard for some learners when an efficiency-scalar represents learners. There are several im-portant findings with respect to using learner’s z-scores to identify learners whose behavior deviates significantly from that of their peers. First of all, it is found that this technique can be used to detect positive (z-score values above three) out-liers learners when learners are represented by their element-usage frequencies, but that learner created models need to have a certain level of complexity (i.e., need to consist of a certain number of elements) before negative outliers (learners with z-score values below minus three) can be detected. Secondly, it was found that using this technique on learners represented by their efficiency-scalar does not re-sult in valuable outlier detection, probably because the variance between learner’s efficiency is not significant enough to designate outlier learners. Finally, there are two main findings with respect to using the k-means algorithm together with an automated elbow-method procedure for finding the best cluster configuration of learners. First of all, using this technique on learners represented by their element-usage frequencies leads to distinct clusters of learners, but whether the learners in a cluster show similar behavior needs to be assessed manually. Secondly, for all simulated classroom sessions, using this technique on learners represented by their efficiency-scalar led to two distinct clusters that persist over time. This

(40)

dicates that learners work in two distinct clusters of efficiency, where supposedly one group of learners work significantly more efficiently than the other group.

The results of applying these assessment techniques on learner data from Dy-naLearn suggest that they can provide teachers with actionable insights into their learners’ learning process. However, this thesis’s goal was to explore possible in-sights that can be derived from these techniques, rather than verify their effective-ness. To verify that teachers can derive actionable insights from these assesement techniques, future work could address provide teachers with real-time information about the outcomes of these techniques while they are coordinating a classroom session in Dynalearn.

(41)

Chapter 7 Conclusion

In this thesis, three new assessment techniques were explored to evaluate learn-ers’ learning process in open-ended learning environments. Assuming learners are working on the same assignment, results suggest that mean-variance among learners can provide actionable insights about the progress on an assignment of a group of learners, that the z-scores of learners can inform teachers about learners whose behavior deviates significantly from their peers, and that automatic cluster-ing of learners uscluster-ing the k-means algorithm can partition a group of learners into meaningful clusters in which learners show similar learning behavior. The current findings are exploratory, and their effectiveness in providing teachers with action-able insights needs to be verified in future research. However, by providing a sole basis for the development of new assessment techniques for open-ended learning environments, this thesis adds to the efforts of making opended learning en-vironments feasible enen-vironments for learning 21st-century skills to an increasing number of people.

(42)

Bibliography

[1] Paulo Blikstein. Using learning analytics to assess students’ behavior in open-ended programming tasks. In Proceedings of the 1st international conference on learning analytics and knowledge, pages 110–116, 2011.

[2] Paulo Blikstein and Marcelo Worsley. Multimodal learning analytics and education data mining: Using computational technologies to measure complex learning tasks. Journal of Learning Analytics, 3(2):220–238, 2016.

[3] Linda A Bond. Norm-and criterion-referenced testing. Practical Assessment, Research, and Evaluation, 5(1):2, 1996.

[4] Crescencio Bravo, Wouter R Van Joolingen, and Ton De Jong. Modeling and simulation in inquiry learning: Checking solutions and giving intelligent advice. Simulation, 82(11):769–784, 2006.

[5] Bert Bredeweg, Jochem Liem, Wouter Beek, Floris Linnebank, Jorge Gracia, Esther Lozano, Michael Wißner, René Bühling, Paulo Salles, Richard Noble, et al. Dynalearn–an intelligent learning environment for learning conceptual knowledge. AI Magazine, 34(4):46–65, 2013.

[6] Sidney K D’mello, Scotty D Craig, Amy Witherspoon, Bethany Mcdaniel, and Arthur Graesser. Automatic detection of learner’s affect from conversational cues. User modeling and user-adapted interaction, 18(1-2):45–80, 2008. [7] Michael J Hannafin, Craig Hall, Susan Land, and Janette Hill. Learning in

open-ended environments: Assumptions, methods, and implications. Educa-tional Technology, 34(8):48–55, 1994.

[8] Kenneth R Koedinger, Albert Corbett, et al. Cognitive tutors: Technology bringing learning sciences to the classroom. na, 2006.

[9] Susanne P Lajoie and Roger Azevedo. Teaching and learning in technology-rich environments. 2006.

(43)

[10] Susan M Land. Cognitive requirements for learning with open-ended learning environments. Educational Technology Research and Development, 48(3):61– 78, 2000.

[11] Roxana Moreno and Richard Mayer. Interactive multimodal learning envi-ronments. Educational psychology review, 19(3):309–326, 2007.

[12] Andrew J Rotherham and Daniel T Willingham. 21st-century” skills. Amer-ican Educator, 17(1):17–20, 2010.

[13] James R. Segedy, Kirk M. Loretz, and Gautam Biswas. Model-driven as-sessment of learners in open-ended learning environments. In Proceedings of the Third International Conference on Learning Analytics and Knowledge, LAK ’13, page 200–204, New York, NY, USA, 2013. Association for Comput-ing Machinery. ISBN 9781450317856. doi: 10.1145/2460296.2460336. URL https://doi.org/10.1145/2460296.2460336.

[14] James R Segedy, John S Kinnebrew, and Gautam Biswas. Using coher-ence analysis to characterize self-regulated learning behaviours in open-ended learning environments. Journal of Learning Analytics, 2(1):13–48, 2015. [15] George Siemens and Ryan SJ d Baker. Learning analytics and educational

data mining: towards communication and collaboration. In Proceedings of the 2nd international conference on learning analytics and knowledge, pages 252–254, 2012.

[16] George Siemens and Phil Long. Penetrating the fog: Analytics in learning and education. EDUCAUSE review, 46(5):30, 2011.

Beyond Norm-based Learning Analytics

Beyond Norm-based

Learning Analytics

Beyond Norm-based

Learning Analytics

Contents

Chapter 1

Introduction

Chapter 2

Theoretical Background

2.1

Learning Analytics & Educational Datamining

2.2

Open-Ended Learning Environments

2.3

Learning Analytics in OELEs

Chapter 3

Application Context

3.1

OELE DynaLearn

3.2

Datasets

Chapter 4

Design & Implementation

4.1

Reconstructing Learner Created Models

4.2

Representing Learners

4.2.1

Representing Learners from Element-usage

Frequen-cies

4.2.2

Representing Learner’s Efficiency Scalar

4.3

Assessment Techniques

4.3.1

Mean Feature Variance & Group Process

4.3.2

Individual Z-Scores & Outlier Learners

4.3.3

K-Means & Meaningful Clusters of Learners

4.4

Simulating Classroom Sessions

Chapter 5

Results & Evaluation

5.1

Group Process

5.1.1

Using Element-usage Frequencies

5.1.2

Using Efficiency

5.2

Outliers

5.2.1

In Element-usage Frequencies

5.2.2

In Efficiency

5.3

Clusters

5.3.1

In Element-usage Frequencies

5.3.2

In Efficiency

Chapter 6

Discussion

Chapter 7

Conclusion

Bibliography