• No results found

Learning analytics at higher education, finding effective predictors

N/A
N/A
Protected

Academic year: 2021

Share "Learning analytics at higher education, finding effective predictors"

Copied!
60
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

1

Thesis

Learning analytics at higher

education, finding effective predictors

Gert-Jan de Graaf 10808558

 

University of Amsterdam Faculty of Science

MSc Information Studies: Business Information Systems Address: Science Park 402, 1098 XH Amsterdam

Final version: July 6, 2016

(2)

2

Abstract

Learning Analytics in higher education has the potential to monitor students, their study results and to give informative feedback. The challenge is to give significance to raw data and identify effective predictors from a multitude of data. If a student is identified as at-risk in study progress, this does not solve the problem. Information is needed why this student is not on track, which learning behavior is applicable and if it is possible to change this behavior to optimize study results. In order to identify the effective predictors, data from three courses are analyzed via a quantitative approach. Three predictors are identified, average grade pre-education, the online activity score of the student and the quiz result. The possibility to change behavior is analyzed with an online survey were students can vote which dashboard has their preference in the context of Learning Analytics, and also explain why. Students prefer the dashboards, which not only show the study results but also contextualize these results.

(3)

3

Table of contents

Abstract ... 2 Table of contents ... 3 List of figures ... 4 List of tables ... 4 1. Introduction ... 5

1.1. Learning analytics in higher education ... 5

1.2. Problem statement ... 5

1.3. Research questions ... 6

2. State of knowledge on learning analytics at higher education ... 7

2.1. Demographic data ... 7

2.2. Learner’s dispositions data ... 7

2.3. Activity data ... 8

2.4. Academic performance ... 9

2.5. Feedback ... 9

3. Methodology and data ... 10

3.1. Method for research sub question 1 ... 10

3.2. Method for research sub question 2 ... 12

4. Results ... 14

4.1. Results research sub question 1 ... 14

4.2. Results research sub question 2 ... 18

5. Discussion ... 21

6. Future work ... 24

7. Conclusion ... 25

References ... 26

Appendix A – The Motivated Strategies for Learning Questionnaires ... 28

Appendix B – Boxplots predictor variables ... 30

Appendix C - Activity score ... 34

Appendix D – Complete dataset predictor variables ... 46

Appendix E – Online survey ... 51

(4)

4

List of figures

Figure 1: Correlation plots average grade pre-education, self-regulation, self-efficacy and

final grade ... 16

Figure 2: Correlation plots activity score, quiz score and the final grade ... 16

Figure 3: Predictor variables in multiple linear regression ... 16

Figure 4: Granting permission ... 19

List of tables

Table 1: Multiplier for activity score ... 12

Table 2: Properties per data source ... 12

Table 3: Gender and Final grade ... 14

Table 4: Descriptive statistics ... 14

Table 5: Pearson correlations ... 15

Table 6: Multiple linear regression, first option ... 17

Table 7: Multiple linear regression, second option ... 18

Table 8: Gender and familiarity with LA... 18

Table 9: Scores per dashboard ... 19

Table 10: Most favorable dashboard ... 20

(5)

5

1. Introduction

1.1. Learning analytics in higher education

In classical learning environments teachers or instructors monitor how students work and interact on a daily basis. In blended learning environments, were online and offline learning is combined, face-to-face interaction becomes less and student behavior is less visible. However data of student behavior are stored and can be used to compensate for this omission. Transparency of student activities are essential in the process of online teaching and studying (Cerezo et al,. 2011). Learning Analytics can help to make data of student behavior more insightful and transparent.

In recent years universities initiated Learning Analytics projects to develop Learning Analytics as a tool to guide both, teacher and student, during the learning process. One of the primary objectives of learning analytics at higher education is to identify students, that are at risk in study progress, during an early stage of the course, and to give them informative feedback (Campbell et al, 2007, Dawson et al. 2014).

According to the 1st Conference on Learning Analytics and Knowledge (Siemens, 2010) Learning Analytics is defined as “the measurement, collection, analysis and reporting of data about learners and their contexts, for purposes of understanding and optimizing learning and the environments in which it occurs”.

Verbert et al. (2012) distinguish six objectives in Learning Analytics: 1. Forecasting student performance;

2. Propose applicable learning material; 3. Boost reflections through visualizations;

4. Enlarge awareness of the social learning environment; 5. Identify students with inefficient study styles;

6. Reveal affective states of students.

These objectives are quite general and partly interrelated like boosting reflections is a logical successor of forecasting performance. Some of the objectives are further analyzed in this thesis.

1.2. Problem statement

One of the most challenging areas of learning analytics is to give significance to raw data (Cerezo et al,. 2011). Information as age, gender, nationality and pre-education is available by the regular student administration (Tempelaar, 2014). There are data related to surveys for learner’s dispositions which students fill in during the first week of the course e.g. learning styles and learning motivation. There are data which are stored in Blackboard1,

like the number of logins, the time spent online, or the number of messages posted on discussion forums all resulting in high volumes of data. And there are data related to quiz

1 Blackboard: is a virtual learning environment and learning management system which

enables teachers to provide course materials, class discussions, assignments and assessments to students/learners (Malone, 2006).

(6)

6 results mostly participated by students halfway during the course while the final course grades are available when the instructor has graded the exams. Huang and Fang (2013) state that simply adding extra variables does not enhance the predictability of mathematical models. Therefore, relevant predictors need to be identified which can help to forecast the student’s performance.

Data sources are generated at various time intervals during a course. Data need to be interpreted using predictive models and statistical analysis. Maybe even more importantly, educators (lecturers and course designers) need to know when to intervene. If a student is identified as ‘at-risk’ information is needed why this student is not on track, which learning behavior is applicable in order to change inappropriate behavior if possible (Whitepaper

Surf, 2016). Each student has his/her own approach when learning in the digital learning

environment (Gašević et al,. 2015, Lust et al,. 2013). For some students information is more decisive than others. The challenge is to identify that type of information and present it to students in the most applicable way.

1.3. Research questions

The main question of this research is:

Which variables can be identified as effective predictors of course performance of students and how can this information be used by students in order to improve their study results?

The following sub-questions are relevant to ensure that the main research question is answered:

a. Which variables can be identified as effective predictors and how does the effectiveness change during a course?

To answer this sub-question five relational hypotheses are tested: 1. Average grade of pre-education is related to final course grade (H1); 2. Self-regulation score is related to final course grade (H2);

3. Self-efficacy score is related to final course grade (H3); 4. Activity score is related to final course grade (H4); 5. Quiz results are related to final course grade (H5).

b. How can the effective predictors be visualized to students, in order to improve their study behavior which ultimately leads to better results?

(7)

7

2. State of knowledge on learning analytics at higher education

To determine which variables are potentially meaningful, course design and the teachers’ intentions need to be considered. The course setup and a balance between online and offline learning activities determine which data are available and when. The sections below give more details of the various data sources, and when they become available and review their predictive value based upon previous research.

2.1. Demographic data

Demographic data typically consist of the variables Gender, Age, Nationality, Highest pre-education before higher pre-education and the average grade obtained at this pre-pre-education. Tempelaar (2014) discovered female students are more active in Blackboard than male students.

The framework of Hofstede et al. (2010) converts the nationality of students into six cultural dimensions: Power Distance, Individualism, Masculinity, Uncertainty avoidance, Long term orientation and Indulgence. Since the University of Amsterdam has internationalization as a strategic theme (Universiteit van Amsterdam, 2015), it is important to determine whether a correlation exists between cultural dimensions (nationality) and final course grade.

The variable average-grade-pre-education is expected to have a positive correlation with the final grade of a course. Students who perform well in pre-education are expected also to perform well in higher education.

2.2. Learner’s dispositions data

A disposition can best be described as the most likely action of a student in a learning situation. For example if a student is disposed as ‘curious’, he or she is motivated by the unknown and asks relevant questions persistently (Buckingham Shum et al,. 2012).

Gathering learner’s dispositions data is done at the beginning or before the start of a course. It is a relatively more time consuming than gathering demographic data which are already available via the student administration. Predictability of learner’s dispositions data is an intermediate according to Tempelaar et al. (2013) which might be a motivation to skip the effort. However learner’s dispositions data have an important added value. It suggests an attractive starting point for intervention compared with activity data. Learner’s dispositions data give insight in learning strategies of students. Self-regulation and self-efficacy are two examples of learning strategies.

Self-regulation is the ability of students to regulate their own learning process. It is related to academic success (Winne et al,. 2000). Self-regulated students operate through the learning process in four main stages (Woolfolk et al,. 2008):

1) Analyzing tasks;

2) Develop plans and goals; 3) Start learning approach; 4) Adapt learning approach.

(8)

8 The variable self-regulation is expected to have a positive correlation with the final grade of a course. Students who self-regulate their learning process are expected to perform well in higher education.

Self-efficacy is ability to develop expectations about one’s own ability to achieve academic success. This type of self-efficacy is called academic self-efficacy and has been studied extensively. “The principal finding is that students’ self-efficacy beliefs are significantly and positively related to academic performance” (Hodges, 2008). Lee (2002) stated that online learning differs from traditional face-to-face learning and that relevant learning strategies depend on the learning context. Lee also defined 11 online learning strategies: self-directed studying, communicate actively, managing simultaneous discussions, being social online, managing huge amounts of information, processing these huge amounts of information, interpreting information, time management, managing nonsynchronous tasks, self-efficacy to complete online courses, and a positive approach into online courses. Additionally, Lee researched the relationship between academic success and online learning strategies and found time management the most effective predictor for academic success followed by self-efficacy to complete online courses. The variable self-efficacy is expected to have a positive correlation with the final grade of a course. Students with self-efficacy beliefs are expected to perform well in higher education.

2.3. Activity data

Blackboard provides the feature Statistical tracking (Blackboard track data). Different types of items can be tracked:

‐ Number of logins; ‐ Messages read; ‐ Documents read;

‐ Posts on discussion forums; ‐ Number of old exams downloaded; ‐ Time on quiz results.

Tempelaar (2015) stated the number of old exams downloaded is an effective predictor. However this predictor is available in the last week of the course because all students download the old exams during the last week of the course or the last week before the exam.

The Blackboard track data identifies students who are most active on Blackboard. This variable is interpretable in two different ways:

- It is a high-level measure of learning activity which is positive; - It is a measure of inefficient learning behavior which is negative.

Macfadyen and Dawson (2010) already proved clicking behavior is at best a meager predictor for the user-behavior of students. Tempelaar et al. (2015) showed that activity data are the poorest predictor compared to any of the other data components (demographic data, learning dispositions data and quiz results). Gathering activity data is possible during all the weeks of the course.

The variable activity score is expected to have a positive correlation with the final grade of a course. Students who study actively are expected to perform well in higher education.

(9)

9

2.4. Academic performance

The best predictor for academic performance is the performance itself (Tempelaar, 2013). The variable which best correlates with the final course grade is the quiz result. The first quiz results of a course are available when the teacher finishes examination. The question which predictor is the most effective does not solve all the issues in the learning analytics spectrum, when this ‘effective’ predictor is available is also of the utmost importance because this determines when it is possible to act or intervene.

The variable quiz result is expected to have a positive correlation with the final grade of a course.

2.5. Feedback

Dispositional Learning Analytics is a form of Learning Analytics where feedback is based on a combination of activity data and learning dispositional data (Buckingham Shum et al,. 2012). Activity data are used to identify at-risk students at an early stage of the course. Dispositional data can create student risk profiles later on in the course. Dispositional data have an advantage over activity data because dispositions give answers to learning strategies and motivational strategies of the students. This is exactly the information needed to provide feedback to students because these dispositions give context to the results. The power of the Dispositional Learning Analytics model is the complementarity of activity data and learner’s dispositional data.

(10)

10

3. Methodology and data

The Results research sub question 1 is based on quantitative research. Results research sub question 2 is based on a qualitative research. The following chapters give insight in the details: setting, method, participants, framework, data, and types of analysis that were used to answer the questions in this thesis.

3.1. Method for research sub question 1

A quantitative research method was used to answer the research sub question 1:

Which variables can be identified as effective predictors and how does the effectiveness change during a course?

Data processing and statistical methods

The analysis for finding predictors is executed with Pearson’s correlation since all variables are at interval or ratio level, and are linearly related and normally distributed. Boxplots of each predictor variable are available in ‘Appendix B – Boxplots predictor variables’. The strength of the correlation is described from “very weak” to “very strong” (Evans 1996) and is translated to the absolute values of r:

- .00-.19 “very weak” - .20-.39 “weak” - .40-.59 “moderate” - .60-.79 “strong” - .80-1.0 “very strong”

All predictor variables that showed a correlation, and had p-value <00.1 are conceptualized in a framework which gives a clear overview of available predictors and the context which they serve (Robson 2011). After the predictor variables were identified they were further analyzed through multiple linear regression. When multicollinearity or singularity was found between predictor variables, one predictor variable was removed from the analysis, depending on the context of the variable. The presence of multicollinearity was examined with Pearson’s correlation. High multicollinearity was defined as r = very strong.

Outliers were removed from the dataset when they exceed more than two times the standard deviation, compared to the average of the dataset.

Population

The results of three blended learning courses were combined in one dataset. The total number of participating students was 177. In the first course 68 students participated, in the second course 62 students, and in the third 54 students. The students were between 18 and 38 years old. 134 was male and 43 female.

Before the start of each course students were asked to sign informed consent. Seven students refused participation in the second course and all data of these students were consequently removed from the dataset. Based on the complete population the following predictor variables were added to the dataset:

(11)

11 1) Demographic data: average grade in pre-education;

2) Learner’s dispositions data: self-regulation and self-efficacy; 3) Activity data: activity score;

4) Academic performance: quiz results.

Demographic data

Demographic data were obtained through the student administration. The variable average

grade in pre-education was used as the first predictor variable that was analyzed with

Pearson’s correlation.

Learner’s dispositions data

MSLQ (Motivated Strategies for Learning Questionnaire) was used as an instrument to obtain the learner’s disposition data. This instrument has 81 questions that can be clustered in 18 scales. For each scale the relevant questions are answered by students on a seven point Likert scale from “Not at all true of me” to “very true of me”. The score in one scale is calculated by taking the mean of all the answers from the questions that are relevant for that scale.

Before the start of the first and second course all students filled in the MSLQ. From students of the third course the MSLQ data were not available. The complete raw dataset was processed and the score of each MSLQ scale added to the dataset. The relevant questions to determine predictor variables self-regulation and self-efficacy are available in ‘Appendix A – The Motivated Strategies for Learning Questionnaires’, as are all the 18 scales. The variable self-regulation and self-efficacy were taken as the second and third predictor variables and were analyzed with Pearson’s correlation.

Activity data

For all three courses the complete raw activity data was provided through the learning record store (LRS)2. Data cleansing was done through the following steps:

1) All rows from years 2010, 2011, 2012, 2013 and 2014 were removed which left only the data from 2015;

2) All rows after 1st of December 2015 were removed. By only using records from

the first four weeks it is possible to identify at risk students when the course is half way;

3) Activity types attempted, attended, interacted, launched and scored were removed because the focus is on activity type accessed. Over 98% of all records are related to activity type accessed which makes this the most frequent used activity type; 4) Outliers were removed.

The average number of activities per student thus obtained for the first course was 1181, for the second course 760 and for the third course 1127. To add an equal activity score to the whole population the following multiplier defined in Table 1 was used for each student.

2 The learning record store is an open source implementation used at the University of

Amsterdam to relatively easy store and extract data from miscellaneous educational tools (Heck 2013).

(12)

12 Table 1: Multiplier for activity score

course average score multiplier result

1 1181 1 1181 2 760 1,5539474 1181 3 1127 1,0479148 1181

The variable activity score is the fourth predictor variable which was analyzed with Pearson’s correlation. See ‘Appendix C - Activity score’ for a complete explanation of this variable per course.

Performance data

For all three courses the results associated with assessments or assignments which gave marks were obtained via Blackboard. The course design of the three courses was slightly different. In order to compare the results of the three courses, the data results in all three courses were assigned as “quiz result”.

The variable quiz result was based specifically for a course on the following variables: - First course: quiz result

- Second course: T1 (the first examination component) - Third course: Q1 (the first examination component)

In the third course 16 results had a final grade of 0. This result applies if at least one of the mandatory assignments / exams was not completed. In this case the quiz result was determined based on the weighted average of the actual results and added to the dataset.

Table 2 gives an overview of all relevant properties per available data source. Table 2: Properties per data source

Demographic

data Dispositional data Activity data

Performance data Source Student administration Questionnaires, self-assessment Blackboard Blackboard Available in week Week 0 Week 0, Week 1 Week 1 - 8 Week 4 - 8 Difficult to

extract

Directly

available Time consuming, each student should fill in questionnaires Time consuming, huge amounts of data When teacher finishes examination

Predictability medium medium low high

3.2. Method for research sub question 2

A qualitative research method was used to answer the final research sub question 2:

How can the effective predictors be visualized to students, in order to improve their study behavior which ultimately leads to better study results?

(13)

13 In this part, user expectations of users were studied about the functionally of the visualizations and considering learning analytics predictors. A small group of participants participated in this study. Three examples of Learning Analytics Dashboards, each with their own, unique purpose, were presented to the participants in this study. The participants could vote for the dashboard of their preference in the context of Learning Analytics. They were asked to explain why. The results were processed and the answers were analyzed.

Setting

An online survey was created using Google forms. The questions used were discussed with an expert and the online form was tested three times by a fellow student before the survey was launched. The survey consisted of three parts: Part 1: two general questions related to gender and age followed by two multiple choice questions related to Learning Analytics. Part 2: three Learning analytics dashboards were presented, each by a picture and by a short explanation of the functionalities. The students expressed here their appreciation on a Likert scale (1 ‘No definitely not’ to 5 ‘Yes absolutely’). Part 3: students had to vote for their most favorable dashboard and clarify why they have chosen it.

Participants

Students from the study Information Studies were approached through online communication channels and direct contact. Both regular and part-time students were asked, not only from this academic year but also from the year before.

Questions

In the context of Learning Analytics three questions were asked per dashboard, does the dashboard improve insight in her/his study progress, can the dashboard help to improve understanding about learning and can the dashboard help to improve learning. The complete survey is available in ‘Appendix E – Online survey’. Three dashboards were presented. The first dashboard works with traffic lights and signals if a student is not on track. The second dashboard signals if students achieve their goals and present indicators on basis of self-assessments. The third dashboard shows how active the student is and benchmarks quiz scores with other students.

The open question is analyzed according thematic analysis were parts of the data are coded and labeled and finally grouped together in a theme (Robson 2011).

(14)

14

4. Results

4.1. Results research sub question 1

Table 3 shows the diversity of the population and the outcome of the course.

Table 3: Gender and Final grade

Final grade Gender < 5.5 => 5.5 Total Male 44 90 134 Female 11 32 43 Total 55 122 177

Descriptive predictor variables

An overview of the continuous variables is presented in Table 4. The scores on scales self-regulation and self-efficacy are given for all students from the first and second course. 33 students did not have data for average pre-education grade because these students did not follow the Dutch education system. These students have no Dutch nationality.

Table 4: Descriptive statistics

avg grade pre-edu regulation Self- efficacy Self- Activity score Quiz score Final grade

Valid 144 123 123 156 177 177 Missing 33 54 54 21 0 0 Mean 6.774 3.975 4.893 1181 6.244 6.097 Median 6.700 4.200 5.100 1173 6.600 6.800 Std. Deviation 0.5311 1.100 1.230 553.2 2.202 2.257 Minimum 6.000 0.8000 0.000 16.00 0.000 0.000 Maximum 9.000 5.900 7.000 2839 10.00 9.500

Correlation matrix predictor variables

A Pearson’s correlation was executed to complete the relationship between the five predictor variables and the final grade. Results are presented in Table 5.

(15)

15 Table 5: Pearson correlations

avg grade pre-edu regulation Self- efficacy Self- Activity score Quiz score

Final grade Pearson's r 0.442 0.004 0.042 0.265 0.780 p-value *** < .001 0.964 0.642 *** < .001 *** < .001 avg grade

pre-edu Pearson's r — -0.087 0.024 0.117 0.343 p-value — 0.390 0.813 0.185 *** < .001 Self-regulation Pearson's r — 0.497 0.192 0.010 p-value — *** < .001 * 0.041 0.912 Self-efficacy Pearson's r — 0.217 0.057 p-value — * 0.020 0.529 Activity score Pearson's r — 0.240

p-value — ** 0.003

Quiz score Pearson's r —

p-value —

* p < .05, ** p < .01, *** p < .001

Three predictor variables correlate with the final grade:

Average grade pre education

There is a moderate, positive correlation between average grade pre-education and final

grade (r = .442, N=144, p <.001)

Activity score

There is a weak, positive correlation between activity score and final grade (r = .280, N=156, p <.001).

Quiz score

There is a strong, positive correlation between quiz score and final grade (r = .780, N=177, p <.001).

The correlation of Average grade pre-education, Activity score and Quiz score are presented in Figure 1 and Figure 2.

(16)

16

Figure 1: Correlation plots average grade pre-education, self-regulation, self-efficacy and final grade

Figure 2: Correlation plots activity score, quiz score and the final grade

Multiple linear regression

Predictor variables which correlate are brought together in a conceptual framework which is used in the multiple linear regression analysis:

(17)

17 There was no multicollinearity or singularity between predictor variables, and outliers which exceed two times the standard deviation compared to the average were removed from the dataset. Next we started the multiple linear regression which is presented in Table 6.

Table 6: Multiple linear regression, first option Model Summary

Model R R² Adjusted RMSE

1 0.743 0.552 0.541 1.405

ANOVA

Model Squares Sum of df Mean Square F p 1 Regression 306.4 3 102.133 51.71 < .001

Residual 248.9 126 1.975 Total 555.3 129

Coefficients

Model Unstandardized Standard Error Standardized t p 1 intercept -6.548 1.569 . < .001 avg grade pre-edu 1.252 0.235 0.325 . < .001 activity score 0.000 0.000 0.023 . 0.705 quiz score 0.663 0.067 0.606 . < .001

Predictor activity score is removed from the analysis because p = 0.705, which means it is not significant. The multiple linear regression is executed with predictor variables average

(18)

18

Table 7: Multiple linear regression, second option Model Summary

Model R R² Adjusted RMSE 1 0.755 0.570 0.564 1.413

ANOVA

Model Squares Sum of df Square Mean F p 1 Regression 372.9 2 186.430 93.39 < .001

Residual 281.5 141 1.996 Total 654.3 143

Coefficients

Model Unstandardized Standard Error Standardized t p 1 intercept -6.122 1.512 . < .001

avg grade

pre-edu 1.196 0.229 0.297 . < .001 quiz score 0.671 0.061 0.629 . < .001

The R² value of 0.570 is the amount of variance in the final grade which is explained in the variance that is happening in the predictor variables average grade pre-education and quiz

score. The value p = <.001 in ANOVA means the model is significant. The unstandardized

values in the coefficients are used in the formula to predict the final grade:

Final grade = (average grade pre-education * 1.196) + (quiz score * 0.671) – 6.122. This model has an accuracy of 56.4% which is explained with the Adjusted R².

4.2. Results research sub question 2

Table 8 shows the diversity of the population and the familiarity with Learning Analytics. Table 8: Gender and familiarity with LA

Familiar LA

Gender No Yes Total Female 13 7 20 Male 19 11 30 Total 32 18 50

(19)

19 The result of the question would you grant the University of Amsterdam permission to use

your data for Learning Analytics purposes? is presented in ‘Figure 4: Granting permission’.

The bigger green picture of a human represents 42 votes of students who grant permission, the smaller red picture of a human represents 8 votes of students who do not grant permission.

Figure 4: Granting permission

The average score of all votes on the Likert scale for the second part are presented in Table 9.

Table 9: Scores per dashboard

Option1: Course signals Option 2: Open Learning Initiative Option3: Adobe Learning Dashboard

Q1: Does this dashboard improve insight of the student in her/his

study progress? 3.6 3.9 3.8 Q2: Can this dashboard help

students to improve understanding

about learning? 2.9 3.8 3.5 Q3: Can this dashboard help

students to improve his/her

learning? 2.9 3.8 3.6

The average score for the first question is comparable although the Open Learning Initiative has the advantage. The Open Learning initiative also has the highest average score for both the second and third question, followed by Adobe Learning Dashboard. Course signals scores lower for question two and three.

(20)

20 Table 10: Most favorable dashboard

Option1: Course signals Option 2: Open Learning Initiative Option3: Adobe Learning Dashboard Number of votes 2 27 21

Below, a summary of explanations extracted from the thematic analysis is given. The complete thematic analysis is available in ‘Appendix F – Thematic analysis’.

Option 1:

- Most intuitive dashboard. Option 2: Open Learning initiative

- Detailed overview of progress, overview of individual focus points; - Self-assessments for changing behavior;

- Insight multiple aspects, multiple sources to understand students learning progress; - Best level of detail and abstraction;

- Accurate information of learning status and competences; - Detail of what learning is, not only if a goal was met;

- Combining background information with current performance and goals. Option 3: Adobe Learning Dashboard

- Better visuals, clear where to improve; - Nice addition to also have the benchmarking; - Easier to understand;

- Benchmarking helps to motivate; - Clearer and less extractions; - Also provides contextual data.

The results of the thematic analysis are presented in Table 11.

Table 11: Thematic analysis

Option1: Course signals Option 2: Open Learning Initiative Option3: Adobe Learning Dashboard Additional information 0 4 1 Benchmark 0 0 3 Best lay-out 0 1 3 Detailed information 0 9 0 Improvement options 0 4 0 Intuitive 1 3 4

(21)

21

5. Discussion

The goal of this research was to identify which learning behaviors can be discovered by learning analytics to predict students’ performance in higher education courses and identify students at risk already in an early stage of a course. Another question was the visualization of this information for students, especially for the students at risk in order to help them to change their behavior and achieve better results.

To find effective predictors a group of five variables that was considered as predictors that correlated with the final course grade of the student. Five relational hypotheses were tested. The first hypothesis was that there is a correlation between average grade pre-education and the final grade of a course. It is expected that students who perform well during their pre-education also perform in higher education. In this study it is shown that the average grade of the pre-education has a moderate, positive correlation with the final grade (r = .442, N=144, p <.001). This predictor variable is available before the course begins which is an advantage compared to other variables. This variable was in our case only available for students who followed their pre-education in the Netherlands. This can be seen as a disadvantage of this variable, especially for tracks with a high number of international students knowing that the University of Amsterdam has internationalization as a strategic theme. The majority of the students did have an average pre-education grade. The second hypothesis was that students who better self-regulate their learning process are expected to perform better in higher education. To find out if there is any correlation a self-evaluation questionnaire MSLQ was used (Pintrich, 1991). We didn’t find the correlation between the MSLQ self-regulation score and the final grade which was surprising as others found a correlation. But why does self-regulation not correlate with the final grade? It could be explained by the fact that this survey was taken in a snapshot before the start of the course when the students were not yet active. This data has an important upside because it suggests an attractive starting point for intervention. For many students the feedback ‘you are not active enough’ is not an opportunity to change whilst the feedback ‘there are more efficient learning styles for you’ is. Therefore we argue that it would be interesting to measure the self-regulation parameter in different moments of a course. Van Vliet et al (2015) showed that MSLQ scores can change when students are exposed to active learning teaching design.

The third hypothesis in this research was that students who develop expectations about their ability to achieve academic success (self-efficacy beliefs) also achieve this better than other students. As with the case of the self-regulation score described above, we could not find a correlation between the MSLQ score on self-efficacy scale with the final grade. It is expected that students who study actively and are active on Blackboard perform well in higher education (fourth hypothesis). The activity score showed a weak, positive correlation with the final grade (r = .280, N=156, p <.001). This is very much in line with the results of Tempelaar et al. (2015) who showed that activity data are a poor predictor for the course achievement. The activity score is indeed the poorest predictor from the variables which do correlate. There is a small number of students who are most active on Blackboard but do not complete the course successfully. This is possibly because they might have an inefficient learning behavior. Buckingham Shum (2012) argued to combine

(22)

22 activity data with dispositional data to identify at risk students (students who are not active) and use the dispositional data to give insight in learning behavior and use this data to provide feedback.

The fifth hypothesis was that students with high quiz results also gain a satisfactorily final course grade. From our research it is evident that there is a high correlation (r = .780, N=177, p <.001). The quiz results show a strong positive correlation with the final grade. The quiz result is also the most effective predictor in our study. Since each student is supposed to have a quiz result, this predictor is available for all students, in contrast to average grade pre-education which is not available for many students. This implies that it is very important that the quiz results are available promptly after taking the quiz. Also, more quizzes are recommended and they should happen early in the course.

There is a variation in the moment that predictors data become available. The average grade pre-education is available before the course starts, activity data are available during the course, and quiz results are available after examination when the course is at least half way. This answers ‘how the effectiveness change during a course’. It is a challenge to identify the students who are at risk before the first quiz to identify them in an early stage where there is still time to change and improve achievement. Activity data, together with the average grade of the pre-education and the learning dispositions are available in the first half of the course. This is in line with Dispositional Learning Analytics which combines activity data with learning dispositional data. In our research an additional variable to this model is added, the average grade of the pre-education.

To answer the second research sub question “How can the effective predictors be visualized to students, in order to improve their study behavior which ultimately leads to better results?” 50 students have participated in an online survey. They evaluated three dashboards that were presented with a picture and an explanation of the functionalities. The first dashboard uses traffic lights to signal if a student is on track (Verbert 2013). The second dashboard is more detailed than the first and signals if students achieve their goals, the dashboard presents topics which need attention and indicators are calculated on basis of self-assessments (Verbert 2013). The third dashboard shows how active the student is during the course. The quiz score is compared to the class as is the student’s participation score.

For a majority of 27 students the most favorable dashboard is the second dashboard, Open Learning Initiative. In the categories additional information, detailed information and improvement options this dashboard has the highest score. Students explain this dashboard as their favorite because self-assessments helps them change the study behavior. It gives a detailed overview of individual focus points. It gives insight into multiple aspects, which helps the student understand the learning progress. It gives information about what learning is and combines background information with current performance goals.

21 students voted for the third dashboard as their most favorable, Adobe Learning Dashboard, because it has better visuals and is easier to understand. The benchmarking with other students helps to motivate. The dashboard is clear and easier to grasp at first sight. The advantage of the Adobe Learning Dashboard is that activity data and quiz results are available for all students, while self-assessment, used in the Open Learning Initiative is not always available and can be rather time consuming. We argue to test the feasibility of

(23)

23 successfully deploying the Adobe Learning Dashboard during a bachelor course and Open Learning initiative dashboard during a master course. During the bachelor course activity data and quiz results are available and bachelor students can compare their results with other students. Students in master courses are more experienced and therefore more capable to set individual performance goals.

As only 2 students voted for course signals, signaling dashboard was the least popular in this survey.

The Adobe Learning Dashboard has it all and sounds too good to be true. Do students have the knowledge and experience to set individual performance goals and is self-assessment information available and accurate to assist in changing study behavior? To use this dashboard within higher education agreements are needed between the ICT-department, who administers the dashboard, and the students / teachers who deliver the content needed in the dashboard. Data is to be delivered in a format which applies to the standards of the dashboard. Students need training in setting performance goals, teachers need training so they are capable to align the course setup with the possibilities of the dashboard.

Learning Analytics delivers very interesting possibilities for both students and teachers at higher education. To use the data for learning analytics it needs to be cleansed. It is necessary to align data from different courses meaningfully. The activity data are available for each course in Blackboard. Teachers are free to setup courses in Blackboard which leads to large datasets which differ from each other. There is room for improvement in how data are delivered from Blackboard and also in the directions of how to align and improve blended learning courses. All students of the first and second course completely filled in the MSLQ. To find out if self-regulation and self-efficacy values have an added predictive value for learning analytics the use of MSLQ instrument to define students’ motivation and learning strategies could be expanded to more courses, or more moments in one course.

(24)

24

6. Future work

This thesis proves the variable quiz result is the most effective predictor. Since this predictor is available after the teacher finishes examination possibilities should be researched to use academic performance sooner, for example before the course begins. In order to support this statement Business Intelligence should be integrated with Learning Analytics. Business Intelligence offers possibilities to use results of previous courses per student. When final course grades of previous courses are aggregated the result is an average course grades per student. Using grades of previous courses might result in a completer state of knowledge about students who are at risk.

Limitations

Although this thesis answered the research questions, there were limitations that need to be contemplated. The quantitative analysis was done on basis of data of three blended learning courses from one college. While other faculties are expected to deliver comparable output this hypothesis is not validated which limits this study’s suitability for generalization. The female population of the three courses was a quarter of a whole. This makes the diversity of the population predominantly male. While more diverse course are expected to deliver identical output this is again not validated which also limits the appropriateness for generalization. Both limitations can be validated if comparable research is done among other, more diverse, faculties or universities.

The qualitative analysis was conducted among 50 participants. In order to generalize further a larger number of students should participate.

(25)

25

7. Conclusion

This thesis researched which variables were identified as effective predictors and how these results should be visualized to students in order to improve their study behavior and consequently their study results. From the current state of knowledge five predictor variables were identified which correlate with the final course grade of a student: average grade pre-education, self-efficacy, self-regulation, activity score and quiz results. These variables are analyzed with Pearson’s correlation and the ones which correlate are conceptualized in a framework and further analyzed through multiple linear regression. From a population of 177 students the variable quiz results is the most effective, followed by average grade pre-education and activity score. The variables efficacy and Self-regulation, from the learning dispositions data do not correlate with the final course grade which is surprising.

To test the visualization an online form was created in google forms were students voted which dashboard from the three has their preference and why. The first dashboard uses traffic lights to signal if a student is on track. The second dashboard uses information from self-assessments and personal goals to signal if a student is on track. The third dashboard shows the online activity of a student and benchmarks quiz scores with other students. From a population of 50 students The second dashboard, Open Learning Initiative, was their favorite with 27 votes. The third dashboard, Adobe Learning Dashboard followed with 21 votes which leaves 2 votes for Course Signals.

Since the second and third dashboard use different sources and visualizations the question is why students vote for that particular functionality. Via thematic analysis themes were coded and Open Learning Initiative scored best on themes ‘Additional information’, ‘Detailed information’ and ‘Improvement options’ were Adobe Learning Dashboard scored best on ‘Benchmark’, ‘Intuitive’ and ‘Best lay-out’. The Adobe Learning Dashboard uses activity data with quiz results which is relatively easy to use at bachelor studies. The Open Learning initiative lets the student set individual performance goals which is intended for experienced students and should therefore be applied to master studies.

The most effective predictor is the academic performance itself, the quiz results. This variable is available after the teachers finishes examination. Since the goal of Learning Analytics is to identify at risk students on time this variable has a shortcoming. Therefor we argue to use Business Intelligence in higher education to present an average course grade per student which may act as the most effective predictor, but this time the variable is available before the course begins.

(26)

26

References

Buckingham Shum, S., & Deakin Crick, R. (2012). Learning dispositions and transferable competencies: Pedagogy, modelling and learning analytics. In Paper presented at the 2nd international conference on learning analytics & knowledge, Vancouver, British Columbia. B. J. P. Campbell, P. B. Deblois, and D. G. Oblinger. Academic Analytics A New Tool, no. August 2007.

Cerezo, R., Sanchez- -Santillan, M., Puerto Paule-Ruiz, M., Carlos Núnez, J. (2015). Students' LMS interaction patterns and their relationship with achievement: A case study in higher education. Computers & Education 96, 42 – 54.

Dawson, S., Gašević, D., Siemens, G., & Joksimovic, S. (2014). Current state and future trends: a citation network analysis of the learning analytics field. Proceedings of the fourth international conference on learning analytics and knowledge (pp. 231–240). New Work, New York , USA.

Evans, J. D. (1996). Straightforward statistics for the behavioral sciences. Pacific Grove, CA: Brooks, USA, Cole Publishing.

Gašević, D., Dawson, S., & Siemens, G. (2015). Let’s not forget: Learning analytics are about learning. TechTrends, 59 (1), 64-71.

Heck, A (2013). Constructive Overview Aggregating Comparative Hits (COACH), technical report University of Amsterdam.

Hodges, C.B. (2008). Self-efficacy in the context of online learning environments: A review of the literature and directions for research. Performance Improvement Quarterly, 20 (3-4), 7-25.

Hofstede G., Hofstede G. J., Minkov M. (2010). Cultures and organizations: Software of the mind. Revised and expanded third edition. – Maidenhead: McGraw-Hill.

Huang, S., & Fang, N. (2013). Predicting student academic performance in an engineering dynamics course: A comparison of four types of predictive mathematical models. Computers & Education, 61, 133–145

Lee, I. (2002). Relationships between e-learning strategies and learning achievement. Journal of Educational Technology, 18 (2), 51–67.

Lust, G., Elen, J., & Clarebout, G. (2013). Regulation of tool-use within a blended course: Student differences and performance effects. Computers & Education, 60 (1), 385-395. Macfadyen L.P. and Dawson S. (2010). Mining LMS data to develop an “early warning system” for educators: A proof of concept, Computers & Education 54 (2), 588–599.

Malone, T. (2006) What is Blackboard? Retrieved March 31, 2016, from

(27)

27 Robson, C. (2011). Real world research (123- 129). West Sussex, John Wiley & Sons. Siemens, G. (2010, July 22). About: 1st international conference on learning analytics and knowledge. Retrieved March 30, 2016, from https://tekri.athabascau.ca/analytics/about

Dirk T. Tempelaar. (2013). Learning Analytics, formatieve toetsing & leerdisposities: Eindrapportage bij ‘stimuleringsregeling Learning Analytics 2013’.

Dirk T. Tempelaar. Learning Analytics And Formative Assessments In Blended Learning Of Mathematics And Statistics – Innovative Infotechnologies for Science, Business and Education, 2014, 14-19.

Dirk T. Tempelaar , Bart Rienties, Bas Giesbers (2015). In search for the most informative data for feedback generation: Learning analytics in a data-rich context. Computers in Human Behavior 47, 157–167.

Universiteit van Amsterdam (2015), Instellingsplan 2015-2020: grenzeloos nieuwsgierig. Verbert, K., Manouselis, N., Drachsler, H., & Duval, E. (2012). Dataset-driven research to support learning and knowledge analytics. Journal of Educational Technology & Society, 15(3), 133–148.

Van Vliet E, Winnips J, Brouwer N (2015). Flipped-class pedagogy enhances student metacognition and collaborative-learning strategies in higher education but effect does not persist. CBE—Life Sciences Education 14.

Whitepaper Surf (2016). Hoe data de kwaliteit van het hoger onderwijs kunnen verbeteren.

Retrieved April 14, 2016, from

https://www.surf.nl/binaries/content/assets/surf/nl/kennisbank/2016/whitepaper-la-web-def.pdf

Winne, P.H. & Perry, N.E. (2000). Measuring self-regulated learning. In P. Pintrich, M. Boekaerts, & M.Seidner, Handbook of self-regulation (p. 531-566). Orlando: Academic Press.

Woolfolk, A., Walkup, V., & Hughes, M. (2008). Psychology in education. Harlow, UK: Pearson-Longman.

(28)

28 Appendix A – The Motivated Strategies for Learning Questionnaires

The table below reflects how the scale, which is of type Motivation or Learning Strategies, correlates with the final grade of 380 Midwestern college students who followed the 3rd

semester (January till May) where the MSLQ was allocated at the end of winter 1990.

Scale Type Correlation with

final grade Value component: extrinsic goal orientation Motivation 0,02

Resource management strategies: help seeking Learning strategies

0,02

Cognitive and metacognitive strategies: rehearsal

Learning strategies

0,05

Expectancy component: control of learning beliefs

Motivation 0,13

Cognitive and metacognitive strategies: critical thinking

Learning strategies

0,15

Cognitive and metacognitive strategies: organization

Learning strategies

0,17

Cognitive and metacognitive strategies: elaboration

Learning strategies

0,22

Value component: task value Motivation 0,22 Value component: intrinsic goal orientation Motivation 0,25 Resource management strategies: time and

study environment

Learning strategies

0,28

Cognitive and metacognitive strategies: self-regulation

Learning strategies

0,3 Resource management strategies: effort

regulation

Learning strategies

0,32

Expectancy component: self-efficacy for learning & performance

Motivation 0,41 Affective component: test anxiety Motivation -0,27 Resource management strategies: peer learning Learning

strategies

(29)

29 For each scale the relevant questions are answered by students on a seven point Likert scale from “Not at all true of me” to “very true of me”. The end score of the scale is constructed by taking the mean of all the answers from the relevant questions for that scale. There is a total of 81 questions.

Before the start of the first and second course all students filled in the MSLQ. The complete raw dataset was processed and the score of each MSLQ scale added to the dataset. On basis of the following questions, from a total of 81, the predictor variables Self-regulation and Self-efficacy are determined:

Self-regulation

33 During class time I often miss important points because I'm thinking of other things. (REVERSED)

36 When reading for this course, I make up questions to help focus my reading. 41 When I become confused about something I'm reading for this

class, I go back and try to figure it out.

44 If course materials are difficult to understand, I change the way I read the material. 54 Before I study new course material thoroughly, I often skim it to see how it is organized.

55 I ask myself questions to make sure I understand the material I have been studying in this class.

56 I try to change the way I study in order to fit the course requirements and instructor's teaching style.

57 I often find that I have been reading for class but don't know what it was all about. (REVERSED)

61 I try to think through a topic and decide what I am supposed to learn from it rather than just reading it over when studying.

76 When studying for this course I try to determine which concepts I don't understand well.

78 When I study for this class, I set goals for myself in order to direct my activities in each study period.

79 If I get confused taking notes in class, I make sure I sort it out afterwards.

Self-efficacy:

5 I believe I will receive an excellent grade in this class.

6 I'm certain I can understand the most difficult material presented in the readings for this course.

12 I'm confident I can understand the basic concepts taught in this course.

15 I'm confident I can understand the most complex material presented by the instructor in this course.

20 I'm confident I can do an excellent job on the assignments and tests in this course. 21 I expect to do well in this class.

29 I'm certain I can master the skills being taught in this class.

31 Considering the difficulty of this course, the teacher, and my skills, I think I will do well in this class.

(30)

30

Appendix B – Boxplots predictor variables

Average grade pre-education

6.9,6.4,6.4,6.4,6.4,6,6.3,6.8,6.3,7.3,7.5,6.4,6.5,7,6.7,6.3,7,7.1,6.7,6.9,6.3,7,7.1,7.1,7.2,6.6,6 .5,6.2,7.1,6.5,6.4,6.8,8.6,6.3,7,7.1,7.3,6.9,7.2,8.2,8.1,8,6.7,6.7,8.6,6.9,7.4,6.9,7.5,6.6,6.4,7, 7.2,6.3,7.6,9,6.7,6.7,6.3,6.3,6.3,6.3,7.6,6.7,7,6.1,6.8,6.3,7,6.5,6.8,6.7,6.1,7,6.6,6.3,6.8,7,7.5 ,7.2,6.2,6.6,6.7,7.1,7.6,6.1,7,6.7,7.1,7.3,6.8,6.6,7.1,6.4,6.5,6.5,6.4,6.2,7,6.1,6.7,6.3,6.3,6.3, 6.7,7,6.1,6.8,6.3,7,6.5,6.8,6.7,6.1,7,6.3,6.5,7,7.5,7.2,6.2,6.6,7.6,6.1,7,6.7,7.1,7.3,6.8,6.6,7.1 ,6.4,6.5,6.4,6.2,6.3,7,6.1,6.2,6,6.3,6.1,6.5,7 Population size: 144 Median: 6.7 Minimum: 6 Maximum: 9 First quartile: 6.325 Third quartile: 7 Interquartile Range: 0.675 Outliers: 9 8.6 8.6 8.2 8.1

(31)

31 Self-regulation 4.9,4,2.7,5.6,4.7,3.8,4.6,4.2,2.3,5.8,4,3.8,2,4.1,3.3,4.3,3.5,4.6,4.1,3.5,4.4,4.7,3.8,3.8,3.4,4.4 ,3.3,0.8,4.8,4.8,4.8,3.1,3.7,4.8,2.6,1.3,3.9,1.3,4.2,5.1,5.3,3.3,4.8,4.4,3.9,1.1,4.2,1.9,4.3,4.4, 4,3.8,2.8,1.3,3.4,4.1,4.7,4.9,4,4.7,4.9,4.8,2.7,4.9,4.9,5.4,4.2,4.3,5.7,5.2,3.2,3.7,5.9,3,4.3,5,2 .3,4,3.9,4.8,1.3,4.9,1.3,4.2,4.4,4,5.9,4.3,5.8,3.9,4.6,3.3,4.5,1.3,4.6,3.8,4.8,5.2,2.2,4.3,4.3,4. 5,4.8,4,4.3,3.7,4.8,4.3,3.4,4.9,4.8,4.5,3.8,1.7,3.9,4.3,4.1,4.7,3.8,3.1,4.8,3.8,4.8 Population size: 123 Median: 4.2 Minimum: 0.8 Maximum: 5.9 First quartile: 3.5 Third quartile: 4.8 Interquartile Range: 1.3 Outliers: 0.8 1.1 1.3 1.3 1.3 1.3 1.3 1.3

(32)

32 Self-efficacy 5.8,5.4,4.3,4.1,6.5,4.8,4,5.3,3.8,6.5,6.1,5.1,5.6,5.4,5.9,6.1,5.9,5.3,3.9,3,3.6,6.4,3.9,5,5.5,4.4 ,2.5,4.3,3.1,4.9,6.9,4.5,4.9,5.1,4,0,4.6,0.5,3.4,5.1,5.5,5.4,5.5,5.1,5.9,5.9,6,4.6,6.1,4.3,5.3,6, 5.3,2.5,4.6,3.8,5.9,4.4,3.6,4.8,5.6,3.9,4.6,6.6,5.6,7,4.3,5.4,5.8,2.6,4.8,4.9,5.8,4,6.3,5.6,4.1,4 ,6.1,5.6,5,5.5,1.4,4.4,5.5,4.6,6.5,3.6,6.1,5.4,4.4,5.1,4.9,0,5.8,3.1,6,5.6,5.1,5.9,5.5,5,6.3,5.4, 4.6,5.1,5.6,4.9,6,5.8,4.9,5.1,4.3,4.6,4.9,4.9,5.5,5,4.8,5.8,5.3,5.5,4.6 Population size: 123 Median: 5.1 Minimum: 0 Maximum: 7 First quartile: 4.4 Third quartile: 5.6 Interquartile Range: 1.2 Outliers: 0 0 0.5 1.4 2.5 2.5 2.6

(33)

33 Activity score 642,2341,1043,1510,1090,1173,,1013,621,1457,457,1985,1694,1255,670,1254,1297,440,7 27,,700,1977,893,1149,1605,784,805,788,1147,1257,1482,1301,862,1717,1569,651,676,4 51,520,768,839,1317,860,1350,1254,1334,1052,1066,2131,2160,2068,682,,701,1413,683, 889,836,,1472,1138,1325,716,1862,2415,408,2451,1403,1330,,1613,692,1922,718,1021,,7 38,1296,1016,695,1364,1459,533,454,1767,1444,1226,1529,1078,729,1986,800,1148,129 0,771,1027,1240,,1296,1399,1543,981,233,1987,1242,744,,1172,,1231,1296,1134,1518,99 0,981,1778,946,1526,517,1784,1433,935,1486,160,,656,1426,1019,,270,1702,690,509,870 ,1295,,,1888,,1830,815,2726,,1809,,1953,,2839,1354,2179,598,693,,2406,244,326,890,118 9,174,2068,1176,2293,,1247,,1295,,187,1020,1728,789,1209,1107,1305,266,1409,16 Population size: 156 Median: 1172.5 Minimum: 16 Maximum: 2839 First quartile: 750 Third quartile: 1479.5 Interquartile Range: 729.5 Outliers: 2839 2726 Quiz score 4,8.3,7.2,7.5,8,7.6,6.1,8.5,6.4,8.6,0,7.1,5,7.2,5.6,7.8,7.5,7.4,7.9,0,7.2,7.9,0,6.8,6.2,6.9,5.8,5 .8,8.6,5.5,9.4,7.7,6.6,4.5,7,5.4,8.1,1.,0,5.3,5.2,7.4,7.5,7.3,8.6,3.9,8.2,7.1,7.6,6.8,7,7.6,7.9,4. 3,5.3,2.3,0,6,3.8,7.3,8.5,7.1,8.6,9.5,7.1,5.2,7.4,7.6,8.5,7,6.5,5,7.5,7.5,6,7,6,6,6,6.5,8,6,5.5,0

(34)

34 ,7.5,5.5,6.5,6,0,7.5,7,6,7.5,7,6,7,8,8.5,6,4,5.5,6.5,0,5.5,6,5,6.5,7.5,6.5,6,7.5,5.5,6.5,6.5,5.5, 4,5,6,0,6.5,6,5.5,6.5,4.7,8.1,3.6,8.7,6.8,6.8,7.1,8.7,7.9,2.3,6.3,7.1,0,7.1,4.4,8.4,6,0,8.1,7.4,4 .7,3.1,8.4,8.1,7.1,6.8,5.5,8.7,1.,7.1,6.8,4.7,7.6,8.4,9.5,6.3,8.7,6.6,6.8,6.3,6.3,6,7.4,0,9.5,8.7, 4.7,6.3,3.4,7.9,5.8,8.4,4.2,6.3 Population size: 177 Median: 6.5 Minimum: 0 Maximum: 9.5 First quartile: 5.5 Third quartile: 7.5 Interquartile Range: 2 Outliers: 0 0 0 0 0 0 0 0 0 0 0 0 1. 1. 2.3 2.3

Appendix C - Activity score

All activities – first course

(35)

35 Removed all rows after 1st of December 2015 : 1.106 records. This way only records relevant to first half of course are secured.

Removed activity types (1.303 records) attempted, attended, interacted, launched and scored because focus is on activities (79.503 records)

Removed four outliers Descriptives

Descriptive Statistics

Grand Total

Valid

69

Missing

1

Mean

1152

Median

1079

Std. Deviation

515.5

Minimum

408.0

Maximum

2451

20th percentile

682.0

40th percentile

889.0

60th percentile

1254

80th percentile

1510

Distribution Plot

Mean 1152 / SD 516

(36)

36

Score

values

1 <103

2

104 - 619

3

620 - 1152

4

1153 - 1669

5

1170 - 2185

6 >

2186

Percentiles 5 groups

1

1 - 682

2

683 - 889

3

889 - 1254

4

1255 - 1510

5 >1510

(37)

37 Scores first course

Activity score Activity score Leren - normdist Activity score Leren - perc5 647 3 1 642 3 1 2341 6 5 1043 3 3 1510 4 4 1090 3 3 1173 4 3 1013 3 3 621 3 1 1079 3 3 1457 4 4 457 2 1 1985 5 5 552 2 1 1694 5 5 1255 4 4 670 3 1 631 3 1 1254 4 3 1297 4 4 440 2 1 727 3 2 700 3 2 1977 5 5 893 3 3 1149 3 3 1605 4 5 784 3 2 805 3 2 788 3 2 1147 3 3 1257 4 4 1482 4 4 1301 4 4 862 3 2 1717 5 5 1569 4 5 651 3 1

(38)

38 676 3 1 451 2 1 520 2 1 768 3 2 839 3 2 1317 4 4 860 3 2 1350 4 4 1254 4 3 1334 4 4 1052 3 3 1066 3 3 2131 5 5 2160 5 5 2068 5 5 682 3 1 998 3 3 701 3 2 1413 4 4 683 3 2 889 3 2 836 3 2 1472 4 4 1138 3 3 1325 4 4 716 3 2 1862 5 5 2415 6 5 408 2 1 2451 6 5 1403 4 4

All activities – second course

Removed all rows from 2012, 2013 and 2014: 491 records Removed all rows higher than 1st of December 2015 : 4.132

Removed all activity types: attempted, attended, interacted, launched and scored: 2710 records

Removed one student who did not have any activity at all: Removed the outliners:

(39)

39 2017 2261 8 14 16

Standard normal distribution

Score

values

1 <152

2

153 - 443

3

444 - 734

4

735 - 1025

5

1026 - 1316

6 >

1317

Percentiles 5 groups

1

1 - 472

2

473 - 657

3

658 - 833

4

834 - 979

5 >980

(40)

40 Descriptives

Descriptive Statistics

number of activities

Valid

52

Missing

0

Mean

734.0

Std. Deviation

290.0

Minimum

75.00

Maximum

1279

20th percentile

472.6

40th percentile

657.8

60th percentile

833.2

80th percentile

979.0

Distribution Plot

number of activities

number of activities Activity score - normdist Activity score perc5

75 1 1 99 1 1 150 1 1 292 2 1 333 2 1 343 2 1

(41)

41 445 3 1 447 3 1 462 3 1 469 3 1 475 3 1 479 3 2 496 3 2 515 3 2 602 3 2 609 3 2 631 3 2 631 3 2 637 3 2 654 3 2 657 3 2 661 3 3 694 3 3 730 3 3 739 3 3 754 4 3 789 4 3 792 4 3 798 4 3 799 4 3 830 4 3 834 4 4 834 4 4 834 4 4 856 4 4 878 4 4 900 4 4 922 4 4 929 4 4 939 4 4 956 4 4 977 4 4 982 4 5 984 4 5 993 4 5 1038 5 5 1137 5 5

(42)

42 1144 5 5 1148 5 5 1237 5 5 1278 5 5 1279 5 5

(43)

43

All activities third course

Removed all rows higher than 1st of December 2015 : 12.234

Removed all activity types: accessed, built, experiences, interacted, scored: 32.463 records Removed the outliers

3756 4266 4273 3147 3195 3216 3260 3263 18 64 0 0

Standard normal distribution

Score

values

1 <101

2

101 - 458

3

459 - 1175

4

1175 - 1891

5

1891 - 2249

6 >2249

Percentiles 5 groups

1

1 - 520

2

521 - 845

3

846 - 1247

4

1247 - 1839

5 >

1840

(44)

44 Descriptives Descriptive Statistics activity Valid 36 Missing 0 Mean 1175 Std. Deviation 716.2 Minimum 153.0 Maximum 2709 20th percentile 520.0 40th percentile 845.2 60th percentile 1247 80th percentile 1839 Plots Distribution Plot activity

(45)

45 activity Activity score OOP - normdist

Activity score OOP perc5 153 2 1 166 2 1 178 2 1 233 2 1 258 2 1 311 2 1 486 3 1 571 3 2 626 3 2 658 3 2 661 3 2 753 3 2 778 3 2 830 3 2 849 3 3 972 3 3 973 3 3 1122 3 3 1135 3 3 1190 4 3 1236 4 3 1236 4 3 1292 4 4 1361 4 4 1624 4 4 1649 4 4 1726 4 4 1746 4 4 1802 4 4 1864 4 5 1973 5 5 2079 5 5 2188 5 5 2296 6 5 2601 6 5 2709 6 5

(46)

46

Appendix D – Complete dataset predictor variables

avg grade pre-edu Activity score Multiplier Activity score Self-regulation Self-efficacy Quiz score Final grade . 642 1 642 4.9 5.8 4 3 . 2341 1 2341 4 5.4 8.3 8.5 6.9 1043 1 1043 2.7 4.3 7.2 8 6.4 1510 1 1510 5.6 4.1 7.5 6.5 6.4 1090 1 1090 4.7 6.5 8 6 6.4 1173 1 1173 3.8 4.8 7.6 7 . . 1 . 4.6 4 6.1 2.5 6.4 1013 1 1013 4.2 5.3 8.5 7 6.0 621 1 621 2.3 3.8 6.4 7 6.3 1457 1 1457 5.8 6.5 8.6 8 . 457 1 457 4 6.1 0 1 6.8 1985 1 1985 3.8 5.1 7.1 6.5 6.3 1694 1 1694 2 5.6 5 3 7.3 1255 1 1255 4.1 5.4 7.2 8.5 7.5 670 1 670 3.3 5.9 5.6 7 6.4 1254 1 1254 4.3 6.1 7.8 8 6.5 1297 1 1297 3.5 5.9 7.5 0 7.0 440 1 440 4.6 5.3 7.4 6 6.7 727 1 727 4.1 3.9 7.9 7.5 6.3 . 1 . 3.5 3 0 1 7.0 700 1 700 4.4 3.6 7.2 6.5 7.1 1977 1 1977 4.7 6.4 7.9 8 6.7 893 1 893 3.8 3.9 0 1 6.9 1149 1 1149 3.8 5 6.8 8 6.3 1605 1 1605 3.4 5.5 6.2 7.5 7.0 784 1 784 4.4 4.4 6.9 6.5 7.1 805 1 805 3.3 2.5 5.8 7.5 7.1 788 1 788 0.8 4.3 5.8 6.5 7.2 1147 1 1147 4.8 3.1 8.6 9 6.6 1257 1 1257 4.8 4.9 5.5 7 6.5 1482 1 1482 4.8 6.9 9.4 9.5 6.2 1301 1 1301 3.1 4.5 7.7 7.5 7.1 862 1 862 3.7 4.9 6.6 5 . 1717 1 1717 4.8 5.1 4.5 7 6.5 1569 1 1569 2.6 4 7 2.5

(47)

47 6.4 651 1 651 1.3 0 5.4 6 6.8 676 1 676 3.9 4.6 8.1 6.5 8.6 451 1 451 1.3 0.5 10 9 . 520 1 520 4.2 3.4 0 1 6.3 768 1 768 5.1 5.1 5.3 6.5 7.0 839 1 839 5.3 5.5 5.2 7.5 7.1 1317 1 1317 3.3 5.4 7.4 7 7.3 860 1 860 4.8 5.5 7.5 7.5 6.9 1350 1 1350 4.4 5.1 7.3 7.5 7.2 1254 1 1254 3.9 5.9 8.6 8.5 8.2 1334 1 1334 1.1 5.9 3.9 5 8.1 1052 1 1052 4.2 6 8.2 8.5 8.0 1066 1 1066 1.9 4.6 7.1 9 6.7 2131 1 2131 4.3 6.1 7.6 8 6.7 2160 1 2160 4.4 4.3 6.8 8 8.6 2068 1 2068 4 5.3 7 8.5 6.9 682 1 682 3.8 6 7.6 7 7.4 . 1 . 2.8 5.3 7.9 7.5 6.9 701 1 701 1.3 2.5 4.3 3 7.5 1413 1 1413 3.4 4.6 5.3 8 6.6 683 1 683 4.1 3.8 2.3 1.5 . 889 1 889 4.7 5.9 0 1 . 836 1 836 4.9 4.4 6 8 6.4 . 1 . 4 3.6 3.8 4.5 7.0 1472 1 1472 4.7 4.8 7.3 7 7.2 1138 1 1138 4.9 5.6 8.5 3 6.3 1325 1 1325 4.8 3.9 7.1 6 7.6 716 1 716 2.7 4.6 8.6 8 9.0 1862 1 1862 4.9 6.6 9.5 9.5 . 2415 1 2415 4.9 5.6 7.1 7.5 . 408 1 408 5.4 7 5.2 3 6.7 2451 1 2451 4.2 4.3 7.4 7.5 . 1403 1 1403 4.3 5.4 7.6 8 6.7 856 1,5539474 1330 5.7 5.8 8.5 5.4 . . 1,5539474 . 5.2 2.6 7 5.1 . 1038 1,5539474 1613 3.2 4.8 6.5 5.4 6.3 445 1,5539474 692 3.7 4.9 5 4.2 6.3 1237 1,5539474 1922 5.9 5.8 7.5 5.4 6.3 462 1,5539474 718 3 4 7.5 6.7 6.3 657 1,5539474 1021 4.3 6.3 6 6.8

(48)

48 7.6 . 1,5539474 . 5 5.6 7 8.2 . 475 1,5539474 738 2.3 4.1 6 4.7 . 834 1,5539474 1296 4 4 6 6.7 6.7 654 1,5539474 1016 3.9 6.1 6 6.6 7.0 447 1,5539474 695 4.8 5.6 6.5 6.8 6.1 878 1,5539474 1364 1.3 5 8 5.4 6.8 939 1,5539474 1459 4.9 5.5 6 6.3 6.3 343 1,5539474 533 1.3 1.4 5.5 3.8 . 292 1,5539474 454 4.2 4.4 0 1 7.0 1137 1,5539474 1767 4.4 5.5 7.5 7.9 6.5 929 1,5539474 1444 4 4.6 5.5 3.7 6.8 789 1,5539474 1226 5.9 6.5 6.5 6.8 6.7 984 1,5539474 1529 4.3 3.6 6 6.9 6.1 694 1,5539474 1078 5.8 6.1 0 1.2 7.0 469 1,5539474 729 3.9 5.4 7.5 7.1 6.6 1278 1,5539474 1986 4.6 4.4 7 6.9 6.3 515 1,5539474 800 3.3 5.1 6 6.1 . 739 1,5539474 1148 4.5 4.9 7.5 7 6.8 830 1,5539474 1290 1.3 0 7 7.7 7.0 496 1,5539474 771 4.6 5.8 6 6.8 7.5 661 1,5539474 1027 3.8 3.1 7 7.3 7.2 798 1,5539474 1240 4.8 6 8 7.4 . . 1,5539474 . 5.2 5.6 8.5 4.7 6.2 834 1,5539474 1296 2.2 5.1 6 5.7 6.6 900 1,5539474 1399 4.3 5.9 4 2.7 6.7 993 1,5539474 1543 4.3 5.5 5.5 6.4 7.1 631 1,5539474 981 4.5 5 6.5 6.3 . 150 1,5539474 233 4.8 6.3 0 3.2 . 1279 1,5539474 1987 4 5.4 5.5 6.7 7.6 799 1,5539474 1242 4.3 4.6 6 7.4 6.1 479 1,5539474 744 3.7 5.1 5 3.4 7.0 . 1,5539474 . 4.8 5.6 6.5 4.6 6.7 754 1,5539474 1172 4.3 4.9 7.5 7.1 7.1 . 1,5539474 . 3.4 6 6.5 7.5 . 792 1,5539474 1231 4.9 5.8 6 6 7.3 834 1,5539474 1296 4.8 4.9 7.5 4.7 6.8 730 1,5539474 1134 4.5 5.1 5.5 6.8 6.6 977 1,5539474 1518 3.8 4.3 6.5 4.7 7.1 637 1,5539474 990 1.7 4.6 6.5 7 6.4 631 1,5539474 981 3.9 4.9 5.5 6.5

Referenties

GERELATEERDE DOCUMENTEN

de data waarop een analyse plaatsvindt niet goed is (bijvoorbeeld als je niet meet wat je wilt meten) of de analyse zelf niet klopt (door verkeerde conclusies op basis van de

Welke kansen en risico’s zijn er bij de keuze voor dit doel en deze doelgroep?.?. Welke data heb je nodig om je vragen

Wanneer duidelijk is wat je doel is en welke vragen je wilt beantwoorden, is het de vraag of de data die je daarvoor nodig hebt binnen jouw instelling beschikbaar zijn?. En is wat

Grondslag: Bepaal of ‘toestemming’ nodig is voor (onderdelen van) de verwerking In sommige gevallen moeten de individuen toestemming geven aan de instelling om hen te mogen volgen

Stichting Klasse, een schoolbestuur met 17 openbare basis- en speciaal onderwijs scholen in Gouda, Woerden, Bodegraven, Reeuwijk en Waddinxveen, stelde een versnellingsvraag over

However, the average number of visible Zita comments is 2.675% higher in M4 2020 than in M1, but Figure A.11 shows that at the end of the course, all error types are also

Verder liggen automatische besluitvorming en profilering (zie verder) erg gevoelig. Omdat over de rechtmatigheid de meningen kunnen verschillen, is het wenselijk dat er een

Deze behoeften zijn gerealiseerd in twee ontwerpen die de manier laten zien waarop learning analytics binnen Schooltas kan worden geïmplementeerd, een low en een high cost ontwerp?.