Designing a Self-Assessment Tool for UDL Course Design

(1)

Designing a Self-Assessment Tool

for UDL Course Design

Author:

Evelyne van Oers

Student Number: 11026049

evelyne.van.oers@hotmail.com

Master Information Studies

Track: Information Systems

July 2, 2019

1

st

_Supervisor:

dr. Bert Bredeweg

B.Bredeweg@uva.nl

2

nd

_Supervisor:

dr. Frank Nack

F.M.Nack@uva.nl

(2)

Evelyne van Oers

University of Amsterdam evelyne.van.oers@hotmail.com

ABSTRACT

Universal Design for Learning (UDL) is a framework which de-scribes course design principles for inclusive education; however, teachers struggle to design courses and introduce new practices in education. A big challenge encountered while trying to solve this problem is that users (teachers) should be provided with informa-tion that is not outdated. Technology and tools are developed at a rapid pace, not only in education but also in other domains; in such contexts, solutions with static information -such as an immutable list of examples- will become obsolete the moment new, improved practices and approaches are created.

This paper describes the development of theUDL Assessment Tool. The developed tool is designed around the UDL framework and stays up to date with novel educational practices through de-ploying a database of user generated content (UGC). As a result, the information the tool provides to the user is both in line with the UDL framework, as well as contemporary.

Preliminary results show that an assessment tool that makes use of a UGC database can successfully be used to conduct quality assessments. Additionally, preliminary results indicate that users value the idea of contributing to a UGC database; however, how the UGC database will expand over time and the effect of this expansion on the assessment quality and user satisfaction, will remain topics for further research.

KEYWORDS

Education, Database, Knowledge Base, Assessment, Universal De-sign for Learning, UDL, User Generated Content, UGC

1 INTRODUCTION

Each learner is unique and therefore education should be versatile to support each learner as well as possible; though, the way in which education is generally conducted does not take this notion into account [Meyer et al. 2014]. To deviate from the general ap-proach to education, a teacher has to make use ofemerging practices -that is to say, they must use practices and tools that are not yet fully explored or adopted in the field of education [Veletsianos 2016]. The problem is, however, that teachers struggle with introducing novel technologies and practices in their courses [Anderson 2016]. New emerging technologies and practices are constantly introduced, and have become more and more relevant in the last few years [Veletsianos 2016]. In other words, the amount of emerging tech-nologies and practices is ever growing, and the amount and nature of possibilities within education is constantly changing. This means that the body of information on educational technologies and prac-tices is highly dynamic. As a result, any information system in this domain should contain and provide information dynamically; not doing so would result in the information becoming obsolete the moment an improvement on an educational practice is introduced.

This paper describes the design of theUDL Assessment Tool, an assessment tool built around the UDL framework. The target user group is (student)teachers and UDL experts that wish to create courses that comply with the UDL framework. This framework describes course design principles for inclusive education. Two main goals exist for the development of this tool:

(1) The tool should help teachers assess their course onUDL design principles, with as goal to stimulate teachers to create inclusive education.

(2) The tool should support the growth of its information data-base over time, such that assessments can be conducted with up-to-date and relevant information. To achieve this, the tool will contain a database of user generated content (UGC), such that teachers can share new emerging practices. Regarding the first point, this paper attempts to answer the question whether the UDL Assessment Tool can be successfully used to assess courses along the UDL design principles. Regarding the second point, this paper discusses the steps taken to design and implement a database that grows from UGC.

The next section provides background information on the UDL framework, UGC, and the design principles used to evaluate the UDL Assessment Tool. Section 3 describes the context in which the UDL Assessment Tool was developed. Section 4 describes how the design principles have been incorporated into the tool. Section 5 and 6 describe the experiments conducted and the results obtained. Discussion of the results and future research takes place in section 7. The conclusion of this paper is given in section 8.

2 RELATED RESEARCH

2.1 UDL

The basis for this project is the Universal Design for Learning (UDL) framework. The UDL concept was developed when the emergence of digital tools offered new flexibility in education [Meyer et al. 2014].

Flexibility and variation in the provision of learning material is a necessity in educational programs, as this ensures that stu-dents from many different backgrounds can optimally gain new knowledge. Although the UDL framework was initially developed specifically to make education more accessible for learners with disabilities, recent research includes initiatives of adopting the UDL framework in more general educational contexts [Dean et al. 2017; Rao and Meo 2016].

The UDL framework is built around the assumption that every brain, and therefore every learner, is unique; every learner has dif-ferent preferences and weaknesses. The UDL framework consists of nine course design principles based on this notion, equally divided over three main categories. The nine principles can be found in appendix A.

(3)

2.1.1 UDLnet. Before the UDL Assessment Tool, another research project has been carried out to inspire teachers to use UDL in the classroom. This previous project is called theUDLNet Network. The UDLnet Network is a project that“aspires to bridge the gap between policy and practice by collecting and creating best practices under the framework of UDL” [Riviou et al. 2014]. The project tried to achieve this goal by creating a teacher network where teachers would be empowered to create and share‘good UDL practices’. Although the project discussed in this paper does not share the UDLnet approach of creating a teacher network, the goals of both projects are similar. It was envisioned that the UDLnet approach would eventually lead to more effective use of UDL in education. Whether the approach was successful remains unclear: although papers have announced annual analytical reports in 2015, we are unaware of any publicly available articles on research outcomes [Riviou et al. 2014, 2015].

2.2 Formative and Summative Assessment

During the development of the UDL Assessment Tool, a distinci-ton was made between formative and summative assessment. The following definition is commonly used:

“The main purpose of formative assessment is seen as helping learning, while the main purpose of summative assessment is to provide information about what learn-ing has been achieved at a certain time.” [Dolin et al. 2018]

Norcini et al. [2011] described the following set of criteria for ‘good’ assessment, which are still relevant in newer research [Kibble 2017]:

(1) Validity or Coherence. The assessment measures what it purports to measure.

(2) Reproducibility or Consistency. The assessment results are the same if repeated under similar circumstances. (3) Equivalence. Equivalent scores are given as a result of the

same assessment regardless who carries out the assessment. (4) Feasibility. The assessment is practical, i.e. it does not cost too much time or money to carry out, given the circum-stances and context.

(5) Educational effect. The assessment works as a motivator for learners to prepare for it in a way that it will benefit them from an educational perspective.

(6) Catalytic effect. The assessment results and feedback have educational value and cause a learner to move forward in their learning.

(7) Acceptability. The assessment results appear credible and hold value to those involved.

The criteria apply to both formative and summative assessment, though the importance of each criterium changes depending on which of the two types of assessment is carried out. Norcini et al. [2011] is of the opinion that for formative assessment, educational effect, feasibility, acceptability and catalytic effect are key, whereas equivalence and consistency are of lesser importance. For summa-tive assessment, it is validity, consistency, and equivalence that are paramount. Feasibility, acceptability, and educational effect are also important, but to a lesser degree. Regarding a catalytic effect, Norcini et al. [2011] note that it is“desireable but less emphasized in [summative assessment settings]”.

2.2.1 Online Formative Assessment.

Recent literature recognises a shift in education from offline, to online or blended learning [Baleni 2015; Spector et al. 2016]. With this shift, the use of online tools to aid in formative assessment has been more seriously considered in the last few years [Spector et al. 2016]. For the design of online formative assessment exercises, literature identifies ten design principles [Baleni 2015]:

(1) Authenticity. The assessment activity is authentic through being relevant to real life experiences of the learner. (2) Engagement and Support. The assessment activity

pro-vides engagement and support for learners, motivating them to participate.

(3) Sharing of Information. Learners are given opportunities to share information with peers to construct knowledge. (4) Timely Feedback. Learners receive useful, ongoing and

timely feedback.

(5) Clear Rubrics. The assessment activity needs to be accom-panied by analytical and transparent rubrics.

(6) Self-reflection. Learners receive opportunities for self-reflection on their own understanding.

(7) Monitoring of Progress. Learner achievements during as-sessment can be documented and monitored.

(8) Assessment Validity. Evidence of the alignment of learn-ing outcomes with assessment criteria is provided. (9) Multiple roles. The assessment activity involves learners

in multiple assessment roles.

(10) Multiple solutions. The assessment activity is flexible in that it provides room for multiple approaches and solutions.

2.3 User Generated Content

User Generated Content (UGC) describes content which is created by users -rather than by paid professionals- and distributed on platforms such as social media or wikis [Daugherty et al. 2008]. The use of UGC in the context of wikis, hinges on the concept of the “Wisdom of Crowds” [Surowiecki 2005]. Wisdom of crowds describes the idea that a large group of people can help create a quality product by providing small contributions.

A disadvantage of using UGC is that a lot of information is generated by users, and separating good quality information from poor quality information may be a labor-intensive task [Ludwig et al. 2015]. This means that a specific piece of UGC can have many contributing authors and will therefore likely be of relatively high quality; contrarily, chances exist that many pieces of UGC are contributed by single authors and therefore do not benefit from the wisdom of crowds principle.

Another concern is that users should be motivated to create content. Motivations for the creation of UGC differ per user. Three popular reasons that users report for creating content online are the need to belong to a community, the need for self-expression, and the motivation to gain and share knowledge [Daugherty et al. 2008; Matikainen et al. 2015].

3 ORIGINAL ASSESSMENT PROCESS

The instrument that has been developed in this project, the UDL Assessment Tool, is used in the assessment of a course on the basis of the UDL principles. A course is created by ateacher, and consists

(4)

Figure 1: The five steps of a summative UDL assessment process and the supporting tools.

of one or multiplebuilding blocks. Each building block is a course component: a technique or activity which can contribute one or more principles of UDL. This section will describe how this new assessment process changes the assessment process used prior to introduction of the UDL Assessment Tool. Details of the tool design will be elaborated upon in the next section.

During development of the UDL Assessment Tool, a distinction was made between formative and summative assessment processes. A summative assessment is carried out byassessors to grade the quality of a course in an official context. These courses that need to be graded, are submitted by teachers in training. We define a forma-tive assessment process as any course assessment process where the result of evaluation is not recorded officially. The group of users who use the UDL Assessment Tool for formative assessment include teachers, as well as teachers in training. Formative assessment pro-cesses can take many forms, and are often in the shape of peer- or self-evaluation; additionally, in a formative assessment users may use the tool for exploratory purposes, to gain inspiration to create new courses. Summative assessments, in contrast,do have a clear structure. This structure encompasses five steps:

(1) Examination of course material: a teacher in training provides their course material for assessment. This is often in the shape of an online learning environment. Assessors will analyze the provided course material and test for the presence of UDL principles.

(2) Examination of documentation report: the teacher in training has to provide a documentation report along with their course material. Assessors will compare the documen-tation with the provided course material, and will evaluate whether every UDL principle is accounted for in both the documentation and course.

(3) Comparison to score rubric: assessors are in the posses-sion of a score rubric. After the assessors have analyzed the course material and documentation, they will each individu-ally rate the submitted course according to the rubric.

(4) Discussion of rubric score: assessors will report to each other on their assessment. They will discuss their findings and differences of opinion.

(5) Final decision on assessment: when assessors have come to an agreement, the final score for the assessment will be officially documented.

Figure 1 displays each of these process steps and the tools used at each step. The UDL Assessment Tool is introduced in the first and second step of the process. During examination of the course material, assessors can use the UDL Assessment Tool to check for the building blocks used in the course. In the second step of assessment, the assessor can evaluate the course as a selection of building blocks. The evaluation screen will provide the assessor with information visualizations that can aid in evaluating whether and how the course accounts for every UDL principle.

In the original assessment process, no instrument was available to aid in the first step of the process. Assessors had to note the com-ponents they encountered within a course mentally or on paper. In the new assessment process, the components are selected from the building blocks database of the UDL assessment tool (Figure 1 1.a). The selection of building blocks can then directly be evaluated in the second step of the process. This evaluation functionality sub-stitutes the UDL Quick reference card used in the original process (Figure 1 2.a), a static table which provides quick look-up on the UDL principles. The quick reference card can be found in appendix A.

4 TOOL DESIGN

The UDL Assessment Tool should provide a solution to the follow-ing two goals: firstly, the UDL Assessment Tool should be usable for conducting assessments. Section 2.2 elucidated the difference be-tween formative and summative assessment. The UDL Assessment Tool is designed such that it could be used in both situations. Sec-ondly, the information on which the UDL Assessment Tool bases its assessment should stay relevant and up-to-date. This goal is

(5)

approached by introducing a database which takes UGC as input to grow over time.

This section describes how the need to support formative and summative evaluation, as well as the need for a growing information database, are reflected in the design choices for the UDL Assessment Tool. For the rest of this report, ‘user’ will refer to teachers (in training) who wish to create courses that adhere to the UDL design principles.

4.1 UGC Database

The UDL Assessment Tool works withcourses and building blocks. A building block is a generally described educational practice, which may be included in a course. Every building block contributes to fulfilling at least one of the UDL design principles. For example, a course may use instructional videos to provide an alternative to text reading, or a course may not have any deadlines to give a learner more freedom. These are both building blocks, and they are both contained in the building block database. Additionally, each building block has a set of labels associated with it. For example, the building block ‘instructional videos’ would have labels such as ‘instruction’ and ‘video’. As has been noted before, in the domain of emerging educational practices, new practices are constantly introduced; in effect, what is necessary is that new building blocks and labels can be added to the database, without compromising the assessment functionality of the UDL Assessment Tool.

To solve for this issue, it was decided to design the user interac-tion for the UDL Assessment Tool in such a way that the selecinterac-tion of a building block may take at most seven steps. (See figure 2.) In addition, step 5 gives the user direct access to submit content to the databases of the UDL Assessment Tool. This allows the database to grow on UGC, and the user is motivated to create this content because in return they receive an assessment that takes everything in their course into account.

User interaction with the tool will take place in the following way: firstly, a user has an idea of what building block should be added to their course. It can happen that this building block already exists within the database. In this case, the user succeeds at step 1 and moves on to the next building block. If the building block does not yet exist, the user will proceed to step 2: creating a new building block. The user is asked to provide a name, a description, and the UDL design principles this building block belongs to. In step 3, the user is asked to add descriptive labels. It is possible that not all labels for a building block exist, in which case the user can add them in step 4. In step 5, the user can still go back to step 2 and 3 and edit the building block or the labels. Only when the building block is confirmed by the user in step 5, are the new building block and any optional new label added to their respective databases. The building blocks and labels still need to be verified by a moderator before they show up in the UDL Assessment Tool. Verification of the building block in step 6 allows it to show up for users, and it can now be selected by the user in step 7. Moderators are capable of verifying labels at any time: if an unverified label is linked to a verified building block, only the unverified label is invisible to the user; the building block can be used without problem.

Figure 2: The seven steps a user may take to select a building block.

4.2 Assessment Design Principles

To assure that the UDL Assessment Tool is designed to conduct good quality assessments, it is important to establish what ‘good assessment’ is. Section 2.2 put forward two sets of design principles. The first set of principles, provided by Norcini et al. [2011], was used as a basis for both the formative, as well as the summative assessment design criteria. The second set of principles, provided by Baleni [2015], was used only for establishing the formative assessment design criteria. The two sets were combined, leading to the set of 13 design principles for formative assessment tools, displayed in table 1.

As can be seen from the table, the criteriumAssessment Validity is covered by both criterium 1 of the criteria for Good Assessment, as well as criterium 8 of the Online Formal Assessment criteria. Both state that the assessment should guarantee to measure the thing it sets out to measure. Additionally, criterium 6 from both sets of criteria describe the need for aCatalytic Effect: outcome of the assessment should prompt the learner to reflect and move forward in learning.

(6)

Table 1: Criteria for design of the UDL Assessment Tool for formative assessment. GA = Criteria for Good Assessment [Norcini et al. 2011], OFA = Criteria for Online Formal As-sessment [Baleni 2015] Criterium OFA GA 1. Assessment Validity 8 1 2. Feasibility - 4 3. Educational Effect - 5 4. Catalytic Effect 6 6 5. Acceptability - 7 6. Authenticity 1 -7. Engagement and Support 2 -8. Sharing of Information 3 -9. Timely Feedback 4 -10. Clear Rubrics 5 -11. Monitoring of Progress 7 -12. Multiple Roles 9 -13. Multiple Solutions 10

-4.3

Summative Assessment

The tool was developed alongside the Good Assessment principles of Norcini et al. [2011] to ensure it can be used effectively as a summative assessment tool. In a summative assessment, students will not come in contact with the UDL assessment tool. For this reason, theeducational effect that may be caused by the summative assessment will not be caused by the tool. The principles repro-ducibility or consistency and equivalence are dependent on how the assessors use the tool, and are not directly incorporated into its de-sign. However, the way in which the tool computes the assessment score is fixed; that will say, the score of each building block is static and a specific selection of building blocks will always yield the same score. This should contribute to equivalent and reproducible summative assessments. The rest of this section illustrates how the other design principles present themselves in the tool design. 4.3.1 Assessment Validity. Each building block in the database has been reviewed by experts on UDL education and has been assigned to at least one of the nine UDL principles [Meyer et al. 2014]. The course created by the user is scored on each UDL principle by counting the amount of building blocks in the course that belong to that category. It is hypothesized that this approach will measure the presence of each UDL principle in a valid way.

4.3.2 Feasibility. Feasibility hinges on whether the user believes their time and money is worth the evaluation they get in return. To achieve feasibility, steps have been taken to reduce the usage cost. Firstly, the UDL assessment tool bears no financial costs for the user. Secondly, the user interaction is inspired by online shopping websites in order to improve the intuitiveness of use: users can add building blocks to their course similar to adding products to an online shopping cart. (See appendix B.) By diminishing the costs for the user in this way, it is hypothesized that the feasibility of using the UDL assessment tool is sufficient.

4.3.3 Catalytic Effect. Feedback after a summative assessment is the responsibility of the assessor. Hence, the UDL assessment tool is

designed to only play a supporting role in causing a catalytic effect through summative assessment. The tool contains an evaluation screen with statistics that give insight into how well a course scores on the UDL principles. The assessor has to interpret these results and can then choose whether they will pass on feedback to the student. Only when the assessor decides to do so, a catalytic effect can be caused.

Figure 3: Evaluation of an assessment.

4.3.4 Acceptability. The UDL assessment tool aims for acceptabil-ity of the results through transparency. Users can see which prin-ciples a building block belongs to, and they can see the effect of adding or removing building blocks on the overall score of the course.

4.4 Formative Assessment

The development of the tool for formative assessment purposes was carried out under guidance of the design principles in table 1. Some of the design principles, such astimely feedback, and sharing of information are contained within the tool explicitly: feedback is updated for each change the user makes to their course, and the user can share their knowledge about UDL and expand the database through uploading their own building blocks. The implementation ofassessment validity, feasibility and acceptability criteria described in section 4.3. The other principles are contained within the tool in a more implicit manner, and whether the tool abides by these principles sufficiently hinges completely on the user experience. This subsection explicates the steps taken during development to include the aforementioned principles, the success of their inclusion is discussed in section 6.

4.4.1 Educational Effect. The building block database contains many concrete examples of things that can be used in a course. It is hypothesized users will see the value of this database and browse it to gain inspiration while designing their course. 4.4.2 Catalytic Effect. To make a catalytic effect likely to occur, the feedback on the evaluation screen is designed to motivate the user to cover all nine UDL principles. This is done through showing the user a progress bar, which gives them insight in how many principles they still have to cover before their course covers all UDL principles.

(7)

Figure 4: Progress bar to encourage user.

4.4.3 Authenticity. To make the tool as authentic as possible, each building block in the database is a practice used in education. Addi-tionally, because users can contribute to the database by proposing their own building blocks, over time the authenticity of the tool will increase.

4.4.4 Clear Rubrics. As can be seen from figure 1, the UDL as-sessment tool is separated from the score rubric in the asas-sessment process (step 3: Comparison to score rubric). This design decision was made to generalize the tool so that it could be used in different UDL-focussed courses, as rubrics may differ per course. However, the toolis constructed around the nine UDL design principles, which are used as criterion to measure course quality with respect to UDL. In other words, a user will receive feedback on how many of the UDL principles are included in their course. To make the presence of the UDL design principles as measure more prominent, two design decisions were made: firstly, it is possible for users to fil-ter the database on each principle (figure 5); secondly, each building block in the database is color-coded per category (figure 5, figure 6), and labeled with its corresponding UDL principle.

Figure 5: Filter and color-coding to make UDL principles prominent.

4.4.5 Monitoring of Progress. Progress during the creation of a course can be monitored by the user through the use of the evalua-tion screen. (See figure 3.) Each time a user edits their course by selecting or removing building blocks, changes to the course are saved and will be displayed when the user returns to the website. The tool does not contain features that will allow a teacher or coach to see an overview of the user progress. However, the tool makes it possible for the user to download score graphs in the evaluation screen, and can thus be used to enhance documentation reports. 4.4.6 Multiple Roles. The UDL Assessment Tool is designed with general purpose in mind. The aim during the design of the tool was

Figure 6: The colours of the three UDL principle categories as established by [CAST 2012].

to allow for the following ways in which the tool can be used for formative assessment:

(1) Assessment for ideation. The building block database can be browsed during the ideation phase of a design process, to gain inspiration and an insight into the balance of a course before building it.

(2) Self evaluation. The tool can be used on its own as self-assessment tool, allowing for the user to retrieve feedback without input from others.

(3) Peer to peer feedback. The tool can also be used in peer to peer sessions. A user can assess a course from another user and give them feedback in person.

(4) Assessment as checklist. The tool is available at all times and can be used as a quick checklist to assess the constructed course before it is handed in or shared for public use. 4.4.7 Multiple Solutions. A course can contain a maximum score of 9 out of 9 UDL principles. For every UDL principle, there are multiple building blocks in the database. This means that users can select different building blocks and design their own course and still obtain a maximum score.

4.4.8 Engagement and Support. The UDL Assessment Tool is flex-ible in that it can be used in more than one way and at the pace of the user. It is assumed that this flexibility will motivate users to work with the tool.

5 METHOD

To test whether the UDL Assessment Tool successfully incorpo-rates the design principles described in the previous sections, three moments of evaluation were organized:

(1) Usage experiment for summative assessment (2) Evaluation survey for summative assessment (3) Evaluation survey for formative assessment

The usage experiment for summative assessment has been con-ducted with three assessors, to test forequivalence and consistency. The former is tested through comparing the assessment results of the assessors among each other, while the latter is tested through in-troducing two evaluation moments and comparing the assessment results produced by one assessor at the two different moments. Dur-ing the experiment, the assessors each used the UDL Assessment Tool to evaluate the same three courses at two different occasions.

(8)

First, assessors got introduced to the usage of the UDL Assessment Tool. After a demonstration of the tool core functionality, assessors were asked to evaluate each of the three courses individually for the first time. The assessment outcome for each course was docu-mented. Before the second evaluation moment took place, a 24 hour interval was introduced in which the assessors were not allowed to use the tool nor discuss tool results with the other assessors. During the second evaluation moment, the assessors carried out the evaluations of the same three courses for a second time.

The same three assessors have been asked to answer the evalua-tion survey for summative assessment after the usage experiment has been concluded. Considering the size of this test group, results of this usage experiment and survey will only be illustrative.

The evaluation survey for formative assessment has been con-ducted with a group of 22 student teachers from the Dutch uni-versity of applied sciences NHL Stenden. The group consists of 10 bachelor students and 12 master students. Each of the participants follows a course in which they need to design an online course that adheres to the principles of UDL. The participants were asked to use the UDL Assessment Tool for formative self-evaluation. After evaluating the design of their online course, participants were asked to fill out a survey.

Table 2 shows which principles are tested by which evaluations. Both surveys in this experiment make use of 5-point Likert scale questions. The survey questions and the corresponding design principles for which they test, can be found in appendix C and D.

Table 2: All assessment design principles and the way they are evaluated.

Criterium Evaluation moment 1. Assessment Validity 2, 3 2. Feasibility 2, 3 3. Educational Effect 3 4. Catalytic Effect 3 5. Acceptability 2, 3 6. Authenticity 3 7. Engagement and Support 3 8. Sharing of Information n/a 9. Timely Feedback n/a 10. Clear Rubrics 3 11. Monitoring of Progress 3 12. Multiple Roles 3 13. Multiple Solutions 3 14. Reproducibility or Consistency 1 15. Equivalence 1

As can be seen from table 2, catalytic effect is only measured in context of formative use of the tool. This is the result of time constraints: testing for catalytic effect would require students to resubmit their course after having received a result of summative assessments carried out with the use of the UDL Assessment Tool. Since there will be no second summative assessment opportunity for participating test subjects before the publication of this paper, measuring the catalytic effect of the tool in context of summative assessment would be a topic for future research.

6 RESULTS

6.1 Usage experiment for summative

assessment

For the usage experiment, three assessors have assessed three courses at two points in time. This resulted in a total of 18 as-sessment scores, which can be found in appendix E. This data has been collected to determine how well the UDL Assessment Tool scores on consistency and equivalence.

6.1.1 consistency. Consistency is defined as the achievement of similar assessment results as an assessor carries out the assessment multiple times. To calculate the assessment consistency, the first assessment of a specific case by a specific assessor is compared to the second assessment of the same case by the same assessor. The tool returns the assessment score on the nine UDL principles as a 9-dimensional vector. The first assessment of a specific case can be interpreted as a golden standard: to get the highest consistency, the aim is to get a score for the second assessment moment that is as close as possible to the first assessment moment. Therefore, to determine consistency, the Mean Absolute Deviation (MAD) of the second assessment score is calculated with respect to the score for the first assessment. (See table 3.)

Table 3: Consistency score for summative assessment. MAD = Mean Absolute Deviation, PDE = Principle Detection Error

Case MAD PDE

Assessor 1 Case 1 0.6667 1 Assessor 1 Case 2 0.444 0 Assessor 1 Case 3 0.778 0 Assessor 2 Case 1 0.556 1 Assessor 2 Case 2 1.000 2 Assessor 2 Case 3 0.889 0 Assessor 3 Case 1 0.444 0 Assessor 3 Case 2 0.556 0 Assessor 3 Case 3 1.222 0 Total 0.728 0.444

The results show that the MAD for assessment consistency is on average 0.728. This means, that for each design principle, on aver-age, the assessor deviated 0.728 building blocks from their previous assessment. The principle detection error is calculated by counting how often the detection of a UDL principle was inconsistent over the two assessment moments. In this experiment, the average prin-ciple detection error is 0.444. This means that on average, when an assessor re-assessed a course, the amount of UDL principles detected in the course changed by 0.444. Although, results from the evaluation survey for summative assessment (section 6.2.) indicate that these inconsistencies in score are likely caused by human error, rather than the tool itself.

6.1.2 equivalence. To calculate the equivalence score, a similar approach is used as when the consistency score was calculated. For each assessor, the scores from the two assessment moments were averaged. As a result, each assessor is associated with three scores, one for each case. For each case, the MAD for each case was

(9)

calculated with respect to the mean of the three scores associated with that specific case. The principle detection error was calculated by counting the occasions in which there was a disagreement on the presence of a specific UDL design principle. If the only disagreement comes from an assessor who was not consistent in their detection of a principle (PDE as described for consistency), then the PDE for equivalence is scored 0.5 for that principle.

Table 4: Equivalence score for summative assessment. MAD = Mean Absolute Deviation, PDE = Principle Detection Error

Case MAD PDE Case 1 0.383 1 Case 2 0.556 2.5 Case 3 0.481 0 Total 0.473 1.167

Table 4 shows that the MAD for equivalence is on average 0.473 for this experiment. This means that an assessor on average deviated 0.473 building block per principle from the average of assessments carried out by themselves and colleagues. The average PDE for equivalence is 1.167. This means that on average, when the three assessors assessed one case with the UDL Assessment Tool, they disagreed on the presence of 1.167 principle per case.

6.2 Evaluation survey for summative

assessment

Assessors too, were asked whether use of the UDL Assessment Tool was feasible, and whether its assessment scores were valid and acceptable. All three assessors agreed strongly that the tool produced valid assessments. One assessor agreed that using the tool was feasible, and two agreed strongly that this was the case. Lastly, one assessor agreed the assessment scores produced by the tool were acceptable, and two agreed strongly with this sentiment. In addition after the experiment took place, assessors did notice that the scores they received from the UDL Assessment Tool were similar:

“The results generally coincide, this goes for the consec-utive assessments taken by one assessor as well as when comparing assessment scores produced by multiple as-sessors. As a result, we could quickly concur and agree on the final grade for these students.”

Although assessors generally agreed on the building blocks that were present in or absent from the courses, sometimes there was a substantial disagreement. However, assessors saw this as something positive:

“This instigated a discussion based on the selected build-ing blocks. We could discuss concretely on whether an element really was in a course or not. While discussing our findings, it also became clear how the dataset could be expanded and improved.”

Both of these sentiments can be found in the consistency and equivalence scores of the usage experiment: a course is scored on nine principles, and the equivalence and consistency errors indicate that per course, only one disagreement on whether a certain principle was included, occurred.

6.3 Evaluation survey for formative assessment

The UDL Assessment Tool will be used for the assessment of stu-dents that follow one of two UDL courses: one given at bachelor level and one at master level. Last year, 22 master students and 15 bachelor students have handed in a course that had to be assessed on UDL design principles. With the assumption that the population size is the same this year, this means that the sample for this con-ducted experiment covers 54% of the master students as intended users, and 75% of the bachelor students as intended users.

Table 5 shows the responses of the participants, for questions that correspond to a subset of the design criteria for formative assessment.

Table 5: Survey results for formative assessment. SA = Strongly Agree, A = Agree, U = Undecided, D = Disagree, SD = Strongly Disagree Criterium SA A U D SD 1. Assessment Validity 7 15 0 0 0 2. Feasibility 7 13 2 0 0 3. Educational Effect 9 13 0 0 0 4. Catalytic Effect 8 11 2 1 0 5. Acceptability 11 11 0 0 0 6. Authenticity 4 13 5 0 0 7. Engagement and Support 4 12 5 1 0 10. Clear Rubrics 7 9 6 0 0 11. Monitoring of Progress 3 15 3 1 0 13. Multiple Solutions 1 14 6 1 0

Figure 7 and 8 show the distribution of responses to survey questions for master students and bachelor students respectively. Although not all intended users have responded to the survey, it can already be concluded that at least half of the master students agrees or strongly agrees the UDL Assessment Tool is valid, feasi-ble, acceptafeasi-ble, holds educational value, and can be used to track progress.

For the bachelor students, at least half of the entire group finds the UDL Assessment Tool valid, feasible, acceptable, and believes the tool has an educational and catalytic effect.

Overall, participants report positively on all of the questions regarding the inclusion of the design criteria, with only one negative response for criterium 4, 7, 11 and 13. Calculating the median and mode yields the results in table 6. For this test sample, the median and mode for all criteria are at least 4 out of 5 (‘Agree’) on the Likert scale.

To test whether the UDL Assessment Tool supported usability for multiple roles (criterium 12), participants were asked to imag-ine the ways in which they would use the UDL Assessment Tool. Fifteen students indicated they would use the tool for gaining inspi-ration, nineteen students would use the tool for intermediate self evaluation, ten students would use the tool as a last check before handing in their assignment, and six students would use the tool to ask peers for feedback.

(10)

Figure 7: Survey responses by master students on whether a criterium (C) from table 5 is included.

Figure 8: Survey responses by bachelor students on whether a criterium (C) from table 5 is included.

7 DISCUSSION

Taking into account the results from the previous section, it can be concluded that users generally experience the UDL Assessment Tool adheres to the established design criteria. (See table 2.) According to the interviewed users, the information they gain (criterium 1, 5) is worth the effort they put into selecting or adding building blocks to create their course (criterium 2). Users reported they could use the tool to gain inspiration (criterium 3) because of the authentic examples offered (criterium 6), and monitor their progress

(criterium 11). Overall, users were motivated when using the tool (criterium 7), and were inspired to change things to their designed course after receiving the assessment (criterium 4). The users agreed that in their experience, it was clear how the UDL Design Principles were used as a rubric (criterium 10), and how they could score on these principles for their specific case (criterium 13). When asked how they would use the tool, they reported multiple forms of use (criterium 12). Assessors stated that the tool helped them conduct assessments in a sufficiently consistent and equivalent way

(11)

Table 6: Survey results for formative assessment.

Criterium Median Mode 1. Assessment Validity A A 2. Feasibility A A 3. Educational Effect A A 4. Catalytic Effect A A 5. Acceptability SA - A SA, A 6. Authenticity A A 7. Engagement and Support A A 10. Clear Rubrics A A 11. Monitoring of Progress A A 13. Multiple Solutions A A

(criterium 14, 15). Lastly, the tool provides feedback to the user in an interactive manner (criterium 9) and allows users to share information by allowing them to add new building blocks (criterium 8).

7.1 Limitations and Future Research

Although users were generally satisfied with the quality of the UDL Assessment Tool, a few limitations were encountered.

7.1.1 Incomplete Database. At the beginning of the experiment, it was assumed that the incompleteness of the initial set of principles that were entered into the database, would pose no issues. This assumption was made because users had the opportunity to report any building blocks they used that were not in the database; how-ever, during the formative assessment sessions, users struggled with the fact that the database was still incomplete. Building blocks for certain UDL principles gave some of the users a wrong idea about the scope of the principle, which made the users hesitant to suggest new building blocks for this principle. It is hypothesized that by having UDL experts introduce more versatile building blocks to the database, users who have less knowledge of UDL will feel more confident in suggesting additional building blocks for the principle. 7.1.2 Improved Assessment Validity. It was hypothesized that count-ing the amount of buildcount-ing blocks that contribute to a specific UDL design principle, would return a valid and acceptable assessment score. Results seem to confirm this hypothesis, but some partici-pants noted that intuitively, certain building blocks seemed more important to include than others. A possible approach to account for this intuition could be to introduce weighed scores, and give moderators the option to assign a higher score to building blocks deemed more important than the rest. Research is necessary to determine what is the best approach to decide the weight of contri-bution towards UDL design principles for each building block. 7.1.3 The Tool is Domain-Specific. The UDL Assessment Tool is designed around design principles that are specifically tailored to-wards assessment. From a scientific perspective, this means that to translate the approach of using an UGC database to other do-mains where such an approach is viable, we expect that certain adjustments need to be made: while designing the UDL Assessment Tool, there was the luxury that the user base would be motivated to contribute new content to the database because it is a necessary

step towards receiving the assessment. In other words, users of UDL Assessment Tool are per definition motivated to gain and share knowledge, as this is exactly what happens when an assessment is carried out. As described in section 2.3, this sharing of knowledge is one of the motivators for creating UGC. In other domains, such an inherent motivation may be absent, and extra steps may need to be taken to ensure that users will be motivated to create UGC. 7.1.4 Moderation. Currently, the UDL Assessment Tool uses mod-eration to verify whether certain building blocks or labels are of a high enough quality to be included into the database. The literature in section 2.3 does however warn for large quantities of sub-par UGC. As a result, moderators might experience a work overload. While this might not particularly happen to the UDL Assessment Tool because it caters towards a restricted user group, research in this area is worthwhile when translating to other domains which look at a broader user group.

8 CONCLUSION

This paper stated that the goal when developing the UDL Assess-ment Tool was twofold: the tool had to support the growth of its information database over time, and the tool should be designed to yield good quality assessments. The former objective has not yet been tested at the writing of this paper, but preliminary response from users shows that the option to add up-to-date content is val-ued. It is, however, too early to tell whether this functionality really causes the database of information to stay up to date with quality information; observing the expansion of the database content and the relation this has to user satisfaction and assessment quality is a subject for future research. For the latter objective, test results were collected in both summative as well as formative assessment contexts. According to users that participated in testing the UDL Assessment Tool, the tool adheres sufficiently to all the design prin-ciples defined in section 4.2. Although assessments carried out with the tool were not perfectly consistent and equivalent, assessors re-ported that the disagreements that occurred were focused: it helped them to ask critical questions on why the disagreements occurred, resulting in better quality assessments. In conclusion, this paper is the first step towards a solution for dealing with tasks that require users to stay up-to-date with possibly infinite creative options. This paper specifically looked at the domain of UDL Assessment and emergent practices in education, but it seems worthwhile to attempt to scale this approach to different domains in the future.

REFERENCES

Terry Anderson. 2016. Theories for learning with emerging tech-nologies.Emergence and innovation in digital learning: Founda-tions and applicaFounda-tions (2016), 35–50.

Zwelijongile Gaylard Baleni. 2015. Online formative assessment in higher education: Its pros and cons. Electronic Journal of e-Learning 13, 4 (2015), 228–236.

CAST. 2012. www.cast.org.

Terry Daugherty, Matthew S Eastin, and Laura Bright. 2008. Explor-ing consumer motivations for creatExplor-ing user-generated content. Journal of interactive advertising 8, 2 (2008), 16–25.

Tereza Dean, Anita Lee-Post, and Holly Hapke. 2017. Universal design for learning in teaching large lecture classes.Journal of

(12)

Marketing Education 39, 1 (2017), 5–16.

Jens Dolin, Paul Black, Wynne Harlen, and Andree Tiberghien. 2018. Exploring relations between formative and summative assessment. InTransforming assessment. Springer, 53–80. Jonathan D Kibble. 2017. Best practices in summative assessment. Thomas Ludwig, Christian Reuter, and Volkmar Pipek. 2015. Social

haystack: Dynamic quality assessment of citizen-generated con-tent during emergencies.ACM Transactions on Computer-Human Interaction (TOCHI) 22, 4 (2015), 17.

Janne Tapani Matikainen et al. 2015. Motivations for content gen-eration in social media.Participations: Journal of Audience and Reception Studies (2015).

Anne Meyer, David Howard Rose, and David T Gordon. 2014. Uni-versal design for learning: Theory and practice. CAST Professional Publishing.

John Norcini, Brownell Anderson, Valdes Bollela, Vanessa Burch, Manuel João Costa, Robbert Duvivier, Robert Galbraith, Richard Hays, Athol Kent, Vanessa Perrott, et al. 2011. Criteria for good assessment: consensus statement and recommendations from the Ottawa 2010 Conference.Medical teacher 33, 3 (2011), 206–214. Kavita Rao and Grace Meo. 2016. Using universal design for

learn-ing to design standards-based lessons. SAGE Open 6, 4 (2016), 2158244016680688.

Katerina Riviou, Georgios Kouroupetroglou, and Alan Bruce. 2014. UDLnet: A framework for addressing learner variability. In Pro-ceedings of the International Conference on Universal Learning Design, Paris, Vol. 4. 83–93.

Katerina Riviou, Georgios Kouroupetroglou, Nikolaos Oikonomidis, and Ellinogermaniki Agogi. 2015. A network of peers and prac-tices for addressing Learner Variability: UDLnet.. InAAATE Conf. 32–39.

J Michael Spector, Dirk Ifenthaler, Demetrios Sampson, Joy Lan Yang, Evode Mukama, Amali Warusavitarana, Kulari Lokuge Dona, Koos Eichhorn, Andrew Fluck, Ronghuai Huang, et al. 2016. Technology enhanced formative assessment for 21st century learning. (2016).

James Surowiecki. 2005.The wisdom of crowds. Anchor.

George Veletsianos. 2016. The defining characteristics of emerg-ing technologies and emergemerg-ing practices in digital education. Emergence and innovation in digital learning: Foundations and applications (2016), 3–16.

(13)

A

QUICK REFERENCE CARD

(14)

B

SCREENSHOTS UDL ASSESSMENT TOOL

(15)

C

EVALUATION SURVEY FOR FORMATIVE ASSESSMENT

(1) I am a:

• Master Student • Minor Student

(2) The course I will evaluate is in the following design phase: • Ideation

• Creation • Evaluation

(3) The UDL tool measures in a valid way how well my course scores on the nine principles of UDL.(Assessment Validity) • Strongly Agree

• Agree • Undecided • Disagree • Strongly Disagree

(4) The time I put into working with the UDL Assessment Tool was worth what I got in return. (E.g. feedback, inspiration, etc.)(Feasibility) • Strongly Agree

(5) The UDL Assessment Tool inspires me and gives me new ideas to work with.(Educational Effect) • Strongly Agree

(6) Based on feedback received from the UDL Assessment Tool, I am inclined to make adjustments to my course.(Catalytic Effect) • Strongly Agree

(7) The feedback received from the UDL Assessment Tool would be valuable material for my report.(Acceptability) • Strongly Agree

(8) I recognize the building blocks in the database as daily practices in education.(Authenticity) • Strongly Agree

(9) Working with the UDL Assessment Tool motivates me.(Engagement and Support) • Strongly Agree

(10) The nine principles of UDL can be recognized within the UDL Assessment Tool as test standards.(Clear Rubrics) • Strongly Agree

(16)

(11) The UDL Assessment Tool is useful for monitoring and documenting my progress.(Monitoring of Progress) • Strongly Agree • Agree • Undecided • Disagree • Strongly Disagree

(12) The UDL Assessment Tool sufficiently suits and supports the problem context of my course.(Multiple Solutions) • Strongly Agree

(13) For which formative assessment purposes would you use the UDL Assessment Tool?(Multiple Roles) • To gain inspiration and ideas prior to development

• To ask feedback to peers • For intermediate self evaluation

• As a last check before I hand in my assignment • Other, specifically: ...

D

EVALUATION SURVEY FOR SUMMATIVE ASSESSMENT

(1) The UDL tool measures in a valid way how well the course scores on the nine principles of UDL.(Assessment Validity) • Strongly Agree

(2) The time I put into working with the UDL Assessment Tool was worth what I got in return. (E.g. feedback quality, insight into UDL, etc.)(Feasibility) • Strongly Agree • Agree • Undecided • Disagree • Strongly Disagree

(3) The feedback provided by the UDL Tool reflects well the UDL quality of the course.(Acceptability) • Strongly Agree

(17)

E

USAGE EXPERIMENT RESULTS

Table 7: Assessment scores on each of the 9 UDL Principles (P) of assessor 1.

Case and assessment moment P1 P2 P3 P4 P5 P6 P7 P8 P9 Case 1 assessment 1 3 4 3 0 3 1 2 0 3 Case 1 assessment 2 4 3 4 0 2 0 1 0 3 Case 2 assessment 1 4 5 3 1 5 1 3 1 3 Case 2 assessment 2 6 4 3 1 5 1 3 2 3 Case 3 assessment 1 4 4 2 1 6 1 4 1 3 Case 3 assessment 2 3 3 4 1 4 1 3 1 3