Automatic formation of satisfactory study groups

(1)

Bachelor Informatica

Automatic formation of

satis-factory study groups

Brian Groskamp

June 16, 2020

Inf

orma

tica

—

Universiteit

v

an

Amsterd

am

(2)

(3)

Abstract

Group-based learning has been an important part of education for decades due to the academic benefits associated with teamwork and the lower overhead for teachers compared with individual learning. Considerable research indicates that forming satisfactory study groups is a difficult challenge. Current group formation techniques are either survey-based or random-based. Survey-based techniques require manual labor by both the teaching team and the students, while the random-based ones rarely yield satisfactory study groups. We investigate a hybrid approach, where we leverage historical data to automate the group formation process. We align existing definitions for satisfactory study groups with this hybrid approach and introduce a Group Composition Score (GCS). The GCS factors in individual student performance and peer familiarity. We use the GCS to formulate a fitness function when encapsulating the group formation problem as a genetic algorithm. Our framework is successful at optimizing GCS values within reasonable time. Although the exponential function for peer familiarity delivered promising results, no positive relation could be found between GCS values and student grades. More research is needed to find the familiarity function which would render GCS a good predictor of group performance.

(4)

(5)

2.2 Genetic Algorithms . . . 10 2.2.1 Initial Population . . . 10 2.2.2 Fitness Function . . . 10 2.2.3 Selection . . . 10 2.2.4 Crossover . . . 10 2.2.5 Mutation . . . 11 3 Related work 13 3.1 Student Attributes . . . 13 3.1.1 Knowledge . . . 13 3.1.2 Personality Traits . . . 13 3.1.3 Social Interaction . . . 13 3.1.4 Team Role . . . 14 3.2 Techniques . . . 14 3.2.1 Problem Encapsulation . . . 14 3.2.2 Student Representation . . . 14 3.2.3 Fitness Function . . . 14

4 Defining Satisfactory Study Groups 15 4.1 Current Satisfactory Study Group Definitions . . . 15

4.2 Goodness of Heterogeneity . . . 16

4.3 Group Composition Score . . . 16

5 Forming Satisfactory Study Groups 19 5.1 Input Data . . . 19

5.2 Genetic Algorithm . . . 20

5.2.1 Initialization of Solution Population . . . 20

5.2.2 Solution Evaluation . . . 21

5.2.3 Termination Condition . . . 21

5.2.4 Selection, Crossover and Mutation . . . 21

5.3 Satisfactory Indicator . . . 21

(6)

6 Experimental setup 23

6.1 Data . . . 23

6.1.1 Grades . . . 23

6.1.2 Study Groups . . . 23

6.2 Proof of Concept . . . 24

6.2.1 Selecting an appropriate Genetic Algorithm . . . 24

6.2.2 Parameter tuning . . . 26

6.3 Experiments . . . 27

6.3.1 Effectiveness of the Framework . . . 28

6.4 Runtime Performance of the Framework . . . 28

6.5 Comparison with GH . . . 28

7 Results 31 7.1 Effectiveness of the Framework . . . 31

7.2 Runtime Performance of the Framework . . . 32

7.3 Comparison with GH . . . 32

8 Discussion 35 8.1 Important factors when creating satisfactory study groups . . . 35

8.2 Accuracy in real-world satisfactory study group formation . . . 35

8.3 Devising a System for Real-world Satisfactory Study Group Formation . . . 36

8.4 Threats to Validity . . . 36

8.5 Ethical Aspects . . . 36

8.6 Future work . . . 37

8.6.1 Evaluate the Student Satisfaction of GCS . . . 37

8.6.2 Re-evaluate the Framework in different Setting . . . 37

8.6.3 Relationship between Peer Familiarity and Group Performance . . . 37

8.6.4 Dynamic Satisfactory Indicator . . . 37

8.6.5 Hyperparameter optimization . . . 37

(7)

CHAPTER 1 Introduction

Group-based learning has become increasingly prevalent in higher education in the last decades, a change stimulated by students, companies and educational institutions. Group work aids students by improving their self-confidence and teaches them highly valued competences such as communication, problem-solving, leadership and self-management. Prospective employers rely more and more on teamwork among their employees to increase productivity and students skilled in this therefore better stand out in the job market. In addition to increasing employability of students, group-based learning also lowers the workload on academic instructors, making this form of teaching beneficial for the educational institutions as well [7, 45]. The rising use of educational technology has made collaborative learning easier than ever by facilitating simpler ways for students to communicate with each other and by making it possible to remotely work together on the same assignment [13], a development that further helped spread the use of group assignments.

Previous studies and surveys have shown that students do see clear benefits associated with this type of assessment. However, many students also report having had negative experiences with group-based learning. The composition of the group is a significant factor in student satisfaction, with both the best and the worst aspects about group work reported by students being related to the group composition itself and not to the concept of group-based learning. A large part of students reports appreciating collaborating on a shared goal with students with different views. A vast majority of students reports that workload related aspects such as an uneven distribution of the work, difficulties aligning the agendas of all members and differences in work ethic are the primary reasons for disliking group work [4, 5, 33].

1.1 Problem Statement

Although several methods for group formation have been studied in the past and numerous important attributes have been identified, the majority of existing research has been focused on a theoretical approach, heavily based on psychology literature in a search for the best possible study groups. Many of the attributes considered by these studies, such as calendars, personality traits, preferred learning styles and social skills, rely on the results of students sharing more data and filling in surveys and personality tests. While positive results have been achieved using these methods, this kind of data is generally not available in real-world Educational Technology (EdTech) systems and extra effort from both the students and the instructors would therefore be required before groups could be made.

As a result of this, these methods are generally not used in education. Most educational institutions and EdTech platforms currently use one of three possible strategies for group for-mation: random, self selection by the students or selection by the academic instructor. Each strategy has its own noteworthy drawbacks: random group formation has a very low chance of resulting in well balanced groups, self selection can discriminate against lesser socially connected students and can also lead to groups that may perform well socially but not academically, and

(8)

selection by the academic instructor is a very labor intensive process due to the vast number of possible groupings, making it very costly and still highly unlikely to result in optimal or even good groupings [14].

The need for a simpler, automated process that sacrifices some of the quality of the groups for practicability has already been acknowledged in earlier reserach. Small scale experiments have been conducted where groups for a single course were formed based on just the student grades for prerequisite courses [3, 46]. While these studies required less student attributes, they did rely on the a-priori knowledge of which previous courses would be good indicators of performance for the course for which groups should be formed.

1.2 Research Question

A generic approach that is applicable to any course and does not require information that is not already available is needed. We therefore extend earlier work that used a genetic algorithm and a proposed group-quality metric to form study groups with the constraint that only student data which is already readily available in real-world educational technology systems can be used. In doing so, we attempt to answer the following research question:

”How can we devise a system that takes in a set of real-world data about students and outputs satisfactory study groups?”

To answer this research question, we answer the following sub questions:

(RQ1): What defines a satisfactory study group?

(RQ2): Which factors are important in the creation of satisfactory study groups?

(RQ3): What are the limits of the accuracy of a system that uses real-world available data to

form satisfactory study groups?

1.3 Thesis Outline

We start in chapter 2 with some background information on the problem of automatic group formation and genetic algorithms, the type of heuristic algorithm used to answer our research question. In the next chapter, chapter 3, we discuss the findings of related studies into group for-mation and previous approaches to automatic group forfor-mation. Then, in chapter 4, we formulate a definition of a satisfactory study group based on the literature about group processes. A design for a framework to form satisfactory study groups is discussed in chapter 5, after which we detail our implementation of this framework in chapter 6. The results of evaluating the performance and accuracy of this implementation are given in chapter 7. In chapter 8, the threats to validity, ethical implications and suggestions for future work are also discussed. In chapter 9, we end this thesis with a short summary and a conclusion.

(9)

CHAPTER 2 Theoretical background

Previous research studies on group formation have already shown that it is a difficult problem to solve. The number of attributes that have been identified in related research studies is vast, and even for a small course with only a couple dozen students we quickly hit a computational barrier because the number of possible grouping combinations would be too large to fully explore. Simply evaluating all possible options in order to find an optimal or even just near-optimal approach is therefore not feasible. A heuristic approach is thus needed to find suitable solutions.

This chapter details why the search space of group formation is this large and explains genetic algorithms, the type of heuristics used in this study.

2.1 Automatic Group Formation

Collaborative learning in the form of assessed group assignments has been widely used in ed-ucation since the last century because of its significant positive impact on students, both on an academic and on a social level. Earlier research has shown that cooperative learning, when implemented correctly, can help students achieve better academically, can improve the relations between students and leads to numerous other positive outcomes such as improved self-esteem under students [21, 41, 43].

Although group studying generally shows a positive impact, the composition of the group has proven to be of significant importance for both the end result and the student satisfaction. Several important factors were deemed to be influential on these criteria, such as pairing students before forming groups and the heterogeneity of groups [27, 34].

The rising use of computers and the internet in education has greatly simplified collaborative learning and online group assessments are now commonplace in higher education [13]. This shift to online education also provides academic instructors with new possibilities. Whereas study groups previously had to be formed manually by either the instructor or the students themselves, computers now offer the possibility to use student data to use a data-driven and automated approach to forming groups.

While computers have made the process drastically easier and faster, group formation is still a difficult problem due to its vast search space. In a class of n students, the number of possible groups of size g that can be formed is equal to

( n g ) which is equal to ( n! g!· (n − g)! )

For reasonably sized classes, this still results in a total number of possible groups which is still small enough to be constructed on any standard household computer. For example, a class of 50 students has 2.118.760 distinct possible groups of 5 students, a small enough search space to fully be evaluated. However, the problem arises when the best set of groups needs to be found. The total number of possible groupings is equal to

(10)

( n! g!·(n−g)! ) (_n g ) !· ( n! g!·(n−g)!− n g ) !

For the same class of 50 students with 10 groups of 5 students each, this results in roughly 5.02× 1056 _{possible groupings, a search space far too large for any computer to fully explore.}

Heuristics are thus required in order to search for suitable solutions. Although a 2013 study has shown that existing heuristic grouping algorithms are unlikely to find optimal or near-optimal solutions, these heuristic algorithms are still capable of finding better than random groupings [14].

2.2 Genetic Algorithms

Genetic Algorithms (GA) are a type of heuristic algorithm inspired by the mechanics of natural selection and natural genetics, developed by John H. Holland in the 1970s [17]. It aims to mimic biological evolution by constantly picking the best options from a set of possible solutions. GAs have already been proven to be a successful and popular type of optimization algorithm in other complex problems with large search spaces such as image segmentation [29] and route planning [18]. Several previous studies on automatic group formation have already attained positive results using genetic algorithms as well [28]. The five important phases that make up GAs are outlined in this section.

2.2.1 Initial Population

The process starts with a set of random solutions, called a population. Each solution in this population is called a chromosome. A chromosome consists of an ordered set of genes, which represent variables in a solution. A subset of the chromosome consisting of multiple genes is called an allele. A variable to control in this phase is the population size, i.e. the number of random solutions to start with. A larger population introduces more variety to the gene pool, but also needs more computations per round of evolution since more chromosomes need to be evaluated.

2.2.2 Fitness Function

In order for the GA to be able to find a solution to a problem, it must know what a good solution looks like. A fitness function is used for this. This function determines the fitness of a chromosome, i.e. it scores how well a given solution solves the problem, and is unique to the problem that needs to be solved. The optimal solution to the problem has the highest possible fitness value. Not unlike biological evolution, the survival chance of a chromosome is positively related to its fitness, and fitter chromosomes has a larger chance at reproducing and surviving.

2.2.3 Selection

In the selection phase, pairs of the fittest individuals are picked as parents so they can pass on their genes to the next generation. Fitter chromosomes have a larger chance to be selected for reproduction. Several algorithms could be used for selection, such as Roulette Wheel Selection or Tournament Selection [50]. The choice of selection algorithm heavily influences the offspring and determines whether it is created by pairing for example only the best chromosomes with each other, leading to a faster convergence towards a local maximum, or to pair good chromosomes with bad ones to create more diversity and slow down convergence to decrease the chance of getting stuck in a suboptimal local maximum.

2.2.4 Crossover

During the crossover phase, one or more crossover points are selected at random for a pair of parents. Numerous crossover algorithms exist, such as Partially Mapped Crossover, Order

(11)

Crossover and Position Based Crossover, but all of them take a pair of parents and use the genes of the parents to create offspring, two new chromosomes that share genes with both parents [37]. The offspring is then added to the population. One can experiment with both the crossover algorithm used as well as the crossover rate, i.e. the chance that two chromosomes can create offspring. A lower crossover rate reduces the speed at which the population gets replaces by offspring. A higher crossover rate results in the population getting replaced by offspring at a higher pace.

2.2.5 Mutation

To maintain diversity in the gene pool, the mutation phase randomly changes genes in the offspring. The mutation rate, i.e. the chance that some gene in the offspring changes to a random other value, can be experimented with. Lower mutation rates makes the algorithm converge faster towards a local maximum fitness and terminates the simulation earlier. A higher mutation rates forces the algorithm to explore new directions, which can drastically slow down the process but reduces the risk of terminating too early in a lower local maximum.

(12)

(13)

CHAPTER 3 Related work

Group formation (GF) has proven to be one of the most challenging tasks in collaborative learning and various research studies have been conducted on this problem [28]. This chapter discusses some of the attributes and techniques used in group formation in previous studies and relate these to our new work.

3.1 Student Attributes

A variety of student attributes has been used in previous studies as factors in the group formation. The attributes can be divided into five main categories: knowledge, personality traits, social interaction and team role [28]. A majority of related research studies has focused on forming the best groups theoretically possible and have therefore often used attributes from two or more of these categories. Although this might indeed result in good groupings, the majority of these attributes are not readily available to a teaching team or EdTech system during the group formation process. To comply with our restriction of only using real-world data, we omit most of these factors that have been deemed to be important and only focus on two types of attributes that are readily available: historic grades and group compositions.

Examples of the categories of student attributes identified in previous studies are given in the following subsections.

3.1.1 Knowledge

Knowledge, especially in the form of previously achieved grades or a GPA, has proven to be both a popular and an effective attribute in previous studies. By forming groups with heterogeneous knowledge and skill levels, low and high performing students can be distributed over all groups, making them more balanced and leading to better results [3, 46]. The knowledge of a student can either be estimated by an instructor with the use of toosl like a Likert scale [11, 16] or by using previously attained grades or a GPA [3, 14, 46].

3.1.2 Personality Traits

Personality traits or learning styles are important indicators of how a student prefers to study and how he or she wishes to execute assignments. Various models of learning styles have been in use, such as the Kolbs Learning Style Inventory (LSI), Herrmann Whole Brain Model (HDBI) and Myers-Briggs Type Indicator (MBTI), all of which rely on the student filling in a questionnaire [6].

3.1.3 Social Interaction

Social interaction refers to the social skills involved in group projects, such as participation, social grounding and conversation skills. These attributes significantly influence how group members

(14)

interact with each other and how ideas and knowledge are shared and discussed. This sharing and debating of information is crucial in forming a shared understanding about concepts and tasks and thus has a major role in aligning team members to achieve the same outcome [24, 44].

3.1.4 Team Role

A team role is the way a person is supposed to behave, contribute and interrelate with other team members throughout collaborative team work. Several team role models have been developed, with the one proposed by Belbin being the most widespread and accepted model. All models indicate a positive relationship between team performance and balance level among the roles of its members [7, 49].

3.2 Techniques

Previous studies have acknowledged the large search space that comes with the problem of group formation and while some studies have explored other techniques like K-means [2] or a greedy algorithm [1], a majority of the literature uses a form of heuristic algorithm. Genetic algorithms have especially been prevalent and have shown good results in smaller scale experiments [3, 46]. Besides these positive results, a heuristic algorithm like GA is flexible in use since it does not require any model to be trained, making it relatively easy to experiment with or to later extend our approach.

Because of the importance of this specific type of algorithm for this thesis, this section briefly details how previous research studies have implemented GAs in the group formation process.

3.2.1 Problem Encapsulation

In order to represent group compositions in a genetic algorithm, the groups and students need to be encoded into a chromosome. Previous literature has demonstrated two ways: every gene can represent a student and the position of the gene within the chromosome represents the group [3], or every gene represents a group and the position of the gene represents the student [46]. In both cases, the length of the chromosome is equal to the number of students being considered in the grouping process.

3.2.2 Student Representation

A majority of previous studies on group formation, both those using genetic algorithms and the ones using other algorithms, represent a student S either as a scalar value representing one attribute or as a multi-attribute vector of the student’s attributes S ={A1, A2, ..., Ai₋₂, Ai₋₁, Ai}

where i is the total number of attributes representing a student [3, 11, 14, 46]. The number and type of attributes used varies per study.

3.2.3 Fitness Function

To evaluate the quality of formed groups in any kind of heuristic, a kind of fitness function is required. The fitness functions in the literature are all based on the same principle that qualities within a group should be heterogeneous, but the overall quality of groups should be homogeneous. The exact implementation of this idea differs per study, but calculation of a fitness per group can be as simple as a mean of the scalar attributes representing each member [3] or the Euclidean distance between student attribute-vectors [46] to a newly introduced metric for good group heterogeneity [11].

Another simple and interesting approach is to first classify the members of a group as either poor, moderate or good students. The number of students belonging to every category can then be counted and the fitness can be calculated based on the ratio between these three types of students [3].

(15)

CHAPTER 4 Defining Satisfactory Study Groups

Over the past five decades, Collaborative Learning (CL) has emerged as a predominant way of teaching and assessing students in all levels of education, making it a fertile area of study. A large number of research studies executed by numerous different researchers from different backgrounds has validated the benefits associated with CL across different settings and countries, with research participants varying widely as to cultural background, economic class, education level and gender. Despite this large body of research validating the positive effects of a collaborative approach to learning, no clear consensus has been reached about the definition of CL. There is a wide variety of uses of the term inside each academic field and even academics in the same field might not agree on a shared interpretation. On top of that, many teachers hold their own unique view about how CL should be organized, making it a difficult term to define [10, 21, 25].

In spite of this lack of a universally agreed upon definition, the overwhelmingly positive impact CL has on the students and their work has lead to a wide range of research studies about teamwork in education. These studies have had various angles of approach, with some focusing on students perception of face-to-face group work and their perceived strategies for improving group work [4, 27, 33], while others focused on finding variables that contribute to positive student attitudes towards teamwork in general [5, 36] or on variables contributing to positive academic end results [27, 48]. This diversity in approaches has lead to numerous attributes being identified as important for the performance or satisfaction of a group. However, many of these variables rely on data acquired through separate tests and surveys and using them would violate the constraint of our research question.

In this chapter we first discuss several existing definitions from related studies into automatic group formation. After this, we go into more detail about the method we have chosen. Finally, we detail our extension to this approach.

4.1 Current Satisfactory Study Group Definitions

While an exact and universally agreed upon definition of a satisfactory study group does not exist, earlier studies on automatic group formation have had to form their own definitions. These definitions differ in the number and the type of attributes used. In Table 4.1 we present the approaches of five previous studies. We then explain which approach we decided to extend, and why.

Experiment Size Accounts for Outliers Attributes Used Domain Graf and Bekele [11] 512 students Yes Numerous, with values from 1 to 3 Continuous Zhamri Che Ani et. al. [3] 35 students No One prerequisite grade Discrete

Sukstrienwong [46] 48 students No Two prerequisite grades Continuous

Henry [14] 25 students No Previous Grades, academic interests, availability, peer preferences Continuous Tang and Chen [47] 1000 simulated students No Knowledge, interests and skills inferred from analyzing browsing behavior Discrete

Table 4.1: The experiment size and attributes used in previous research studies on automatic group formation

(16)

After reviewing the above studies we opted to use the Goodness of Heterogeneity (GH) developed by Graf and Berkele [11] as a foundation for our research. This decission is based on the following reasons:

• It uses a vector representation of student attributes which simplifies extensions and makes the approach more versatile than for example the scalar representations used by Zhamri Che Ani et. al [3].

• While all experiments have shown at least moderately successful results, GH is the only one to be tested on a sizeable set of real student data.

• It is the only definition that accounts for outliers in a way that is suggested by psychology literature [42].

4.2 Goodness of Heterogeneity

GH is developed based on the suggestion by Slavin [42] that in a reasonably heterogeneous group, most group members should score roughly halfway between the best and the worst scoring member of a group. This would imply that a group has one low performer, one high performer and other than that mostly average performing students. To calculate the GH, we start by first calculating the Average Distance (AD) using Equation 4.1.

AD = max(S1, S2, ..., Si) + min(S1, S2, ..., Si)

2 (4.1)

Where Si represents the score of student i. This AD value is than used to define Goodness

of Heterogeneity using Equation 4.2.

GH =max(S1, S2, ..., Si)− min(S1, S2, ..., Si) 1 +∑_j|AD − Sj|

(4.2)

It is trivial to show that for completely homogeneous groups this results in a GH of 0. For

GH < 1, heterogeneity is unreasonable and student-scores are at the two extremes, meaning that

there are no average students. Good heterogeneous should have GH > 1, and the higher GH is, the better the heterogeneity is.

4.3 Group Composition Score

Several studies have shown that familiarity among team members reduces uncertainty. Friends or people who have worked together before know what behavior to expect from each other and are more likely to hold positive attitudes towards their team members and to group work in general. It is also suggested that greater familiarity among team members leads to more peer pressure and more pro-social behavior, factors that are positively related to both group performance and satisfaction [15, 27, 40].

This increase in peer pressure is especially important in larger groups. It has been shown that social loafing, the reduced efforts of an individual in a group setting setting than in an individual setting, increases proportionally to group size [22, 26]. When aiming for better group performance and student satisfaction, the importance of familiarity among group members would thus increase with the number of individuals in a group.

GCS extends GH, which only considers the heterogeneity of individual student attributes, by also considering these inter-student relations. We introduce a peerF amiliarityBonus that increases the GH based on the number of inter-student relations in a group as can be seen in Equation 4.3.

GCS = GH + peerF amiliarityBonus(f amiliarityScore) (4.3) where f amiliarityScore is the number of pairs in a group that have collaborated on an assignment in the past and peerF amiliarityBonus is a function returning a value based on this

(17)

The literature contains no quantification of the relationship between peer familiarity and group performance, and it is thus unknown what type of function peerF amiliarityBonus should be. We thus propose three types of relationships based on the literature: exponential, additive and multiplicative.

The exponential relationship is suggested based on the theory that the negative effects of social loafing increase progressively with the size of a group. An exponential relationship would minimally affect the GH for smaller groups where this is not a necessity, but would yield increas-ingly high bonuses for larger groups with a large number of existing inter-student relations.

The additive relationship is suggested based on the same idea: a small bonus would be awarded to the GH for every connection between peers in a group. This would still minimally affect smaller groups, but would linearly add up for larger groups. This would make it a pro-gressively more important factor for larger groups as well.

Finally, to cover the possibility that peer familiarity is a more significant influence than GH for any size of group, a multiplicative relationship is proposed. Here the GH score would be multiplied by the number of peer-connections in a group, which would make peer familiarity the primary component of the GCS.

(18)

(19)

CHAPTER 5 Forming Satisfactory Study Groups

We design a framework to automatically form satisfactory study groups. The framework accepts student performance and peer familiarity attributes and feeds this to a genetic algorithm. The framework outputs study groups. A design of the framework can be seen in Figure 5.1

We design a framework to automatically form groups using the Group Composition Score (GCS) detailed in the previous chapter. The framework takes in historic grades and group compositions and generates and outputs new study groups. Because of the size of the search space and the good results attained with this type of algorithm in previous studies, a genetic algorithm is used. The framework can be divided into seven parts, as can be seen in Figure 5.1. This chapter details all parts in the order in which they are executed.

GENETIC ALGORITHM STUDY GROUPS OUTPUT SELECTION CROSSOVER MUTATION YES NO TERMINATION CONDITION REACHED? SATISFACTORY INDICATOR EVALUATE SOLUTIONS INITIALIZE SOLUTION POPULATION PEER FAMILIARITY ATTRIBUTES STUDENT PERFORMANCE ATTRIBUTES INPUT

Figure 5.1: Design of the framework that uses student performance and peer familiarity attributes to form satisfactory study groups

5.1 Input Data

To calculate the score of a given group, two data sets are required as input: student performance attributes and peer familiarity attributes. The set of student performance attributes should contain one or more features for every student that relate to his or her academic performance, either on a course or on a global level. The set of peer familiarity attributes should be related to

(20)

how much interaction any given pair of students has had. Examples of this could be the number of previous shared (online) group assignments or known friendships between students.

5.2 Genetic Algorithm

A genetic algorithm is used to optimize the score of all groups that need to be formed. Positive results have been achieved using genetic algorithms for group formation in previous studies. On top of that, using a heuristic is also beneficial for the versatility of the framework. It can be dynamically adjusted and/or extended for every run without any overhead that would occur with for example strategies that require a model to be trained. The genetic algorithm consists of several steps that are outlined in this section.

5.2.1 Initialization of Solution Population

As is required in genetic algorithms, a set of random chromosomes representing solutions should be created at the initialization of the process. Solutions to our group formation process must therefore be encoded into chromosomes. For a set of n students, denoted as S ={s1, s2, ..., sn₋₂, sn₋₁, sn}, a chromosome of length n is used in which every gene constitutes of a student as

can be seen in Figure 5.2. Students are represented by the vector Si= (StudentIDi, SP Ai) where StudentIDiis the unique identifier for student i and SP Ai is the vector of student performance

attribute of student i. S1 1 S2 2 S3 3 … Sn-2 n-2 Sn-1 n-1 Sn n

Figure 5.2: Structure of a chromosome. The value in the cell represents the student. The number underneath the cell is the index within the chromosome.

Groups are represented by the alleles in the chromosome. For groups of size r, every allele of

r genes in the chromosome represents a single group. Suppose that for a class of nine students

with three groups of three students each the groups are G1 = {S3, S7, S2}, G2 = {S5, S1, S4}

and G3={S6, S9, S8}. The chromosome would be encoded as chromosomexin Figure 5.3.

For a set of groups with G1 = {S7, S9, S2}, G2 = {S4, S1, S5} and G3 = {S8, S3, S6}, the

chromosome would be structured as chromosomey in Figure 5.3.

3 1 7 2 2 3 5 4 1 5 4 6 6 7 9 8 8 9

}

Group 1

}

Group 2

}

Group 3 Structure of chromosomex 7 1 9 2 2 3 4 4 1 5 5 6 8 7 3 8 6 9

}

Group 1

}

Group 2

}

Group 3 Structure of chromosomey

Figure 5.3: Structures of chromosomex and chromosomey. The number in the cell represents

(21)

5.2.2 Solution Evaluation

Every chromosome in the population consists of r groups and our goal is to balance these groups. We do so by evaluating the quality of each group Gi using the satisfactory indicator explained

later in this chapter. To form form fair groups, the standard deviation of all group scores GSiis

calculated with Equation 5.1.

Standard Deviation = √∑

i(GSi− µ)2

r (5.1)

where GSiis the calculated satisfactory score for group i, µ is the mean calculated satisfactory

score for all groups an r is the number of groups in the chromosome. To balance the groups as much as possible, the standard deviation should be as low as possible and this is thus a minimization problem.

5.2.3 Termination Condition

To halt the group formation process, a termination condition is needed. On triggering this condition, the process will halt and the pool of solutions will be frozen. In order to maximize control over solution quality and execution time, the termination condition will be triggered after a fixed number x of iterations. A positive relation between x, execution time and solution quality exists. This relationship makes it possible to dynamically sacrifice quality for speed or vice versa. A possible extra use case for this would be to offer a course instructor to specify this balance.

5.2.4 Selection, Crossover and Mutation

If the termination condition has not been reached yet, the population should evolve into a next generation and new groups should be formed. The selection procedure selects pairs of chromosomes to create new combinations of study groups using a crossover operator. This should include a repair operator so that the resulting offspring will still be a valid permutation of all students. A mutation operator can then randomly swap two students within a chromosome.

5.3 Satisfactory Indicator

The satisfactory indicator is a function that calculates a score for every study group based on the performance and peer familiarity attributes of the students. It returns integer or float values representing the quality of all study groups that make up a chromosome. These group scores will then be used by the evaluation function to establish the quality of a chromosome as a whole.

5.4 Output Data

On reaching the termination condition, the genetic algorithm is brought to a halt and the chro-mosome pool is frozen. The chrochro-mosome with the highest satisfactory score is selected and the groups encoded into this chromosome are returned as output. All other groups in the solution pool are discarded.

(22)

(23)

CHAPTER 6 Experimental setup

To test the accuracy of the framework design as proposed in the previous chapter, an experiment has been designed in which data from a real-world Educational Technology system is used. This chapter describes the source of the data used in our experiment and outlines all decisions that are made in the implementation of the framework.

6.1 Data

To satisfy our self-imposed restriction of only using real-world data, a data set from Feedback-Fruits, an educational technology provider for tertiary education, is used to verify the results and performance of our framework. The data set consists of a large number of historic group assignments, graded group and individual assignments from a large number of mostly European educational institutions. The company provides a number of online tools aimed at supporting instructors with their courses that can be integrated in the Learning Management System (LMS) of an educational institution. These tools include both individual and group assignments that can be graded by either an instructor or peer student(s). The vast majority of their data is from peer feedback assignments where students graded other students.

FeedbackFruits operates on the European market and thus has to comply with European regulations for data storage and processing. These regulations mandate careful and informed processing of all personal data, which includes student names, contact information but also marks and progress reports. FeedbackFruits therefore only collects and stores the information that is critical for the performance of its applications and nothing more [8]. This has resulted in a data set that is large in size, but shallow in depth with a small number of attributes stored per student.

This data set does however contain the two factors needed to calculate Group Composition Scores: previously attained grades and historical study groups.

6.1.1 Grades

The data set includes student grades and progressions for both individual and group assignments, stored as percentages. Only grades that are reported back to the LMS are considered. For graded group assignments, an individual grade for each group member is stored. The calculated knowledge score of each students is the average recorded grade of the student in the data set. For students that have no previously recorded grades available, a mean grade of 50% is used as knowledge score. Since the vast majority of the data is related to peer feedback, most of the grades are awarded by students and not by instructors.

6.1.2 Study Groups

A large number of historic group assignments is available in the data set, for which not only the grade but also the group composition is known. The data set is limited to interactions that have

(24)

taken place within the platform and we cannot know social relations outside of the Educational Technology platform. But, if two students have been part of the the same study group in the past, we can reasonably make the assumption that they have interacted with each other before and that they know each other. By counting the number of pairs in a group that have worked with each other in the past, we can calculate a familiarity score.

6.2 Proof of Concept

We use the well-established open source Java MOEA framework [31] to implement the GA. The MOEA framework provides fast implementations of an extensive set of multi-objective evolu-tionary algorithms and has been used in numerous other research studies [19, 38, 39], making it highly suitable to test our group formation framework. This section describes how we configure the MOEA framework to run our experiment.

6.2.1 Selecting an appropriate Genetic Algorithm

The MOEA framework provides a wide set of algorithms to be used in optimization, including state-of-the-art algorithms like NSGAIII. As the optimal algorithm depends on the specific prob-lem to solve, three popular algorithms are evaluated: ϵ-MOEA [30], NSGAIII [20] and NSGAII [9].

In order to determine the best algorithm to solve our group formation problem, we define four different problems:

50 students, 5 groups The formation of 5 groups of 10 students each. 50 students, 10 groups The formation of 10 groups of 5 students each. 100 students, 5 groups The formation of 5 groups of 20 students each. 100 students, 10 groups The formation of 10 groups of 10 students each.

These problems are chosen in such a way that we can perform a rudimentary sensitivity analysis: a different number of students leads to a different length of the chromosome in which all students need to be stored and will increase the complexity of the problem. By varying the number of groups that need to be formed within that number of students, we can change the size of the search space of the problem. The more students and the more groups, the bigger the search space.

All three considered algorithms are executed on all four problems. All other input and parameters, such as the student data used as input and the GA population size of a 100, were identical across experiments. The best performing algorithm is the one that is most effective at lowering the fitness value. To account for the stochastic nature of genetic algorithms, every combination of algorithm and problem has been ran a 100 times and the average fitness score has been recorded. The number of iterations has been limited to 7.000 since at this point it is already converging towards a plateau (the difference in fitness values is≈0.001). The results are

(25)

Figure 6.1: eMOEA, NSGAII and NSGAIII fitness values over iterations for the group formation problem of dividing 50 students in 5 groups.

(26)

Figures 6.1, 6.2, 6.3 and 6.4 show that the convergence slows down when the number of students per group increases. What is also interesting to note is that NSGAII, the predecessor to NSGAIII, outperforms its successor on all problems. However, ϵ-MOEA outperforms both NSGAIII and NSGAII on all four problems. ϵ-MOEA is therefore used for all experiments

from hereon.

6.2.2 Parameter tuning

With ϵ-MOEA applied to our group formation problem, three parameters can be tuned: per-mutation rate (PMX), Swap rate (SWX) and population size. Typically, one would use hyper-parameter optimization to tune all three hyper-parameters at a time. Unfortunately, despite our best efforts we could not manage to get this working in our experimental setup. We therefore opt to use a simplified to tuning these parameters.

(27)

The permutation crossover-operator and the swap mutation-operator provide us with the PMX rate and SWX as parameters to tweak. Both variables represent the probability of their respective operator being applied upon a chromosome. They are of influence on the diversity and rate of convergence of the gene pool and optimal values depend on the type of problem on which it is applied. To find approximately good values for our group formation problem, a small experiment has been performed.

The common PMX rates Pr={0.1, 0.2, 0.3, 0.4, 0.45, 0.5, 0.55, 0.6, 0.65, 0.7, 0.8, 0.9, 1.0} and

SWX rates Sr={0.01, 0.02, 0.03, 0.04, 0.05, 0.06, 0.07, 0.08, 0.09, 0.1} are considered [35]. Due to

the large number of possible combinations of these two sets, it is not feasible to brute force all of then for all our four problems. We therefore estimate that both the right PMX rate and SWX rate are somwehere in the middle of their respective sets. A small scale experiment found that

Pr= 0.6 and Sr= 0.05 worked best.

The next and last parameter to tune is the population size. Since a common population size is 100 [12, 23], we will consider 5 values with 100 as the center: 50, 75, 100, 125 and 150. To test these population sizes, we use the same four different problems as used for the algorithm selection.

Every combination of population size and problem is run a 100 times and the average value is displayed in Figure 6.5. Although all population sizes perform almost exactly as well over a large number of iterations, we can see that a population size of 50 scores notably better for all four problems on a smaller number of iterations.

Figure 6.5: Five different population sizes are tested on four group formation problems

6.3 Experiments

The performance and accuracy of the framework are evaluated. The experiments are run using the the proof of concept implementation described in the previous section. The effectiveness, runtime performance and accuracy compared with Goodness of Heterogeneity are measured. The experiments for these measurements are outlined in the following subsections.

(28)

6.3.1 Effectiveness of the Framework

To verify that the framework is indeed able to form balanced groups, we compare the framework against historic data. Different problems will be used than the ones in the last section. While these were valuable to study the algorithms, they do not accurately represent real study groups and are therefore not suitable to measure realistic effectiveness or accuracy.

For each study group size from 4 to 10 students, we first collect historic study group data from real assignments and group them by their number of team members. For every set of study groups of equal team size we then calculate the fitness score using the solution evaluation function from the design. These scores will form our reference to compare the framework with.

For every group size from 4 to 10 students, we now create 2 fake courses using data from real students: one small course, and one large course. The small course has enough students for 10 study groups, and the large course has enough students for 20 study groups. So, for group size 4, the small course would have 40 students that get divided over 10 groups. The large course would have 80 students that get divided over 20 groups. For both the small and the large course, the framework will use 20.000 iterations to form the study groups. After this, the fitness of both courses will be calculated in order to compare it with the reference scores.

6.4 Runtime Performance of the Framework

The runtime of the framework is important for usability in real-world settings: if the forming of satisfactory study groups would take unreasonably long, the chance of it being used is low. We therefore conduct an experiment into the runtime of the problems from subsection 6.2.1. Every problem is executed from from a 1000 simulation-iterations to 20.000 iterations, with incremental steps of a 1000. With the goal of this experiment being to measure the execution time and not the accuracy, we have chosen for a this larger number of iterations to get a better picture of the runtime of the problems.

6.5 Comparison with GH

To compare the accuracy of the Group Composition Score (GCS) with the Goodness of Hetero-geneity (GH), we look for correlations between the two scores and achieved grades for past group assignments. Given that we do not know the type of relationship between peer familiarity and group performance, three GCS implementations are considered:

additive peerFamiliarityBonus implementation awards a bonus equal to 10% of the starting

GH value for every peer connection found.

exponential peerFamiliarityBonus implementation multiplies the GH value by 1.02 for every

peer connection found.

multiplicative peerFamiliarityBonus implementation multiplies the GH value by the

num-ber of peer connections found. The GH is implemented as described in Equation 4.2. The multiplicative factor 1.02 of the exponential implementation has been chosen because experimentation shows that it is the value which gives the best correlation coefficient with a p-value still below 0.05. Further exploration is done into possible multiplicative factors, and it is found that a plateau is reached with p < 0.2 for multiplicative factors≥ 1.15, as can be seen in Figure 6.6 and Table 6.1.

(29)

Figure 6.6: Four different multiplicative factors for the exponential implementation of the Group Composition Score

From visually inspecting Figure 6.6 we can observe that the cluster of points shrinks with every increase in the multiplicative factor. Outliers also appear to be moving further away from the cluster and the y-axis. In Table 6.1 we see the Pearson correlation coefficients and P-values for all four variants of the exponential GCS. The high p-values indicate that the approach may be right, but the amount of data is insufficient.

1.02

1.15

1.5

2.0 Pearson Correlation Coefficient

-0.180

0.02983

0.02982

P-value

2.26674e-15

0.19131

0.19150

Table 6.1: Pearson Correlation Coefficients and P-values for exponential GCS implementations with multiplicative factors 1.02, 1.15, 1.5 and 2.0 (values rounded up to 5 decimals)

We expect grades and the degree of peer familiarity to follow reasonably normal distributions. To look for a correlation between the GH or the GCS implementations and attained grades we therefore use Pearson’s correlation coefficient, since it performs well on normally distributed data [32]. The coefficient is calculated for all four scores and the achieved grades. If GCS is an improvement over GH, the correlation coefficient should be higher.

(30)

(31)

CHAPTER 7 Results

The performance and accuracy of the framework are evaluated using the experiments and imple-mentation outlined in the previous chapter. The first section of this chapter contains the results of comparing the fitness scores achieved by the framework with the fitness scores of random, his-toric study groups. The second section displays the execution time of the framework for multiple problem sizes. The third and final shows the results of comparing the accuracy of the Group Composition Score with Goodness of Heterogeneity.

7.1 Effectiveness of the Framework

Figure 7.1 shows a comparison of the performance of the framework compared to random, historic data. The framework has been executed a 100 times and the average value of all runs is displayed in the graph.

Figure 7.1: Group Composition Scores for several group sizes for both the framework and random historic groups.

It is trivial to see that the fitness of the historic groups progressively climbs with the group size. The standard deviation of the GCS scores per group is thus increasing, meaning that the groups are less balanced. The framework performs notably better for both the smaller and the larger courses. Contrary to the historic groups, the framework actually leads to lower fitness

(32)

scores for larger groups, meaning that the groups are better balanced. The framework is thus capable of significantly improving the fitness scores in the group formation process.

7.2 Runtime Performance of the Framework

Figure 7.2 shows the execution time of four problems as a function of the number of iterations. The timing experiment is run a 100 times and the average execution time is plotted.

Figure 7.2: Execution time in seconds versus number of iterations for several problem sizes

Since we know that the time needed for a single iteration is related to the computational difficulty of the problem that is being solved, we can see the strong effect that both the number of students considered and the number of groups that need to be formed have on the complexity of the problem. The execution times of all four problems are almost perfectly linear, making them very predictable. The largest problem of 100 students and 5 groups takes less than 20 seconds for 20.000 iterations. This predictable yet fairly short execution time makes it very suitable for real-world applications.

7.3 Comparison with GH

2184 historic study groups are considered in the comparison of GCS with GH. The grades of these study gruops are plotted versus the four different scores in Figure 7.3. The red data points represent scores of zero, meaning that these groups lacked the data required to calculate any score. For the calculation of the correlation coefficient, these data points are not considered.

(33)

Figure 7.3: GH scores versus achieved grades for historic group assignments. Red crosses indicate groups for which not enough data was present to calculate a score. Blue crosses represent groups that did have enough data to calculate a score.

All four plots have sizable clusters against the y-axis and against the line y = 10. While the spread of all three GCS graphs is bigger, no obvious relationship can be seen with the naked eye. Table 7.1 contains the Pearson Correlation Coefficient for all four graphs. All correlations consider p-value 0.05.

GH GCS (exponential) GCS (additive) GCS (multiplicative) Pearson Correlation Coefficient -0.341 -0.180 -0.291 -0.243

Table 7.1: Pearson Correlation Coefficients for GH, GCS (exponential), GCS (additive) and GCS (multiplicative)

A positive relationship between the scores and attained grades was to be expected, but all four correlations are negative. This implies that there are inverse relationships between grades and scores, and especially for GH the negative correlation is in fact moderately strong. Although all three GCS correlations are closer to a positive relationship than GH, they still have weak to moderate negative relations. Neither Goodness of Heterogeneity nor any of the three Group Composition Scores is thus able to receive positive results on this data set.

(34)

(35)

CHAPTER 8 Discussion

While the effectiveness and execution time of the framework show good results, neither our newly introduced Group Composition Score nor the Goodness of Heterogeneity on which we build shows a positive relationship with group performance. However, future research would be needed before we can discard the framework and GCS and GH. In this chapter we discuss the results and the answers to our research questions and identify possible causes for our negative results. We also briefly discuss the ethics of using automated group formation methods and we propose some research directions for future work.

8.1 Important factors when creating satisfactory study groups

Numerous attributes important for group performance have been identified in chapter 3, of which many have had positive results in previous research studies. A majority of these factors, however, are not present in real-world settings. Using these would violate our constraint of exclusively using readily available data to increase ease of use and practicability. With this restriction in mind, a brief literature review lead us to identify grade-wise heterogeneity among members of a group and familiarity among team member as important factors for team performance and student satisfaction.

8.2 Accuracy in real-world satisfactory study group formation

Looking at Figure 7.1 we see that the proposed framework is very effective at optimizing the fitness scores of study groups. Figure 7.2 shows us that this optimization can be achieved within seconds, making it highly applicable in real-world settings. However, when validating its predic-tive ability using historic group compositions and their achieved grades as we do in Figure 7.3, we have to conclude that neither Goodness of Heterogeneity nor our proposed extension Group Composition Score is accomplishing its goal of accurately predicting which groups will perform well.

The likely cause for the negative coefficients are the large clusters at the y-axes of all four plots in Figure 7.3. The vast majority of groups gets relatively low scores but has attained high grades. A probable explanation for this could be in the data set used: most of the assignments in the data provided by FeedbackFruits consists of peer feedback, where students review and grade other students. More lenient grading by students could explain the higher than expected grades for most of the study groups.

Low scores could have two reasons: the data set could contain a lot of new users of the FeedbackFruits-platform, which results in a lack of data that would be needed to calculate accurate scores. Not all groups are random either: if a group of students would work together on a regular basis, they would receive the same grades as well and their GPAs would be similar or even equal. This would lead to low scores for all four methods, even if the students work together very well and only score perfect grades.

(36)

Although no positive relationship between the scores and achieved grades is found, the sec-ond goal of GCS is to improve student satisfaction. Unfortunately, we do not have access to a significantly large enough data set of group compositions and student satisfaction to verify whether GCS has any impact on this.

8.3 Devising a System for Real-world Satisfactory Study Group

Forma-tion

Group formation, be it manual or automatic, is a complicated problem. A wide range of pos-sible attributes is proposed by psychology and computer science literature and many different techniques are utilized to use these attributes to form satisfactory study groups. While many of these tools show at least some positive results, the lack of a consensus on the meaning of a satisfactory study group has lead to a variety of approaches. Not all these approaches have proven to be practical in real world situations because they often require data about students which is not available in most situations.

We thus propose a new framework for automatic formation of satisfactory study groups. The framework is based on a Genetic Algorithm (GA) because of the flexibility that it gives us to dynamically change the target of the algorithm at every run without overhead caused by factors such as having to retrain a model. The framework takes in student performance and peer familiarity attributes and uses the GA to form and then output satisfactory study groups. In Figure 7.1 we can see that the framework is very effective at optimizing the fitness, especially on larger groups. In Figure 7.2 we see that even on a larger problem of 5 groups of 20 students, 20.000 iterations can be executed in less than 20 seconds, making the framework fast enough to be applicable in real world settings.

8.4 Threats to Validity

This thesis builds upon earlier studies on the topic of automatic group formation. While nu-merous research studies have been conducted and have shown positive results, they have only validated their results on a very small scale. Groups have been formed for single projects in small classes, but no study evaluated the end results of the groups. The majority of these previous studies did not actually evaluate their method in the real world and only ran simulations with either real or fake student data. The primary way in which results are validated is by using the psychology literature. While this literature does provide a solid foundation, it cannot be ruled out that these previously researched methods would fail in real-world settings, in which case the foundation for this thesis would be weak.

As already mentioned in subsection 6.2.2, the parameters for the genetic algorithm should normally be chosen using methods like hyperparameter optimization. Unfortunately we were not capable of successfully applying this technique and we therefore had to opt for a considerably weaker approach to finding suitable values. Although our experiments show good results for the permutation rate, mutation rate and population size we settled on, these values might not be optimal or even near-optimal. It therefore cannot be ruled out that our lack of hyperparameter optimization has a negative effect on the outcomes of our framework.

Our choice for the Pearson correlation coefficient is based on the assumption that both grades and peer familiarity follow a normal distribution. However, while this assumption about peer familiarity might be true on a macro-level, lesser- or overly-connected students might not fit in this assumption. If a large enough number of groups consists of non-standard socially students, the number of outliers might negatively influence the Pearson correlation coefficient since it is particularly vulnerable to this.

8.5 Ethical Aspects

As a result of the increasing usage of group assignments in tertiary education, an increasingly large percentage of the final grade of any student is determined by group work. Since the

(37)

literature suggests that group composition has a large effect on group performance and end result, the group formation process has indirectly become a strong influence on a students’ final grade and GPA. The developed framework could thus become a factor in the grades of students. The concept of an algorithm having any influence on the academic achievements of a student might raise ethical concerns.

However, we deem the proposed framework morally responsible. The goal of the framework is to offer an improvement over the now commonly used strategy of randomly forming groups, aiming to improve both end result and student satisfaction. We do not believe that our frame-work could lead to lesser performing groups than random groups and it should thus not have a regressive effect on individual student grades.

8.6 Future work

8.6.1 Evaluate the Student Satisfaction of GCS

The correlation between group grades and GCS has been evaluated in chapter 7 using data from FeedbackFruits. This data set does however not contain any data related to how students experienced the group work. The second goal of GCS, improving student satisfaction, could therefore not be evaluated in this thesis. We propose a future study in which GCS is implemented in a real-world setting to form study groups, after which the students would receive a survey with questions about how they experienced working in their group.

8.6.2 Re-evaluate the Framework in different Setting

This thesis has used the peer feedback data from FeedbackFruits to evaluate the performance and accuracy of the proposed framework. Since both the newly introduced GCS and the already established GH lead to negative weak correlations, it is a possibility that the problem is related the data and not with the method. We propose a new study wherein the framework will again be evaluated but against a different data set, such as one where students are only graded by their instructor.

8.6.3 Relationship between Peer Familiarity and Group Performance

Considerable research in psychology has shown that social loafing, the phenomenon where team members perform worse in a group than in an individual assignment, increases progressively with the group size. A quantifiable relationship has nevertheless not been identified. We suggest a study in which a larger set of data could be explored to find what type of quantifiable relationship exists between peer familiarity and grades or student satisfaction.

8.6.4 Dynamic Satisfactory Indicator

The design of the framework allows for the satisfactory indicator to be set dynamically for each run. This opens up the possibilities to move away from a one-size-fits-all approach of forming study groups to one where a course instructor could possibly even dynamically determine the attributes and technique used in the group formation. Allowing teachers to influence the algorithm would incorporate the domain-specific knowledge of an instructor about what factors are good predictors of group quality and success in the course. We propose a new study into an extended framework which can be easily adjusted by a teaching team to fit the needs of a specific course.

8.6.5 Hyperparameter optimization

The crossover-rate, mutation-rate and population size used as parameters for the genetic algo-rithm of the proposed framework are chosen using a rudimentary method and are not likely to be optimal. The accuracy of the framework can possibly benefit from a more sophisticated method

(38)

for tuning these parameters, such as hyperparameter optimization. We therefore propose a small study into optimizing the values of the proposed framework.

(39)

CHAPTER 9 Conclusion

The importance of group-based learning in higher education has been steadily increasing in recent decades. Companies highly value collaboration skills, instructors experience less overhead and students report positive effects on personal strengths such as their self-confidence. The introduction of the internet in education has further facilitated a growth in the prevalence of group-based learning, boosted by the emergence of online educational technology.

Group composition has been shown to be a major influence on both group performance and individual student satisfaction. Substantial research has been conducted into the formation of study groups. Current group formation techniques are either survey-based or random-based. Survey-based grouping techniques require more work from both students and instructors, and random-based grouping techniques are highly unlikely to construct satisfactory study groups.

This thesis proposes and investigates a new hybrid approach whereby historical student data is utilized to automate the group formation process. The Goodness of Heterogeneity (GH) is extended and a Group Composition Score (GCS) is introduced to score study groups based on both performance attributes of individual students and peer familiarity. We encapsulate the group formation problem as a genetic algorithm and employ the GCS to formulate a fitness function.

A framework is designed with the goal of forming satisfactory study groups based on student performance and peer familiarity attributes. The framework uses a Genetic Algorithm (GA) for optimizing the grouping problem while keeping the approach dynamic and flexible.

The framework together with the GCS is evaluated. Effectiveness, execution time and accu-racy are measured. The results show that the framework is successful at increasing homogeneity of GCS scores among groups in a course within seconds. However, in experiments neither GCS nor GH show any positive correlation with group performance. The framework is efficient and versatile. The proposed GCS, as well as GH, are not positively related to group performance. Further research is needed to investigate a possible relationship between GCS and student sat-isfaction.

(40)

(41)

Bibliography

[1] Samira Abnar, Fatemeh Orooji, and Fattaneh Taghiyareh. “An evolutionary algorithm for forming mixed groups of learners in web based collaborative learning environments”. In:

2012 IEEE international conference on technology enhanced education (ICTEE). IEEE.

2012, pp. 1–6.

[2] Sofiane Amara et al. “Group formation in mobile computer supported collaborative learning contexts: A systematic literature review”. In: Journal of Educational Technology & Society 19.2 (2016), pp. 258–273.

[3] Zhamri Che Ani et al. “A method for group formation using genetic algorithm”. In:

Inter-national Journal on Computer Science and Engineering 2.9 (2010), pp. 3060–3064.

[4] Jane Burdett. “Making groups work: University students’ perceptions”. In: International

Education Journal 4.3 (2003), pp. 177–191.

[5] Jane Burdett and Brianne Hastie. “Predicting satisfaction with group work assignments”. In: Journal of University Teaching & Learning Practice 6.1 (2009), p. 7.

[6] Frank Coffield et al. “Learning styles and pedagogy in post-16 learning: A systematic and critical review”. In: (2004).

[7] Carol L Colbeck, Susan E Campbell, and Stefani A Bjorklund. “Grouping in the dark: What college students learn from group projects”. In: The Journal of Higher Education 71.1 (2000), pp. 60–83.

[8] Council of European Union. Council regulation (EU) no 679/2016. https://eur-lex.europa.eu/eli/reg/2016/679/oj. 2016.

[9] Kalyan Deb et al. “A fast and elitist multiobjective genetic algorithm: NSGA-II”. In:

Evo-lutionary Computation, IEEE Transactions on 6 (May 2002), pp. 182–197. doi: 10.1109/

4235.996017.

[10] Pierre Dillenbourg. What do you mean by collaborative learning? 1999.

[11] Sabine Graf and Rahel Bekele. “Forming heterogeneous groups for intelligent collaborative learning systems with ant colony optimization”. In: International conference on intelligent

tutoring systems. Springer. 2006, pp. 217–226.

[12] R. L. Haupt. “Optimum population size and mutation rate for a simple real genetic algo-rithm that optimizes array factors”. In: IEEE Antennas and Propagation Society

Interna-tional Symposium. Transmitting Waves of Progress to the Next Millennium. 2000 Digest. Held in conjunction with: USNC/URSI National Radio Science Meeting (C. Vol. 2. 2000,

1034–1037 vol.2.

[13] Michael Henderson, Neil Selwyn, and Rachel Aston. “What works and why? Student per-ceptions of ‘useful’digital technology in university teaching and learning”. In: Studies in

Higher Education 42.8 (2017), pp. 1567–1579.

[14] Tyson R Henry. “Forming productive student groups using a massively parallel brute-force algorithm”. In: Proceedings of the World Congress on Engineering and Computer Science. Vol. 1. 2013, pp. 23–25.

Automatic formation of satisfactory study groups

Bachelor Informatica