• No results found

Evaluation of grouping methods for solving one-mode blockmodeling problems

N/A
N/A
Protected

Academic year: 2021

Share "Evaluation of grouping methods for solving one-mode blockmodeling problems"

Copied!
89
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

Evaluation of grouping methods for

solving one-mode blockmodeling

problems

E Spoelstra

21234272

Dissertation submitted in partial fulfilment of the requirements

for the degree

Magister Scientiae

in

Computer Science

at the

Potchefstroom Campus of the North-West University

Supervisor:

Mr H Foulds

Co-supervisor:

Prof S ter Horst

(2)
(3)

2

Abstract

Allocating resources to a group, subject to restrictions, is a scheduling problem in manufacturing and service industries. One such problem is the scheduling of timetables for industrial training, schools, colleges and universities. A method is proposed in this study for solving one-mode block modelling problems. The question this study attempts to answer is: How does the proposed method compare to methods from the literature designed to solve similar problems and how can the method be improved upon. This was done by creating the method, studying the literature and implement-ing methods found therein. The proposed method was then improved empirically by implementimplement-ing and comparing all these methods. The improved Proposed Method combined the Hill Climbing method’s speed, the Grouping Genetic Algorithm’s nature-inspired improvements and used some random number generation and combinatorics. This created a method that is time efficient and finds good solutions for grouping problems which have one type of object but with values relative to each other. Results from this study may benefit the North West University where it might be implemented in order to program the assessment week and examination timetables. At a later date, this method can be used to optimize class timetables for universities or related application areas.

Keywords: Combinatorial Optimization; Grouping; Timetabling; Random Number Generation; Simulated Annealing.

Die toekenning van hulpbronne aan ’n groep, onderhewig aan beperkings, is ’n skeduleringsprob-leem in die vervaardiging- en diensnywerhede. Een probskeduleringsprob-leem van die aard is die skedulering van roosters vir industri¨ele opleiding, skole en universiteite. In hierdie studie word ’n metode vir die oplossing van een-modus blokmodelleringsprobleme voorgestel. Die vraag wat hierdie studie poog om te beantwoord is: Hoe vergelyk die voorgestelde metode met metodes uit vorige navorsing wat ontwerp is om soortgelyke probleme op te los en hoe kan die voorgestelde metode verbeter word. Dit is gedoen deur die metode te ontwerp en kodeer, die literatuur te bestudeer en ander metodes wat daarin gevind is te implementeer. Die voorgestelde metode is dan empiries verbeter deur al hierdie metodes te implementeer en te vergelyk. Die verbeterde voorgestelde metode kombineer die spoed van die Hill Climbing metode, die verbeterings van die Grouping Genetic Algoritme soos deur die natuur ge¨ınspireer en die gebruik van kansgetalle en kombinatorika. Dit het in ’n metode onwikkel wat vinnig en doeltreffende is en wat goeie oplossings vir groeperingsprobleme vind wat ’n objek het met waardes relatief tot mekaar. Resultate van hierdie studie kan tot voordeel wees vir die Noordwes-Universiteit, waar dit ge¨ımplementeer kan word om die assesseringsweek en eksamen-roostersprogram te verbeter. Op ’n later stadium, kan hierdie metode gebruik word om klasroosters vir universiteite of verwante velde te optimaliseer.

Kernwoorde: Kombinatoriese Optimalisering; Groepering; Roosterontwerp; Kansgetalle; Simu-lated Annealing.

(4)

3

Acknowledgements

I wish to thank everybody who helped with this dissertation. They are briefly mentioned. My parents Japie and Hester Spoelstra for patience and reading. Andrea Scholtz for his clarification of parts of the work and assistance with the basic mathematical formulas in Section 4.2.4, as well as his intensive reading and editing near the end of the study, without which this study may not have been completed on time. A great thanks to my supervisor Sanne ter Horst and the language editor Bouke Spoelstra. I need also thank Henry Foulds and Magda Huisman for their efforts to see this study greatly improved and completed on time. Also all others who listened and contributed some necessary ideas and encouragement. I also want to thank the creators and contributors of LATEX

(5)

Contents

1 Introduction 15 1.1 Problem statement . . . 16 1.2 Formulation . . . 18 1.3 Background . . . 18 1.3.1 Timetable grouping . . . 19 1.4 Research question . . . 20 1.5 Objectives . . . 20 1.6 Methodology . . . 21 1.7 Overview of Dissertation . . . 21

2 Literature Study: Overview 23 2.1 Introduction . . . 23

2.2 Clustering problem . . . 24

2.2.1 The model . . . 24

2.2.2 Example and context . . . 24

2.2.3 Comparison . . . 24

2.3 Bin packing . . . 25

2.3.1 The Model . . . 25

2.3.2 Example and context . . . 26

2.3.3 Comparison . . . 26

2.4 Knapsack . . . 26

2.4.1 The Model . . . 26

2.4.2 Example and context . . . 27

2.4.3 Comparison . . . 27

2.5 Four colour map problem . . . 28

2.5.1 The model . . . 28

2.5.2 Example and context . . . 28

2.5.3 Comparison . . . 29

2.6 Applicable method: Hill Climbing . . . 29

2.6.1 Method overview . . . 29

2.6.2 Example . . . 30

2.7 Applicable method: Genetic algorithms . . . 30

2.7.1 Method overview . . . 30 5

(6)

6 CONTENTS

2.7.2 Example . . . 30

2.7.3 Conclusion . . . 31

3 Data Overview 33 3.1 Data . . . 33

3.1.1 Defining large datasets . . . 34

4 Implementation of the Proposed method 37 4.1 Overview of Method . . . 37

4.2 Algorithm . . . 39

4.3 Discussion of Procedure - OptimizationProgram . . . 39

4.3.1 The three nested loops . . . 40

4.3.2 Repeating the program . . . 41

4.3.3 Initialising the statsControlParameter . . . 41

4.3.4 Iterations necessary to ensure strictness . . . 42

4.3.5 List used when restarting the while . . . 43

4.3.6 List used when restarting the outermost for . . . 43

4.3.7 Iterations necessary to ensure successes . . . 44

4.3.8 Stop-criteria . . . 44

4.3.9 Reason for adding a group . . . 45

4.3.10 The two possible strategies . . . 45

4.4 Discussion of Procedure - initialiseGreedyList . . . 45

4.5 Discussion of Procedure - BadnessCalculator . . . 47

4.6 Discussion of Procedure - RandomChanger . . . 48

4.7 Discussion of Procedure - AcceptanceFunction . . . 49

5 Literature Study: Hill Climbing method implementation 51 5.1 Overview of Method . . . 51

5.2 Algorithm . . . 52

5.3 Discussion of Procedures as Implemented . . . 52

5.3.1 Hill Climbing method program . . . 52

5.3.2 Creating the initial list . . . 52

5.3.3 Improvement program part 1 . . . 53

5.3.4 Improvement program part 2 . . . 54

5.3.5 Improvement program part 3 . . . 55

5.3.6 Appending program . . . 55

5.4 Conclusion . . . 56

6 Literature Study: Grouping Genetic Algorithm Implementation 57 6.1 Overview of Method . . . 57

6.2 Algorithm . . . 58

6.3 Discussion of Procedures as Implemented . . . 58

(7)

CONTENTS 7

6.3.2 Creating the initial solutions . . . 59

6.3.3 Ranking and fitness program . . . 60

6.3.4 Crossover program . . . 61

6.3.5 Program for repairing the child . . . 62

6.4 Conclusion . . . 63

7 Results 65 7.1 Hardware and software . . . 65

7.2 Overview of results . . . 65

7.3 Dataset 1 . . . 66

7.4 Dataset 2 . . . 68

7.5 Dataset 3 . . . 70

8 Evaluation of Results of Methods 73 8.1 Least groups . . . 73

8.2 Speed . . . 73

8.3 Adaptability . . . 74

9 Implementation of Improved Proposed program 75 9.1 Overview of Method . . . 75

9.2 Algorithm . . . 77

9.3 Discussion of Procedures and Differences . . . 77

9.3.1 Greedy algorithm used after each while . . . 77

9.3.2 Using the RepairChild idea . . . 78

9.3.3 Reduced successes and iterations . . . 80

9.3.4 Maximum iterations only count failed iterations . . . 80

9.4 Overview of Results . . . 81

9.5 Evaluation of Results of Methods . . . 83

9.5.1 Least groups found . . . 83

9.5.2 Speed . . . 83

9.5.3 Adaptability . . . 83

10 Conclusion 85 10.1 Future work . . . 85

(8)
(9)

List of Figures

2.1 Countries linked with vertices (Stromquist 1975) . . . 28

7.1 Proposed method applied to Dataset 1 . . . 66

7.2 Grouping Genetic Algorithm applied to Dataset 1 . . . 67

7.3 Hill Climbing method applied to Dataset 1 . . . 67

7.4 Proposed method applied to Dataset 2 . . . 68

7.5 Grouping Genetic Algorithm applied to Dataset 2 . . . 69

7.6 Hill Climbing method applied to Dataset 2 . . . 69

7.7 Proposed method applied to Dataset 3 . . . 70

7.8 Grouping Genetic Algorithm applied to Dataset 3 . . . 70

7.9 Hill Climbing method applied to Dataset 3 . . . 71

9.1 Improved Proposed Method applied to Dataset 1 . . . 81

9.2 Improved Proposed Method applied to Dataset 2 . . . 82

9.3 Improved Proposed Method applied to Dataset 3 . . . 82

(10)
(11)

List of Code snippets

3.1 Example of the data used . . . 34

4.1 Excerpt from: Proposed method - OptimisationProgram.m . . . 39

4.2 Excerpt from: Proposed method - OptimisationProgram.m . . . 44

4.3 Excerpt from: Proposed method - InitialiseGreedyList.m . . . 46

4.4 Excerpt from: Proposed method - BadnessCalculator.m . . . 47

4.5 Excerpt from: Proposed method - RandomChanger.m . . . 49

5.1 Excerpt from: Hill Climbing method - InitialiseGreedyList.m . . . 52

5.2 Excerpt from part 1: Hill Climbing method - ImprovementProgram.m . . . 53

5.3 Excerpt from part 2: Hill Climbing method - ImprovementProgram.m . . . 54

5.4 Excerpt from part 3: Hill Climbing method - ImprovementProgram.m . . . 55

5.5 Excerpt from: Hill Climbing method - Append.m . . . 55

6.1 Grouping Genetic Algorithm - GGA.m . . . 58

6.2 Grouping Genetic Algorithm - GreedyAlgorithm.m . . . 60

6.3 Grouping Genetic Algorithm - ComputeOFV.m . . . 60

6.4 Grouping Genetic Algorithm - Crossover.m . . . 61

6.5 Grouping Genetic Algorithm - RepairChild.m . . . 62

9.1 Excerpt from: Proposed method - OptimisationProgram.m . . . 77

9.2 Excerpt from: Improved Proposed method - IPM.m . . . 78

9.3 Excerpt from: Proposed method - OptimisationProgram.m . . . 79

9.4 Excerpt from: Improved Proposed method - IPM.m . . . 79

9.5 Excerpt from: Improved Proposed method - IPM.m . . . 80

9.6 Excerpt from: Improved Proposed method - IPM.m . . . 80

(12)
(13)

Nomenclature

LATEX Is a mark-up typesetting language specially suited for scientific documents

Badness A measure of how bad any one solution is in terms of the desired solution or the best solution imaginable (not necessarily reachable)

Binary acceptance This means that a solution is either feasible or non-feasible with no solutions in between.

Clash When two modules with a discordance are placed in the same group Constraint A condition of an optimization problem that the solution must satisfy Discordance When two modules cannot be presented at the same time

Ellipse inside code This means some code has been removed for ease of reading

Greedy algorithm An algorithm that follows the problem solving heuristic of making the locally optimal choice at each stage

Profit The difference between the badness value of the previous solution and the badness value of the current solution

Relations The value of the cross point between two modules in the matrix/ The logical or natural association between between modules

Singletons A group consisting of only one module

Strictness A measure of the probability for picking a worse solution over a better solution Uphill solution A solution that is worse that the previous solution

Abbreviations

GGA Grouping Genetic Algorithm HC Hill Climbing

IPP Improved Proposed Program

(14)

14 LIST OF CODE SNIPPETS

Reference notation

As prescribed by the NWU, the reference and bibliography styles used in this dissertation are the author-year style according to the Harvard referencing style.

(15)

Chapter 1

Introduction

In the manufacturing and service industries, management is continually confronted by sequencing and scheduling of tasks (Haouari and Serairi 2009; Labb´e et al. 1995; Pisinger and Sigurd 2005). It often plays a crucial role that is necessary for survival in the marketplace (Pinedo 2010). One such scheduling problem is that of allocating resources to a group, subject to constraints. An illustration of this is the scheduling of timetables for industrial training, schools, colleges and universities. These can be class timetables or examination timetables. In these cases the resources are time slots and venues that must be allocated.

One method of scheduling any type of timetable is based on the concept of timetable groups. These groups are placed in non-overlapping timeslots that are predetermined and, to a large extent, unchangeable. No student can register for two modules in the same group simultaneously — a student is able to register for a maximum of one subject per group. If any student were to register for more than one subject from each group, the student would have both modules at the same time and would only be able to attend one of the two classes. Subject or timetable groups are therefore constructed from the module-combinations which students are allowed to enrol for.

Based on informal conversations with lecturers working at universities in different countries (Spoelstra et al. 2014):

Japie Spoelstra South Africa Jan Geertsema United States Jaap Molenaar Netherlands Manie Spoelstra United States Bouke Spoelstra Taiwan

it is known that such systems are in use in many existing universities. Timetable groups in this study were based on the grouping system of class timetables at the North-West University (NWU), formerly the Potchefstroom University until the year 2000. From 2000 the North-West University experimented with other techniques, finally changing back to a timetable group system, with respect to class timetables, in 2005. There are unique challenges at the NWU, posed by the fact that many subjects are presented over faculty boundaries (e.g first year Economics is presented both in curricula of the Faculty of Natural Sciences and the Faculty of Economic and Management Sciences) in one class group, and by limited resources with respect to instructors and venues. This

(16)

16 CHAPTER 1. INTRODUCTION

makes it an ideal situation for investigating methods of optimally grouping modules for the purpose of scheduling. These circumstances and problems were the motivation that started the research presented in this study.

1.1

Problem statement

The management committee of the NWU’s Potchefstroom campus has considered also implement-ing the timetable groupimplement-ing system for assessment weeks and examination timetables and then to keep these stable with regards to the placements and having the same structure from year to year. Because of the permanency of the placement, in that modules retain their grouping, the com-plications will be limited if it can be optimised once. At the moment the NWU will benefit by implementing it with respect to the assessment week and the examination timetables. At a later date, the class timetables can be optimised as well, since the current placing was done without adequate data, and under time pressure, mostly by hand and edited through trial and error.

The proposed method that will be presented in this dissertation can be applied to almost any problem that has the same basic problem structure, from timetables of a university to meetings in a company or even to assign employees to projects. A non-academic example would be table placings for guests at a wedding reception or diplomatic banquet, where there may be deep divisions as to dietary requirements, religion, old feuds, political views, etc. The difficulty in adapting this method to other problems will be in assigning the different aspects, such as employees and projects, to variables in such a way that the data are in the same format as in this study in order to implement the method. For example if problem A can be converted to fit the model of problem B, then the method used to solve problem B can be applied to solve problem A. In other words if the variables of a new problem and the relative discordances between each variable can be identified, the problem can be presented as a one-block modelling problem. In that case the proposed method can be used to solve that problem. Thus the proposed method will be applicable to any grouping problem that can be converted to a one-mode block-modelling problem.

An example of a grouping problem from scheduling, is the problem of having electives. For example a student can take any two subjects out of three electives - subject one (S1), subject two

(S2) and subject three (S3). There can be three students in this curriculum each taking a different

combination of electives, that means that it is possible that student A take subject one and two, student B takes subject two and three, and student C take subject one and three. This leads to having 3 objects that need to be grouped - the three subject. Subjects S1 and S2 cannot be in the

same group, since it is possible for a student to pick both. The same holds for subjects S2 and S3,

and subjects S1 and S2. The matrix would then be

    S1 S2 S3 S1 1 1 1 S2 1 1 1 S3 1 1 1    

(17)

1.1. PROBLEM STATEMENT 17

subject S1 was in a different curriculum with subject S4, and subject S2 in yet another curriculum

with subject S5, then the matrix can be expanded to

         S1 S2 S3 S4 S5 S1 1 1 1 1 0 S2 1 1 1 0 1 S3 1 1 1 0 0 S4 1 0 0 1 0 S5 0 1 0 0 1         

This will mean subject S1, S2and S3 need to be in separate groups, even though they are electives.

The same way of handling can be used for any electives where students can take any 2 subjects out of three or more available electives.

The data used for this problem were gathered from the curricula of the NWU. In these curricula, a number of modules get prescribed per study year per semester to each curriculum based on the knowledge needed to complete a degree. The modules selected for curricula occur in many combinations and sometimes include electives to give the students a greater diversity. For this study no modules were seen as electives and the assumption is made that all modules that can be combined in a degree have been so combined. It does happen in some curricula that a third year module is placed in the second year of the curriculum, therefore the study years can not be handled separately.

The data were collected and are combined in a matrix form for ease of use. In this matrix the rows and columns headings represent modules, are identical, and should have the same order top-to-bottom and left-to-right. If two modules are in the same year in the same curriculum, it is regarded as a clash, meaning they can not be in the same timeslot. This row-column intersection is given as a 1 in the matrix. Otherwise the value at the intersection of the row and column of these modules is a 0. The matrix is completely symmetric, since one module clashing with another is the same as the second clashing with the first. The diagonal entries will be zeros since a module cannot clash with itself. This matrix is used as the primary source of data for the implementation of the method. The goal would be to find sub-matrices, or groupings of rows and columns, such that no matrix contains a clash or 1. This form of data to model a problem is called one-mode

block modelling problems

The final grouping then represents the modules that can be in the same timeslot without causing a clash. That is to say that two modules in the same group, can not both be enrolled for. Two modules from different groups could possibly be enrolled for together, but here the constraint is curriculum content and modules prerequisites. The groups themselves are not seen as the primary source for enrolling possibilities — that remains to be determined by the curricula prescribed in the yearbook of the university.

This study does not focus on the soft constraints of timetabling problems such as the order of the modules on an examination timetable, or which lecturer presents which subject. It focuses only on the grouping of the subjects in non-clashing slots. Even though many problems have a fixed maximum number of groups or slots that can be used, it is assumed that, in most cases,

(18)

18 CHAPTER 1. INTRODUCTION

using less than the maximum is better. For example, if the university designates 18 days for the examination period, using only 17 or 16 days would be better, leaving open slots for studying or for an emergency. Therefore the goal is to minimise the number of required groups.

After much discussion and thought it was decided that it is practical to define the problem in mathematical terms as well as giving a full explanation. The mathematical formulation will be a benefit when comparing methods and applying algorithms. In the following section the problem is formally defined mathematically.

1.2

Formulation

This section presents the mathematical formulation of the problem discussed in literary terms above. For future consistency, the terminology that has and will be used henceforth is listed: Modules. Subjects, question papers, employees, tasks, guests, etc., that have to be scheduled will

be called modules and denoted by a finite set X, of size N .

Data. A symmetric N × N matrix Y whose entries consist of ones and zeros, with zeros on the diagonal. The rows and columns of Y are indexed by the elements of X.

Discordances. The zero and one entries yij in Y corresponding to the entries of X, depict the

discordances between modules, therefore

yij =

(

0 if xi and xj can be in the same group

1 if xi and xj cannot be in the same group.

Groups. For a subset S of X, with size s, we define Y [S] to be the s × s sub-matrix whose rows and columns consist of those elements associated with S. The subset S is called admissible if Y [S] consists of zeros only, i.e. Y [S] is a s × s zero matrix. An admissible grouping is a disjoint partitioning S1, . . . , SM of X (i.e. Sj ∈ X for j = 1, . . . , M ), such that each Sj is an

admissible subset of X.

Admissible groupings do exist. The groups can always be a partitioning consisting of single-tons, that is, all the subsets consisting of one element each.

Grouping objective. Determine an admissible grouping that meets these optimisation criteria: - Minimise M : Determine an admissible grouping with as few subsets as possible. Rationale: You can schedule exams with as few time-slots as possible, and thus have more time between the time-slots.

1.3

Background

The problem of grouping as well as the one-mode block modelling problem stems from the system of scheduling timetables, called timetable grouping. This section explains this system as background to the problem at hand.

(19)

1.3. BACKGROUND 19

1.3.1 Timetable grouping

Timetable grouping is a transparent, easy to understand system for timetabling with low mainte-nance. Though it does have negative points, most are due to lack of personal consideration and not based on academic or administrative reasons.

Timetable grouping considers the timeslots, also called periods, in the timetable to be set beforehand. In a class timetable this would be the class periods in a week. All periods have the same length and the same start times right through every day of the week. These periods are then allocated a group number, so that every group has the same number of periods and total length of contact time per week. Preferably all timetable groups would have evenly distributed ‘good’ and ‘bad’ periods, where good periods refer to periods at pleasant times e.g. Wednesday 11h00 and bad to periods that are at inconvenient times e.g. 16h00 on Friday. The periods per group should also be evenly distributed over the course of the week. In an examination setting the timeslots are the days and the sessions in the day, for example, a morning and afternoon session each day during the examination.

After the timeslots are set, be it for examinations or classes, groups can be allocated to a timeslot. For classes this will mean that all modules in one group will be delivered on the same day and time each week. If this is continued for a longer period, the modules are the same each year, with only small modifications for new curricula or modules. In the case of examinations, the same concept holds; all modules in the same group would be in the same timeslot during the examinations. Over a longer period small modifications can be made such as a cyclical rotation of weeks each year. The placement of the groups can be optimised to give the students the best spread of modules. But in essence it will be the same annually.

As is the case for any timetabling system, this system has advantages and disadvantages. Start-ing with the advantages, the first is simplicity, since this system is easy to create, update and maintain. It requires very little effort leaving a timetabling officer with spare time to focus on other tasks. This system is also very transparent, meaning anyone who uses it can understand its working. It encourages trust in the system and people behind it, when the users know why and how their timetable has been constructed. It is also a fair system since no modules get more or less time, or consideration, or are assigned more importance than another. It doesn’t consider any modules as being ‘better’ than another or requiring more effort. In cases where the institution has decided that one module does require less or more time than another, half or double timetable groups can be used respectively. This is mostly the case with practical modules that require laboratory work, and small essential modules that may only require half the periods in a weekly timetable. Because this system is consistent, it inspires trust from the users, they know what their timetables will look like and why, and they also know that they get the same priority as all other users of this timetable. The last advantage is the modular quality of the timetable. Because the timeslots and the grouping are considered separately, it is easy to modify one without upsetting the entire timetable. For example, the institution may change the start and end times in a day. In that case the timetable slots would change a bit to adapt, but the groupings would remain similar, thereby only giving the users one small difference to adapt to instead of an entire overhaul of the system.

(20)

20 CHAPTER 1. INTRODUCTION

inflexibility. This timetable grouping system is not very flexible for accommodating exceptions or deviations on the rules. This can be seen as an advantage, depending on the user. This system does not consider lecturers of any module when grouping or placing modules and expects the lecturers to keep the timetable in mind when determining who will present which course. This also leads to the humanitarian concerns. This system does not make allowances for lecturers’ or students’ personal lives. It does not change the times based on the preferences of the users, or their time availability. Again, this can be seen as an advantage depending on the point of view. To the user it may seem rigid and unaccommodating, but in the larger picture of scheduling, it would be impossible to allocate all users those periods they want. This system promotes fairness for most if not all. If everyone cannot have their desired periods, no one will. The system also has a large negative consequence in that it sometimes over-utilise venues. Venues are scheduled for a specific module, and all its timeslots, yet lecturers don’t use all the given time every week. Therefore a usable venue may be open during that timeslot. If an institution has enough venues for all its courses this is not a big problem, as it enables venues time to air and freshen up before the next session begins.

1.4

Research question

A method has been developed by the author for solving one-mode block modelling problems, specifically the problem discussed in the problem formulation. In this dissertation this proposed method will be discussed in detail, specifying how the method is adaptable to certain changes which may occur in the problem statement (Section 1.1). The questions this study attempts to answer is: How does the proposed method compare to methods from literature designed to solve similar problems and what are its shortcomings, or theirs, when compared. How can the proposed method be improved upon? To answer these questions, alternative methods will be discussed, implemented and then evaluated using the same considerations.

1.5

Objectives

The aim of this research is ultimately to find or develop a good method for solving very large one-mode block modelling problems such as the problem discussed in Section 1.1. To achieve this aim, the objectives are to:

1. Find methods in literature that can solve this relatively simplistic problem and apply them. These methods will all be evaluated according to the same considerations.

2. Evaluate the proposed method. 3. Compare all the methods.

4. Improve the proposed method, or develop another to solve this problem. .

(21)

1.6. METHODOLOGY 21

1.6

Methodology

The word methodology, used from 1970 to mean ‘a way of developing software/research’ means ‘the science of methods’ (Schach 2007). This dissertation uses the “design and create” research method. For many research projects in computing, the research involves designing, analysing and developing a computer based product. This dissertation also exhibits academic qualities of research such as analysis, explanations, argument, justification and critical evaluation (Oates 2006). It is now commonly held that object-oriented methodologies are more effective for managing the complexities in large and complex software artefacts and research (Preiss 2000).

This study started with the creation of a method to solve the problem of grouping one-mode block matrices. This method was created and applied to the test data. After the method was created, the literature was investigated. The investigation focused on finding a similar or better method to do the same grouping of one-mode block-modelling matrices. When any grouping method was found, that method was evaluated with the intent of applying it. Two methods were found that could be applied to the same matrices. These were thoroughly studied and understood before encoding and applying them to the same test data. During this study, procedures were found that could potentially improve the proposed method. These procedures were then adapted and applied to create a new method, called the Improved Proposed Program. This program was also applied to the test data. All the results were evaluated and compared to find the best method.

1.7

Overview of Dissertation

In Chapter 2 possible methods from the literature are discussed and a short overview is given of the two methods used. Chapter 3 focuses on the data, specifying the origin, format and data that will be used throughout the dissertation. Following this, the proposed method is presented in detail in Chapter 4, preceding alternative methods from literature, fully discussed and implemented. The Hill Climbing method is discussed in Chapter 5 and the Grouping Genetic Algorithm (GGA) in Chapter 6. Extracts of the code implementations for these methods are given in these chapters. These extracts are representations of the methods and are used to explain the flow and procedures of the methods. There can be many other ways to program these methods. The full code as the methods are implemented is given in the appendix. This code can be copied and pasted, but will require some editing to convert from pdf characters to Matlab characters.

The results after implementing the methods and running the programs developed for these methods, are presented in Chapter 7 followed by the evaluation of these results in Chapter 8. Chapter 9 will once again focus on the proposed method but with some improvements based on new knowledge. After that a conclusion is drawn in Chapter 10 along with a mention of possible future work.

(22)
(23)

Chapter 2

Literature Study: Overview

2.1

Introduction

In this chapter, a number of optimization models from literature will be discussed. These models were encountered while looking for viable methods to solve the one-mode block modelling problem. Each of these general models were studied with the aim of applying the many methods, created to solve them, to the one-mode block-modelling problem. The approach followed is to investigate and discuss models found in the literature that are used to solve grouping problems and then at-tempt to fit the one-mode-block modelling problem into the literature model. The discussions are sectioned according to the models. The models, as presented in literature, will be discussed and explained, whereafter it will be applied to the one-mode block-modelling problem if possible. If the one-mode-block modelling problem can be formulated in such a way that it fits the model found in literature, it might be possible to use one of the methods for solving the literature model to solve the one-mode block modelling problem in this study. Methods have been found that have already been applied to one-mode-block modelling problems. They are briefly mentioned in Section 2.6 and Section 2.7 and fully discussed and implemented in Chapters 5 and 6 after the discussion of the proposed method in Chapter 4.

As discussed in the previous chapter, the one-mode block modelling problem that is created from timetabling data addresses a timetabling problem. When discussing timetabling a differentiation in usually made between course timetabling and exam timetabling. With exam timetabling the spread of the modules over the exam period for any one student is important, while the spread of modules for course timetabling is less relevant. Another difference is that all the students of a module with large student numbers should usually write exams at the same time, while a course can be split into many class groups not presented at the same time. Another difference come in when doing venue allocations for the modules. In an exam set-up the students can be spread over many venues, but when running classes one class group needs to be in one venue (Burke and Petrovic 2002). The grouping done via the one-mode-block modelling problem is done before the groups of modules are allocated over the available time-slots. The distribution of the groups over the various time slots can be accounted for when grouping the modules into timetable groups, but that would be done by an additional process after the possible solutions have been found.

(24)

24 CHAPTER 2. LITERATURE STUDY: OVERVIEW

2.2

Clustering problem

2.2.1 The model

The first model encountered while researching grouping methods is clustering. Clustering is gener-ally defined as grouping a set of nodes or objects into clusters, where the elements in each cluster are as similar (for a problem specific definition of similar) as possible. Likewise, objects from differ-ent clusters should be as dissimilar as possible. A similarity function has to be defined to measure similarity (Kashan et al. 2013; Ferrari and Castro 2015). It is meant as a learning task that should output a group for any given input, even those that are given after the process has been run and therefore not already mapped out. (Liu et al. 2015). Mathematically it could be written as (Xu and Lee 2015; Tsai et al. 2015):

Objects The modules or vectors to be clustered are denoted by X = {x1, x2, . . . , xn}.

Clusters The clusters are defined as subsets or partitions of X, s1, where S = {s1, s2, . . . , sk},

with a given maximum of clusters k, so that X = Sk

i=1Si and Si∩ Sj = ∅ for i 6= j. Each

cluster has a value assigned to it, for example the centre or mean of the clusters, defined as C = {c1, c2, . . . , ck} where ci is the value of cluster si.

The similarity function This function depends entirely upon the application but is a function of the cluster values, therefore SF (c1) this is also the function that is optimised so that

SF (c1) = Min.

The problem The aim is to optimise the clustering. Thus to minimise the similarity function in such a way that any input yi gets assigned to a cluster in a reasonable time with reasonable

accuracy.

2.2.2 Example and context

In Acedo-Hern´andez et al. (2016) the clustering problem to be solved is the placing of LTE sites inside multi storied buildings. The system that was considered consists of a baseband unit control-ling several radio heads, each in charge of several antennas. All these are usually on the same floor. As in macrocellular network planning, having a small cell system in an already existing network is a trade-off between improvement coverage and capacity, and the extra expenses due to the small cell infrastructure.

2.2.3 Comparison

To start off with the comparison, the similarities in clustering are considered. The discordances can be swapped to refer to a 0 as a similarity and a 1 as a dissimilarity. If we say that the value of each group is ci for timetable group si, which is a grouping of modules x1. . . xn then the similarity

function would be 1 k X sn∈S ci= number of clashes

(25)

2.3. BIN PACKING 25 where ci = X xi,xj∈Sn d(xi, xj)

is the value of a cluster i and d(xi, xj) is the similarity or distance between modules xi and xj. This

implies that a cluster where modules are similar would form a group, wherein no modules have any discordances with the other modules in this specific group. Therefore ci should always be equal to

zero, indicating no clashes and the similarity function should also be equal to zero.

The problem when matching the format used in this study, to the data format used in clustering is the value to be minimised. In clustering the value to be minimised is the similarity function used. The number of clusters, k, is not as important or is given, whereas, in the problem this study addresses, the number of groups, modelled to the clusters k, is the most important consideration. The similarity function must always be equal to zero and cannot be minimized to be almost zero or as close as possible to zero, since that is the relative distance between two modules that doesn’t clash. This is a set constraint.

There are many applications in exam timetabling where clustering techniques are used (Shat-nawi et al. 2010). But most of those are done after the grouping has already been done, or with different focus areas. The conclusion is that it would be impractical to try and use clustering methods on the problem of this study since the data and end-goal differ too much. There was no evidence in the literature that this has been tried before either.

2.3

Bin packing

2.3.1 The Model

A second possibility is bin packing. Bin packing is the general term for any model in which one can have a number of unsplittable items with different weights - also called sizes - and an unlimited or given maximum number of bins to place them in. The bins have a capacity that can not be exceeded. The goal is to use as few bins as possible (Labb´e et al. 1995). In addition to the capacity, a bin can have a cost. The aim would then be to minimize the sum of the costs of the bins that are used and not the number of bins. A problem can often be reduced to a bin packing model or one sub-part of a problem may be reduced to a bin packing model. A mathematical format would be(Pisinger and Sigurd 2005; Haouari and Serairi 2009):

Objects the objects are a given set of items X = {x1, x2, . . . , xn} with each item xi having a

corresponding weight yi.

Bins The bins, C = {c1, c2, . . . , ck} are bins with capacities N = {n1, n2, . . . , nk} and a fixed cost

si for using each bin, where ci∩ cj = ∅ for i 6= j.

A packing A partitioning V1, . . . , Vd ⊆ X of the items ∪di=1Vi = X and Vi∩ Vj = ∅ if i = j with

the property that d ≤ k and #(Vi) ≤ cj for each i.

(26)

26 CHAPTER 2. LITERATURE STUDY: OVERVIEW given by d X i=1 Si,

thus minimize cost of the packing, k, while not exceeding the capacity of a bin.

2.3.2 Example and context

An example of a bin packing model is in hosting, both service and virtualized hosting. The objects, in this case services or virtual machines must be assigned to clustered servers. Each server needs to provide enough resources, so that all its processes can run. Each machine has its own capacity for resources. The aim is to find a feasible assignment that minimizes the weighted cost (Gabay and Zaourar 2016).

In timetabling, bin packing is primarily used when doing venue allocations. The venues are the bins, each with a given capacity - the number of seats available. The objects would be the students needing placement. In an exam timetabling problem, the students would not be grouped and each student would have a weight of one. The students would then be allocated to venues so that as few as possible venues are used. If the students of one module need to be together in a venue the objects can be the modules, where the number of students is the weight. (Ross et al. 2002).

2.3.3 Comparison

At first view this seems close to the one-mode block-modelling problem, even though there was no indication found that methods for solving the bin packing model has been applied to the one-mode block modelling problem. The goal of the bin packing model is to minimise the number of bins, analogous to minimising the number of groups. It also handles objects that are allocated using a name which has no numerical value.

The problem is that the bin packing problem’s objects have fixed weights attached to them, whereas the objects of this study’s data have values relative to each other. There is no logical way of relating the discordances between modules of the one-mode block-modelling problem to an object as a fixed weight. This holds an interesting future possibility, granting the modules weights as well, but then in addition to the discordances already noted. The bin packing problem does not take into account that some items cannot be placed with others, thus making this method an improbable choice. A similar, yet reversed problem is discussed in the next section.

2.4

Knapsack

2.4.1 The Model

In the ordinary knapsack model, a set of items with weights and profits are to be placed in a knapsack with a fixed capacity. The profit is a fixed value that each item has, that will be added to the total profit if that item were to be packed. Sometimes there are a number of sacks, not just one. The idea is to maximize the value of the knapsack with the smallest profit. (Hochbaum 1995)

(27)

2.4. KNAPSACK 27

Mathematically that would be described as follows. (Yamada et al. 1998; Fujimoto and Yamada 2006)

Objects Given a set of items X = {x1, x1, . . . , xn} with each having corresponding a weight of

Y = {y1, y1, . . . , yn} and a corresponding profit of P = {p1, p2, . . . , pn}

Knapsacks Defined as k knapsacks, 1 per player C = {c1, c2, . . . , ck} each with a respective

capacity N = {n1, n2, . . . , nk}, where one object can only be placed in one knapsack so

ci∩ cj = ∅. Each knapsack is a set of objects.

Selection Using a variable n = nj with nj = 1 if item j is selected, 0 otherwise.

The Problem To maximise the minimum of the profits pk(x), with pkas the set of all the profits, where

pi(x) =

X

j∈Ci pjnj.

The goal is so get the maximum profit from the combined knapsacks.

2.4.2 Example and context

The Dynamic airlift loading problem involves the assignment of palletized cargo to specified pallet positions in a specified available aircraft. This must be done without deviating from the preferred aircraft load requirements while still satisfying the temporal constraints on the pallets, if possible. Overall the goal includes minimizing the number of flights required to transport all the pallets, the total number of aircraft required and minimizing the violations to an aircraft’s allowable cabin load and pallet temporal restrictions (Roesener and Barnes 2016).

2.4.3 Comparison

Just as in Section 2.3, the model seems very similar to the one-mode block-modelling problem and therefore should be considered. One has the groups versus players with knapsacks, the other modules versus the objects and even the 1 or 0 constraint. However, this constraint has nothing to do with whether two objects can be placed together or not. Just like with the bin packing problem, there are no fixed weights, profits or capacity per group in the problem considered in this study. These aspects could be added to the problem by giving each module a weight and each group a capacity, but doing so will not solve the grouping with the discordances as relations. This problem has not in the literature been compared with one mode block modelling problems, and in light of this comparison, it would be impractical to attempt to apply the methods to the problem under investigation.

(28)

28 CHAPTER 2. LITERATURE STUDY: OVERVIEW

2.5

Four colour map problem

2.5.1 The model

The four-colour map model refers, in general, to all the proofs of the statement made in 1852: any planar map, with regions bordering each other can be coloured using only four different colours where no two bordering regions use the same colour (Purves et al. 2000). This statement also led to methods for colouring such a map.

The shapes and sizes of the countries do not matter, but what does matter is which region borders which. For this, a graph (2.1) was used (Stromquist 1975), where the midpoint of a region is denoted by a dot or vertex and any two bordering regions are connected with a line segment or edges, which may be curved, but never intercept.

Figure 2.1: Countries linked with vertices (Stromquist 1975)

The only intersection would be the dot itself. If we then colour the vertices of this graph with four colours such that no edge joins two of the same colour, the problem is solved. According to Bagchi and Datta (2013) it can be formulated for higher dimensions as follows:

For p ≥ 1, let χ(p) be the smallest number such that any (finite) set of d-dimensional non-overlapping closed balls (not necessarily of the same size) in a p-dimensional Eu-clidean space may be coloured in χ(p) colours, so that any two touching balls receive different colours. Thus, we have χ(1) = 2 (trivial) and χ(2) = 4.

If we know that our problem is in a three dimensional space, the question then is what χ(3) would be. If we define τ (d) as the number of tangent balls that can occur in a d-dimensional Euclidean space, Bagchi proves that the number of colours needed are represented by

d + 2 ≤ χ(d) ≤ τ (d) + 1 ≤ 3d, ∀d ≥ 2 So that 5 ≤ χ(3) ≤ 13.

2.5.2 Example and context

The author, Randic et al. (2009), proposes a graphical representation of a protein for further studying of them. Yet different graphical representations gave different views of the same structure. Therefore he proposes a compact 2-D graphical representation of proteins based on a four colour map and a virtual genetic code. A virtual genetic code is a subset of the standard genetic code and consists of 20 codons coding the 20 naturally occurring amino acids.

(29)

2.6. APPLICABLE METHOD: HILL CLIMBING 29

2.5.3 Comparison

The data structure used here correlates closely to the data structure used for this study, as will be discussed now. This model also uses objects, in this case regions, with names that have no compu-tational value, with relative values between each two of these objects. The methods for colouring, as well as the four colour theory, is only relevant in a planar map. It might not be possible for the data structure this study employs to be displayed in a two dimensional space e.g. if there are six modules that all have discordances with each other. Because of this, the research on map colouring of more than two-dimensions was included. Although it does not give definite answers towards methods of obtaining a colouring, it does give an idea of the number of groups that would be optimal. This, however, depends on the minimum dimensions the main data can be represented in. Although there are many methods for colouring a map with four colours, none of them are, yet, applicable to a multidimensional space larger that two dimensional. Although graph colouring techniques are used for some timetabling applications, they can not be applied to the one-mode block-modelling prob-lem and therefore this part of the literature study did not yield an applicable result for our probprob-lem.

The next two sections discuss methods that have been applied to the one-mode block modelling problem or similar problems. Both these methods are discussed in detail in later chapters, therefore they are not discussed in full here.

2.6

Applicable method: Hill Climbing

2.6.1 Method overview

The Hill Climbing (HC) method was evaluated as an applicable method to the problem in this study as is fully discussed and implemented in Chapter 5. This section presents an overview of the method and its applicability. The HC method uses the greedy algorithm as an integral part. The greedy algorithm operates by considering items one by one, in any arbitrary order, and inserting them into groups. These inserts only happen if the placing is feasible, otherwise a new group is created (Lewis 2009). The solution found with the greedy algorithm is used as an initial feasible solution. From here the algorithm examines neighbouring solutions. If a neighbouring solution is better than the current solution, it replaces the current solution. This continues until no further improvements are possible (Sakamoto et al. 2014). In HC, solutions tend towards a local optimum, although it can be implemented as a hybrid, along with other methods, to counter the locality. (Bolaji et al. 2014)

This method has been applied to block modelling problems in the literature (Lewis 2009) and is therefore highly applicable to this situation. Most of the block modelling problems, however, have a binary acceptance. This means that a solution is either feasible or non-feasible with no solutions in between. For the problem in this study this method can be used.

(30)

30 CHAPTER 2. LITERATURE STUDY: OVERVIEW

2.6.2 Example

Optimization, modelling and resolution is crucial to achieving optimized performance networks, es-pecially with the emergence of several new networking paradigms. One such paradigm that requires resolution is that of Wireless Mesh Networks. This is an important networking infrastructure for providing cost effective broadband wireless connectivity. The issues with wireless mesh networks is closely related to node placement problems such as router placements. Node placement prob-lems have been investigated in the optimization field due to numerous applications such as facility locations, logistics or clustering (Sakamoto et al. 2014) .

2.7

Applicable method: Genetic algorithms

2.7.1 Method overview

In Chapter 6, the Grouping Genetic Algorithm (GGA) is discussed completely along with its implementation. In general, a genetic algorithm (GA) is an evolutionary strategy that emulates the natural evolutionary process. It seeks good quality solutions to optimise problems. The techniques used are inspired by biological evolution, where the existing solutions get recombined and modified (Quiroz-Castellanos et al. 2015). Since it is population based, each individual in the population represents a feasible solution and has a fitness value derived from the objective value of the feasible solution. These feasible solutions are called chromosomes. This method works iteratively, i.e. each new generation of solutions are created from the previous generation (Solbeyko and Monch 2015). The first solutions are not created to be optimal, only to be feasible although it does have a small probability of being the optimum.

There are several necessary aspects required to make the implementation of this algorithm suitable (Balash-Masoliver et al. 2014):

• The search space must be diverse and not well known

• There must be a suitable encoding to represent the solutions of the problem . • Each solution should have a fitness value.

Falkenauer and Delchambre (1992) proposed a new version of the genetic algorithm, called the Grouping Genetic Algorithm (GGA). It is argued that the normal crossover and mutation operators are not able to preserve the group features. In this crossover operator, crossover happens on the group representation of the parent chromosomes (Moghaddam and Moghaddam 2015).

2.7.2 Example

Lewis and Paechter (2007) applies the GGA to a timetabling problem. This problem has a set of hard constraints that is

1. No student is required to attend more that one event at any time 2. Only one event can be in any one room at any timeslot

(31)

2.7. APPLICABLE METHOD: GENETIC ALGORITHMS 31

3. The room satisfies all requirements of the event. Their problem also contains soft constraint which is:

1. No student should attend an event in the last timeslot of a day 2. No student should sit more that two classes in a row

3. No student should have a single class a day.

For this problem they specify that it is important that the groups themselves are the building blocks of the problem and not the states of the items. In this case, the items are the events and the groups are the timeslots.

2.7.3 Conclusion

The literature study started as a way of finding algorithms or methods for solving the problem under consideration which could then theoretically be applied to the data of this study and be compared to the method that was developed. The impression was that there would be numerous applicable methods in the literature. An intensive literature study delivered only a few such methods. The methods or problems discussed above are those that seemed most probable for implementation. One of the reasons for the lack of methods may be that most problems can be reduced to one or a combination of the already mentioned, yet impractical, problems and therefore the methods to solve these are not applicable to the problem at hand. The literature study is, however, not quite as unproductive as it would seem at first. The Hill Climbing (Chapter 5) and GGA (Chapter 6) are applicable and will be investigated thoroughly in this study.

(32)
(33)

Chapter 3

Data Overview

The methods discussed in succeeding sections will be applied to data from the NWU, modified for the newest curriculum changes in the yearbook for 2015. A curriculum is a program of modules spread over a few years that together form a degree. The university allows students certain electives. For this problem data have been simplified so that if any two electives taken together is not considered a clash, then this is given a value of 0, and that any main subject with a choice module is a clash with a value of 1. Any two modules which are not electives in the same year and semester but are in the same curriculum, are seen as a clash, allocated a value of 1. This imposes hard (or absolute) constraints on grouping. Acceptability of hard constraints is when the sum of the entries in all groups equals 0. The first section gives complete detail about the data that will be used.

This data may occur in many forms or from different genres. As has been mentioned, this study uses university data for subjects in preallocated curricula. The yearbook of the NWU contains set curricula with fixed modules for each year and each semester. Some of the curricula allow a few alternative module options. In the case of a wedding reception, people, the clashes based on their food preferences, friendships, or backgrounds may be utilized as data. Data can also be representative of a diplomatic meeting when allocating seating for all the dignitaries, where the clashes would then be long standing feuds or previous alliances, religions, or their siding on a specific topic under discussion. The modules in this case would be the dignitaries themselves. Yet another example may be the tasks when building a house. Some tasks can, by nature, not be done together, such as laying the foundation and roofing, but other tasks, like digging the second foundation while laying the first, or painting of walls while tiling the roof, may be done simultaneously. The idea would be to maximise the number of grouped tasks, thereby minimising the total time, and costs. Although these tasks have a order as well, the order can possibly be worked into valid data.

3.1

Data

This section explains and gives an example of the data to be used. The form the data is presented in can also be called one-mode block modelling data. One-mode refers to the aspect that there is only one type of module with values in relation to each other, therefore the top and right hand side of the matrix indices will be in the same order and have the same values throughout. Block modelling refers to the matrix-form in which these data can be presented.

(34)

34 CHAPTER 3. DATA OVERVIEW

Three sets of data will be used in an effort to adequately evaluate the proposed method and compare the proposed method to the two methods from literature, namely the Hill Climbing and Grouping Genetic Algorithm. The first two sets of data are reasonably small, but contain many discordances. Dataset 3 is large (see Section 3.1.1) but sparse. Dataset 1 (Code 3.1) is a logically chosen sub-set of modules from dataset 3, specifically chosen for the many discordances at first glance. Dataset 2 however, is a set of randomly chosen modules with discordances that may not be as logical. These datasets will not be given in this dissertation due to their size, but are available on request.

Code 3.1: Example of the data used

1 clashGrid = 2 [0 1 0 1 1 1 1 0 1 0 0 0 0 1 0 1 1 1 ; 3 1 0 0 1 1 0 1 0 1 1 0 0 1 0 1 0 1 0 ; 4 0 0 0 1 1 0 1 1 1 0 0 0 0 0 0 1 1 1 ; 5 1 1 1 0 0 0 0 1 0 1 1 1 1 1 1 0 0 0 ; 6 1 1 1 0 0 0 0 1 0 1 1 1 1 1 1 1 0 1 ; 7 1 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 ; 8 1 1 1 0 0 0 0 1 0 1 1 1 1 1 1 1 0 1 ; 9 0 0 1 1 1 1 1 0 1 0 0 1 0 0 0 0 1 0 ; 10 1 1 1 0 0 1 0 1 0 1 1 1 1 0 1 1 0 1 ; 11 0 1 0 1 1 1 1 0 1 0 1 0 0 1 0 1 1 1 ; 12 0 0 0 1 1 1 1 0 1 1 0 0 1 0 1 0 1 0 ; 13 0 0 0 1 1 1 1 1 1 0 0 0 0 0 0 1 1 1 ; 14 0 1 0 1 1 1 1 0 1 0 1 0 0 1 0 1 1 1 ; 15 1 0 0 1 1 1 1 0 0 1 0 0 1 0 1 0 1 0 ; 16 0 1 0 1 1 1 1 0 1 0 1 0 0 1 0 1 1 1 ; 17 1 0 1 0 1 1 1 0 1 1 0 1 1 0 1 0 0 0 ; 18 1 1 1 0 0 1 0 1 0 1 1 1 1 1 1 0 0 0 ; 19 1 0 1 0 1 1 1 0 1 1 0 1 1 0 1 0 0 0 ];

As the first set has many discordances, it was thought to be a good start for testing methods and to test program functionality. If the method could find a good (relative to an idealistic minimum) solution for this set, the program would be applied to dataset 3 which is much larger, with less discordances per module, in order to test the speed of the methods. Dataset 3 contains 693 modules with discordances. Not all modules will have a discordance, while some modules might have many discordances. Although this matrix is rather sparse, it requires more computation time and resources because of its size.

3.1.1 Defining large datasets

In this dissertation, the term “large dataset” is used often. But when is a dataset large, especially with modern day computers which have been upgraded to handle exceptional amounts of infor-mation in reasonably short times, such as a few hours or at most a week or two. In this study, a dataset was seen as large when it topped as few as 60 modules, because of the nature of the optimization problem.

(35)

3.1. DATA 35

To explain this, a few starting values need to be assigned or decided upon. The values used here are only example values. If the dataset had 60 modules that have to be scheduled into at most 12 groups, 12 being a good - not optimal - number of groups for a normal timetable, and all the possible ways of creating such a grouping is considered, how long will that take? The table below shows a few possible distributions of modules into groups, and the pattern it will follow if continued. These are not solutions, since there is no discordance matrix given. These are a representation of possible ways that the modules can be distributed over 12 groups, whether viable or not. It also shows, lastly, the distribution we will use for computation purposes, where the modules are evenly distributed between all the groups. There are many ways to group the modules for each of these distributions. Groups #: 1 2 3 4 5 6 7 8 9 10 11 12 Distribution 1: 60 0 0 0 0 0 0 0 0 0 0 0 Distribution 2: 59 1 0 0 0 0 0 0 0 0 0 0 Distribution 3.1: 58 2 0 0 0 0 0 0 0 0 0 0 Distribution 3.2: 58 1 1 0 0 0 0 0 0 0 0 0 Distribution 4.1: 57 3 0 0 0 0 0 0 0 0 0 0 Distribution 4.2: 57 2 1 0 0 0 0 0 0 0 0 0 Distribution 4.3: 57 1 1 1 0 0 0 0 0 0 0 0 .. . Distribution n: 5 5 5 5 5 5 5 5 5 5 5 5

The number of groupings that follow an even distribution starts by selecting 5 modules out of the 60, to place in group one. The order of these modules in the group doesn’t matter and no module can be selected twice. Therefore there are 60 combination 5 = 605 = 5461512 different selections for group one. For each of these different selections, there are 55 combination 5 ways to select 5 modules for the second group. Thus there are already = 605 55

5 = 5461512×3478761 = 1.89×1013

ways of placing modules into the first 2 groups. By continuing in this manner, each time selecting 5 modules out of the remaining modules for each of the 12 groups you get

60 5 55 5 50 5 45 5 40 5 35 5 30 5 25 5 20 5 15 5 10 5 5 5 

ways of placing the modules into the twelve groups. However, the order in which you place the groups themselves also doesn’t matter. The number of groupings therefore needs to be divided by 12!. This gives 60 5  55 5  50 5  45 5  40 5  35 5  30 5  25 5  20 5  15 5  10 5  5 5  12! = 1.9 × 10 48

groupings for distribution n, where the modules are evenly distributed. If you can form, and evaluate, one distribution in 0.001 milliseconds. It will require

1.9 × 1048× 0.001

1000 × 360 × 24 × 365 × 1000 = 2.95 × 10

32

millennia to consider all possible groupings for validity. This is only for the distribution where all the modules are evenly distributed.

(36)

36 CHAPTER 3. DATA OVERVIEW

The number of groups affect the number of solutions greatly, and if the number of groups considered is only 4, the time falls drastically to 3.7 × 1016 millennia. This is still a impracticable amount of time and getting all modules into 4 groups with the added constraints of no clashes is very unlikely. If you have 5 modules, all with discordances to all else, this is already impossible. If the computer you use is very good and needs less that a millionth of a second to create and evaluate, lets say a billionth of a second (on the long scale), 1×101 12 it will take 3.7 × 1010 millennia.

Consider also that we only looked at one of the distributions for both 12 and four groups, the one in which the modules can be evenly distributed over the groups. The number of possible solutions will only grow if you compute all the other patterns’ number of solutions and add it to this one as shown here for 12 groups:

Distribution 1: 60 60  Distribution 2: 60 59 1 1  Distribution 3.1: 60 58 2 2  Distribution 3.2: 60 58 2 1 1 1  ÷ 2! Distribution 4.1: 60 57 3 2  Distribution 4.2: 60 57 3 2 1 1  ÷ 2! Distribution 4.3: 60 57 3 1 2 1 1 1  ÷ 3! .. . Distribution n: 605 55 5  50 5  45 5  40 5  35 5  30 5  25 5  20 5  15 5  10 5  5 5 ÷ 12!

If it takes 2.95 × 1032 millennia to consider all the solutions for a dataset of only 60 modules

in 12 groups, consider the impossibility of evaluating all the possible solutions for a dataset of 700 modules, as was used in this study. Consider also the added solutions that will need to be considered if there is no set number of groups, and the groups need to dynamically change according to the constraints. It may well be that more that 12 groups are necessary which further increases the time required.

The results of the programs and the time used, when applied to the data, are presented and compared in Chapter 7. The evaluation of a method is done in Chapter 8 by comparing the number of groups that the method finds as the least number of groups. The time required to find this solution is also important, as is the adaptability of the method to changes in the data format.

(37)

Chapter 4

Implementation of the Proposed

method

This chapter contains a full discussion of the method developed by the author, referred to as the Proposed method (PM). This method was developed to be used at the NWU for examination scheduling. The use of simulated annealing was one of the ideas proposed by Professor J. Spoelstra. The method itself has since been edited substantially in order to implement it practically and improve upon it. This chapter starts with an overview of the method and algorithm depicting the idea behind the program and how decisions are made. It is followed by a section with a full discussion of the code given in the appendix, with some excerpts included in this chapter where necessary. A discussion of the program, reasons for certain chosen values and other options considered follows.

4.1

Overview of Method

The goal of this method is to create a grouping of modules with a minimum number of groups, satisfying the constraints. The input into this method is a one-mode block matrix, as discussed in Chapter 3. A list of positive integers is output where the index corresponds to the module and the value corresponds to the group it has been placed in. This final solution is referred to as the best found solution, since it is the best solution the program is able to produce . Similarly, any solution that is accepted by the method as valid is called a success, or successful solution.

Input: An N × N symmetric matrix, with entries {0,1} called ClashGrid

Output: A vector (also called list and solution) of length N with positive integer entries. - Interpretation: This is the representation of a grouping. The j-th entry indicates the group num-ber in which entry j is placed.

Programs:

• OptimizationProgram: This program is responsible for calling all the necessary procedures and contains the loop-structures.

(38)

38 CHAPTER 4. IMPLEMENTATION OF THE PROPOSED METHOD

– Input: The N × N matrix

– Output: The best found grouping list

• initialiseGreedyList: This program constructs an initial list using the greedy algorithm. – Input: The N × N matrix

– Output: A re-ordered N × N matrix ; an initial grouping list

• BadnessCalculator: This program computes a corresponding badness value for a given list. – Input: A grouping list ; the N × N matrix ; the number of preferred groups

– Output: A corresponding badness value ; the number of groups

• RandomChanger: This program changes a given list using random number generation – Input: A grouping list ; the iteration of the loop where this procedure is called – Output: A new grouping list ;

• AcceptanceFunction: This program decides if a given list is accepted or not

– Input: The profit between the last and current badness values ; a control parameter that defines the strictness

– Output: A {0,1} - yes/no answer

The method starts from an initial list generated by the greedy (or first-fit) algorithm (Sec-tion 4.4, p. 45). The proposed method then attempts to improve this initial solu(Sec-tion through a procedure that uses random number generation (Section 4.6, p. 48). After each iteration of random number generation, the solution’s badness is calculated as an indication of how good or bad this solution is in relation to the criteria (Section 4.5, p. 47). The previously accepted solution’s badness is subtracted from this solutions’ badness to compute the profit. A decision criterion, from simu-lated annealing, is used: if the profit is acceptable the new solution is accepted, otherwise reiterate from the previous solution (Section 4.7, p. 49).

(39)

4.2. ALGORITHM 39

4.2

Algorithm

A pseudo-code overview is given in Algorithm 1 below, followed by the detailed discussion in 4.3. The full code of the program is given in the appendix, Code 2.

Algorithm 1 Proposed method

1: n = numberOfModules

2: Apply the greedy algorithm to initialise a list 3: for 5 × do

4: Start from the best solution yet 5: if Only one clash then

6: see if one of the clashing modules can be moved to any other group to avoid a clash 7: end if

8: if One or more clash then 9: Add a group

10: end if 11: forn × 50 do

12: while(The # iterations <= maxIterations) and (# successful solution <= threshold) do 13: Choose strategy 1 or 2 with a 1:2 relation

14: if Strategy one is chosen then

15: Remove one group, adding the modules others 16: Call RandomChanger

17: Compute the badness of the new solution using BadnessCalculator 18: Use AcceptanceFunction to decide if the solution is accepted or not 19: Test newest solution against best, and replace if better

20: end if

21: if Strategy two is chosen then 22: Call RandomChanger

23: Compute the badness of the new solution using BadnessCalculator 24: Use AcceptanceFunction to decide if the solution is accepted or not 25: Test newest solution against best, and replace if better

26: end if 27: end while

28: if No viable solution was found then 29: increase the count of failures by one 30: end if

31: if Three failures have been reached then 32: Break the for loop

33: end if

34: Make the variable used in the decision stricter 35: end for

36: end for

4.3

Discussion of Procedure - OptimizationProgram

The first program under discussion is the program which calls all the other functions. This program is called the optimisation program, as it contains the iterations necessary for the optimisation procedures. The following sections discuss the reasons why certain values where chosen, or why loops are used as they are. This program receives the block matrix as input, and at the very end produces the best found solution as output.

Code 4.1: Excerpt from: Proposed method - OptimisationProgram.m

1 ...

2 numberOfWhiles = numberModules*100; 3 numberOfMaxSuccesses = numberModules*10; 4 statsControlParameterFactor = 0.6;

Referenties

GERELATEERDE DOCUMENTEN

of attribute values for genotype / (/=!,...,#) in environment j (/=! ... mean vectors, covariance matrices and mixing proportions, are estimated using maximum-likelihood methods.

fundamental methods in the systematic study of religion, and I will explicate what I believe self-identified sociologists, psychologists, anthropologists, and historians of religion

In applying cognitive linguistic theory in my analyses, an important question to answer will be how judgments about the use of classical rhetorical figures can be made

• For q = 10 and starting from a random initial configuration, determine the equilibration time for the heat bath algorithm and the Metropolis algorithm for T = 0.5 by plotting

A polar organic solvent mixture such as propylene carbonate and 2-aminoethanol is contacted with the hydrocarbon stream in a liquid-liquid extraction

Echter van deze vakwerkbouwfase werden geen sporen aangetroffen In de 16 e eeuw zal een groot bakstenen pand langsheen de Markt opgericht worden.. Van

The mycobacterial yield and time to positive culture following bedside inoculation into standard mycobacterial growth indicator tubes were compared with initial inoculation into an

Op 1 periode snijdt de grafiek van f de x-as twee keer.. Extra