Automatically providing feedback on the modularization quality of student programs

(1)

Automatically providing feedback

on the modularization quality of

student programs

Jeroen Terstall

10766030

Bachelor thesis Credits: 18 EC

Bachelor Opleiding Kunstmatige Intelligentie University of Amsterdam

Faculty of Science Science Park 904 1098 XH Amsterdam

Supervisors

dhr. drs. M.S. (Martijn) Stegeman, Jelle van Assema Faculty of Science

University of Amsterdam Science Park 904 1098 XH Amsterdam

(2)

Abstract

Providing feedback on the code quality of student programs is time-consuming. This research aimed to use coupling metrics ICP and CCBC to provide automatic feedback on the modularization quality of student pro-grams made in introductory programming courses. A tool was created that implements ICP and CCBC and provides feedback. The correctness of the feedback was determined through comparison with feedback given by TAs. The usefulness of the feedback was determined by observing students’ abil-ity to improve code with the assistance of the feedback. Students were able to improve code based on feedback provided by the tool. The tool is able to provide correct feedback on merge and move opportunities of classes and files. However, opportunities to split classes proved to be more difficult to provide feedback on.

(3)

1 Introduction

In addition to being graded on the correctness of their programs, students of pro-gramming courses are judged on the quality of their code. Code quality is an as-pect of software quality that concerns directly observable properties of source code [15]. However, providing feedback on code quality requires a significant amount of time. This research aimed to reduce the amount of time required by creating a tool that automatically provides feedback on one aspect of code quality, modular-ization. The purpose of the tool was to help students of introductory programming courses improve the modularization quality of their code. This research focused on students of introductory programming courses because the means to evaluate and receive feedback at any stage of development will be most beneficial to novice programmers. These students recently started programming and do not possess the programming skills and experience to write well-modularized software.

Similar research focused on assessing the modularization quality in student pro-grams by observing unwanted dependencies [3]. However, no research has at-tempted to provide automatic feedback on the modularization quality of student programs. It is important that research is conducted to explore methods to provide automatic feedback on modularization quality. If the research is successful, the amount of time spent providing feedback on code quality will be reduced. Addi-tionally, students will gain the means to receive feedback on their work prior to final grading. Lastly, this research will identify if methods used to assess modular-ization quality can be utilized to provide feedback.

The structure of the thesis is as follows. First, the theoretical background of the re-search is discussed. Then, the rere-search question is formulated based on the found literature. Next, the method to answer the research question is outlined. After-wards, the results of the chosen methodology are shown. Subsequently, the discus-sion analyzes the observed results. Then, the research question and sub-questions are answered in the conclusion. Finally, directions for future work are suggested.

(6)

2 Theoretical Foundation

2.1 Coupling

Research defines a well-modularized program as having two properties: high co-hesion and low coupling [13]. Coupling is defined as the measure of strength of interconnection between modules [17]. The degree of coupling determines the probability that a modification in one module affects another module. Cohesion is the degree of relatedness between internal components of a module [16]. To nar-row down the scope of this research, coupling was chosen as the means to provide feedback on student programs. Coupling was chosen over cohesion because no cohesion metric currently exists that captures all aspects of cohesion [1].

Several books on software design have mentioned coupling. Budgen [2] described coupling as a useful quality measure for assessing the extent of modular structure in software. Martin [8] stated that loose coupling isolates elements of systems to allow for change to be isolated. McConnell [9] stated that modules should be cou-pled as such that allows for effortless reusability.

The degree of coupling can be measured by metrics. Coupling metrics can be di-vided in two categories: conceptual and structural coupling metrics [12]. Structural coupling metrics use the structure of a program as the basis for measurement. Con-ceptual coupling metrics capture semantic information stored in the source code of modules and compares it to other modules. Conceptual coupling metrics aim to determine the relatedness in tasks that modules perform. The closer the similarity in semantic information stored in modules, the more conceptually related the task.

2.2 Structural Coupling Metrics

Several metrics were found that measure the degree of structural coupling in a pro-gram. Coupling between objects (CBO) [4] and direct class coupling (DCC) [10] determine the number of classes that are coupled by method invocations or shared variable instances. Response for a class (RFC) [4] determines the set of methods called in response to a received message. Message passing coupling (MPC) [7] determines the number of external method calls made in a module. Data abstrac-tion coupling (DAC) [7] determines the number of attributes in a class that has an external class as their type. Lastly, information-flow-based coupling (ICP) [6] cal-culates the number of method calls to external modules weighted by the amount of parameters passed.

2.3 Conceptual Coupling Metrics

Conceptual coupling is measured using conceptual coupling between classes (CCBC) as defined in the paper by Bavota et al. [1]. CCBC vectorizes the source code of

(7)

each method in a module using Latent Semantic Indexing (LSI). Then, the concep-tual similarity is determined by calculating the cosine of the angle of the vectors of all combinations of methods between two modules. The higher the conceptual similarity, the more likely it is that the modules perform similar tasks.

2.4 Choosing a Coupling Metric

The coupling metrics described were ranked in a paper by Poshyvanyk et al. [12]. The coupling metrics were compared on how well they perform change impact analysis. The metrics were used to predict classes which could be changed in a large scale open source software system. If a class needed to be changed, the per-ceived coupling to other classes was calculated to determine the impact a change in the class would have. The results were evaluated by comparing the suggested classes to the history of changes made in the open source software system.

The results can be utilized to determine the usefulness of coupling metrics for this research. This research aimed to retrieve the degree of coupling of student programs to determine if they should be refactored. The research by Poshyvanyk et al. detected classes which change together as a result of the degree of coupling. The motivation for determining the degree of coupling is different but the imple-mentation is not. Because of this, the results can be utilized to choose the metrics for this research. The results indicated that CCBC performed the best out of the conceptual coupling metrics and ICP out of the structural coupling metrics. These metrics were chosen to create a tool to provide automatic feedback on modulariza-tion quality.

2.5 Feedback

Feedback can have a positive or negative impact on the improvement of a student. Constructive feedback can aid the student in gaining knowledge while bad feed-back will not promote learning [5]. In order to learn from feedfeed-back, it should state three things. How a task should ideally be performed, the current level in rela-tion to ideal performance, and to provide the means to close the ”gap” between the two levels [14]. For this research this can be interpreted as follows. Feedback on modularization quality should state three things. The ideal structure of a pro-gram, the current areas that need improvement, and how the student can improve the modularization quality of the program.

2.6 Related Work

One paper was found that researched how to assess the quality of modularization in student programs. Research by Cai et al. [3] utilized a design structure matrix (DSM) to visualize the ideal modular structure of an assignment. The DSM was compared to the student implementation to detect unwanted dependencies. The research concluded that 74% of the 85 student submissions introduced unwanted

(8)

dependencies.

However, DSMs are not suited to provide automatic feedback. The DSM does not allow for alternative design choices with high modularization quality. In larger assignments, several ways exist to structure a project. It is hard to capture all structures using DSM. Additionally, DSMs identify differences with the proposed design. However, this means that feedback can solely be given on the student pro-gram in relation to the proposed design. General feedback on the structure of the program which is correct regardless of implementation, can not be given. Because of this, an alternative measure of modularization quality was chosen.

(9)

3 Research Question

The goal of this research was to create a tool that automatically provides construc-tive feedback on the modularization quality of student programs made in introduc-tory programming courses. The literature indicated that high cohesion and low coupling are properties of a well-modularized program. Measuring the degree of coupling was chosen to assess the modularization quality of student programs. The selected coupling metrics were ICP and CCBC. The research question was formu-lated as follows:

How can ICP and CCBC contribute to automatically providing constructive feed-back on the modularization quality of code made by novice programming students?

To assist in answering the research question, the following sub-questions were de-fined:

Can ICP and CCBC be used to provide correct feedback on code made by novice programming students?

Can ICP and CCBC be used to provide useful feedback on code made by novice programming students?

(10)

4 Method

4.1 Method Outline

To answer the research question and its sub-questions, the following approach was taken. First, extensive research of the existing literature was conducted. Based on found literature, metrics to measure the degree of coupling were chosen. A tool that utilizes the metrics was created as a proof-of-concept to provide feedback on correlations between values of the metrics and observed mistakes in student programs. The correctness of feedback provided by the tool was evaluated with the assistance of TAs. The usefulness of the feedback provided by the tool was evaluated with assistance of students.

4.2 Student Programs Used

The student programs selected to tailor the tool towards, were assignments from a heuristics course written in Python. In this course, students chose problems to be solved using techniques called heuristics as assignments. Heuristics are problem solving techniques with the objective to solve a problem in a reasonable time frame. These students took prior programming courses. However, this was the first course in which the students were required to structure a larger scale project and make use of modularization. For this reason, student programs made for this course were selected to be used in this research. In total, 91 student programs were made available.

4.3 Analyzing the Source Code

Both coupling metrics required analyzing of the source code. Analyzing source code can be done dynamic or static. Dynamic analysis of source code is performed at runtime, while static analysis does not require the code to be run. Dynamic anal-ysis of the source code was used to detect dependencies between modules needed for the ICP metric. Dynamic analysis was chosen for this task because module names can be given aliases and can be renamed in Python. Determining which module a function belongs to is difficult without dynamically examining the stack.

Static analysis of the source code was utilized to implement the CCBC metric. Due to student programs often being inefficient, dynamic analysis of the source code is more time-intensive than static analysis. Because of this, static analysis is chosen in all cases where it does not lead to a significantly more difficult imple-mentation.

Static analysis was carried out using an open source Python library named Red-Baron [11]. RedRed-Baron is a Python library which programmatically parses source code into an abstract syntax tree (AST). RedBaron is able to modify the AST and return modified source code. Other libraries exist that can modify the AST tree.

(11)

However, these libraries do not return modified source code. Additionally, Red-Baron can be utilized to retrieve source code from various nodes in the AST. These nodes represent various types of code. An example of a node, is a DefNode which represents a function or method. Queries can be done in RedBaron to retrieve all nodes of a certain type and retrieve the corresponding source code.

RedBaron is not able to parse all student programs. Because of this, several student programs can not be given feedback on and are excluded from the research. Out of the 91 student programs, 47 could be parsed by RedBaron.

4.4 Implementation of Coupling Metrics

This section will discuss the implementation of ICP and CCBC.

4.4.1 Information-flow-based Coupling (ICP)

ICP, as defined by Lee et al. [6], measures structural coupling by detecting method and function calls made between two modules. The amount of information that flows through these calls determines the degree of coupling between two modules. The information flow is measured by counting the amount of parameters passed in these function and method calls. The following formula is used to calculate the ICP between two classes ciand cj. The same formula applies to files.

ICPi→ j=

|calls(ci,cj)|

∑

k=1

p(call(ci, cj)k)

where p(call(ci, cj)k) is the number of parameters in the k − th call from ci to cj.

To prevent the impact of loops onto the total value of ICP, every function or method call is counted once.

To retrieve the parameters passed in function and method calls between each mod-ule, a Python decorator was used. Python decorators were chosen because decora-tors allow for functionality to be added to a function or method without modifying the original source code. The decorator is added to every DefNode in the student program using RedBaron. This allows for examining the stack when a function or method is called to determine the information of the call. This information reveals the module which the calling function belongs to and which the called function belongs to. If these two modules differ, the line number is retrieved to determine if the function call has been counted before. If the modules are different and the function has not been counted before, the amount of parameters is added to the ICP value.

(12)

4.4.2 Conceptual Coupling Between Classes (CCBC)

CCBC, as defined by Bavota et al. [1], is based on semantic information captured by identifiers and comments in source code. Originally, this metric is used to mea-sure the degree of conceptual similarity between two classes. However, CCBC was additionally used to measure the conceptual similarity between two files. This is done because some students structure their programs in files instead of classes.

A measure to calculate the conceptual similarity between methods is needed to cal-culate CCBC, conceptual coupling between methods (CCM). To measure CCM, comments and identifiers are retrieved from functions and methods using Red-Baron nodes. Latent semantic indexing (LSI) is used to create a term-by-document matrix that captures the distribution of comments and identifiers used in each method. LSI vectors are created using the TfidfVectorizer and TruncatedSVD func-tions from the scikit-learn open source Machine Learning library. To measure the similarity between two methods, the cosine of the angle between the vectors is calculated as follows.

CCM(mi, mj) =

~ mi· ~mj

k~mik · k ~mjk

where ~mi and ~mj are the vectors created using LSI of the methods i and j

respec-tively, and k~mik and k ~mjk are the Euclidean norms of the vectors of method i and

j respectively. A value between zero and one is returned by CCM. The closer the value is to one, the more similar two methods are in the tasks they perform. CCBC is calculated by averaging the CCM of all method combinations between two classes or files. This formula is defined as follows.

CCBC(ci, cj) =

∑mh∈ci∑mk∈cjCCM(mh, mk)

|ci| × |cj|

where |ci| and |cj| is the number of methods in ciand cj respectively.

4.5 Feedback Categories

For each program, at most four results are retrieved. Mainly, ICP between classes, ICP between files, CCBC between files, and CCBC between classes. Analysis on the code was done after the coupling metrics were implemented. For each student program, the values of the metrics were observed and the code was examined for areas of improvement. Based on correlations between values of the metrics and corresponding improvements, thresholds were selected to determine the feedback that should be given. The following categories, as shown in table 1, were defined as a result of the analysis of the student programs.

(13)

Feedback Category Located in Indicates

A. Extremely High CCBC files - Duplicate code

B. High CCBC files - Similar tasks

C. Extremely High CCBC classes Same file Merge possibility D. Relatively high ICP and CCBC classes Separate files Dependent & similar E. Relatively high ICP and CCBC files - Dependent & similar F. High ICP and High CCBC classes Separate files Extremely coupled

G. High ICP classes Separate files Too dependent

H. High CCBC classes Separate files Similar tasks

I. Extremely High CCBC classes Separate files Duplicate code

J. Low CCBC and ICP classes Same file Unrelated

K. Low Average CCBC and ICP classes Same file Multiple unrelated

Table 1: Feedback Categories

Category A: Extremely High CCBC values between files

An extremely high CCBC value (CCBC > 0.75) indicated that duplicate code was found between two files. Based on this observation, the following feedback was given:

The following combination(s) of files seem to contain a lot of duplicate code be-tween them. It would improve the maintainability of your code to create a new file which will contain the duplicate code and which will execute the task that the duplicate code executes. This ensures that this particular piece of code will be reusable and modifiable for multiple files. Aim to create files in which every file has its own separate task. An alternative solution, would be to merge the files into one.

+ List of all file combinations where this feedback applies

Category B: High CCBC values between files

A high (CCBC > 0.45) but not extremely high CCBC value (CCBC <= 0.75) indi-cated that some tasks performed in the files were similar and could be outsourced to a separate file:

The following combination(s) of files seem to contain parts of code between them that perform similar tasks. It would improve the maintainability of your code to identify these tasks and create a new file or class that will exclusively execute this task which will then be reusable by multiple files. Aim to create files where every

(14)

file has its own well-defined task.

Category C: Extremely high CCBC values between classes located in the same file

An extremely high CCBC value between classes (CCBC > 0.75) located in the same file indicated that both classes performed (nearly) the same task and most likely contained duplicate code between them:

The following combination(s) of classes which are located in the same file seem to contain a lot of duplicate code or perform (nearly) the exact same task. It would be better to merge these two classes into one class to improve the modularization of your code. Ensuring that each class has only one well-defined task that it performs will make sure that your code will be easily understood by others and is easily maintainable.

+ List of all class combinations where this feedback appliesCategory D: Rela-tively high ICP and CCBC values between classes located in separate files When the CCBC and ICP values were relatively high but not high enough to indi-cate problems on their own and the classes were loindi-cated in separate files (0.25 <= CCBC<= 0.45, 15 <= ICP <= 100), it indicated that the classes should share the same file location because they performed similar tasks:

The following combination(s) of classes seem to be performing similar tasks and are very dependent on each other, which means that methods from one class calls methods from the other class excessively. Because they are located in different files, it means that changes to one class can not be made without having an effect on a different file. If structurally possible, it would be best to move these classes into the same file to ensure that dependencies between files is limited and that each file contains classes that perform tasks which are similar. This ensures that your program will be easily modifiable and maintainable.

+ List of all class combinations where this feedback applies

Category E: Relatively high ICP and CCBC values between files

The same applied to files which had relatively high values in both metrics (0.25 <= CCBC<= 0.45, 15 <= ICP <= 100). These files seemed to execute similar tasks and were highly dependent. The files should be merged if possible or rewritten in some way:

The following combination(s) of files seem to perform similar tasks and are very dependent on each other. If possible, it would be best to merge these files into one file or to rewrite them in such a way that each file performs a well-defined task and

(15)

is not dependent on other files. This ensures that your code can easily be modified without affecting other files and it will ensure that the structure of your program is easily understood by other developers.

Category F: High ICP and CCBC values between classes located in separate files

A high ICP value between classes (ICP > 100) indicated that these classes should be moved to the same file location to localize dependencies. The CCBC value was taken into account. If the same classes had an extremely high CCBC value (CCBC > 0.75), the classes should be merged into one class:

The following combination(s) of classes seem to be too dependent on each other to function without the other. These classes also seem to perform (nearly) the same task or contain duplicate code between them. To improve the maintainability of your code, it would be better to move these classes into the same file so that depen-dencies are limited to one file. If possible, it would be best to merge these classes into a single class to make sure that each class performs its own well-defined task.

Category G: High ICP value between classes located in separate files

A high ICP value (ICP > 100,CCBC <= 0.75) without a high CCBC value (CCBC <= 0.75) resulted in the following feedback:

The following combination(s) of classes are extremely dependent on each other. To improve the maintainability of your code, it would be best to move these classes into the same file so that dependencies are limited to classes in the same file and changes can be made to these classes without affecting other parts of your pro-gram.

Category H: Extremely High CCBC values between classes located in sepa-rate files

An extremely high CCBC value (CCBC > 0.75) indicated that classes contained duplicate code or performed (nearly) the same task. The classes should be merged into one class:

The following combination(s) of classes seem to perform the same task conceptu-ally and are located in different files. This means that either these classes contain a lot of duplicate code between them or these classes perform (nearly) the same task. To improve the maintainability of your code, it would be best to move these

(16)

classes into the same file and if possible, to also merge them into a single class. This will ensure that classes with similar tasks are grouped together and that each class has its own well-defined task.

Category I: High CCBC values between classes located in separate files A high (CCBC > 0.45) but not extremely high (CCBC <= 0.75) CCBC value in-dicated that the classes should be moved into the same file:

The following combination(s) of classes seem to perform similar tasks and are located in separate files. It would be best to move these classes into one file to make sure that each class with a similar task is grouped in a single file which will improve how easily your code is understood by others.

Category J: Low Average CCBC and ICP values between classes located in the same file

Low average coupling between all combinations of classes in the file (CCBC <= 0.12, ICP <= 10) indicated that the classes in the file should be split in the file:

Classes in filename.py are not dependent on each other nor do they seem to per-form similar tasks. If the structure of your program allows it, it would be better to split the classes in this file into separate files. Aim to create files which contain classes that that have similar tasks or which work towards the same goal. Each file should have a well-defined task that it executes and the classes should reflect that. Class combination(s) that can be put in separate files are:

+ List of class combinations which can be split from each other

Category K: Low CCBC and ICP values of classes located in the same file Low coupling between classes (CCBC <= 0.12, ICP <= 10) but not a low aver-age coupling in the file occasionally indicated that classes could be split. More careful feedback was given in contrast to when low average coupling is observed. The feedback indicated that changes could be made if the structure of the program allows it:

The following class combinations(s) seem to perform dissimilar tasks and are not dependent on each other but are located in the same file. If possible, it would be better to split these classes into separate files. Only do this if doesn’t interfere with dependencies of other classes in the file. Aim to create files that contain classes that perform similar tasks and which are not too dependent on classes in other files.

(17)

4.6 Evaluation

The evaluation aimed to assess the correctness and usefulness of the feedback pro-vided. The correctness of the feedback was determined through its observed sim-ilarity with feedback given by TAs. The usefulness of the feedback was assessed through its observed usefulness in assisting students improve their code and expand their knowledge.

4.6.1 Teaching Assistant Evaluation

The first part of the evaluation session aimed to determine if feedback provided by the tool was similar to feedback given by TAs. The similarity of the feedback provided was determined using the following aspects.

• If the TA has provided feedback on the same files or classes • Similarity in reasons for giving the feedback

• Similarity in the suggested changes

• Similarity in motivation given, the benefit of the suggested change

The evaluation proceeded as follows. Three TAs were shown three student pro-grams originating from the heuristics course that the tool was tailored towards. The selected student programs all received at least one piece of feedback from the tool. Additionally, as much categories as possible was aimed to be incorporated in these three student programs.

The student programs contained the following categories of feedback. The first program contained cases of extremely high CCBC values between classes (cate-gory I), high CCBC values between classes (H), and low average coupling CCBC and ICP between classes (K). The second program contained cases of extremely high CCBC between files (A), high CCBC between files (B), and low CCBC and ICP between classes (J). The final program was selected because it was a program in which all code was stored into one file. This program contained a case of low CCBC and ICP between classes (J).

The feedback given on these student programs was not shown to the TAs. With permission of the TAs, the evaluation session was recorded. The TAs were asked to provide feedback on the student programs with the following instructions:

”You will be shown three student programs from the heuristics course, written in Python. It is your job to provide feedback on the following aspect. How well the

(18)

design principle, separation of concerns, is applied. Separation of concerns as I perceive it, is how well the program is divided into classes and files in such a way that each file and class performs its own well-defined task and each file or class is not too dependent on other files and classes. Try to articulate your thought process out loud.”

Additional questions were asked throughout the session to ensure that the aspects required to judge the similarity of the feedback were named by the TAs.

4.6.2 Student Evaluation

The second part of the evaluation session aimed to determine if feedback given by the tool can be used by students to improve the modularization quality of code. Specifically, the following aspects were judged:

• The students’ understanding of why the feedback was given.

• The students’ ability to identify the specific area to which the feedback ap-plies to

• The students’ ability to improve the code after reading the feedback • The students’ understanding of the motivation behind the change

In addition to these aspects, the students were asked about their overall impressions of the feedback after the evaluation session. The evaluation session was conducted as follows. Two programming students were selected from a programming minor course. These students were provided with three partial student programs from the heuristics course. Each partial student program contained one to two files and was provided with feedback given by the tool. The student programs were a subset from the student programs which were shown to the TAs. Each piece of feedback originated from a different feedback category. The selected categories were A, C, J, and K. Two cases of high coupling and two cases of low coupling.

The students received the following instructions and were shown the partial stu-dent programs with feedback afterwards:

”You will be shown a selection of Python files and a piece of feedback which is given on those files. This feedback is specifically aimed at judging how well the code is split into files and classes. I ask you to read the feedback and study the files. Then, I wish to know two things. Why do you think this feedback is given on this piece of code? And, do you have any idea how you would improve the code after reading the feedback? Try to articulate your thought process out loud.”

Additional questions were asked throughout the session to ensure that enough in-formation was retrieved from the session. The session was recorded with permis-sion granted by the students.

(19)

5 Results

5.1 Teaching Assistant Evaluation

5.1.1 First TA Evaluation Program

The general structure of this program is illustrated in appendix A. Every file in the program contained a Car class. The first part of the feedback indicated that these Car classes had extremely high CCBC values between them. This indicated that they performed nearly the exact same task and should be merged and moved into the same file. Multiple files contained a PQueue class. The second part of the feedback stated that the PQueue classes had a high CCBC value and performed very similar tasks. It was suggested to move these classes into the same file. The final piece of feedback given by the tool indicated that multiple files had low aver-age coupling between its classes. Each of these files contained the following three classes: Car, PQueue, and Game. The tool suggested to split the Car and Game class from the PQueue class in each of the files due to not being similar in tasks performed nor being dependent on each other.

The following feedback was given by the TAs. All TAs indicated that the Car classes, located in multiple files, contained duplicate code and should be merged into one class. This would allow for reuse of the code and would minimize the amount of duplicate code in the student program. Furthermore, all TAs indicated that the PQueue classes could also be merged for the same reasons.

Next, all TAs indicated that the Game class, located in multiple files, could be merged into a single class. They indicated that the variations in each Game class could be solved through the use of a base class and sub-classes. The main reason given for this improvement was to minimize the amount of duplicate code in the program and the ability to reuse the Game class in multiple locations.

All TAs indicated that the Car, Game, and PQueue classes should be split in sepa-rate files. TA one and two indicated that each of the classes should receive its own file. No specific reason or motivation was given by TA one and two as this feed-back flowed out of the merging of classes mentioned above. TA three indicated that because Car and Game both pertain to the Grid, these classes should be placed into the same file and PQueue should be placed into a separate file. This would improve the readability of the program and how easily it is understood by others.

Finally, TA three gave feedback on the location of various functions. TA three indicated that there were a significant number of functions that should be moved into a helper file. Additionally, TA three indicated that the Game class could be split into multiple classes. A class containing the algorithm and a class containing the board representation. Finally, TA three indicated that a separate main file could

(20)

be created with several functions from the various files.

The feedback by the TAs were judged on the four aspects mentioned in the method. The results are shown in table 2.

Feedback given by Tool given by TAs Similar reason Similar change Similar motivation

Merge Car classes Yes 3/3 3/3 3/3 0/3 Merge PQueue classes Yes 3/3 0/3 0/3 0/3 Split Car, Game, and Pqueue Yes 3/3 1/1 (TA3) 1/3 (TA3) 1/1 (TA3)

Merge Game classes No 3/3 - -

-Relocation of functions No 1/3 (TA3) - - -Split Game class No 1/3 (TA3) - -

-Table 2: Results first TA evaluation program

5.1.2 Second TA Evaluation Program

The general structure of this program is illustrated in appendix B. The feedback provided by the tool was as follows. The first part of the feedback indicated that several algorithm files were located in this program which contained duplicate code between them. Extremely high CCBC values were detected between these files and the feedback was given to outsource the duplicate code to a new file. The second part of the feedback indicated that there were files with high CCBC values which contained code that performed similar tasks which should be outsourced. The third and final part of the feedback indicated that classes with low coupling could be split from each other. Various Fly classes could be split from the Stack and Queue class located in the same file.

All TAs noticed that the algorithm files contained code which was duplicate. TA one suggested to retrieve the duplicate code and place it into a separate file which handles the selection of algorithms through parameters. TA one said: ”separation of concerns would be better applied with this change”. TA two and three indicated that one of the functions in each file was identical and could be moved into a data structure file. The actual algorithms could remain in separate files. TA three addi-tionally stated that it was possible to merge multiple files into one. However, the current structure of each algorithm having its own file was correct as well.

Furthermore, TA two indicated that the Stack and Queue classes could be split from the Fly classes into a data structure file. Because the tasks performed were dissimilar. TA three indicated that the current structure of placing all classes into one file is not wrong. However, TA three did indicate that the Stack and Queue

(21)

class could be split from the Fly classes.

Additional feedback was given by TA three. The various Fly classes could be merged into a singular fly class because the added functionality was minimal and the fly object represented in each class stays constant. The other TAs mentioned that the use of a base Fly class and sub-classes was correct.

The results of comparing the feedback are shown in table 3.

Feedback given by Tool given by TAs Similar reason Similar change Similar motivation Duplicate code between files Yes 3/3 3/3 3/3 3/3 Similar tasks between files Yes 3/3 0/3 3/3 0/3 Split Stack and Queue from Fly Yes 1/3 (TA1) 1/1 1/1 0/1 Merge Fly classes No 1/3 (TA3) - -

-Table 3: Results second TA evaluation program

5.1.3 Third TA Evaluation Program

The general structure of this program is illustrated in appendix C. The feedback provided by the tool was as follows. All code was located in one file and contained five classes: Board, HashBoard, Car, HorCar, and VerCar. The tool indicated that the Car classes could be split from HashBoard due to being dissimilar and not be-ing dependent on each other.

TA one indicated that if the student had separated the program into multiple files, it would be properly structured. TA one indicated that the code was structured in such a way that no functions were reused between classes/functions. All TAs mentioned that separate files needed to be created to improve the readability of the program. TA two and three indicated that a separate file needed to be created for classes, algorithms, visualization, and initializing the program.

TA three mentioned additional improvements. TA three indicated that the Hash-Board and Hash-Board classes could be merged into one file. The three Car classes could additionally be merged into one class because they perform nearly the same task.

The results of comparing the feedback of TAs to feedback given by the tool are shown in table 4.

(22)

Feedback given by Tool given by TAs Similar reason Similar change Similar motivation split HashBoard and Car classes Yes 0/3 - - -split into multiple files No 3/3 - - -merge HashBoard and Board No 1/3 (TA3) - - -merge Car classes No 1/3 (TA3) - -

-Table 4: Comparison third TA evaluation program

5.1.4 Overall performance TA evaluation

Out of seven pieces of feedback given by the tool, six were additionally given by the TAs. Out of these six programs, four were given similar reasons to at least one of the TAs. A similar change was suggested five out of six times. Similar motivation was given two out of six times. The tool provided one piece of feedback not given by any of the TAs. The tool did not provide seven pieces of feedback which were provided by at least one of the TAs.

5.2 Student Evaluation

5.2.1 First Student Evaluation Program

The feedback provided to this program, consisting of two files, belonged to cate-gory A, extremely high CCBC value between files. The specific feedback provided is shown in appendix D. Student one identified that the feedback was given because there was a lot of duplicate code between the files. After examination of the code, the suggestion to create a file to outsource a duplicate function and variable was given. Additionally, student one indicated that two files made use of a stack and queue which were highly similar. The stack and queue could be merged into a sin-gle class and outsourced to a separate file. Student one identified parts of the code that were similar but not identical. The suggestion was made to not modify these parts. When asked about the motivation for the changes suggested in the feedback, student one indicated that the maintainability would improve and it allowed for changes to be made in one location.

Student two identified that duplicate code was the reason the feedback was given. The student correctly indicated that the code could be improved by creating sepa-rate files which can be imported. After examining the code, student two noticed a duplicate function which could be outsourced to a separate file. Code with slight differences were found to be acceptable by the student. Improving the maintain-ability was given as the main motivation for the changes suggested in the feedback.

Table 5 shows the results of evaluating the students’ ability to incorporate the feed-back.

(23)

Student 1 Student 2

Understood reason Yes Yes

Identified area of improvement Yes Yes

Able to improve code Yes Yes

Understood motivation Yes Yes

Table 5: Evaluation first student program

5.2.2 Second Student Evaluation Program

The feedback provided to this program, consisting of one file, belonged to cate-gory J, low CCBC and ICP values between classes in the same file. The specific feedback provided is shown in appendix E. Student one indicated that the feed-back was given because the file contained a lot of classes in the same file which were not related. The student mentioned that the classes should be structured in such a way that conceptually related classes are placed in the same file. Student one indicated that the feedback pointed towards the Stack and Queue classes being different from the various Fly classes. The student indicated that the Stack and Queue classes should receive its own file. Additionally, the student indicated that the readability would improve by ensuring that classes which are related are put together. The changes would make sure that classes can be found where they are expected to be found.

Student two indicated that the feedback was given because unrelated classes were located in the same file. Additionally, the student indicated that the classes should be split into separate files. Easier to find classes, was the motivation given for the suggested change. If a combination of related classes are placed together, it is im-mediately noticeable that they perform similar tasks. Initially, the student did not understand the examples given and could not suggest an improvement. After ex-planation of the examples, however, the student indicated that the Queue and Stack classes could be placed in a separate file.

Table 6 shows the results of evaluating the students’ ability to incorporate the feed-back.

(24)

Student 1 Student 2

Identified area of improvement Yes No

Able to improve code Yes No

Understood motivation Yes Yes

Table 6: Evaluation second student program

5.2.3 Third Student Evaluation Program

The third student program, consisting of two files, was provided with two pieces of feedback. The first feedback belonged to category C, an extremely high CCBC value between class. The second feedback belonged to category K, a low average coupling between classes in a file. The specific feedback provided is shown in ap-pendix F. After reading the first part of the feedback, student one understood that multiple classes were located in the two files with a significant amount of overlap in tasks. The student identified two classes not mentioned in the feedback for this evaluation but which were given in the complete feedback of the program. The student indicated that the feedback applied to those classes as well. The student indicated that the code could be improved by merging the classes mentioned in the feedback and implementing small differences through the use of variables. The motivation given was to minimize duplicate code.

Student two identified that there were multiple classes which performed similar tasks. The student indicated that the classes should be merged in some way. The motivation behind the change was unclear. The examples given needed additional explanation before the student could identify the specific area the feedback applied to. After explanation, the student indicated that the classes mentioned in the feed-back could be merged into a separate file.

After reading the second part of the feedback, student one indicated that the feed-back was given because the classes performed dissimilar tasks and were located in the same file. Student one stated that the Car and Game class could be split from the PQueue class because the tasks that they perform were unrelated. This would ensure that conceptually related classes are found in the same files.

Student two identified the changes needed according to the feedback. The stu-dent indicated that the Game and Car classes could be split from the PQueue class. The student explained that because the Game class was large, it should receive its own file. This was not the correct reason. Additionally, Car should be placed into the same file as Game because it is used in the Game class. The motivation behind

(25)

the change was unclear to the student. Table 7 shows the results of evaluating the students’ ability to incorporate the feedback.

Student 1 Student 2

First feedback part

Identified area of improvement Yes No

Able to improve code Yes No

Understood motivation Yes No

Second feedback part

Understood reason Yes No

Identified area of improvement Yes Yes

Able to improve code Yes Yes

Understood motivation Yes No

Table 7: Evaluation third student program

5.2.4 Impressions by Students

Student one indicated that the feedback was easy to read. The student stated that it was helpful that the feedback gave reasons for a change. The student explained that novice programmers need a reason to make changes to code when the code is functional.

Student two indicated that the feedback was clear, but too long. This made it hard to retrieve the essence of what was wrong with the code. The student preferred to receive one line which indicates the mistake of the code and another which indi-cates the reason why the mistake was wrong.

(26)

5.2.5 Overall Performance

Table 8 summarizes the results of the student evaluation.

Results from both students

Understood reason 7 out of 8

Identified area of improvement 6 out of 8

Able to improve code 6 out of 8

Understood motivation 6 out of 8

(27)

6 Discussion

6.1 Teaching Assistant Evaluation

Several observations can be made based on the results obtained from the evalu-ation. First of all, the TAs did not make a distinction between the high CCBC value categories (B and H) and extremely high CCBC value categories (A and I) for classes and files. This became apparent in the evaluation of the first and second student programs. For both high CCBC values and extremely high CCBC values, the TAs indicated that duplicate code was present and could be outsourced. This in-dicates that these categories should be merged into a single category that mentions duplicate code between files and classes. If the categories would be combined, the correct feedback would be provided. Additionally, this distinction was the cause that wrong reasons and improvements were given in feedback for the first and sec-ond evaluation programs.

Seven false negatives were encountered. The first false negative was provided for the first program. A merge of the various Game classes was suggested by the TAs. However, the metrics did not indicate a significantly high CCBC value (0.24) or ICP value (0) between the game classes. This indicates that the current metrics can not detect the possible merge between the Game classes. The Game class contained a significant amount of methods. Various methods were duplicate when pertaining to the grid but methods concerning the algorithms were different. This means that the CCM value is high for some methods but a lower CCBC was observed. A solution could be to suggest the creation of a sub-class if a significant amount of methods with high CCM values is observed. Another solution could be to choose the highest CCM as the CCBC value between two classes.

The relocation of specific pieces of code was not found by the tool. The current metrics are not able to detect this improvement. However, only TA three men-tioned the change. This means that relocation of code is of a lower priority for novice programming students than minimizing reuse of code and the location of classes. Future implementations of a tool might incorporate metrics to detect re-location of code but the results indicate that this is of lower priority for novice programming students. This might be a feature for more advanced students.

TA three, on three separate occasions, provided feedback on possible merges be-tween classes. For the second student program, TA three mentioned that the base Fly class and Fly sub-classes could be merged into one Fly class. The other two TAs, however, specifically indicated that the separation into different Fly classes was correct. The same applies to the merging of Car classes in the third student program. The other TAs did not indicate the same improvement. This indicates that merging classes which are too specific is personal preference and the fact that the tool did not provide this feedback is acceptable for now.

(28)

However, as observed with the merging of the Game classes, the metrics did in-dicate relatively high to high CCBC values between the various Fly classes and between the various Car classes. This indicates that a relatively high CCBC value could indicate too similar classes. An additional category could be tested which provides feedback on relatively high CCBC values. However, this might increase the amount of false positives detected by the tool on other student programs. An additional metric might be needed to distinguish between merge possibilities for relatively high CCBC values.

The merge of HashBoard and Board in the third student program was also indi-cated by TA three. These two classes did not exceed the relatively high CCBC threshold. This indicates that providing no feedback in this case could be correct. Additionally, TAs one and two did not mention merging HashBoard and Board. More TAs should review this file to filter out personal preferences of TAs and to determine if the classes should be merged.

The final false negatives were located in the third program. The TAs indicated that the program, consisting of one file, could be split into multiple files. This can not be detected with the current metrics. However, cohesion metrics might be well suited to determine files and classes which should be split into smaller files and classes. Bavota et al. [1] indicated that cohesion is suited for the task but no cohe-sion metric is found to incorporate all aspects of cohecohe-sion.

The tool provided one false positive. A suggestion to split the various Car classes from the HashBoard classes was given for the third student program. These classes were semantically unrelated but a third class was located in the file to which both classes were related, the Board class. The feedback did mention that the classes could only be split if the structure of the program allowed for it. However, addi-tional work could be done to prevent giving this feedback if mutual high coupling to another class is determined. If no other high coupling values to a third class are detected, a suggestion to split can be given. An alternative solution would be to only include feedback to split classes if the average coupling in the file is low which performed well in the evaluation.

The motivation given by the tool matched two out of six times. The main reason observed is that the tool uses general terms such as maintainability to motivate the changes suggested. However, TAs do not talk in these terms. The three most used motivations by the TAs were as follows. Ability to reuse the code, improving the readability of the program, and allowing for changes to be made in one location. The intention of the motivation is similar but worded in simpler terms as given by the TAs. The feedback can be improved by providing the motivation in these terms.

(29)

indicate whole classes and files to be split, merged, or moved. The TAs could spec-ify the structure of the resulting program. This difference is acceptable. Providing general directions will force the student to use own knowledge to improve the code, which is the desired effect of the feedback. Forcing the student to think about the new structure will be beneficial to the learning process.

Finally, the tool performed well in detecting files and classes with highly similar tasks and which contained duplicate code. If the previously mentioned distinction is removed, all but one of the classes and files that were too similar should be de-tected for the correct reasons. Significant improvements could be made in the low coupling categories. Feedback given to split classes when the average coupling was low, was correct. If conceptual coupling between other classes and the av-erage coupling of the file is taken into account when providing feedback on low coupling, the false positive should be eliminated for the set of student programs used in the evaluation.

6.2 Student Evaluation

The reason for giving feedback was unclear once. A wrong reason was given which was not mentioned in the feedback. This might be due to the length of the feed-back. The student indicated that the size made it difficult to determine what was wrong with the code. This should be solved by making the feedback more concise.

The causes of the inability to improve the code and the inability to name the area of improvement were identical. The manner in which the list of file/class combi-nations that the feedback applied to was given, was unclear to the student. After explanation of the examples, the student was able to name the areas of improve-ment and improve the code. This indicates that the feedback should explain what the examples represent more clearly. Instead of using the current form, ”FileLoca-tion.ClassName” to describe a class, a simplified form would be ”ClassName in the file FileLocation”. It is less succinct but more expressive and might alleviate the confusion caused by the examples. Additionally, what the examples represented was also unclear for one piece of feedback. The cause of this might be because the function of the examples is often explained in the first sentence of the feedback. It would be better to state the function of the examples in the last sentence.

The motivation behind the changes was unclear two out of eight times. This was caused because the student did not know when classes needed to be placed into the same file and why. This did not become clearer after reading the feedback. This might be due to the use of general terms in the feedback such as maintainability, as mentioned in the discussion of the TA evaluation. Use of simpler terms that the TAs use might alleviate the confusion.

(30)

Of-ten, the students were steered in the right direction by the examples given in the feedback and subsequently used own knowledge to improve the code. The reason for this is because the feedback only provides general instructions and indications to improve the code. This is the desired effect of the tool. The students should use the tool to receive an indication of how to improve the code and utilize own knowledge to modify the program. This observation confirms that the difference observed in the TA evaluation between feedback given by the TAs and the tool is acceptable.

The knowledge imparted after reading the feedback was not satisfactory. The first student had prior knowledge about modularization and seemed to know how to im-prove the code regardless of the feedback. The first student utilized the feedback as a general indicator of where to look for improvements. The second student in-dicated that the general rule for when to place classes together was unclear. This was not made clear to the student after reading the feedback. This indicates that the optimal structure of a program should receive a more prominent role in the feedback. A possible solution might be to mention the general rule at the start of the feedback, whenever the tool is run.

An observed possible addition to the feedback in the case of highly related classes and files is the following. The students indicated that code which was not com-pletely identical could not be outsourced to a separate file. As mentioned by the TAs, minor differences can be solved by implementing base classes and sub-classes. This should be mentioned by the feedback when high and extremely high CCBC values are found.

In general, after explanation of the examples, the feedback could be used in all cases to improve the code. The students were not familiar with the programs be-fore the evaluation session, but were able to improve the code and indicate what was wrong with the code. This indicates that the feedback in its current state, is useful in helping students improve their code. With the changes suggested above, the usefulness should increase further. However, it is still unclear if any lasting knowledge is imparted by the feedback but this is of lower priority to being able to assist in improving code.

6.3 Threats to Validity

The aspects which might threaten the validity of the results are discussed in this section.

The evaluation used a limited amount of TAs and students. However, the qual-ity of the evaluation is a good starting point for further, more extensive research regarding this subject. This research aimed to be an exploration of one method to provide automatic feedback on novice student programs. Because of this, the

(31)

small scale of the evaluation is sufficient to determine if the current direction has potential.

The tool is tailored towards providing feedback for one specific course. The cor-relations between the metrics and the improvements in the student programs are based on a limited amount of student programs. It is possible that different courses require different thresholds and the current thresholds might provide erroneous feedback. However, the thresholds can easily be adjusted and providing feedback on a new course only requires additional analysis of the student programs. If the tool is tested on more courses, general thresholds which are correct for all courses should be determined.

A select amount of categories could be evaluated due to limitations in scope and time. The validity of the remaining categories are unknown and further research is needed to clarify on the validity of the categories.

The students selected for the evaluation were slightly more advanced than the aim of this research required. These students had already taken a previous course in which they were required to make use of modularization. Because of this, their ability to improve the code is better than true novices. However, this might be off-set because they were not familiar with the code provided in the evaluation. They were required to improve code which they had never seen before. True usefulness of the tool can be determined when a complete tool is created by making it avail-able to students in an introductory course and determining if the progress made is improved.

Finally, a small selection of student programs were evaluated. Because of this, the validity of the tool on the rest of the student programs remain unknown. To evaluate a tool correctly, a significant amount of time is required to evaluate as many programs as possible. Because of this, it is beneficial to do preliminary re-search to determine the correctness of various directions in determining the quality of code. Because of this, the current small scale of the evaluation is satisfactory for this goal.

(32)

7 Conclusion

In this research, a tool was created that provides automatic feedback on the mod-ularization quality of code made by novice programming students. The tool was created using the ICP and CCBC metrics. The correctness and usefulness were determined by evaluating with assistance of TAs and students. The following ques-tions were answered through this approach.

The first sub-question was defined as follows.

Can ICP and CCBC be used to provide correct feedback on code made by novice programming students?

ICP and CCBC can be used to provide correct feedback on highly similar modules which should be merged or moved together. Additionally, modules with duplicate code between them can be detected and provided correct feedback on. Opportu-nities to merge classes are missed when a large amount of methods is present and only a subset of the methods execute similar tasks.

Feedback given on classes which are dissimilar and should be placed in separate files is more inconsistent. Correct feedback can be provided when the average cou-pling between classes is low. However, correct feedback can not reliably be given when the average coupling is not low. Finally, splitting a class into sub-classes and relocation of specific pieces of code can not be provided correct feedback on with the currently used metrics.

The second sub-question was defined as follows.

Can ICP and CCBC be used to provide useful feedback on code made by novice programming students?

The ICP and CCBC metrics can be used to create a tool that provides useful feed-back. The feedback was shown to be able to help students improve their code for almost all cases. When the improvements suggested in the discussion are imple-mented, the feedback is predicted to be able to help students improve the code in all cases. However, the results indicate that the current state of the feedback does not provide the students with knowledge how a program should be modularized correctly.

The research question was defined as follows.

How can ICP and CCBC contribute to automatically providing constructive feed-back on the modularization quality of code made by novice programming students?

(33)

It has been shown that ICP and CCBC can be used to automatically provide feed-back on classes that are highly similar and should be merged or moved together. These metrics can also be used to provide feedback on modules which contain duplicate code. Finally, files can be detected which contain multiple dissimilar classes. It has also been shown that feedback generated by utilizing these metrics can be used by students to improve the modularization quality of their code. These are the aspects that ICP and CCBC contribute to providing automatic construc-tive feedback on the modularization quality of code made by novice programming students.

(34)

8 Future work

Cohesion and coupling are both essential in judging the modularization quality of source code. This research focused on measuring the degree of coupling in student programs. However, further research could be done in using cohesion to judge the modularization quality of source code and possibly combine the two metrics to provide more accurate feedback.

Furthermore, future research could explore alternative coupling metrics and de-termine the most effective metric for providing automatic feedback on modular-ization quality. Other combinations of metrics could prove to be more effective in determining the modularization quality of code made by novice programming students.

(35)

References

[1] Bavota, G., De Lucia, A., Marcus, A., and Oliveto, R. (2013). Using structural and semantic measures to improve software modularization. Empirical Software Engineering, 18(5):901–932.

[2] Budgen, D. (2003). Software design. Pearson Education.

[3] Cai, Y., Iannuzzi, D., and Wong, S. (2011). Leveraging design structure ma-trices in software design education. In Software Engineering Education and Training (CSEE&T), 2011 24th IEEE-CS Conference on, pages 179–188. IEEE. [4] Chidamber, S. R. and Kemerer, C. F. (1994). A metrics suite for object oriented

design. IEEE Transactions on software engineering, 20(6):476–493.

[5] Hattie, J. and Timperley, H. (2007). The power of feedback. Review of educa-tional research, 77(1):81–112.

[6] Lee, Y., Liang, B., Wu, S., and Wang, F. (1995). Measuring the coupling and cohesion of an object-oriented program based on information flow. In Proc. International Conference on Software Quality, Maribor, Slovenia, pages 81–90. [7] Li, W. and Henry, S. (1993). Object-oriented metrics that predict

maintainabil-ity. Journal of systems and software, 23(2):111–122.

[8] Martin, R. C. (2009). Clean code: a handbook of agile software craftsmanship. Pearson Education.

[9] McConnell, S. (2004). Code complete. Pearson Education.

[10] O’Keeffe, M. and Cinn´eide, M. O. (2006). Search-based software mainte-nance. In Software Maintenance and Reengineering, 2006. CSMR 2006. Pro-ceedings of the 10th European Conference on, pages 10–pp. IEEE.

[11] Peuch, L. (2014-present). Redbaron, a bottom-up approach to refactoring in python. https://github.com/PyCQA/redbaron.

[12] Poshyvanyk, D., Marcus, A., Ferenc, R., and Gyim´othy, T. (2009). Using information retrieval based coupling measures for impact analysis. Empirical software engineering, 14(1):5–32.

[13] Praditwong, K., Harman, M., and Yao, X. (2011). Software module clustering as a multi-objective search problem. IEEE Transactions on Software Engineer-ing, 37(2):264–282.

[14] Sadler, D. R. (1989). Formative assessment and the design of instructional systems. Instructional science, 18(2):119–144.

(36)

[15] Stegeman, M., Barendsen, E., and Smetsers, S. (2016). Designing a rubric for feedback on code quality in programming courses. In Proceedings of the 16th Koli Calling International Conference on Computing Education Research, pages 160–164. ACM.

[16] Stevens, W. P., Myers, G. J., and Constantine, L. L. (1974). Structured design. IBM Systems Journal, 13(2):115–139.

[17] Yang, H. Y. (2010). Measuring indirect coupling. PhD thesis, The University of Auckland New Zealand.

(37)

A

Overview First Program

This program solves the Rush Hour game with various algorithms. The files and classes located in the program are illustrated in the following figure.

(38)

B

Overview Second Program

Two species of flies exist with genomes that are identical in numbers, but in a dif-ferent order. The assignment is to find the least amount of mutations necessary to transform one into the other. A mutation can only flip over a complete subrow i.e. 3-5-6 becomes 6-5-3. The files and classes located in the program are illustrated in the following figure.

(39)

C

Overview Third Program

This program solves the Rush Hour game with various algorithms. The files and classes located in the program are illustrated in the following figure.

(40)

D

First Feedback Student Evaluation

The following combination(s) of files seem to contain a lot of duplicate code be-tween them. It would improve the maintainability of your code to create a new file which will contain the duplicate code and which will execute the task that the duplicate code executes. This ensures that this particular piece of code will be reusable and modifiable for multiple files. Aim to create files in which every file has its own separate task. An alternative solution, would be to merge the files into one.

(41)

E

Second Feedback Student Evaluation

The following class combinations(s) seem to perform dissimilar tasks and are not dependent on each other but are located in the same file. If possible, it would be better to split these classes into separate files. Only do this if doesn’t interfere with dependencies of other classes in the file. Aim to create files that contain classes that perform similar tasks and which are not too dependent on classes in other files.

NewFlyClass.Queue and NewFlyClass.Dif3Fly

NewFlyClass.Stack and NewFlyClass.ScoreFly

NewFlyClass.Stack and NewFlyClass.Fly

NewFlyClass.Queue and NewFlyClass.Fly

NewFlyClass.Queue and NewFlyClass.UndoFly

NewFlyClass.Queue and NewFlyClass.ScoreFly

NewFlyClass.Stack and NewFlyClass.BidirectFly

NewFlyClass.Stack and NewFlyClass.Dif2Fly

NewFlyClass.Stack and NewFlyClass.Dif3Fly

(42)

F

Feedback Third Student Evaluation

F.1 First Part

The following combination(s) of classes seem to perform the same task conceptu-ally and are located in different files. This means that either these classes contain a lot of duplicate code between them or these classes perform (nearly) the same task. To improve the maintainability of your code, it would be best to move these classes into the same file and if possible, to also merge them into a single class. This will ensure that classes with similar tasks are grouped together and that each class has its own well-defined task.

RushHour AStar.Car and RushHour Pruned 100it 50proc.Car

F.2 Second Part

Classes in RushHour Pruned 100it 50proc.py are not dependent on each other nor do they seem to perform similar tasks. If the structure of your program allows it, it would be better to split the classes in this file into separate files. Aim to create files which contain classes that that have similar tasks or which work towards the same goal. Each file should have a well-defined task that it executes and the classes should reflect that. Class combination(s) that can be put in separate files are:

RushHour Pruned 100it 50proc.Game and RushHour Pruned 100it 50proc.PQueueItem

RushHour Pruned 100it 50proc.Car and RushHour Pruned 100it 50proc.PQueueItem

The same applies to file RushHour AStar.py as well. Classes that can be put in separate files are:

RushHour AStar.Car and RushHour AStar.PQueueItem

Automatically providing feedback on the modularization quality of student programs