Automatically providing feedback aimed at student programmers on the quality of variable naming in their code.

(1)

Automatically providing feedback aimed at

student programmers on the quality of

variable naming in their code.

Sjors Witteveen

10808493

Bachelor thesis Credits: 18 EC

Bachelor Opleiding Kunstmatige Intelligentie University of Amsterdam Faculty of Science Science Park 904 1098 XH Amsterdam Supervisors Dhr. drs. M.S. Stegeman Dhr. J. van Assema MSc Faculty of Science University of Amsterdam Science Park 904 1098 XH Amsterdam July 2, 2017

(2)

Abstract

In this project, a tool was designed to automatically provide appropriate and useful feedback aimed at students from an introductory programming course on the quality of the variable naming in their code. To accomplish this, guidelines on the length of variable names were implemented in the tool to decide whether variable names are correct or incorrect. Along with feedback on why variable names are incorrect, the tool provides example names of this error from the student’s code. In more complex types of feedback, the tool will give the students only one example of the error from their code, so that the other errors of this same type will have to be recognized by the student. The feedback from the tool was compared to feedback from teaching assistants from an introductory programming course, from which resulted that, overall, the tool performs well in determining when a name is too long, but is less successful in determining when a name is too short. It can, however, successfully determine when a variable name could be made longer, without reducing the readability of the code. From tests with students that were asked to apply the feedback appeared that the feedback was useful, as the students applied the feedback well and were able to use the feedback to recognize other examples as well.

(3)

1 Introduction

Being able to quickly understand code can be of tremendous help when working on large programming projects, especially when working together with other people on the same project, as you are then required to understand code written by others as well. Hence, the code should not only work well, it should also be easy to understand or, in other words, the code should be elegant. One aspect of elegant coding is properly naming your variables. Naming your variables in such a way that they reveal intent can make it much easier to understand and change code [7]. Many introductory programming courses make use of human teachers to give feedback on student’s code, which can be very time consuming. To resolve this issue, a tool that can automatically provide feedback on code can be designed to help students learn without the need of human teachers.

Using a tool as a style checker for code is not a new concept. An example of a style checker tool for Java is Checkstyle1_{. Regarding variable naming, Checkstyle mainly examines the syntax of the names. For instance,}

it checks whether only valid characters are used in the name or, depending on the user’s preference, it can enforce strict camel casing. This is a very useful tool, but it lacks in providing feedback on the semantics of variable names, which would be an important aspect in teaching students how to properly name their variables.

The goal of this project is to create a tool that can provide feedback on a student programmer’s code. This tool should not only improve the student’s code, but also help the student learn why certain variable names should be improved. Therefore, the main focus of this tool is not to be able to recognize and present all incorrect variable names, but rather to provide valid and useful feedback aimed at beginner programmers based on the errors made in naming their variables. The tool will be designed to work with Python code.

In general, it holds that names that are too short do not convey enough meaning, while names that are too long are hard to type and can obscure the visual structure of a program [8]. Thus, by evaluating the length of a variable name, which is a syntactic property, it is, to some extent, possible to determine the descriptiveness of a variable. For this reason, the length of the variable name will be the main property that will be examined by the tool in order to provide feedback. The question that will be answered in this research is as follows:

How can we use the length of variable names to automatically provide appropriate and useful feedback aimed at students from an introductory programming course on the quality of variable naming in their code?

Because the scope of this research is relatively small, word recognition and meaning have not been taken into account. To check if a name corresponds to what a certain variable represents is a very difficult task and this could simply not have been accomplished within the available time.

To answer the research question, firstly, guidelines concerning variable name length and theories about providing feedback were studied. Secondly, to test whether these guidelines apply to assignments from an introductory programming course as well, a pretest was conducted. Then, a tool that can automatically provide feedback was created using this newly acquired knowledge. Lastly, this tool was evaluated on validity and usefulness.

(6)

2 Related work

2.1 Guidelines on variable naming

A study by Stegeman, Barendsen and Smetsers [12] has resulted in the design of a rubric that can be used to evaluate code of a student of an introductory programming course. This rubric can be used to decide whether the student’s code is ‘good’ or ‘bad’ beyond simply whether the code works or not. Regarding the naming of identifiers in code, the rubric states that it is expected that the name describes the intent of the underlying code. Meaningless, misleading and unreadable names are considered a shortcoming, while it is desirable to have a consistent vocabulary.

Judging code on readability can be subjective, but there are some general guidelines that are widely accepted and presented in a large number of handbooks. A generally accepted guideline and perhaps the most important one is that variable names should be meaningful [5, 8, 7]. Another guideline presented in the handbook Code Complete [8] states that, as discovered by Gorla, Benander and Benander [4], the effort required to debug a program is minimized when the average variable name length in the program is ten to sixteen characters, as this would be a good balance between having descriptive enough names, while maintaining a good readability of the code.

Along with the handbook Clean Code [7], Code Complete also states that the length of a variable name should be related to the length of the scope, but this can be interpreted in many different ways. Code Complete splits this guideline into the following four parts:

• Rarely used variables should have longer names. • Global variables should have longer names. • Loop variables should have shorter names. • Local variables should have shorter names.

Clean Codeinterprets the guideline slightly differently and claims that: • Variables with a small scope should have shorter names.

• Variables with a big scope should have longer names. • Variable names lose their meaning over a longer distance.

Although most of the differences between both interpretations of the guideline are very subtle, it could be of large importance when implementing the guideline in the tool. One difference that does stand out is that, on the one hand, rarely used variables should have longer names, while, on the other hand, variables with a small scope, and thus rarely used, should have shorter names. The reason why rarely used variables should have longer names is because, in general, short names do not convey enough meaning, while rarely used long names barely decrease the readability of the code. At the same time, variables with small scopes should have shorter names, because small pieces of code are generally easily understood, which means that longer names will only decrease the readability of the code. This seeming contradiction should be taken into account when designing the tool.

More guidelines concerning variable name length are found in other previously conducted research. Em-pirical research by Butler, Wermelinger, Yu and Sharp [1] resulted in a guideline that states that variable names should not be longer than twenty-five characters. In another study, Lawrie, Morrell, Feild and Binkley [6] discovered that identifier names composed of dictionary words are more easily understood than those

(7)

composed of a single letter and thus single-letter variable names should be avoided. The latter is also pre-sented in the handbooks Code Complete [8] and Clean Code [7].

Altogether, these guidelines form a sufficient base to evaluate the quality of variable names and to provide feedback on the descriptiveness and readability of the code. The following are all four guidelines that resulted from related work:

1. Average variable name length should be between 10-16 characters [4, 8].

2. Identifier names of more than twenty-five characters should be avoided where possible [1].

3. Single-letter names, like i, j or k, are sufficient variable names only for simple loops. Otherwise, they should be avoided. [1, 8, 7].

4. Shorter names are better for variables with a small scope, while longer names are better for variables with a large scope [7, 8, 11].

While the first three guidelines are very concrete, the latter can, as resulted from research, be interpreted in many different ways. How exactly this will be done, will be discussed in the Tool design section (5).

2.2 Providing feedback

As proposed by Sadler [10], a student needs to know three things in order to learn. Firstly, a student needs to know what good performance on a task is. In the automatically generated feedback, this can be accomplished by telling the student what properties of a good variable name are, for instance, that a certain name should be long and descriptive. Secondly, a student needs to know how their own performance relates to good performance. This can be achieved by telling the student that a certain variable name in their code is too short and not descriptive enough. The third thing a student needs to know is now very easily accomplished in the context of this tool: the student needs to know what to do to close the gap between their own performance and good performance. This can be done by telling the student that the variable name should be made longer and more descriptive.

From research conducted by Fregeau [3] and Cohen and Cavalcanti [2] on providing feedback on second language writing resulted that the outright correction of errors is an ineffective way of providing feedback. This type of feedback will lead to passive action by the students, where they will simply copy the teacher without recording or studying the feedback. In addition, they will not learn to recognize or correct errors on their own. Although this research was done on second language writing, writing code is very similar to writing in a second language. Moreover, both second language writing and writing code result in a text file where errors can be discovered and pointed out. Applying this to providing feedback automatically, it would be a good idea to provide feedback in such a way that the students will have to recognize errors themselves, instead of providing them with every of their mistakes made.

2.3 Feedback tools

Few studies on tools that provide feedback on variable names, or on identifiers in general, were found, but there is one study by Relf [9] that is relevant to some extent. In this study, a tool was developed that assisted the user in choosing identifier names. The resulting paper primarily focuses on investigating whether pro-grammers improve the readability of their source code if they have support from such a tool. The paper does not elaborate much on the design of this tool, besides that it uses nineteen identifier-naming style guidelines, including a guideline on short and on long names. Although not much is declared about these two guidelines, it seems that they are merely used as simple hard constraints, without examining the scope of the variable.

(8)

The results do demonstrate a significant improvement in the source code readability with the use of this tool, but because a total of nineteen guidelines are used, not much can be said about the impact of length on the quality of a variable name.

(9)

3 Pretest

From related work resulted that there are some directly applicable guidelines that can be used in the tool. It is important to note that these guidelines are aimed at programming in general and therefore possibly not directly applicable to introductory level programming. Assignments for introductory programming courses are generally less complex and abstract, which could lead to shorter variable names. Another difference is that often students will be asked to implement a mathematical or scientific formula, where very short variable names such as ‘x’ or ‘y’ will suffice. To discover how the guidelines in an introductory programming course differ from general programming a pretest with a teaching assistant from such a course is conducted.

3.1 Method

The teaching assistant (TA) was presented four pieces of code made by students for an assignment and asked to provide feedback on the variable naming and, more specifically, the length of the variable names. After providing feedback on the separate programs, the TA was asked to choose which of the four programs is the best based solely on the naming of variables. The four assignments ranged from, for student assignment standards, very short to relatively long variable names. The average variable name length in number of characters of the four assignments are as follows:

• Program 1: 2.70 • Program 2: 5.72 • Program 3: 8.00 • Program 4: 10.00

3.2 Results

From the feedback provided by the TA, two guidelines appeared to be similar to those found in related work. These guidelines are that single-letter variable names should be avoided and that the length of the variable name should be relative to the length of the scope. The other two guidelines, however, appeared to be slightly different. Firstly, the program that was chosen by the TA to be the best based on variable naming was the third program. This program had an average variable name length of eight characters, which is below the optimal average variable name length of ten to sixteen characters, as proposed by Gorla, Benander and Benander [4]. The TA declared that the variable names in this program were of proper length and descriptive enough. Secondly, even though the TA would agree on the guideline that variable names should consist of twenty-five or less characters, it appeared that even names consisting of around eighteen characters were considered too long, as the TA pointed out that one of the names appearing in program 2, consisting of eighteen characters, was too long.

Another naming error that occurred very often in student’s assignments is that multiple languages (in this case, Dutch and English) are used for the naming of variables. While this aspect was not mentioned in any of the related work, it appeared to be one aspect that the TA paid a great deal of attention to. According to the TA, using multiple languages in naming variables is bad practice and thus exclusively one language should be used.

3.3 Conclusion

In general, it seems that the variable name length in assignments from introductory programming courses are relatively short. This needs to be taken into account when designing the tool. More important to note

(10)

is that, besides the controversy about specific values, the general ideas are the same and can be used in the tool. Because of this, the tool will be made in such a way that these values can be easily adjusted through parameters, where the specific values need to be discovered by testing.

Using exclusively one language in naming your variables is considered very important by the TA. Al-though this has not much to do with the variable name length, because this is such a common error, it could be implemented in the tool nonetheless.

(11)

4 Method

4.1 Tool

To answer the research question, a tool was created that can automatically provide feedback on the quality of variable naming in student’s code from an introductory programming course. The guidelines resulting from related work and the pretest were implemented in the tool, such that the tool can determine which variable names are incorrect. Ideas from related work about providing feedback were applied to come up with the best way of providing feedback. How exactly the tool was designed is described in the Tool design section (5).

From related work resulted that the outright correction of errors is an inefficient way of providing feedback [3, 2]. Along with some of the more complicated types of feedback that the tool provides, the tool will provide only one example of such an error from the student’s code. This way, the students will be able to see how the feedback applies to their code, and because of this, they will have to recognize all other errors of the same type in their code themselves, which means they will need a good understanding of the feedback that is given. This best example is chosen using a couple of measures (described in section 5.4.3) to make sure it is the best example variable name matching to that type of feedback.

4.2 Evaluation

The tool was then evaluated on validity and usefulness in two separate parts. The first part consisted of teaching assistants (TAs) giving feedback on the variable naming in three snippets of code (Appendix). In the second part, two students were asked to improve the variable names in the same three snippets of code by using the feedback provided by the tool. The three snippets used for this evaluation are assignments made by students from an introductory Python programming course from 2014 aimed at physics students and were chosen with the intention to have as much variety in variable naming as possible. Several variable naming mistakes are present in the snippets of code, that consist of names that are too long or too short for multiple different reasons.

4.2.1 Feedback from teaching assistants

The validity of the tool was evaluated in order to make sure the feedback it provides is correct and corresponds with feedback given by TAs. This was done by comparing the feedback from TAs to the feedback provided by the tool. The three TAs that were asked to provide feedback are all TAs from the minor programming at the University of Amsterdam and are all familiar with providing feedback on Python code. They were asked to provide feedback on the naming of variables in the snippets of code similarly to how they would provide feedback to a student in class. The feedback was provided verbally and recorded to simulate the same scenario as in class with a student as much as possible. Important to note is that the TAs did not have access to the feedback from the tool, so they were in no way influenced or biased.

In case the best example of an error given by the tool was not specifically mentioned by the TA, the TA was asked to evaluate that specific name. This was necessary to decide whether this given best example is a true or false positive, because simply not naming a variable name does not mean it is correct, as the TA likely does not want to give away all mistakes the student has made.

4.2.2 Use of feedback by students

The usefulness of the tool was evaluated in order to see whether the tool helps the students to improve their code and also to learn from doing so. To do this, two students were asked to use feedback on the three snippets of code provided by the tool to improve the variable names in the code. They could simply adjust

(12)

the code with a text editor, to simulate the scenario where the students would use the tool themselves. The results of this are adjusted versions of the snippets of code. The students were also asked to verbally give the reason why they adjusted a variable name.

4.2.3 Comparing to feedback from the tool

After all the data from the TAs and students were collected, the results were compared to the feedback from the tool. For this purpose, the tool was adjusted slightly, so that in every case, it outputs all examples from the code, instead of just the best example. This was done for two reasons. Firstly, to get a better image of how well the tool can predict incorrect names, instead of comparing only the best example, all names that are considered incorrect by the tool will be compared to the feedback from the TA. Secondly, to discover if the students truly understand the feedback given by the tool, it is interesting to see whether they recognize the same type of mistake in other variable names marked as incorrect by the tool as well.

(13)

5 Tool design

The design of the tool is as follows. Firstly, all variable names will be retrieved from the code (5.1). Secondly, the tool will accumulate data about the code itself and the variable names (5.2) that can then be used to decide whether a variable name is correct or incorrect (5.3). Lastly, by using these data and incorrect names, appropriate, helpful and understandable feedback can be provided by the tool. (5.4).

5.1 Retrieving all variable names

The Python library RedBaron2was used to retrieve all variable names from code. RedBaron uses many dif-ferent node types of which one is the NameNode. A NameNode consists of a data attribute (string) and other information about the node, such as its position in the RedBaron node tree, which represents the structure of the code. With a simple command from RedBaron, all NameNodes in a piece of code can be requested. Problematic is that NameNodes consist of not only variable names, but of all identifiers, such as function and class names. To overcome this problem, it is necessary to search more specifically for variable names. For this tool, three types of variable instantiations in code as plain text are used from which the variable name can be identified:

• Assignments • Function parameters • For loop iterators

RedBaron is used to find all assignments, from which the target value can be requested in order to retrieve the variable NameNode. Similarly with function parameters, RedBaron can be used to find all functions, from which then the arguments can be requested. Lastly, RedBaron is used to find all for loops, from which iterators can be requested.

The list of variables that are retrieved at this stage is not ready to be used. The problem is that variables that are re-assigned with a new value will appear multiple times in the list. To solve this problem, the string value (i.e. the name) of all the NameNodes are put in a set, so that the duplicates are removed. Then, using a simple RedBaron command, these strings are used to find all first occurring NameNodes where the value is equal to the string. After doing this, a list of first occurring NameNodes of all variables will remain. Another function is created that takes any of these NameNodes as parameter to find all other instances of the same variable name. This function does not yet distinguish between global and local instances of the same variable name, which means that possible different global and local variables with the same name are gathered in the same list.

5.2 Accumulating data from variables

The next step is to accumulate data from the code itself and, more specifically, the variable names. To be able to determine which variable names are correct and incorrect and for what reason, data is required about the length of all individual variable names (5.2.1), the average variable name length (5.2.2), the languages used in all variable names (5.2.3) and the scope of all variables (5.2.4).

5.2.1 Individual variable name length

The length of all individual variable names can be easily requested in Python using the string value of the variable NameNode.

(14)

5.2.2 Average variable name length

Using the list of all variable names, the average variable name length is easily calculated by adding all of the lengths of the variables together and dividing this by the total number of different variable names. For the tool, the decision has been made to only take one instance of every variable into account when calculating the average, because that is how this measure was specified by Gorla, Benander and Benander [4] in their research.

5.2.3 Variable name language

From the pretest resulted that exclusively one language should be used when programming, as opposed to using more than one language. To check the variable name language, the spell checking library PyEnchant3is used. Because the student assignments used for intermediate testing and for the evaluation are from a Dutch introductory programming course, the tool will only be able to recognize Dutch and English in variable names. Using PyEnchant, the Dutch and English dictionary have to be instantiated. These dictionaries can then be used to check whether a certain string is a valid word in that language. If any variable name is present in one dictionary, while any different variable name is present in the other dictionary, it means that multiple languages are used in the program.

A small problem that was faced during the design of this part of the tool was that single letters are considered words in most dictionaries. The result of this was that when there are at least two single-letter variable names in the program, the tool would think that multiple languages are used, while this is not the case. This problem is avoided by simply skipping the single-letter variable names. Another problem is that variable names often consist of multiple words. The tool will simply not recognize such names as dictionary words (in either language), which means that in some cases where multiple languages are used, it is not recognized by the tool. Although this is does lead to some false negatives, false positives, which are a larger problem, are avoided.

5.2.4 Variable scope

From the related work resulted that the guideline concerning the variable scope can be interpreted in multiple ways. For the purpose of the tool, the scope of each variable should have information about whether the scope is global or local and about the size of the scope. While determining whether a scope is global or local is very straightforward, accumulating information about the size of the scope can be done in many different ways. The seeming contradiction resulting from related work, where rarely used variable names should be longer, while variable names with a small scope should be shorter, has led to the invention of a measure where instances of the same variable are divided into clusters using the line number of each instance. Two clusters of variable instances are separated by a certain number of lines where no instance of the variable is found. The idea behind this is that in clusters consisting of many variable instances the variable name should be shorter in order to increase the readability of the code, while in clusters consisting of very few variable instances, the variable name can be made longer to make it more descriptive.

Each variable’s scope will be determined by searching through the RedBaron node tree. From a variable instance, RedBaron is used to look upwards in the tree to find the first function definition. If no function definition is found, then this means that the variable is a global variable. For each variable, the scope is found and added to the set of all scopes. If a scope has already been found before, the variable is simply added to the existing scope. This set now consists of all global and local scopes with every variable instance from that specific scope.

3_{https://pyenchant.readthedocs.io/}

(15)

Next, all variable instances of one specific scope will be divided into separate clusters. Instances that are close to each other in line number will be part of the same cluster. When in between two instances at least five lines no other instance of the variable is found, the two instances are in different clusters. The number of lines that separates two clusters is a parameter of the tool and can thus be easily set to any value. By intermediate testing in the design phase based on the expertise of the designer, the best results were obtained by setting this parameter to five.

5.3 Deciding which names are incorrect

The following step is to use the data accumulated from the code to decide which variable names in the code are incorrect. Because the main type of feedback the tool should eventually provide is related to the length of the variable names, all variable names are firstly categorized based on their length. The following five name length categories are distinguished:

• Single-letter:

Iterator variables are excluded from this category, as single-letter variable names are sufficient variable names for simple for loops [1, 8, 7]. All other single-letter names are always considered incorrect. • 2-7 characters:

Names from this category are generally too short, but can be justified in some cases, as explained in 5.3.1.

• 8-14 characters:

This is the optimal name length category. Names from this category are considered correct by the tool and are not further examined.

• 15-23 characters:

Names from this category are generally too long, but can be justified in some cases, as explained in 5.3.2.

• >23 characters:

Names from this category are too long in any case and should thus be shortened.

All specific length values dividing the categories are parameters of the tool that can be easily adjusted. The values used for this categorization and for the evaluation are chosen based on the related work, pretest and some intermediate testing. The optimal average variable name length of ten to sixteen characters discovered in research [4] is slightly lowered as a result from the pretest. A combination of the results from the pretest and intermediate testing the optimal variable name length is adjusted to eight to fourteen characters. Another adjustment from the related work is the guideline that states that variable names should not be longer than 25 characters. Similarly to the other values, this value has been lowered by two characters. This value is still significantly higher than what resulted from the pretest, because although names above eighteen characters are incorrect in some cases, in other cases it is correct to use such long names.

5.3.1 Category: 2-7 characters

The category with variable names that range from 2-7 characters are further examined using the accumulated data to decide whether they are correct or incorrect. These data consist of whether the variable is global or local and the clusters of the variable. The amount of variable instances in the largest cluster is the main value that decides the correctness of the variable. If the largest cluster consists of many variables, it is acceptable for the variable name to consist of 2-7 characters, because if they would be longer, the readability would decrease

(16)

due to the large number of long names in a small piece of code. In contrast, if the largest cluster consists of few variables, the name could be made longer in order to make it more descriptive. More concretely, if the variable is global and the largest cluster consists of less than four variable instances, the name is considered incorrect. Because local variables should generally be shorter, a local variable is considered incorrect if the largest cluster consists of less than three variable instances. Both the values three and four are parameters of the tool and can be easily adjusted if necessary. The specific values of three and four are chosen based on intermediate testing.

Another reason why a variable name from this category can be considered incorrect, is that the variable can lose its meaning over a longer distance. This is the case when in between two instances of the same variable no other instance is found in at least ten lines of code. If the name consists of 2-7 characters, it would likely not be very descriptive, which means the programmer will have to remember what exactly the variable name represents, whereas if the name were to be longer and thus more likely to be descriptive, this would not be the case. The minimum number of lines of code between two instances for a variable to be considered incorrect is a parameter of the tool and can be adjusted if necessary. The value of ten is found by intermediate testing.

Altogether, there are three scenarios in which a name from this category can be considered incorrect: 1. The variable is global and the largest cluster consists of less than four variable instances. 2. The variable is local and the largest cluster consists of less than three variable instances. 3. No variable instance is found in between two other instances over a distance of at least ten lines.

5.3.2 Category: 15-23 characters

Similarly to the 2-7 character category, the variables in this category are further examined using the retrieved data. If the largest cluster of a variable consists of many instances, the name should be shorter, because if this is the case, the large number of long names in a small piece of code will obscure the visual structure of the code. If the largest cluster consists of few variables, the long name is considered correct, because the long name is then not a large hindrance to the visual structure of the code. The concrete cases are similar to the 2-7 characters category, but now it is the other way around. If the variable is global and the largest cluster consists of four or more instances, the name is considered incorrect. If the variable is local and the largest cluster consists of three or more instances, the name is also considered incorrect. These specific values are parameters chosen through intermediate testing.

Thus, a total of two scenarios in which a name can be considered incorrect can be found in this category: 1. The variable is global and the largest cluster consists of four or more variable instances.

2. The variable is local and the largest cluster consists of three or more variable instances.

5.4 Providing feedback

Taking all information from previous sections, there are now ten different scenarios where feedback can be provided. There will be no feedback given in case these scenarios do not take place in a particular piece of code. The textual feedback that is given is, at this point, fairly straightforward. These concrete scenarios are all based on some guideline found in either related work or in the pretest. This feedback has gradually been altered over the course of the design phase of the tool to make sure that the feedback is not too complicated and can be understood by beginner level programmers. In some of the cases, one or more examples will be given along with the textual feedback, to give the student a better understanding of the mistake that has been

(17)

made. The following are the ten scenarios that can take place, along with the textual feedback that will be given.

1. Average variable name length is below eight characters.

“Your variable names likely do not convey enough meaning, as they are too short on average.” 2. Average variable name length is above fourteen characters.

“Your variable names are too long on average. Long variable names can be hard to type or can obscure the visual structure of your program.”

3. Multiple languages are used.

“Make sure all your variable names are in the same language.” 4. Single-letter names.

“Variable names should not consist of a single letter, unless it is a very simple for loop iterator or it is a widely accepted name (e.g. ‘x’ & ‘y’ as coordinates).

You should reconsider the following names:” 5. Names that are always too long (>23 characters).

“Variable names of more than 23 characters should be avoided where possible. The following names are too long:”

6. Names that are too short: the variable is global and the largest cluster consists of less than four variable instances.

“Global variable names should be relatively long, unless it is used frequently within a small piece of code. Short names will likely not convey enough meaning.

One example of a short global variable name in your code that could be longer:”

7. Names that are too short: the variable is local and the largest cluster consists of less than three variable instances.

“Local variable names should generally be longer when the variable is rarely used. Short names will likely not convey enough meaning, while rarely used long names will not decrease the readability of your code.

One example of a short local variable name in your code that could be longer:”

8. Names that are too short: no variable instance is found in between two other instances over a distance of at least ten lines of code.

“Variable names should generally be longer when the variable is used over a longer distance. Short names are more likely to lose their meaning over a longer distance.

One example of a short variable name that is used over a long distance:”

9. Names that are too long: the variable is global and the largest cluster consists of four or more variable instances.

“Global variables that are frequently used within a small piece of code should be shorter. Long names that are frequently used within a small piece of code will decrease the readability of your code. One example of such a long global variable name in your code that should be shorter:”

10. Names that are too long: the variable is local and the largest cluster consists of three or more variable instances.

“Local variable names should be relatively short, unless it is rarely used. Long names that are fre-quently used within a small piece of code will decrease readability of your code.

(18)

5.4.1 No example feedback

As seen in feedback above, in the first two scenarios no example will be given, because the average variable name length is not caused by just one or two naming errors, but instead by all variables. Therefore, no example can be given for this scenario. In addition, this feedback is meant as more of a hint than as something that can be adjusted in the code right away.

In the third scenario, however, there could be pointed out a number of names that are in a different language, but it is up to the student to decide what language all the variable names should be in. Therefore, it is in some cases not possible to point out in which language the names are correct and in which they are incorrect. Besides that, because this feedback can be very easily understood, there is no need to give examples along with it.

5.4.2 All examples feedback

In the fourth and fifth scenario, all examples of that naming error will be given to the student. Because the feedback given in these two scenarios is very simple, all examples can be easily recognized, which means there would be no reason to provide just one example. The name of the examples will be given, along with the line number in which the variable first appears:

“You should reconsider the following names: ‘a’, first appearance in line 5.

’b’, first appearance in line 5. ’n’, first appearance in line 24.”

5.4.3 Best example feedback

In the sixth up to the tenth scenario only one example will be given to the student. As explained in 4.1, this is done because the student will then have to recognize all other examples in the code themselves. The goal is that the one given example is the best example for that specific error, so that it would cost the student minimal effort to identify the problem and the risk of providing a false positive is minimized. Using the methods from 5.3, for each of the five scenarios a list with all incorrect variable names will be created.

For the sixth and seventh scenario, the following properties of the names, listed in order of precedence, are used to reduce the list of incorrect names to just one best example:

1. Dictionary word.

If there is at least one non-dictionary word (English or Dutch) present in the list of incorrect names, all dictionary words will be filtered out of the list, because the chance of a name not being descrip-tive enough, and thus being a good example of this specific error, increases when the name is not a dictionary word.

2. Size of largest cluster.

The variable where this cluster is the smallest will remain as the best example. If there are, by chance, multiple variables left where the largest cluster is of the same size, all these variables will remain in the list.

3. Name length.

Lastly, the name length of all the remaining variables in the list are compared, in order to find the name that is the shortest. If, by chance, there are still multiple variables left in the list, they will all remain in the list.

(19)

4. Line number.

If there are, at this point, still multiple names in the list, the first occurring variable in the code will be chosen as the best example.

The best example for these scenarios are given as follows:

“One example of a short global variable name in your code that could be longer: - ’var’, first appearance in line 34”

“One example of a short local variable name in your code that could be longer: - ’list’, first appearance in line 42”

For the eighth scenario the best example is chosen similarly to as described above. The only difference is that instead of checking for the largest cluster, the largest distance in line length between two instances of the variables are compared. The variable where this distance is the largest will remain in the list. The example for this scenario is given as follows:

“One example of a short variable name that is used over a long distance: - ’vy’, first appearance in line 45”

The best example for the ninth and tenth scenario are chosen slightly differently. In these cases, it is not checked if the name is a dictionary word, because often long names consist of multiple words, which are not worse than names with one long word. The following properties are used in this order to reduce the list of incorrect names to one best example for these scenarios:

1. Size of largest cluster.

The variable(s) where this cluster is the largest will remain in the list. 2. Name length.

The variable(s) with the longest name will remain in the list. 3. Line range of largest cluster.

The variable(s) where the largest cluster has the biggest line range (i.e. is spread out the most), will remain in the list.

4. Line number.

If there are, at this point, still multiple names in the list, the first occurring variable in the code will be chosen as the best example.

The best example for these scenarios are given as follows:

“One example of such a long global variable name in your code that should be shorter: - ’list of student ids’ is used frequently in lines 2-7”

“One example of such a long local variable name in your code that should be shorter: - ’longest cold period’ is used frequently in lines 40-44”

(20)

6 Results

Resulting from the evaluation, it appeared that the feedback from all three TAs was very similar in structure. They all went over most of the names and picked out the ones that they thought were incorrect. Along with the incorrect name, they gave a reason as to why the name is incorrect. The results from the students part of the evaluation were also structurally very similar to the results from the TAs. The students corrected names that they thought were incorrect with the help of the feedback and also gave the reason why.

The first column in Tables 1, 2 and 3 shows the ten scenarios where feedback can be given, as described in 5.4. In the Tool column, for every scenario that takes place, all automatically generated incorrect names with their line number in the code provided by the tool are given. The names highlighted in red are the names that are given as (best) example by the tool, while names that are not highlighted are generated specifically for the evaluation, as described in 4.2.3. Because the TAs and students were not directly asked about the average variable name length, it was difficult to evaluate this. For this reason, those two particular rows were left blank.

(21)

Table 1: Code snippet 1

6.1 Code snippet 1

Table 1 shows that the feedback that multiple languages are used was given by the tool and by all three TAs. The students used the feedback to successfully correct all variable names such that only one language was used. Every example of a single-letter variable was also pointed out by the TAs and corrected successfully by the students.

The only incorrect name that was found by the tool for the seventh scenario was the variable named ‘list’. It turned out that two out of the three TAs gave similar feedback on this name. One TA considered the name itself to be descriptive enough, but because ‘list’ is already an existing function in Python, the TA declared that the name should be adjusted to something like ‘a list’. The students decided to ignore the feedback given by the tool, because they both thought the name was descriptive enough. This is an interesting case, where most of the TAs agreed with the tool, but the students were so sure of their own knowledge, that they decided to ignore the feedback given by the tool. They did, however, adjust the name for the same reason as the other TA.

The variable name that was considered incorrect by the tool for scenario ten, was also considered incorrect by all three TAs and adjusted correctly by the students.

Two of the three TAs provided feedback about the iterator names ‘i’. These TAs declared that the variable name ‘i’ should only be used in for loops where the iterator is used as a simple counter, but in any other case, the name should be more descriptive. The tool, on the other hand, simply gave the feedback that the name ‘i’ is correct if it is a very simple for loop iterator. Although both the TAs and the tool agreed that the name ‘i’ is in some cases fine to use for an iterator, there is a slight difference in the formulation of the feedback.

(22)

6.2 Code snippet 2

Similarly to the first code snippet, the tool gave feedback on the fact that multiple languages are used and on all the single-letter variables in the code as can be seen in Table 2. Again, the TA’s feedback concurred with the feedback from the tool and the students correctly adjusted all single-letter names.

The variable names the tool considered to be incorrect for scenario six appeared to be correct, according to all three TAs. The variable name ‘list’ was coincidentally not mentioned by any of the TAs, which likely means that the TAs considered the name to be correct or they simply overlooked it. One of the students did use the feedback to adjust the names ‘bal A’ and ‘bal B’ to respectively ‘data ball A’ and ‘data ball B’, while the other student decided that the name was descriptive enough and left it unchanged.

The names considered incorrect by the tool for the seventh scenario were mostly not considered incorrect by the TAs and were also not corrected by the students for the reason given by the tool. Even the best example given by the tool was only considered incorrect by one TA and was not adjusted by the students. Most of the

(23)

names provided by the tool in this scenario consisted of ‘x’ or ‘y’ along with a word describing in more detail what kind of coordinates they represent, such as ‘new x’ or ‘x min’. As stated by the feedback given in the fourth scenario, coordinates are an exception to the rule that short names are not descriptive. However, this feedback is only coincidentally provided as a result of having single-letter variable names in the code. One of the students applied this feedback to the names considered incorrect in the seventh scenario and thus decided to not adjust these names.

The variable name ‘vy’ given as best example for the eighth scenario was considered incorrect by all three TAs, but not all TAs gave the same reason. Two of the three TAs agreed that the name is not descriptive enough and should be adjusted to ‘velocity y’, but they did not mention the large distance over which the variable was used. Both students corrected the name to something similar to what the TAs would have corrected it to. The variable name ‘new y’ was not considered incorrect by the TAs and was not corrected by the students.

Similar to with snippet 1, two of the three TAs declared that the variable name ‘i’ used as an iterator in this snippet of code should be more descriptive and one of the students corrected the name, while the other decided to leave it as it is.

(24)

6.3 Code snippet 3

As seen in Table 3, the feedback that multiple languages are used was provided by the tool on the third snippet of code. All TAs agreed with this feedback and the students successfully corrected all names such that only one language was used.

The variable name ‘line’ was given as best example for the seventh scenario by the tool, but was not considered incorrect by the TAs, because they all thought the name was descriptive enough and should not be any longer. For this same reason, both students decided to not apply the feedback in the code.

The variable name ‘datum’ (translates to ‘date’ in English) that was considered incorrect by the tool for the seventh and eighth scenario, was not considered incorrect for either of those reasons by any of the TAs. The students, however, did use the feedback to correct this name. One of the students recognized ‘datum’ as incorrect for the reasons given for both scenario seven and eight and decided to make the name longer and thus more descriptive. The other student only recognized the name as a mistake for scenario eight and adjusted the name as suggested by the feedback for this scenario given by the tool.

The best example variable name for the eighth scenario that was given by the tool was ‘regel’ (translates to ‘line’ in English). Although all three TAs agreed that the name should be adjusted, because the variable represented something entirely different than a line, after being asked the question whether ‘regel’ would be descriptive enough, they all agreed that this was the case. Therefore, this example given by the tool was not a good example. Both students said the same thing as the TAs, but when coming up with a better name, they did apply the feedback for the eighth scenario to make sure the new name was long and descriptive enough.

For the tenth scenario the name ‘langste vriesperiode’ (translates to ‘longest freezingperiod’ in English) was given as best example. While two of the three TAs agreed that this name was too long, the other TA thought the name was correct, because in this case, shortening the name would make it not descriptive enough. One of the students had the same idea and decided to not adjust this name. The other student, however, did correct the name based on the feedback provided by the tool.

The variable name ‘i’ was used correctly in this snippet of code, according to all three TAs. In this case,

(25)

the name ‘i’ was used to iterate over a range of 21, thus making it a simple for loop counter. Neither of the students adjusted this name in the code.

One of the students decided to shorten the variable names ‘dooiperiode’ and ‘vriesperiode’ based on the feedback given for the tenth scenario. In contrast to the student, all TAs agreed that these names were fine and declared that they were not too long and descriptive enough.

(26)

7 Discussion

7.1 Average variable name length

The feedback about the average variable name length could be a good addition to the tool, but because of the way the TAs provided feedback, it was not possible to properly evaluate this. In future research, it would be a good idea to do a quantitative evaluation with many experts and ask them whether they think on average the variable names in one or more snippets of code are of sufficient length, too short or too long. However, the usefulness of this type of feedback is nonetheless difficult to evaluate, because it cannot be applied directly in code.

7.2 Multiple languages used

Based on the results from the TAs and students, the conclusion can be drawn that in all three snippets, the feedback given that multiple languages are used is valid and useful to the students. The only problem with this type of feedback is the use of multiple words in one variable name. For future design of such a tool, it would be a good idea to first split the variable name into separate words in order to reduce the number of false negatives.

7.3 Single-letter names

Feedback about the single-letter variable names also turned out to be valid and useful to the students. The TA’s feedback concurred with the ideas from related work, where single-letter variable names are in almost all cases not descriptive enough. The only exceptions are when the variable serves as a counter in a for loop or when the variable name is a widely accepted name, such as the coordinates ‘x’ or ‘y’.

7.4 More than twenty-three character names

Although in none of the three snippets a name consisting of more than twenty-three characters occurred, it became evident, especially from the pretest that names around this length are too long. The exact number, however, is still up for debate.

7.5 Final five scenarios

Many names that were considered incorrect by the tool for the final five scenarios were not considered incor-rect by the TAs. This does not necessarily mean that the feedback provided by the tool is not valid, because in some cases, the students did use the feedback to adjust the names to make them arguably even better. In contrast to this idea, there were many variable names that were considered incorrect by the tool that even the students did not adjust, which means that in deciding which names could be made better, the tool put out many actual false positives as well.

In these same five scenarios, there were also cases where the TAs agreed with the tool, but where the students decided to not adjust the particular variable name given. It could be that in these cases the feedback given was not clear or specific enough. One example of this is the name ‘list’ in code snippet 1. The feedback given here was that the name was too short and therefore likely not descriptive enough and the suggestion is made that the name could be made longer. Because ‘list’ is already an existing word, the students could have thought that it is descriptive enough, while according to the TAs, it should also be made clear what kind of list the variable represents. This is, however, not directly instructed to the students in the tool’s feedback, which could mean that for future design of such a tool, this type of feedback could be slightly altered.

(27)

From the results it does appear that in these five scenarios where only one example is given, the students do try to find the other cases of similar errors. The best example of this is the variable name ‘datum’ in code snippet 3. Both students studied the textual feedback, recognized this name as an error of that type and adjusted the name accordingly. Although it is difficult to conclude that students learn more from providing just one example, it does seem to be a successful strategy that can be used in future designs.

(28)

8 Limitations

The evaluation was conducted with only three TAs and two students. Although more of a quantitative evalu-ation could have led to interesting conclusions about the validity of the tool, this was not possible due to the number of TAs, students and time available for this research. Therefore, the decision was made to go with a qualitative approach, where the TAs would not only give a list of correct and incorrect names, but also the reason why certain names are incorrect. The part of the evaluation with the students turned out to take much more time than anticipated, but the results contained plenty of information to draw quality conclusions from.

The students that took part of the evaluation were also slightly more advanced than true beginner level programmers. They are students from the minor programming at the University of Amsterdam, which is a minor aimed at students that want to start programming. However, the students were near the end of the minor, which means that they had already followed most courses and were therefore slightly more advanced, which could have affected the results. Adding to this, the three snippets of code that they were asked to correct were not created by themselves, but were taken from an introductory Python programming course aimed at physics students. Because the students did not create the code themselves, there were likely many incorrect names that the students would have normally already corrected, even without the help of the feedback from the tool. It would be a better idea to have the students use the feedback to improve a snippet of code that they themselves have created, in order to see what they will actually learn from the feedback.

Another problem is that the three snippets of code that were used for the evaluation were not very large. In general, this means that there are less variable names and, thus, also less incorrect names to choose the best example from. Therefore, the strategy of choosing one best example is likely to be less effective in smaller snippets of code. The fact that the snippets of code were from a course for physics students adds to the problem. Many of the assignments were to code physics related formulas with many variable names such as ‘v x’ and ‘v y’, which are acceptable names for the velocity in the x and y direction (coordinates), but are very troublesome for the tool.

The values of the parameters are chosen based on related work, the pretest and intermediate testing in the design phase. Although much thought went into deciding upon these values, one can never be sure these values are truly optimal. A problem here is that it is not possible to single out one parameter to then find the optimal value for it, as some of the parameters might be dependant on one another and the tool needs values for all parameters in order to work. In addition, the intermediate testing is mostly based on the expertise of the designer, which could have led to subpar values.

(29)

9 Conclusion

The research question that has been attempted to answer in this project is the following:

How can we use the length of variable names to automatically provide appropriate and useful feedback aimed at students from an introductory programming course on the quality of variable naming in their code? To determine how to do this, previously conducted research was examined to discover when a variable name is too short or too long. It turned out that single-letter and more than twenty-five character names are in any case incorrect and cannot be justified [1, 8, 7]. The optimal average variable name length turned out to be between ten to sixteen characters [4, 8]. Anything from two characters to this optimal length could be too short, while anything from this optimal length to twenty-three characters could be too long, depending on how the variable is used in the code.

From the pretest resulted that, in student assignments, most of the variable names should be even shorter than described in this previously conducted research. From this result and intermediate testing of the tool it turned out that it would be better to lower the maximum length and optimal average variable name length by two characters.

Altogether, it appears that by using the length of variable names along with their position in the code it is, to some extent, possible to provide feedback aimed at beginner level programmers on variable naming in Python code. By examining the clustering of instances of a variable, the tool can decide whether a name shouldbe shorter or whether a name could be longer. It is, however, difficult to determine whether a variable name is descriptive enough using this method and, thus, to be certain a variable name should be longer, as the results show that plenty short names are considered to be descriptive enough by TAs. To minimize this problem, the tool can search for the best example out of all incorrect names, so that the given example is more likely to be incorrect, but this still does not ascertain that the given variable name is incorrect. Providing feedback on simpler cases, such as single-letter names, more than twenty-three character names and snippets of code where multiple languages are used, turned out to be valid and useful to the student.

Although more research is required to optimize this tool such that it can be used in practice, as the number of false positives provided by the tool, at this point, is too large, many aspects of the design of this tool, as discussed in the Discussion section (7), can be used in future designs of such a tool.

(30)

References

[1] Simon Butler, Michel Wermelinger, Yijun Yu, and Helen Sharp. Exploring the influence of identifier names on code quality: An empirical study. In Software Maintenance and Reengineering (CSMR), 2010 14th European Conference on, pages 156–165. IEEE, 2010.

[2] Andrew D Cohen and Marilda C Cavalcanti. Feedback on compositions: Teacher and student verbal reports. Second language writing: Research insights for the classroom, pages 155–177, 1990.

[3] LA Fregeau. Preparing esl students for college writing: Two case studies. the internet tesl journal [on-line], 5 (10), 1999.

[4] Narasimhaiah Gorla, Alan C. Benander, and Barbara A. Benander. Debugging effort estimation using software metrics. IEEE Transactions on Software Engineering, 16(2):223–231, 1990.

[5] Andrew Hunt and David Thomas. The Pragmatic Programmer: From Journeyman to Master. Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA, 1999.

[6] Dawn Lawrie, Christopher Morrell, Henry Feild, and David Binkley. What’s in a name? a study of identifiers. In Program Comprehension, 2006. ICPC 2006. 14th IEEE International Conference on, pages 3–12. IEEE, 2006.

[7] Robert C. Martin. Clean Code: A Handbook of Agile Software Craftsmanship. Prentice Hall PTR, Upper Saddle River, NJ, USA, 1 edition, 2008.

[8] Steve McConnell. Code Complete, Second Edition. Microsoft Press, Redmond, WA, USA, 2004. [9] Phillip A Relf. Tool assisted identifier naming for improved software readability: an empirical study. In

Empirical Software Engineering, 2005. 2005 International Symposium on, pages 10–pp. IEEE, 2005. [10] D Royce Sadler. Formative assessment and the design of instructional systems. Instructional science,

18(2):119–144, 1989.

[11] Ben Shneiderman. Software Psychology: Human Factors in Computer and Information Systems (Winthrop Computer Systems Series). Winthrop Publishers, 1980.

[12] Martijn Stegeman, Erik Barendsen, and Sjaak Smetsers. Towards an empirically validated model for assessment of code quality. In Proceedings of the 14th Koli Calling International Conference on Com-puting Education Research, pages 99–108. ACM, 2014.

(31)

10 Appendix

Code snippets used for evaluation

Snippet 1

1 import random 2

3 # roll dice, with variable a the number of sides of the dice

4 # and variable b the times you may roll the dice

5 def roll_dice(a, b):

6 number_of_dice_to_roll = 0

7 while number_of_dice_to_roll < b: 8 print random.randint(0, a)

9 number_of_dice_to_roll = number_of_dice_to_roll + 1 10 return "That’s all!"

11

12 # returns a new list with the cumulative sum of a list

13 def cumulative_sum(number_list):

14 # number_list is a list of numbers

15 new_list = [] 16 sum = 0 17 for i in number_list: 18 sum += i 19 new_list.append(sum) 20 21 return new_list 22

23 # sum_of_list takes a list as a argument and sums all the elements

24 def sum_of_list(n): 25 sum = 0 26 for i in n: 27 sum += i 28 29 return sum 30

31 # finds the position of a element in a list that is equal with a value

32 def find_position(a, b): 33 positie = 0

34 while positie < len(b): 35 if b[positie] == a:

36 return positie + 1 # er wordt er 1 bij opgeteld want het gaat om de

plaats

37 positie = positie + 1 38

39 return None 40

41 # finds maximum value of a list

42 def find_maximum(list): 43 max_number = 0 44 for i in list:

(32)

45 # tweede variabele, zo werkt de code ook voor negatieve elementen in de

lijst

46 max_number2 = i

47 # or statement vervangt de waarde van max_number ten alle tijde door

eerste element in de lijst

48 if max_number2 > max_number or max_number == 0 and i != 0: 49 max_number = max_number2 50 51 return max_number Snippet 2 1 import math 2 3 ’’’

4 Maakt een woordenboek door het eerste digit uit de lijst als key te maken.

5 ’’’ 6 list = [45, 90, 30, 33, 34, 64, 34, 28, 72, 20] 7 def sort_by_digit(n): 8 d = dict() 9 for i in n: 10

11 sec_digit = i % 10 # rest getal vinden 12 if sec_digit not in d: 13 d[sec_digit] = [i] 14 else: 15 d[sec_digit].append(i) 16 return d 17 18 # opgave 3.3

19 # bepaling of twee ballen botsen door de afstand te vergelijken met de som

20 # van radius 21 def ball_collide(a, b): 22 x1 = a[0] 23 x2 = b[0] 24 y1 = a[1] 25 y2 = b[1] 26

27 afstand = math.sqrt( (x2 - x1)**2 + (y2 - y1)**2) 28 som_radius = a[2] + b[2]

29 if afstand <= som_radius: 30 return "They are colliding" 31 else:

32 return "They are not colliding" 33

34 bal_A = (2, 3, 1) 35 bal_B = (8, 5, 3)

36 print ball_collide(bal_A, bal_B) 37

38 # opgave 3.4

(33)

39 # kijkt wanneer de snelheden moet worden omgedraaid, dus bij de randen 40 # van de doos 41 def ball_step(a, b): 42 start_x = a[0] 43 start_y = a[1] 44 vx = a[2] 45 vy = a[3] 46 47 new_x = (vx * b) + start_x 48 new_y = (vy * b) + start_y 49 50 x_min = 0 51 x_max = 10 52 y_min = 0 53 y_max = 10 54 55 if new_x >= x_max: 56 vx = -vx 57 if new_x <= x_min: 58 vx = -vx 59 if new_y >= y_max: 60 vy = -vy 61 if new_y <= y_min: 62 vy = -vy 63

64 return (new_x, new_y, vx, vy) 65

66 print "Ball’s new position:", ball_step((5, 5, 2, 1), 0.01)

Snippet 3

1 def langste_vries_periode_met_1_moment_dooi(opened_file): 2 """

3 @return: de langste periode waarin de maximale temperatuur onder nul is

gebleven,\

4 maar het wel 1 dag heeft mogen dooien in De Bilt

5 @param: opened_file is het eerder geopende bestand waarvoor de functie wordt

uitgevoerd 6 """ 7 # benodigde tellers 8 vriesperiode = 0 9 dooiperiode = 0 10 langste_vriesperiode = 0 11

12 # De eerste 21 regels moeten worden overgeslagen om bij de data in het

bestand aan te komen

13 for i in range(21): 14 opened_file.next() 15

(34)

17 # deelt de data op in strings, zodat temperatuur en datum als

afzonderlijke elementen worden gezien

18 regel = line.split(",") 19

20 #Zodra de maximum temperatuur onder 0 komt begint de teller en wordt deze

opgeslagen\

21 # wanneer deze langer is dan de eerder opgeslagen langste periode waarin

het heeft gevroren

22 if int(regel[3]) < 0: 23 vriesperiode += 1

24 if vriesperiode > langste_vriesperiode: 25 langste_vriesperiode = vriesperiode 26 datum = int(regel[2])

27 else:

28 # De temperatuur mag 1 dag boven 0 komen en de teller dooiperiode

begint dan met tellen

29 dooiperiode += 1 30 vriesperiode += 1 31

32 # zodra het langer dan 1 dag dooit/de temperatuur boven 0 komt, worden

de dooiperiode\

33 # en vriesperiode counter opnieuw op 0 gezet

34 if dooiperiode > 1:

35 vriesperiode = 0

36 dooiperiode = 0

37 else:

38 # vriesperiode incl. met 1 dooidag worden opgeslagen als

langste_vriesperiode\

39 # en bijbehorende datum

40 if vriesperiode > langste_vriesperiode: 41 langste_vriesperiode = vriesperiode

42 datum = int(regel[2])

43

44 return langste_vriesperiode, datum

Automatically providing feedback aimed at student programmers on the quality of variable naming in their code.