Regarding the Impact of Code Smells on the Energy Efficiency of Different Computer Languages

(1)

Bachelor Informatica

Regarding the Impact of Code

Smells on the Energy Efficiency

of Different Computer Languages

Sander van Oostveen

June 15, 2020

Inf

orma

tica

—

Universiteit

v

an

Ams

terd

am

(2)

(3)

Abstract

In order to meet the Energy Efficiency Directive of the European Commission, it is imperative to develop energy efficient software. Previous research has shown that differ-ent computer languages have differdiffer-ent energy consumption rates compared to each other. Additional research has shown that code smells can also impact the energy consumption. Following these findings, we aim to research if code smells impact the energy consumption of software projects written in a different computer language each, and if this impact is different for the different languages. For this purpose we designed a framework to conduct research on the impact of code smells on the energy consumption for different languages. We use this framework to show that in general for the languages C++, Java and Python, contrary to our expectations, adding the ’Long Function’ code smell to the software projects increased the energy consumption. However, we found in some cases that no significant impact could be found for C++, and a single case were the impact decreased the energy consumption for Python. As such we conclude that the impact of the ’Long Function’ code smell is different for different languages.

(4)

(5)

Introduction

It is commonly accepted among most scientists that climate change is an increasing concern on a global scale. One of the key causes of climate change is the release of greenhouse gasses emitted by the burning of fossil fuels for the production of electricity, amongst other things[9]. In order to decrease these emissions, policies and directives such as the 2012 Energy Efficiency Directive(EED) by the European Commission1_{have been proposed. These policies are aimed at}

reducing the amount of energy produced, and improving the efficient usage of energy.

As the energy consumption of ICT devices and services is growing at a faster rate then the worldwide energy consumption, research into increasing the energy efficiency of this sector is increasingly relevant towards achieving the goals of the EED. While much research has been focused on improving the efficiency of hardware, more recently research has been conducted on the energy-efficiency of software.

1.1 Problem Statement

In this thesis, we expand on the research conducted by Koedijk in 2019 [11] on the impact of computer languages on the energy consumption of a software project, and by Kok in 2019 [12] on the impact of refactoring code smells on the energy consumption of java programs. These findings could indicate that good programming practices, such as refactoring code smells in a software project, could have negative impacts on the energy-efficiency of said software project. On the other hand, the choice of computer language used to write a software project with can impact the energy-efficiency of the project. Understanding how the presence of code smells impact the energy-efficiency of software projects each written in a different language can therefore promote energy-efficient programming practices. Unlike most related research, rather than looking at refactoring existing code smells from the original software project, we look at the impact of adding in a code smell to the original project.

1.1.1 Research Questions

In this research we aim to answer the following questions:

1. Does the presence of a code smell impact the energy consumption of software projects each written in a different computer language?

2. Is the impact of a code smell on the energy consumption of a software project different between computer languages?

To answer these questions, we must first answer the following sub-questions:

• How do we add code smells in software projects that are comparable between languages?

(8)

• How do we measure the impact of the presence of code smells on the energy consumption?

1.1.2 Research Method

To answer these questions we conduct an experiment where we measure the energy consumption of a computer system during the execution of specific software projects. These projects are the variable input, while the energy consumption is the output, keeping environmental factors such as system and method of measurement constant.

Data Preparation

To conduct these experiments we prepare a selection of software projects. These projects must be written in different, specific languages. However, they must implement the same functionality using the same algorithms, while also maintaining a similar structure at code level. This allows us to measure the differences caused by the computer language the project was written in. As these projects are structured in the same manner between the languages, we can add code smells to alter the structure of the program in a comparable way. As such, we require a source of software projects fitting these requirements, and a selection of code smells regarding the structure of software projects written in the selected languages.

We may source our software projects from codebases such as the Computer Languages Bench-marks Game2 or from existing libraries implementing well known algorithms. The benefit of relying on the Benchmarks Game over libraries is that this provides a source of software projects purposefully written in different computer languages but in a comparable way, so as to highlight the differences in performance between the languages. The measurement of energy consumption would therefore be in line with the purpose of this data set.

Adding in Code Smells

Given a selection of software projects written in different computer languages so that they are functionally and structurally comparable, we may add changes at code-level. In order to compare the impact of these changes between languages, it is important that the change we add is not specific to any such language. Hence, we opted to use a code smell impacting the overall structure of the software project, as this would be the same amongst our selection of software projects. Any change we make to this structure would be reproducible between different languages, and its impact on the energy consumption between different languages is measured.

Measuring the Energy Consumption

In order to measure the impact of a code smell on the energy consumption of a software project implemented in a specific language, we must measure the energy consumption. Such measure-ments can be done through either a hardware approach, a software approach, or through a theoretical model. While relying on hardware improves the overall accuracy of the measure-ments, it is not readily available. A software approach on the other hand is accessible for most systems, but has reduced accuracy over the hardware approach. While a theoretical model could highlight inherent differences in the proposed power consumption of different computer languages for different software projects, this approach would be more time consuming than the previous two approaches. Due to these restrictions, a software approach was selected based on the RAPL counters present on most Intel Chips.

1.2 Contributions

Through this research, we aim to make the following contributions:

1. A framework for performing energy consumption measurements on code smells3_.

2_{https://benchmarksgame-team.pages.debian.net/benchmarksgame/} 3_{https://github.com/sandervano/GreenCodeSmells}

(9)

2. A codebase of functionally and structurally comparable software projects each written in a different language.

3. A methodology with which to compare the impact of code smells on the energy consumption of a software project with.

1.3 Ethical Implications

Through this research, we aim to highlight the necessity of evaluating current programming prac-tices to include the energy efficiency of software projects. By showcasing that current pracprac-tices such as the refactoring of code smells, may negatively affect the energy efficiency of a software project, we may develop a practice of energy-efficient programming.

Through such practices, and the increased awareness the impact software development has on the energy efficiency of a project, we can contribute to the reduction of the consumption rate of the ICT sector. In accordance with the EED, we aim to increase the energy efficiency of the ICT sector, with the ultimate goal of minimising the increase in global warming due to the production of energy from fossil fuels.

1.4 Outline

First, we discuss the various pieces of background information used in this research to support our methodology. After this we provide a general overview of the framework we designed for conducting our research. Here we highlight the choices and assumptions that must be made to conduct our research, and give our expectations on the impact of some of those choices. Following this, we explain which of the previously highlighted choices we made and why in the experimental setup. In the subsequent chapter we show the results of our experiment. Following this we discuss our results and their implications in the discussion. In the next chapter we briefly discuss the related research, and how it links to our own research. Lastly, we conclude our research in the conclusion.

(10)

(11)

CHAPTER 2

Background

In this chapter, we introduce and highlight several terms used throughout the paper to support our research. Our aim is to give a broad explanation of these concepts and their relevance to our work.

2.1 The Running Average Power Limit Counters

Starting with the Sandy bridge architecture, Intel chipsets have been supporting the Running Average Power Limit(RAPL) counters. These counters store estimates of the power and energy consumption of the different parts of the system. These estimates are based on readings from a mix of on-chip hardware counters and I/O models [5].

Figure 2.1: Diagram of the different domains of the RAPL counters [16]. The Pack-age contains readings of the energy and power consumption of the entire processor, including powerplane 0(PP0) and powerplane 1(PP1). Powerplane 0 includes read-ings for the cores on the processor, and powerplane 1 for an on-core GPU. DRAM provides readings for the system memory, and is not included in the package.

The RAPL counters are split into several domains to allow for specific readings of the different parts of the processor and memory, as shown in Figure 2.1. The top domain is the package, con-taining readings of the energy and power consumption of all the underlying powerplane domains and any other uncore devices located on the processor. The packages together form the entire processor. Most client processors only have a single package, Package0, while server processors tend to have two. Inside the package domain is the core powerplane(PP0). This provides collec-tive readings of the energy and power consumption of the processor’s multiple cores. Next is the uncore power plane(PP1). This domain contains readings on the consumption of the integrated graphics(GPU) or other uncore devices, but is not always supported by every chipset. It holds that the energy and power consumption of the package is equal or larger than the energy or power consumption of PP0 and PP1 combined. Lastly, the DRAM domain provides readings for the memory, separate from the processor.

(12)

2.2 Definition of Code Smells

Category Code Smell Description

Bloaters Mysterious Name Code needs to be straightforward and clear Long Function Long functions should be decomposed

Large Class Class has too many methods, fields or parameters Primitive Obsession Too many variables that belong in a class

Long Parameter List Too many parameters in a method

Loops Loops are deprecating and should be replaced with first-class functions

Data Clumps A method requires a lot of variables that could be in a class

Object-Orientation Abusers

Repeated Switches Make it more difficult to add features and should be replaced with polymorphism

Temporary Field Variables in class are rarely used Refused Bequest Wrongly inherited subclass Alternative Classes with

Different Interfaces

Classes with similar behaviour should have a com-mon superclass

Change Preventers

Divergent Change A class should have only one reason to change Shotgun Surgery A single change should not affect multiple classes Global Data Global data can be modified from anywhere without

knowing what causes it. This makes it error prone and more costly to add features

Mutable Data Changes to data can lead to unexpected conse-quences

Dispensables Comments The need of comments hints at other code smells Duplicate Code Code semantics already exists

Lazy Element Rarely used class Data Class Only data in class

Speculative Generality Unused method, field or parameters in a class Couplers Feature Envy Extensive usage of another class

Insider Trading Modules should be localised, decreasing the messag-ing between modules as much as possible

Message Chains Classes who request a class that requests a class, etc. Middle Man One class that only dedicates other classes

Table 2.1: The code smells as identified by Fowler [8]

In his book published in 2018, Fowler identifies 24 code smells [8] as shown in Table 2.1. These code smells indicate patterns in pieces of code that reduce their readability or maintainability. Identifying and refactoring these patterns thereby helps improve the quality of the code in a software project. Such smells can be categorised based on their impact on the software project.

2.3 Statistical Analysis

To verify the data retrieved from our experiments, we must perform a statistical analysis. This analysis consists of an anomaly detection followed by a significance analysis.

2.3.1 Anomaly Detection

Anomalies are data points in our results which behave differently than the general data points. The presence of such points can skew the further analysis of the results. As such, it is desirable to detect and remove these anomalies. Some clustering algorithms are suitable for detecting

(13)

anomalies. By forming clusters of the general data points, any points outside the cluster can be labelled as anomalies.

Figure 2.2: Cluster labelling by DBSCAN [4] with minimal neighbors at four. Here, the red dots are core points, the yellow dots are border points, and the blue dot is an anomaly. Within the fixed area, core points have at least the minimum amount of neighbors, while border points have less than the minimum amount but do share a border with a core point. Anomalies do not have the minimum amount of neighbors, nor do they border a core point.

One such algorithm is DBSCAN. The algorithm counts the amount of points in a predefined area around the given point. If this amount is equal or larger than a predefined minimal amount, this point is labelled as a core point. Any point within the predefined area of a core point is a part of the cluster formed by the connected core points. If a point is connected to a core point, but does not have the minimal amount of neighbors, this point is labelled as a border point. A point not connected to any core points is outside of the cluster, and is considered an anomaly. Choosing the Parameters for DBSCAN

The two parameters of DBSCAN must be chosen based on the data it is used on. According to Ester et al. [7], we can set the minimal number of points neighboring a core point for two dimensional data, including itself, at four. The radius of the area in which the minimum amount of neighbors must be is called the eps distance. Based on the minimal number of neighbors, we can determine the eps distance by finding the distance to the fourth nearest neighbor for every sample on our data. If we rank these distances from greatest to smallest, our optimal eps distance can be found as the first knee point in the rankings. The knee point(or elbow point) of the distances is the point where the curve formed by the ranked distances visually bends from a steep slope to a nearly flat slope.

2.3.2 Proving Statistical Significance

Having detected anomalies, we must prove the statistical significance of our results. We may show the significance of the impact on the energy consumption due to the addition of the code smell by proving that the samples from the original and from the smelly measurements are from different distributions for a specific problem written in a specific language. Additionally, we wish to know the direction of the impact.

The aim of a statistical test is to reject a null hypothesis in favor of an alternative hypothesis. By rejecting the null hypothesis we can claim that the alternative hypothesis is significantly true. For this purpose, we derive some statistic from our sample distribution called the value. This p-value is the probability of obtaining a result given that the null hypothesis is true. If this p-p-value is smaller than a predetermined significance level(−value), we may reject the null hypothesis. The significance level indicates the probability of rejecting the null hypothesis when it is true. Typically, the significance level is set at 0.05.

(14)

The Mann-Whitney U test

In their research, both Koedijk [11] and Kok [12] use the Mann-Whitney U test [13] to prove the significance of their results. This test looks at the probability that a sample from one distribution is greater than a sample from another distribution. If this probability is at 50%, the two samples belong to the same distribution. There are two Mann-Whitney U tests: The one-sided and the two-sided tests. While the two-sided test has as an alternative hypothesis claiming that the two samples are from different distributions, the one-sided test has as alternative hypothesis that says the first sample is from a greater distribution [14].

To find the direction of the difference between the distributions, we must use the one-sided test twice. We use the greater one-sided Mann-Whitney U test to determine the probability that a sample from the first distribution is larger than a sample from the second distribution. The lesser one-sided Mann-Whitney U test determines the probability that a sample from the first distribution is smaller than a sample from the second distribution. If this p-value of the greater one-sided test is smaller than our significance level, we can say that the first distribution is significantly smaller than the second distribution, and vice versa for the lesser one-sided test.

(15)

CHAPTER 3

Design of the Framework

In this chapter we discuss the design of the framework used to measure and analyse the impact of code smells on the energy consumption of different computer languages. The framework we design measures the total energy consumption of the system during the execution of software projects solving the same problem using the same implementation, but written in different languages. Within a specific language and for a specific problem, the energy consumption of a project with and without a specific code smell may be compared to determine the impact the code smell has on the energy consumption of the language. This impact may be compared between languages to determine if certain computer languages are more or less energy-efficient in regards to certain code smells.

The framework is split up in three parts, as shown in Figure 3.1: a part specific to the featured languages, a part specific to an execution environment, and a part specific to the system hardware.

Figure 3.1: Diagram of the measurement framework. The framework measures the impact of code smells on the energy consumption for different computer languages. For the specific languages, the user must supply their own codebase containing software projects both with and without a specific code smell. The framework executes these projects and measures the energy consumption of the system for the duration of the execution. Each measurement is repeated a n number of times. The measurement setup is specific to the used hardware.

(16)

3.1 Creating the Codebase

To support measurements of the impact of code smells on any computer language, the framework is supplied with the creation of a Codebase featuring multiple problems. Each problem features a software project solving the problem written in the language for each specific language. The choices of which languages, what problems and which code smell to feature are not unrelated to each other, and each of these choices has several criteria they must satisfy.

3.1.1 Selection of the Computer Languages

The selection of computer languages to be featured in the framework is based on several criteria. Each language provides their own implementation for all the specific problem sets making up the codebase. As such, the selection of featured languages is not separate from the selection of the codebase. Nor is this entirely separate from the selection of the specific code smells featured in the research. However, the choice of the featured languages is the most practical to start with. As such, the most important criteria for the selected languages is their structural similarity.

When selecting the computer languages, it is important that the selected languages are struc-turally similar to each other. In order to compare the differences between languages, the changes that are added to the software projects in different languages must be similar. As the added code smell changes the overall structure of the project, this means that the structure of the projects must be similar for the different languages if the change is to be similar too. Languages closely related and using the same paradigms are therefor more suited to be selected together to be used for the codebase.

3.1.2 Requirements for the Codebase

The codebase consists of a series of problems solved by a specific algorithm. For each selected language and for each problem, there exists a software project in the codebase solving this problem which is also written in the specific language. For a given problem, the software projects solving it must meet the following criteria:

• For each selected language and for each problem, at least one software project implementing the problem in the language must be present.

• The software projects must solve the same problem with the same solution • The software projects must use the same algorithms to solve the same problem

• The software projects solving the same problem must have the same structure at code-level. • The structure of the software projects must allow for the addition of the selected

code-smell(s).

Additionally, the choice of problems used in the codebase must take into account the parts of the system used most intensively to solve the problem. The energy consumption of a software project is the sum of the consumption of the CPU, memory, and disk [1]. Parts of the system that are used more intensively consume more energy. Choosing problems known to rely heavily on a specific part of the system therefore allows for more predictable readings of the energy consumption behavior of different parts of the system during the execution of the problem. Additionally, due to hardware limitations, the parts of the system available for measurements of the energy consumption may be limited, and this can impact the accuracy of the readings. Problems are therefore chosen with their system usage in mind.

Given these criteria, the following assumption can be made of two software projects, each written in a different computer language, solving the same problem:

Assumption 1 As the performance of these software projects can be compared between the used languages, the impact of the same code smell on the energy consumption for the used languages may also be compared.

(17)

As of the time of writing, we have found no research supporting or opposing this assumption. To prove or disprove this assumption would require further research on the comparability of the impact of code-level changes between different computer languages. This however would be outside the scope of this research.

3.1.3 Expected Impact of the Code Smell

To determine this impact, a code smell must be selected to be featured in the research. Although the framework and codebase can potentially be reused for multiple code smells, each experiment measures only the impact of a single code smell. The selection of this code smell is based on its expected impact, and the availability of tools used for detection and refactoring.

In his thesis, Kok [12] provides his expectations on the impact refactoring has on the energy consumption in Java for 22 out of the 24 code smells identified by Fowler [8]. Table 3.1 shows our expectation for some of the code smells generalised for multiple languages. Additionally, the expectations for the ’Mysterious Name’ and ’Comments’ code smells are added, as these where not broadly discussed by Kok.

Code smell Refactoring Method Impact on Energy consumption Long

Function

Extract Function Extracting a new function from a ’Long Function’ adds an ex-tra function call on the stack. This is expected to increase the energy consumption. However, certain compiler optimisations can remove this overhead by inlinging the code. This suggests that refactoring ’Long Function’ will either increase the energy consumption, or will have no impact on compiled languages. In line with these expectations, Dhaka et al. [6] and Kok [12] for Java, and Park et al. [17] for C++ found that ’Extract Function’ refactoring increased energy consumption.

Inversely, adding the ’Long Function’ code smell to a software project is expected to decrease the energy consumption. Primitive

Obsession

Replace Data Value with Object

When a primitive type is wrapped in an object, this is called boxing. As an object, this data will take up more memory than as a primitive datatype. This will increase the memory

required by the project and is thus expected to increase the energy consumption.

Replace Type Code with Class

Replace Type Code

with Subclass Additionally, certain languages such as Java and C# support features such as autoboxing. This could lead to continued boxing and unboxing, which has been shown to significantly increase execution time for Java [2]. For these languages it is expected that this will further increase the energy consumption.

Loops Replace Loops with Pipeline

As the Logic between loops and pipelines is different, the im-pact on the performance will depend on the usage. As such, the impact this will have on the energy consumption is unclear. Mysterious

Name

Rename Function As renaming ’Mysterious Name’ does not necessarily alter the length of the name, this is not expected to have an impact on the energy consumption.

Rename Class Rename Parameter Refused

Bequests

Push Down Field To refactor ’Refused Bequests’, unused class functionality is removed. As these parts of the class are removed, the size of the class will decrease. This will reduce the amount of memory needed when loading the object, and this could reduce the energy consumption.

Push Down Method Replace with Dele-gate

(18)

However, some compilers might support dead code elimination optimisations. Therefore it is expected that refactoring ’Refused Bequest’ will either decrease the energy consumption or have no impact.

Global Data

Encapsulate Field Encapsulating global data is likely to increase overhead, although this might be optimised by the compiler. As such, it is expected that this will either increase the energy consumption or have no impact.

Mutable Data

Encapsulate Field Similarly to ’Global Data’, we expected that refactoring with ’Encapsulate Field’ will either increase the energy consumption, or have no impact.

Replace Derived Variable with Query Combine Functions into Class

Replacing a ’Derived Variable’ with a ’Query’ only moves logic to the right method. As the necessary calculations remain the same, it is expected that this will not impact the energy consumption.

Combine Functions into Transform Change Reference to Value

Combining functions into a class will increase the overhead and memory usage, similar to the ’Large Class’ code smell. As such, it is expected that this will increase the energy consumption. Combining functions into a transform behaves similar to wrap-ping functions into another function. As this is similar to ’Long Class’, it is expected that this will increase the energy consump-tion.

By changing a reference to a value, a primitive is swapped for an object. As such, it is expected that similar to ’Primitive Obsession’, this refactoring will increase the energy consumption, or have no impact.

Duplicate Code

Extract function It is expected that extracting the function will have a similar effect to refactoring ’Long Function’ and increase the energy consumption, or have no impact.

Pull Up Method

However, as the amount of code is decreased, this also decreases the amount of code needed to be compiled or interpreted. Fur-ther compiler optimisations could also lead to the more used function to be cached which could further improve performance. This is expected to decrease the energy consumption.

Comments Extract Function Extracting a function or Introducing an assertion to improve code readability increases the overhead. Therefore this is expected to increase the energy consumption.

Rename Function Introduce Assertion

Table 3.1: The expected impact of refactoring methods on the energy consumption for several code smells, adding to the expectations provided by Kok [12].

We expect that adding a specific code smell to a software project has the opposite impact compared to refactoring on the energy consumption of the project (see Table 3.1).

Additionally Koedijk [11] showed that the energy consumption of a software project scaled with the total execution time. When the execution time increases or decreases due to the addition of a code smell, it is expected that the energy consumption would increase or decrease in a similar manner. However, the way specific code smells impact the execution time is not clear, and may be specific to the software project it is added to.

(19)

3.2 The Execution Environment

Given a codebase that meets the the discussed criteria, the framework can begin executing each software project in the codebase and measure the energy consumption on the system during program execution. While the execution of the experiment is relatively straight-forward, there are several kinds of measurements that can be taken during execution.

3.2.1 Execution of Experiment

The experiment consists of executing each software project provided in the codebase. By re-peating the execution and measurement of a single software project a set number of times, the distribution of measurements for the software project may be determined. For a normally distributed set of samples, it is typical to repeat the measurements at least 30 times to proof sta-tistical significance. However, Koedijk [11] found that for several cases the energy consumption of software projects did not follow a normal distribution. As such, it is recommended to repeat the measurements at least 50 times or more.

During each of the repeated measurements, the software project is executed using the same parameters. For the duration of the program execution of the project, the framework measures the energy consumption of the system. Before and after the repeated execution of each software project, measurements are taken of the system while idle. The average of these ’idle measure-ments’ may be used to determine the system and framework’s idle energy consumption during the repeated execution of the project. This consumption is subtracted from the measurements during the analysis of the results, isolating the energy consumption of the software project. From these results, the total energy consumption of a software project during its execution time can be determined.

3.2.2 Performing the Measurements

During the execution of a software project by the framework, there are two kinds of measurements that can be taken by the framework with which the total energy consumption of the software project can be determined. Either the current system power can be measured, or the cumulative energy consumed by the system.

The Current System Power

The current system power measures the power used by the system at the moment of measurement. The energy consumption of a software project can be determined as the sum of the averages of every two consecutive power measurements multiplied by the difference in time between these two measurements, as shown in Equation 3.1.

E =X(tn+1− tn) ∗

Pn+ Pn+1

2 (3.1)

However, the power readings are in general less accurate as compared to the cumulative energy consumption. Because the power is only measured between periods of time, the energy con-sumed between two measurements can not be taken into account. As the time period between measurements increases, the accuracy of the resultant energy consumption decreases.

The Cumulative Energy

The cumulative energy consumed measures the total amount of energy consumed by the system since a specific starting point, depending on the method of measurement and hardware. This allows for a straightforward measurement, as the energy consumption of a software project can be determined by subtracting the cumulative energy consumed by the system before and after the execution of the project, as seen in Equation 3.2.

(20)

A disadvantage of this approach compared to the current system power is its sensitivity to disturbances. If during measurements a change occurs to the system outside of the measurements, this could temporarily increase the energy consumption. Even if this change is quickly reverted, the total cumulative energy measured remains skewed.

3.3 Hardware Specific Measuring

To measure the energy consumption of a system, the framework requires a hardware-specific implementation. The method to perform these measurements may differ given the available re-sources and system used for the experiment. Certain methods may provide more accuracy or other benefits, given the specifics of the research. As such this component should be selected based on the circumstances of the research itself, as no single method is universally applicable.

3.3.1 Method of Measuring

There are two primary methods to measure the cumulative energy consumption of a system: By using an external measuring device, or by using an internal software model specific to the system architecture.

External Hardware

An external measuring device allows for accurate measuring of the power and cumulative energy consumption of a given system. As these devices are connected through the power plug, they can directly measure the energy consumed by the system without impacting the system itself. The downside of such a method is the availability of such devices within the constraints of a research project. Furthermore, using such a device requires an additional setup to integrate its measure-ments with the framework. Most available external devices allow for the measuring of both the cumulative energy consumed and the current system power, although with varying degrees of accuracy. The researches conducted by Koedijk [11] and Kok [12] both used a Racktivity PDU to measure the current system power.

Internal Software

An internal software model is often more easily available, but at the cost of accuracy. Such models often make estimations of the energy consumption of the system over a small period of time. These estimations are based off of on-chip measuring devices. This means that while they allow estimates of the consumption of different parts of the system, they may not allow estimates for all parts of the system. As such, the available models and system parts they can measure must be taken into account when using this method of measurement. Extra care can be taken in selecting the code base to only include software projects that make intensive use of only the parts of the system covered by the model. These parts tend to be split between the CPU, memory, and disk.

A common example of a software model is the RAPL counters. These counters are available on most modern Intel, and some modern AMD chipsets. These counters allow for the measuring of both the cumulative energy consumed, and the current system power. On Linux, there are limited methods for reading these counters for both Intel and AMD chipsets.

The perf event interface allows for the reading of these counters for both Intel and AMD. This method does require more setup and is not implemented straightforwardly. Values read from the counters using this method must be converted to Joules or Watt. The constant used for this conversion is however dependant on the architecture, and must therefore be determined separately for each new system.

Instead, the powercap interface provided by sysfs allows for easy and straightforward readings of these values. It automatically detects the used architecture and determines the correct rate of conversion. As such, it provides readings directly in Joules or Watt. This method however does not support AMD chipsets, even if they have RAPL counters.

(21)

Lastly, with suitable access privileges it is possible to directly read the Model Specific Regis-ters (MSR) containing the RAPL values. This method again requires the conversions specific to the architecture of the chipset.

(22)

(23)

CHAPTER 4

Experimental Setup

In this chapter we discuss how the experiments where conducted, and how we applied our previous framework to conduct these experiments. First we explain the process of preparing our codebase, and the steps this entailed. Following this we detail our methods of measurements.

4.1 Data Preparation

Based on what we discussed in the previous chapter, we have build a codebase which consists of software projects. Each project implements a specific programming problem. For each prob-lem, a software project exists for the specific languages we have selected. These projects were structurally changed to add a specific code smell.

4.1.1 Selection of Relevant Computer Languages

We based the selection of the featured languages on the popularity of the language, and the availability of code smell detection tools supporting the languages. To determine the most popular computer languages, we relied on rankings provided by several external sources.

Github1_{is a version-control platform allowing many software developers to share and}

collab-orate on software projects. Each year they publish a number of statistics on the usage of their platform. As part of this they release a ranking of the most popular computer languages based on the number of contributors to git-repositories tagged with the specific language.

Another ranking of computer languages is provided by TIOBE2. Their company reviews and grades the quality of software projects for software developers. Using the amount of hits when querying a specific language on several popular search engines, they provide an index for the popularity of computer languages. This method however has a bias towards more difficult to learn computer languages. While these difficult languages might not be more popular than other languages, the amount of searches for this language might be higher due to its higher difficulty. Using a similar method, the PYPL3 _{PopularitY of Programming Languages Index counts}

the amount of tutorials found for a specific computer language on search engines to establish a ranking of most popular languages. Similar to the TIOBE index, this method suffers from the difficulty bias.

In Table 4.1 we show the top 10 rankings of all the discussed indexes. We find that in all 3 rankings, both Java and Python consistently rank among the top 3 most popular languages. While JavaScript scores high among two of the three indexes, it ranks considerably low on the other index, and is therefore not chosen to be featured in this research.

Amongst the languages researched by Koedijk [11], the languages C and C++ where found to consume the least amount of energy during program execution. Similarly, Pereira et al. [18] found that both C and C++ ranked highly among all languages in terms of energy efficiency. As

1_{https://github.com/} 2_{https://www.tiobe.com/}

(24)

Rank Github TIOBE PYPL

1 JavaScript C Python

2 Python Java Java

3 Java Python JavaScript

4 PHP C++ C# 5 C# C# PHP 6 C++ Visual Basic C/C++ 7 TypeScript JavaScript R 8 Shell PHP Objective-C 9 C R Swift 10 Ruby SQL TypeScript

Table 4.1: The top 10 ranking of computer languages according to Github, TIOBE and PYPL, based on the most recent rankings at the time of writing.

such we wanted to include one of these two languages, to feature several languages with different known rates of energy consumption.

To detect the presence of code smells, and show that our manual addition has increased the presence of our specific code smell, we use BetterCodeHub4. While BetterCodeHub supports the grading of code quality for both Python, Java and C++ among others, it does not support grading for the C language. As such, the languages C++, Java and Python where chosen to be featured. All three selected languages are Object Oriented, Imperative and C-like, and therefor have a similar structure. This makes them suitable for our codebase.

4.1.2 The Computer Language Benchmark Game

As previously discussed, we require a selection of software projects. The Computer Language Benchmark Game5 _{is an open-source Benchmark for different computer languages. Users may}

submit their implementation of specific problems written in different languages. Each submission must meet strict requirements for how it implements specific algorithms to solve its specific problem. All submissions are then executed and ranked based on a number of benchmarks. As such, the Benchmark Game provides a large source of software projects which are comparable between different languages. Both Koedijk [11] and Pereira et al. [18] used the Benchmark Game as the source for their codebase for their measurements on the energy consumption of different languages.

However, as our criteria for the codebase are stricter, we had to manually go through the software projects written in C++, Java, and Python for each problem to find suitable sources. For each problem, we needed to find a software project for each language with a similar structure. Of the 10 problems used by the benchmark, we found suitable implementations for only 2 of them: Fasta and Nbody.

The Fasta problem creates large DNA sequences and saves them. As such, this problem is considered memory intensive. Nbody simulates a model of the orbit of Jovian planets. This problem is therefore considered CPU intensive.

For each of these two problems, we found one software project solving this problem with a similar structure for each of the languages C++, Java, and Python.

4.1.3 Project Euler

In order to further expand our codebase, we use Project Euler6_{. This website hosts a large}

selection of problems intended to be solved using computer programming. Although it does not host implementations or solutions of these problems, users are free to implement these problems in their own way using whichever computer languages they desire. Many of these implementations

4_{https://bettercodehub.com/} 5

https://benchmarksgame-team.pages.debian.net/benchmarksgame/ 6_{https://projecteuler.net/}

(25)

Category Related Code Smell Detection metric Write Short Units of Code Long Function Lines of code Write Simple Units of

Code

Large Class McCabe Index

Write Code once Duplicate Code Lines of duplicate code Keep Interfaces Small Long Parameter List number of parameters Seperate Concerns in

Modules

Feature Envy, Insider Trading, Middle Man

number of incoming calls Couple Architecture

Com-ponents Loosly

Insider Trading, Message Chains, Insider Trading

number of incoming and out-going calls

Keep Architecture Com-ponents Balanced

- Lines of codes per component

Keep your Codebase Small

- Person-years

Automate Tests - Lines of test code

Write Clean Code Comments, Mysteri-ous Name, Speculative Generality

number of instances

Table 4.2: The categories used by BetterCodeHub to grade software projects. Sev-eral categories corresponds with one or more code smells present in the project. For each category we note the different metrics used to judge the quality of the code in relation to the category. The categories and metrics are based on the book written by Visser et al. [21] on writing maintainable Java software.

are made freely available online to serve as reference material for others trying to challenge Project Euler. This means that for these problems, there is a large source of implementations for several different computer languages from which we can select software projects fitting our criteria.

From the problems featured by Project Euler, we selected problem 96, Su Doku7_. _The

solving of a sudoku is a well known programming problem, assuring an expansive supply of software projects solving this problem. A common algorithm used to solve sudoku is the depth-first backtracking algorithm. This algorithm will visit the empty positions in a given sudoku in some order and fill in the possible values at that position. If no values are possible at a position, the algorithm moves back to a previous value to try out the next possible value at that position. This is repeated until the algorithm either fills in all empty positions and solves the sudoku, or tried all possible values at the initial position without finding a solution. As the algorithm must go through different values at different positions, it is a CPU intensive algorithm.

4.1.4 Selecting the Code smell

Now that we have the software projects, we selected a code smell to be added to the projects. As previously mentioned, our choice is limited to structural code smells which can be detected in our selected languages using the available detection tools. In this research, we use BetterCodeHub as our detection tool. BetterCodeHub uses several metrics to grade software projects for 10 different criteria to promote better quality of coding based on the work of Visser et al. [21]. A score is from 0 to 10 based on how many of the 10 criteria are met within certain bounds. Of these criteria, several are directly or indirectly based on the detection of code smells as can be seen in Table 4.2.

Based on the categories on BetterCodeHub which can detect only a single code smell, we have selected the ’Long Function’ code smell as it has been featured in several previous studies on the effect of refactoring (see Table 3.1). Therefore, we have a clear grasp of the expected impact of adding ’Long Function’ to a software project.

7

(26)

4.1.5 Applying the Code smell to the Codebase

Next, we add the ’Long Function’ code smell to our codebase. To satisfy Assumption 1, we must add the code smell in a structurally similar manner for each problem.

(a) The Make Repeat Fasta function body is added to the Main function.

(b) The Main function body and the Nbody Advance function body are added to the Make Nbody function.

Figure 4.1: Diagrams of the general structure of the Fasta and Nbody problems before and after adding in the ’Long Function’ code smell. The dark gray blocks indicate a function in the software project, and the lighter gray dashed box indicates the a function body added to another function.

Adding ’Long Function’ to Fasta

Figure 4.1a shows the structure of the software projects implementing the Fasta problem before and after adding the code smell. We remove the ’Make Repeat Fasta’ function and paste its function body inside the Main function. This increases the length of the ’Main’ function, and removes one function from the call stack. Any arguments given to the ’Make Repeat Function’ are now initialised in the ’Main’ Function to prevent issues with the new scope of the function body. We do not alter any functions that are used multiple times, moving these function bodies would result in the ’Duplicate Code’ code smell.

Adding ’Long Function’ to Nbody

We use a similar method to add the code smell to the software projects solving Nbody. As seen in Figure 4.1b, we remove the ’Main’ function’s body and add it to the ’Make Nbody’ function. Additionally, we remove the ’Nbody Advance’ function and add its function body to the ’Make Nbody’ function. This increases the length of the ’Make Nbody’ function, and removes one function from the call stack. Any arguments given to the ’Nbody Advance’ Function are now initialised in the ’Make Nbody’ Function to prevent issues with the new scope of the function body. We do not alter the ’Nbody Energy’ function, as this one is used twice, to prevent adding in the ’Duplicate Code’ code smell.

Adding ’Long Function’ to Su Doku

As in the previous cases, we add ’Long Function’ to the Su Doku problem by combining several functions into a single, long, function. As shown in Figure 4.2, we move the function body of the ’Is Sudoku Valid’ function to the body of the ’Is Sudoku Solved’ function. To the ’Main’ function, we add three hardcoded sudokus, to be solved sequentially by the algorithm. This is done as not all found implementations contained methods to read sudokus from user input, and is not changed between the original and smelly counterparts.

(27)

Figure 4.2: Diagram of the general structure of the Su Doku problem before and after adding in the ’Long Function’ code smell. The dark gray blocks indicate a function in the software project, and the lighter gray dashed box indicates the a function body added to another function.

4.2 System Specifications

Table 4.3 shows the system specifications used to run our framework. System Operating System Ubuntu 18.04.3

CPU Intel i7-6700K

DRAM 8gb DDR3

Table 4.3: System specifications used to run the framework on.

Each of the selected languages used the compilers with the versions as shown in Table 4.4 Language Compiler or Interpreter version C++ g++ 4:7.2 Java javac 11.0.7 Python python 3.6.9

Table 4.4: The compilers or interpreters and their version used to compile or inter-pret each computer language.

(28)

4.3 Energy Consumption measurements

To measure the energy consumption, we decided which specific metric to measure, and how to measure this metric.

4.3.1 Measuring the Cumulative Energy Consumption

The measuring of the cumulative energy consumption of the system is more prone to unintended interference from the system, but this can be mitigated by repeating the experiment. Further-more, the cumulative energy consumption gives a clear and accurate measure of the total energy consumption, as we do not need to take the average over a period of time as with current sys-tem power measurements. As such, we have chosen to directly measure the cumulative energy consumption.

4.3.2 Reading the RAPL Counters

We use the RAPL counters to measure the cumulative energy consumption. These counters are accessible on most modern Intel chipsets, with counterparts available on some modern AMD chipsets. Desrochers et al. [5] show that, although RAPL readings have a slight but constant offset against actual values, they do follow the general trend of the actual power consumption. The estimates are found to be most accurate when the system parts are used intensively, which fits our codebase of CPU and memory intensive algorithms. The RAPL counters on our used system return 3 sets of readings: The energy consumption of the Package0 domain, the Powerplane0 domain and DRAM. While Package0 encompasses the energy consumption of Powerplane0 along with the other parts of the CPU, DRAM measures the energy consumption of the memory outside of the CPU. As such, to measure the total system energy consumption, the measurements for Package0 and DRAM must be added together. While this does not take energy consumption by the disk into account, this is the largest system wide energy measurement that can be taken using the RAPL counters. In practice, we find that most of the energy is consumed by the CPU.

4.4 Dealing with Anomalies

Once measurements have been taken, they must then be converted from the cumulative energy consumption to the total energy consumption. For this, we use Equation 3.2 as described in Chapter 3.2.2. To minimise the impact of the idle energy consumption of the system and frame-work, we determine the average of the power consumption before and after the experiments for a single software project. We multiply this average power consumption with the execution time of the specific measurement to estimate the idle energy consumption during the measurement, and subtract this from the measurement to then derive the total energy consumption of a single measurement. However, these found samples of the total energy consumption are prone to error, and as such, anomaly detection is needed to remove samples clearly outside of the norm. For this, we use the DBSCAN to find clusters in our samples.

4.4.1 Determing the Eps distance

To use DBSCAN, we must determine the optimal eps distance. This distance is dependant on the distance between the samples of a single experiment, and must therefore be recalculated for each new set of samples. There are two methods for determining the optimal eps distance. Upon ordering the distances of each sample to their fourth nearest neighbor, the optimal eps value is either the first knee point of the graph, or the first valley. Figure 4.3 shows the ordered distances to the fourth nearest neighbor for the samples of the Java Nbody original software project. In it, we determined the first knee point using the Kneedle Algorithm, and the first valley by searching for the first local minimum in the second derivative of the curve.

We find that in most cases, using the knee point as the eps distance gives more accurate results. However, for some sets of samples, the Kneedle algorithm may fail to find a knee point if

(29)

Figure 4.3: The ordered distances to the fourth nearest neighbor of a cluster, plotted over its rank in the order. To determine the optimal eps distance, both the first local valley and the first knee point are determined. While the first knee point gives better acuracy, this cannot always be determined from the data, in which case the first local valley is used.

no clear curve is found in the ranked plot of fourth nearest neighbor distances. In such cases, we found that using the first local minimum of the second order derivative gives suitable accuracy.

4.4.2 Detecting Clusters Using DBSCAN

Figure 4.4: The cluster found by DBSCAN for a given set of measurements. The blue samples are labeled as part of the cluster, while the red crosses are detected anomalies.

Using the optimal eps distance found using the above method, we may now use the DBSCAN algorithm to find clusters in our set of samples. DBSCAN labels points to either be part of a cluster or to be anomalies, however, depending on the eps used and the sample distribution, eps might find more clusters than intended. As we expect only a single distribution from our findings, we regard all points labelled as part of a cluster, to be part of the same cluster. All other points we regard as anomalies, and discard those from further analysis and results. Figure 4.4 shows an example of how DBSCAN labels a set of samples. All blue samples are labeled as a cluster together, while the red crossed samples are found to be anomalies.

(30)

(31)

CHAPTER 5

Results

In this chapter, we present the results of our experiment. First, we show the grades given by BetterCodehub for the original and smelly version of all software project. We show the measured energy consumption for the original and smelly counterparts of the different problems for each of the languages C++, Java, and Python. We use boxplots to show the mean values and sample distribution of our measurements.

5.1 The BetterCodeHub Scores

Given the original and smelly versions for the three problems in the three languages, we test the score of each version for each software project on BetterCodeHub. Table 5.1 shows the score given for both versions, and notes the category that changed after adding the code smell which caused the change in score.

Software Project Original Score Smelly score Difference

C++ Fasta 8 8 No difference detected

C++ Nbody 8 8 No difference detected

C++ Su Doku 8 8 No difference detected

Java Fasta 4 5 ’Write Code Once’ improved

Java Nbody 7 6 ’Write Simple Units of Code’

worsened

Java Su Doku 5 6 ’Keep Interfaces Small’ improved Python Fasta 5 6 ’Write Simple Units of Code’

im-proved

Python Nbody 6 6 No difference found

Python Su Doku 4 5 ’Keep Interfaces Small’ improved Table 5.1: The scores given by BetterCodeHub for the original and smelly version of each software project. The last column shows which category changed the score. We note that in most cases, the score was not impacted, or improved because of adding the code smell.

5.2 Impact of the Code Smell on the Energy Consumption of C++

In Figure 5.1a we show the distribution of total energy consumption of the normal and smelly counterparts of the software projects solving the Fasta problem in C++. Of the 50 samples taken, 1 was discarded as an anomaly for the normal version, and 2 were discarded as anomalies

(32)

(a) Energy consumption of Fasta in C++. (b) Energy consumption of Nbody in C++. (c) Energy consumption of Su Doku in C++

Figure 5.1: Total energy consumption of normal and smelly counterparts imple-menting a specific problem in C++. The total energy consumption is the sum of the energy consumption of Package0 and DRAM1

for the smelly version. The mean energy consumption was found to be 0.1295 kilojoule for the normal version and 0.1293 kilojoule for the smelly version.

Figure 5.1b shows the distribution of total energy consumption of the normal and smelly versions of the software projects solving the Nbody problem in C++. Of the 50 samples taken, 4 were discarded as anomalies for the normal version, and 3 were discarded as anomalies for the smelly version. The mean energy consumption was found to be 0.1295 and 0.1338 kilojoule for the normal and smelly versions respectively.

Additionally, the distribution of total energy consumption of the normal and smelly versions of the software projects solving the Su Doku problem in C++ is shown in Figure 5.1c. Of the 50 samples taken, 3 were discarded as anomalies for the normal version, and 3 were discarded as anomalies for the smelly version. The mean energy consumption was found to be 6.495e-05 and 6.712e-05 kilojoule for the normal and smelly versions respectively.

(a) Execution time of Fasta in C++.

(b) Execution time of Nbody in C++.

(c) Execution time of Su Doku in C++.

Figure 5.2: Execution time of normal and smelly counterparts implementing a spe-cific problem in C++.

As shown in Figure 5.2, the execution time of the different software projects slightly mirrors the total energy consumption shown in Figure 5.1.

5.3 Impact on the Energy Consumption of Java

In figure 5.3a we show the distribution of total energy consumption of the original and smelly counterparts of the software projects solving the Fasta problem in Java. Of the 50 samples taken, 2 where discarded as anomalies for the original version, and 6 were discarded as anomalies for the smelly version. The mean energy consumption was found to be 0.1657 kilojoule for the original version and 0.1677 kilojoule for the smelly version.

Figure 5.3b shows the distribution of total energy consumption of the original and smelly versions of the software projects solving the Nbody problem in Java. Of the 50 samples taken, 2

(33)

(a) Energy consumption of Fasta in Java. (b) Energy consumption of Nbody in Java. (c) Energy consumption of Su Doku in Java.

Figure 5.3: Total energy consumption of the original and smelly counterparts im-plementing a specific problem in Java. The total energy consumption is the sum of the energy consumption of Package0 and DRAM2

were discarded as anomalies for the original version, and 4 were discarded as anomalies for the smelly version. The mean energy consumption was found to be 0.1875 and 0.1990 kilojoule for the original and smelly versions respectively.

Additionally, the distribution of total energy consumption of the normal and smelly versions of the software projects solving the Su Doku problem in Java is shown in Figure 5.3c. Of the 50 samples taken, 5 were discarded as anomalies for the normal version, and 8 were discarded as anomalies for the smelly version. The mean energy consumption was found to be 2.130e-03 and 4.552e-03 kilojoule for the normal and smelly versions respectively.

(a) Execution time of Fasta in Java.

(b) Execution time of Nbody in Java.

(c) Execution time of Su Doku in Java.

Figure 5.4: Execution time of normal and smelly counterparts implementing a spe-cific problem in Java.

As shown in Figure 5.4, the execution time of the different software projects slightly mirrors the total energy consumption shown in Figure 5.3.

5.4 Impact on the Energy Consumption of Python

In figure 5.5a we show the distribution of total energy consumption of the original and smelly counterparts of the software projects solving the Fasta problem in Python. Of the 50 samples taken, 4 where discarded as anomalies for the original version, and 2 for the smelly version. The mean energy consumption was found to be 2.273 Kilojoule for the original version and 2.278 Kilojoule for the smelly version.

Figure 5.5b shows the distribution of total energy consumption of the original and smelly versions of the software projects solving the Nbody problem in Python. Of the 50 samples taken, 3 were discarded as anomalies for the original version, and 2 for the smelly version. The mean energy consumption was found to be 15.66 and 15.39 Kilojoule for the original and smelly versions respectively.

(34)

(a) Energy consumption of Fasta in Python. (b) Energy consumption of Nbody in Python. (c) Energy consumption of Su Doku in Python.

Figure 5.5: Total energy consumption of normal and smelly counterparts imple-menting a specific problem in Python. The total energy consumption is the sum of the energy consumption of Package0 and DRAM3

Additionally, the distribution of total energy consumption of the normal and smelly versions of the software projects solving the Su Doku problem in Python is shown in Figure 5.5c. Of the 50 samples taken, 4 were discarded as anomalies for the normal version, and 3 were discarded as anomalies for the smelly version. The mean energy consumption was found to be 0.4171 and 1.039 Kilojoule for the normal and smelly versions respectively.

(a) Execution time of Fasta in Python.

(b) Execution time of Nbody in Python.

(c) Execution time of Su Doku in Python.

Figure 5.6: Execution time of normal and smelly counterparts implementing a spe-cific problem in Python.

As shown in Figure 5.6, the execution time of the different software projects slightly mirrors the total energy consumption shown in Figure ??.

(35)

CHAPTER 6

Discussion

In this chapter we discuss and interpret the implications of the results found in the previous chapter. We first discuss the significance of our findings for the different computer languages. From there we discuss the general impact we found the ’Long Function’ code smell to have on the energy consumption of software projects, compared to our expectations and the findings from related research. Following this, we discuss the impact this code smell had on the different computer languages of C++, Java and Python. Lastly we argue the validity of our research, based on our methodology and considerations discussed in previous chapters.

6.1 Acuracy of the BetterCodeHub Grades

As can be seen from Table 5.1, Codehub could not always detect a change in the quality of the software projects after adding the code smell, and when it did, it detected a change other than the ’Write Short Units of Code’ we expected to relate to the ’Long Function’ code smell. The main reason for this is that each criteria the grades are based on only detects if a specific code smell is present within a certain margin. These margins are set for each category based on BetterCodeHub’s own insight on what good code quality constitutes. If the change in the project adding the code smell is within the margin, than no difference will be detected by BetterCodeHub, and the score remains the same. Similarly, if the original project already contained the code smell, further adding in the code-smell will not change the grade, as the project is already above the margin for the specific project. Despite this, in both cases we may have made the same structural change in the software projects for two different language, satisfying assumption 1, but BetterCodeHub might not detect a difference in the score for the ’Write Short Units of Code’ category.

Additionally, other criteria can change after adding the ’Long Function’ code smell, as re-flected in the changes in the scores. However, upon inspection of the code, we found that in most cases, this was an arbitrary difference that we expect would not impact the energy consumption. In the case of ’Write Simple Units of code’, this change in the grade came because of moving function bodies from and to specific classes. In the case the function body was moved to a specific function in a class from outside, such as in the case of the Nbody problem as seen in Figure 4.1b, this therefor also increased the complexity of the class as a whole, while at code-level it only increased the length of a specific function. Opposite, moving the function body from within a class to outside would cause the opposite effect, and therefore be detected as improving the simplicity of units of code. Because of these reasons it is not expected that this would impact the energy consumption beyond the impact of the ’Long Function’ code smell.

In the case of ’Keep Interfaces Small’, this criteria is based on the number of parameters in a function’s call. If due to our addition of the code smell we remove a function call which previously had too many parameters, this would be detected by BetterCodeHub as an improvement in the score. However, as the parameters are now instead initialised as variables inside the function body of another function, we do not expect this difference to have an additional impact on the

(36)

energy consumption.

Lastly, there is the single case of the grade for ’Write Code Once’ improving for Java Fasta. Upon inspection we found that this was due to two functions in the original software project having a similar portion of code. In the smelly version, we do not however remove or otherwise change this portion of code, but moved the function body of one of these two functions to the main function, as described in Figure 4.1a. However, to prevent any interference, we renamed some variables in the moved function body which where used differently in the host function, such as the common loop variables ’i’ and ’j’. This caused the two similar portions of code to differ enough that they where no longer detected as duplicate code. Simply changing a variable name however is not expected to impact the energy consumption.

Based on these insights, we find that while the grading criteria used by BetterCodeHub are indicative of several relevant code smells, there are not accurate enough to be useful in the detecting the changes we made to the software projects. As discussed, we do not expect that the differences in the BetterCodeHub grades reflects on the impact of the code smell on the energy consumption of the software projects.

6.2 Significance of Impact on the Different Languages

For each of the following languages we show the significance of our findings. We used the one-sided Mann-Whitney U test twice to determine the probability that a sample from the smelly measurements is greater or lesser than a sample from the original measurements. If the greater p-value is smaller than the significance-value(α-value) of 0.05, we can show that the distribution of smelly samples is significantly smaller than the original samples, and vice versa for the lesser p-value. If neither p-values are smaller than the α-value, then we cannot reject the Null hypothesis that the two sets of samples are from the same distribution.

6.2.1 Impact on C++

p-value Rejects Null hypothesis? Problem Greater Lesser

Fasta 0.5158 0.4871 No Nbody 1.0 1.3139e-11 Yes Su Doku 0.6991 0.3035 No

Table 6.1: The p-values corresponding to the Greater and Lesser Mann-Whitney U test between the energy consumption of the original and smelly counterparts for the different problems implemented in C++. The Null hypothesis is rejected if either p-value is less than the α-value of 0.05

In Table 6.1 we show the p-values derived from the Mann-Whitney U test for the found distributions in Figures 5.1a, 5.1b, and 5.1c. As neither the greater or lesser p-values for Fasta are below the α-value of 0.05, we cannot reject the null hypothesis that the samples from the smelly version in Figure 5.1a are from the same distribution as the original version. For Nbody we find that the p-value for the lesser Mann-Whitney U test is below our α-value. This shows that the sample distribution of the smelly version in Figure 5.1b is significantly larger than the sample distribution of the original version. For Su Doku, we find that neither the greater or lesser p-values are below the α-value of 0.05. Therefore we cannot reject the null hypothesis that the samples from the smelly version in Figure 5.1c are from the same distribution as the original version.

6.2.2 Impact on Java

In Table 6.2 we show the p-values derived from the Mann-Whitney U test for the found distri-butions in Figures 5.3a, 5.3b, and 5.3c. For both Fasta, Nbody , and Su Doku, we find that the p-value for the lesser Mann-Whitney U test is below our α-value. This shows that for all three

Regarding the Impact of Code Smells on the Energy Efficiency of Different Computer Languages

Bachelor Informatica