Actionable Measurements - Improving The Actionability of Architecture Level Software Quality Violations

(1)

University of Amsterdam

Master’s Thesis

Actionable Measurements in Better Code

Hub

Improving The Actionability of Architecture Level Software

Quality Violations

Wojciech Czabański

Student Id: 11570946

wojciech.czabanski@gmail.com

supervised by

Dr. Paul Martin,p.w.martin@uva.nl, University of Amsterdam Dr. Magiel Bruntink,m.bruntink@sig.eu, Software Improvement Group

(2)

3.1.1 Internal . . . 15 3.1.2 External . . . 15 3.2 Cross-module cycle . . . 16 3.3 Cross-package cycle. . . 16 4 Research method 18 4.1 Assumptions . . . 18 4.2 Problem analysis . . . 18 4.2.1 SIG context . . . 18 4.2.2 General context. . . 19 4.3 Solution design . . . 19 4.4 Validation . . . 20 5 Experiments 21 5.1 Data sources . . . 21 5.1.1 Selection criteria . . . 21 5.1.2 Selected repositories . . . 21 5.2 Hotspot distribution . . . 22 5.2.1 Discussion . . . 23 5.3 Approach validation . . . 23 5.3.1 Discussion . . . 24 5.4 Detector . . . 24 5.4.1 Prototype overview. . . 25 5.4.2 Implementation. . . 25 5.4.3 Validation . . . 27 5.4.4 Discussion . . . 27 6 Prototype evaluation 29 6.1 Evaluation setup . . . 29 6.2 Refactoring recommendations . . . 29 6.3 Visualisation . . . 29 6.3.1 Rationale . . . 29 6.3.2 User journey . . . 30 6.3.3 Hotspot examples . . . 32

6.4 Evaluation with an expert panel . . . 34

(3)

7 Discussion 36

7.1 Discussion – prototype validation . . . 36

7.1.1 Demographics questions . . . 36

7.1.2 Modularity and static analysis questions . . . 36

7.1.3 Pre-experiment questions . . . 36

7.1.4 Post-experiment questions . . . 37

7.2 General discussion . . . 37

7.2.1 Improvements. . . 38

7.2.2 Limitations . . . 38

7.3 Answering the research questions . . . 38

7.3.1 Architecture smell impact and classification . . . 38

7.3.2 Facilitating architecture smell detection and actionability . . . 39

7.4 Threats to validity . . . 39

7.4.1 Conclusion validity . . . 39

7.4.2 Construct validity . . . 39

7.4.3 Internal validity . . . 39

8 Related work 40 8.1 Automated detection and refactoring of code smells. . . 40

8.2 Classifying architecture bad smells . . . 40

8.3 Software architecture recovery. . . 40

9 Conclusion 41 9.1 Recommendations for SIG . . . 41

9.1.1 Bundle chart recommendations . . . 41

9.1.2 Hotspot recommendations . . . 41

9.2 Future work . . . 41

Bibliography 45

Appendix A – Interview questions and scales 46

Appendix B – Interview transcripts 48

(4)

Abstract

The number of software systems to maintain is growing. Oftentimes, legacy software was not constructed according to principles of modular software design or the architecture components became tightly coupled, thus less modular during development time. This leads to an increased effort dedicated to software architecture refactoring, when such systems are being further developed and maintained. It is a difficult task to evaluate the severity of the architectural drift of a system and identify which areas and issues need to be addressed.

In order to remedy that, it is of high importance to be able to firstly, identify the risk profile of the coupling between the components where tightly coupled components present higher risk, because they make the system more difficult to understand, test and change in isolation. Furthermore we want to identify patterns that indicate specific design problems in the architecture-as-implemented. In the following thesis we utilize the concept of architecture hotspots to identify violations in architecture design and prepare a prototype tool that can identify and present hotspots to developers in a manner facilitating action. We then evaluate the tool by applying it to open-source projects and validate the approach through interviews with developers.

(5)

Acknowledgements

I would like to thank my supervisors – Paul Martin and Magiel Bruntink for constant feedback dur-ing the project, guiddur-ing advice and exploratory questions that allowed me to complete this project. Furthermore, I want to express my gratitude for the members of the Software Improvement Group for creating a very welcoming and inclusive environment and helping me over the course of my research. Lastly, I would like to thank my family and friends for their support during the most demanding moments.

Wojciech Czabański

Amsterdam, The Netherlands August 2018

(6)

1 Introduction

As software systems grow in size and complexity their maintenance becomes the dominant factor in the effort and costs of a software project, ranging up to 80% of the total costs [14].

Software maintainability is an internal quality of a software product that is aimed at describing the effort required to maintain a software system. Low maintainability has a detrimental effect on code comprehension. Glass [14] argues, that the most challenging part of maintaining software is understanding the existing product. What follows is, that code which is hard to understand is also difficult to modify in a controlled manner and test. If changes are difficult to introduce and code hard to understand, the probability of bugs being introduced is very high. This significantly raises the cost of developing and maintaining a software system.

However, there is also a human aspect related to maintainability. Codebases with low main-tainability tend to be difficult for developers to work in. Hard to maintain code is usually not only hard to understand, but oftentimes hard to test, prone to errors due to hidden dependencies and undocumented assumptions. A combination of the problems mentioned before eventually leads to brittle, unstable software which fails seemingly randomly. Not only this is detrimental to the user’s trust in the software, but it also negatively impacts the motivation and confidence of developers and the project managers who may simply not know which of the changes are breaking and which ones are not. Low motivation of developers can lead to a high turnover rate in the project and a large quantity of defects will lead to slower release schedule. Eventually, the pace of development would slow down drastically, further elevating the cost of maintaining the system. The human aspect however, will not be the main focus of the project.

An important question to answer is - how to measure maintainability? If it is such a crucial aspect of code quality, then it is of utmost importance to be able to tell to which degree a codebase is maintainable. Project managers risk vastly underestimating the budgets for system maintenance. Over time, numerous software metrics were developed to describe different aspects of software source code. The metrics can describe volume of the source code using the number of lines of code or function points. They can also describe complexity, for using example cyclomatic complexity. Other metrics can also describe cohesion of classes or even entire components. Different metrics are used to describe software on a multiple levels of decomposition - from units through classes to packages and components. The relationship between the values of those metrics - the measurements and maintainability needs to be specified using a maintainability model that is based upon empirical research and specifies how the selected metrics relate to the maintainability quality of a software system.

I will focus on the maintainability model created and used by the Software Improvement Group [15]. The model uses a single value - a real number ranging from 1 to 5 to describe the overall maintainability of a codebase. This number is computed from 4 properties of the code: analysability, changeability, stability and testability, also expressed as real numbers ranging from 1 to 5. The properties themselves are derived from specific source code metrics. For example, the analysability property is influenced by the "volume" metric, which is derived from the number of source lines of code in the codebase. From this model, 10 guidelines have been derived to help developers quickly evaluate the maintainability of their code and provide actionable suggestions increase its quality. The guidelines are implemented in a SIG tool called Better Code Hub [38]. The tool provides feedback to the developers on the quality of their code for different levels of abstraction: for functions - unit complexity, for classes and interfaces - small interfaces, separation of concerns and for components - keep components balanced, keep components loosely coupled.

In this master project, I will put emphasis on the architecture level quality, focusing on com-ponent independence. The goal is to provide developers with actionable guidelines, in addition to the diagnosis provided by the Better Code Hub tool, so that they can improve the maintainability of the code. Another desirable output would be a classification of common modularization prob-lems that occur in software code. Another welcome result would be a useful visualisation of the modularization violation "hotspots" for the maintainer.

1.1 Problem analysis

For most of the guidelines in the Better Code Hub Tool, it is relatively easy to suggest actions for developers to improve the score, however for architecture level metrics, such as Component Balancing and Component Independence, also referred to as modularity metrics, it is considerably more challenging. Currently, the tool provides and overview of components and the interactions

(7)

between them such as incoming and outgoing code to or from modules in other components, but does not provide much guidance so as to the actions that the developer could take to resolve the problem. In simple terms, the Better Code Hub provides a diagnosis, but does not say much about the treatment of detected modularity problems. Attempts have been made to generate sug-gestions for improving the modularity metrics by suggesting move module refactoring operations, framing the problem as a multi-objective search [20]. There were also different classifications of architecture problems, such as [12], [10], [24] and [31]. However, only the last one draws a clear connection between the architectural smells and maintainability measurements through the De-sign Rule Spaces [2] and the Dependency Structure Matrix. Kazman et al. adapt Baldwin and Clark’s concept to reconstructing software architecture and reasoning about the quality of the modularization in an earlier paper [40].

What has been identified is that even though in certain cases the suggestions are helpful, for most cases the context of the operation is missing and the refactoring operations would either not contribute to improving the modularity metrics or would cause the codebase to degrade or become less semantically consistent. Identifying patterns in poorly modularized code can be a starting point for devising recommendations to the code maintainer on how the component can be improved. The patterns can be detected by the Titan toolkit [41], however it is not yet known how actionable they are and whether the patterns can also be identified outside of the scope of the basic dataset from [31]. I intend to use the patterns detected by the Titan toolkit to conduct a study on open source projects to evaluate whether those patterns can be identified in other projects and whether they reflect actual problems in the codebase.

In order to validate the actionability of the hotspots I construct a hotspot detector based on the formulas described by Kazman et al. [31], integrate with the Better Code Hub analysis tool provided by the Software Improvement Group and provide a visualisation of the hotspots as a refactoring candidate for the developers. I then conduct a series of interviews in order to obtain information from the developers on the usefulness of the approach.

1.2 Research questions

The research questions addressed by this thesis can be divided into two main topics. For each topic we pose two subquestions.

1.2.1 Architecture smell impact and classification

The first topic can be addressed by answering these two questions:

1. RQ 1.1 – What architecture smells can be detected in open source software?

2. RQ 1.2 – How do the detected architecture smells impact the maintainability of open source projects?

By answering these questions I intend to address a threat to validity remarked by Kazman in the hotspot pattern classification article [31], as the number of investigated projects was low (9 open source projects) and code written in a single language (Java) was investigated. Suranarayana [36] notes that there are few empirical studies on architecture smells and there is not much insight on their characteristics and impact.

1.2.2 Facilitating architecture smell detection and actionability The second topic can be addressed by answering these two questions:

1. RQ 2.1 – How can architecture smells be detected automatically as a part of an automated source code analysis pipeline?

2. RQ 2.2 – How can architecture smells be presented to developers in a manner facilitating action?

Architecture smells can span across multiple classes and packages. It is very common for visualisations to include too many details, which blur the overall picture and make identifying the problems in the codebase more difficult. Answering the research question aims to provide a visualisation which outlines clearly illustrates the architectural problem.

(8)

1.3 Outline

In this section we outline the general structure of the thesis. In Chapter 2 we present the back-ground of the problems we have set out to solve: helping developers identify and act upon ar-chitecturally significant code anomalies. In addition to measuring software maintainability it also includes a broader view of program analysis and software architecture refactoring. In Chapter3

we present and discuss three examples of architecture smells that we intend to discover and visu-alize. The next chapter 4 is dedicated to describing the research method and the context of the project: what are the stakeholders, their problems and the criteria that a solution should satisfy. Furthermore, in Chapter5 we describe the experiment and the environment in which it will be implemented. It is followed by Chapter6containing the evaluation of the experiment and the pro-cessed results obtained over its course. Afterwards, in Chapter7we discuss the results and answer the research questions. In Chapter8we outline the related areas of work, which either influenced our work or which we found interesting. Lastly, in Chapter9 we discuss the conclusions from the project and outline areas for future work. We attach three appendices to our work. Appendix A

contains the interview questions and structure that we used for the prototype validation. Appendix Bcontains the transcripts of all the interviews we conducted. Lastly, Appendix C contains the tables with the answers to the polls that we conducted during the interviews.

(9)

2 Background

In this chapter we present all the background that we needed for our research. This involved looking into how software programs can be analyzed: dynamically and statically, how code quality can be measured and how software architecture can be reconstructed from source code and represented.

2.1 Program analysis

Program analysis [33] is the process of analyzing the behavior of a software program with regards to a certain property such as correctness, robustness or maintainability. In the thesis, the main focus is the maintainability property.

2.1.1 Static analysis

Methods of static program analysis focus on analyzing the program without executing it. Com-monly, the source code is used as input for static analysis tools and in certain cases other inputs are being used as well, such as revision history or metadata about the software system.

Syntactic analysis and software metrics Syntactic analysis and software metrics computa-tion involves analysing the source code of a system, often represented as a graph. We observe also textual representations and visual representations often encountered in CASE tools. Our focus in this thesis is the source code as a graph. Commonly used metrics are: module size in SLOC (source lines of code), module size in statements, comment line percentage, documentation line percentage, number of classes, number of methods per class, number of calls per method, num-ber of statements per method, maximum cyclomatic complexity, maximum module depth, average module depth. Computing software metrics along with the dependency graph analysis serves as a starting point for reasoning about a software system. For example, the cyclomatic complexity metric can be used to evaluate the testability of units within the codebase.

These metrics are by no means unique and are well documented in literature and implemented also in other tools. The tools that we have evaluated for this project are as follows:

1. Source Monitor1 _{- a freeware tool for calculating metrics for C#, C++, C and Visual Basic}

codebases. It calculates a basic set of metrics and allows browsing the source code.

2. Understand 2 _{- a commercial code analyzer, which, in addition to metrics can provide the}

dependency graph for over 15 programming languages.

3. Software Analysis Toolkit3 _{- a commercial tool available within the Software Improvement}

Group. It does not only compute the metrics, but exposes the source graph, which while being an intermediate analysis stage, allows us to reason about its structure through comparison with other code graphs and, if necessary implement other metrics which are not supported by default. It also supports parsing of a wide variety of languages, ranging from modern, such as Java or C# to older ones, such as COBOL.

4. SonarQube4- a commercial tool which analyses the codebase and provides a suite of metrics and a list of code quality violations that need to be addressed. It also computes the design rule hierarchy.

Considerations The business side of the research endeavour is to enhance the actionability of the architecture level violations while still addressing a research problem. For this purpose, the Source Monitor is too basic, because it only exposes the software metrics without the dependency graph. SonarQube is aimed at integrating it with a development and operations pipeline, making it difficult to use in a research setting with limited infrastructure. Understand alleviated most of the problems with the previous tools: requiring very little infrastructure, exporting the extracted dependency graph to a commonly used format and supporting a wide-enough variety of programming languages. It is however more difficult to integrate into the SIG infrastructure. The Software Analysis Toolkit

1_{http://www.campwoodsw.com/sourcemonitor.html} 2_{https://scitools.com/}

3_{https://www.sig.eu/} 4_{https://www.sonarqube.org/}

(10)

is in this case easier to integrate with the existing pipeline and this lead us to choosing the Software Analysis Toolkit as the tool to preprocess the source code for further analysis.

Semantic analysis and formal methods Formal methods are mathematical tools that can be applied to software source code to reason about its runtime behavior. The formals method we have investigated for the purposes of this project are: abstract interpretation and Hoare logic.

Abstract interpretation is a variant of static analysis. The reasoning behind abstract interpre-tation is that while concrete semantics of a program represent all its states during an execution [8], the number of all possible states that a program can be in is too high for any non-trivial program. Abstract semantics is a superset of concrete semantics that aims to cover all possible cases of concrete interpretation while still being computable.

Hoare logic is a formal system that can be used to reason about the correctness of programs based on asserting that two sets: a set of preconditions and a set of postconditions are still satisfied after an execution of a command of the program.

The following list contains tools that guarantee soundness. That is, the results yielded by the tools are guaranteed not to contain false negatives, meaning that properties that are checked and confirmed to be true by the tools will be satisfied for all the executions of the program. The tools utilize the theoretical concepts introduced by abstract interpretation and Hoare logic to produce the analyses.

1. AbsInt / Astrée5_{– Detects similar classes of errors as dynamic analysis tools: inconsistent}

locking, violations of user defined assertions, bound errors, floating point errors. Used for mission critical software. The tool is proprietary.

2. CodePeer6– Detects code constructs that are likely to lead to runtime errors, such as buffer overflows. The tool is proprietary.

3. CPAChecker7_{– Performs a reachability analysis of a software program and verifies if any of}

the states that are reached violate the given specifications. This allows for predicting defects, by generating invalid stated are verifying the behavior of the program. The invalid states are then presented in the form of a call graph. The tool is open source.

4. Frama-C8_{– Identifies dead code, weakest preconditions. Performs invariant checking,}

pro-gram slicing and impact analysis.

5. Julia9 _{– The list of checks is available on the software website. It does not differ much from}

the competing, static analysis tools such as SonarQube, but yields less false positives and no false negatives. The tool is proprietary.

6. Predator10– Analyzes sequential C programs for pointer arithmetics, block operations with memory, usage of invalid pointers and other low-level memory concerns. The tool is open source.

Considerations The available abstract interpretation tools are powerful, allowing for memory error detection, invariant detection and even bug prediction. There is a considerable overlap be-tween the outputs of dynamic analysis tools and the static-semantic analysis tools. However, the analyzed open source and freeware tools are primarily focused on defect detection and memory con-cerns accordingly. The proprietary tools provide a more comprehensive output, including potential security violations and maintainability issues. Nevertheless, none of the tools that we investigated from this group provide insights into the implemented architecture of the systems. This is why we decided not to incorporate any of the tools described above in our final workflow.

5_{https://www.absint.com/astree/index.htm} 6_{https://www.adacore.com/codepeer} 7_{https://cpachecker.sosy-lab.org/} 8_{https://frama-c.com/} 9_{https://www.juliasoft.com/} 10_{http://www.fit.vutbr.cz/research/groups/verifit/tools/predator/}

(11)

2.1.2 Dynamic analysis

The dynamic analysis of software involves executing the programs on a real or virtual processor. Dynamic analysis can be used to test the behavior of a program or reason about its structure based on a subset of inputs. In this section we showcase the tools that we investigated during the background investigation.

1. Google Sanitizers11 _{– A free suite of tools from Google capable of performing: memory}

error detection, threading analysis and stack/heap overflow detection in C++ programs. 2. Bounds Checker12 _{– A proprietary tool capable of performing: memory error detection,}

threading analysis and stack/heap overflow detection in C++ programs.

3. Daikon 13 _{– A free tool capable of detecting invariants in code and stack/heap overflow}

errors. Can analyse C/C++/C#/Java and Perl programs.

4. DMalloc 14 _{– An open source tool for analyzing stack and heap overflow exception in C}

programs.

5. IBM AppScan15 – A proprietary suite from IBM utilizing both static and dynamic pro-gram analysis techniques to identify vulnerabilities in C/C++/Java/Cobol and JavaScript programs.

6. Iroh 16 – An open source library capable of providing call graphs and reverse engineering JavaScript programs.

7. ParaSoft17 _{– A proprietary tool suite aimed at detecting memory errors in C# and C++}

programs.

8. PurifyPlus18_{– A proprietary suite aimed at memory error detection, performance profiling}

and code coverage analysis in C++, C# and Java programs.

9. Valgrind19 _{– An open source memory error detection and multithreaded program analysis}

tool. Suitable for C#, C++, Java and Perl programs.

10. BinNavi 20 – An open source tool that can be used for call graph analysis and reverse engineering. Works with x86, ARM, MIPS, and PowerPC executables.

Considerations The input data for the thesis project is the source code only, which means that the executable for every project would need to be built locally. This in turn limits both the number of projects that can be analyzed as well as the range of languages that the projects are created in due to the additional effort required to set up build and runtime environments for every project in the candidate dataset. In addition to that, the focus of most dynamic analysis tools reviewed is to detect possible faults in the codebase as opposed to analyze its maintainability. One may argue, that the detected defect count could constitute a measure of maintainability, however in this paper we focus on the maintainability from a code maintainer standpoint of view. In our case the defects already existing are of less significance - we want to focus on the areas that are most likely to introduce defects upon change. However, Corneliessen [7] and Shantawi [35] et al. use execution traces to recover software component and connector views in object oriented codebases, which could be used in connection with the source graphs obtained by static analysis. Nevetherless, in the end, considering the additional complexity required to perform a dynamic analysis and the scope of our project, we chose not to introduce any of the tools described above in our workflow.

11_{https://github.com/google/sanitizers/} 12_{https://www.microfocus.com/products/devpartner/} 13_{https://plse.cs.washington.edu/daikon/} 14_{http://dmalloc.com/} 15_{https://www.ibm.com/security/application-security/appscan/} 16_{https://github.com/maierfelix/Iroh/} 17_{https://www.parasoft.com/} 18_{https://teamblue.unicomsi.com/products/purifyplus/} 19_{http://valgrind.org/} 20_{https://github.com/google/binnavi/}

(12)

2.2 SIG Maintainability Model

The Software Improvement Group developed a maintainability model for software[38] based on years of empirical software engineering research. They use a in-house software suite called the Software Analysis Toolkit to conduct a static analysis of software systems. The maintainability model is also accessible for GitHub developers through the Better Code Hub system, which uses the Software Analysis Toolkit as a backend component to conduct an analysis of a GitHub repository and evaluate it against 10 criteria, also referred to as guidelines.

The following section describes the guidelines checked by Better Code Hub and the associated software metrics that are computed by the Software Analysis Toolkit:

1. Write Short Units of Code – source lines of code broken down per unit. Longer units of code tend to be more complex and therefore more difficult to test, maintain and understand for the developers changing it.

2. Write Simple Units of Code – cyclomatic complexity [29] broken down per unit. More complex units require more tests and in extreme cases can become untestable altogether if the number of decision points is too large. Untestable units are very difficult to maintain and are a very likely source of future defects.

3. Write Code Once – the degree of code duplication broken down into duplicated segments and the number of occurrences per segment. While a certain degree of duplication is expected, more than 5% [16] may indicate design problems. Duplicated code causes the system to be larger than necessary and more difficult to modify as the developers need to recognize if the duplication is a deliberate choice at making different parts of the code structurally independent or an oversight which is likely to cause a defect should the duplicated parts of code indeed change independently.

4. Keep Unit Interfaces Small – limiting the number of parameters in interface methods. Shorter parameter lists make the interfaces more cohesive and easier to reuse in the future. Long parameter lists make the interfaces bloated, difficult to test throughly, complicated to implement and more difficult to comprehend.

5. Separate Concerns in Modules – Reducing the number of incoming and throughput calls, described by the fan-in metric [17]. Modules which have lots of throughput code act as intermediaries between two other modules. This indicates that perhaps some functionality has been misplaced and needs to be put in a different module. Large number of throughput code lines make a module more tightly coupled with other modules and therefore more difficult to maintain in isolation.

6. Couple Architecture Components Loosely – minimizing the number of throughput components, that is components that have a high fan-in and fan-out metric values [5]. Sim-ilarily to modules, components that act as intermediaries, tend to be more tightly coupled and more difficult to maintain in isolation.

7. Keep Architecture Components Balanced – relative size of components, measures in source lines of code [5]. If the component sizes are unbalanced, that may mean that some concerns are leaking between components. For example, in a three component system, the business logic may leak to the UI code and data model component, leading to the business logic component being the smallest of three. Balanced component sizes make it easier to develop different parts of a system independently.

8. Keep Your Codebase Small – The complete size of the codebase measured in source lines of code and function points [1]. Larger systems tend to be more effortful to develop and maintain, therefore the size of a codebase should be kept below a certain limit.

9. Automate Tests – the ratio of source lines of production code to source lines of code of test code. A second metric that is used is the assertion statement density in test code, measured as a ration of assert statements to the number of source lines of code.

10. Write Clean Code – absence of certain code smells: “TODO” comments, "catch all" ex-ception handling and commented out code. Version control systems take care of removed, so

(13)

there is no reason to keep it commented out. "TODO comments" as indicators of work that may need to be done should rather be placed in an issue tracking system and a "catch all" exception handler may indicate that there may not be a proper error handling strategy in place.

2.3 Design rule spaces

The design rules are a concept borrowed from Baldwin et al. [2] which represents design decisions that are being used to decouple a system into modules. On a practical level, in object oriented systems the design rules are usually represented by interfaces or abstract classes. The design structure can be represented using a matrix where the rows and columns represent different types of structural relations between classes. The four most prominent structural relations types that we are interested in are inheritance, aggregation, dependency and co-evolution.

Kazman et al. introduces design rule spaces [40] as a method to partition a codebase and design rule hierarchy that organizes the partitions hierarchically into layers based on the principle that elements in lower layers only depend on elements in higher layers and elements in the same layer form mutually independent groups - modules. He also proposes a tool to derive these hierarchies from structural dependency matrices [41] of obtained from source code.

2.4 Software design defects

The concept of "code smell" [11] was defined by Fowler in the following way: "Smells are certain structures in the code that indicate violation of fundamental design principles and negatively impact design quality". Since then, the concept was picked up and expanded by various researchers to accommodate for the scope of the design principle violations.

Martin [28] focuses on framing the component design guidelines with the three principles: acyclic dependencies principle, stable dependencies principle and stable abstractions principle. The acyclic dependency principles states that there should be no cyclic dependencies within Cyclic dependencies are considered undesirable in other classifications as well. The stable dependencies principle requires modules that are easy to change (have low stability) to not be depended on by modules which are difficult to change (have high stability). If the principle is not followed, then such an architecture can be costly to maintain due to the difficulty of making changes to components that are not designed to change much (stable components). The last principle is the stable abstractions principle, according to which, a component should be as abstract as it is stable. Components not adhering to this principle fall either what Martin refers to as "zone of pain" -components which are very concrete and heavily depended on, and therefore very costly to change or "zone of uselessness" – components which are very abstract, but with very few dependents.

The two relatively new concepts are architecture smells and design smells. Suranarayana and Sharma propose that architecture smells would represent design violations impacting component and system level [37]. As examples of such they propose:

• Cyclic Dependencies – This smell arises when there are cycles in the dependency structure of the components in the project. Cyclic dependencies make it very difficult to layer a system hierarchically and decrease modularity.

• Feature Concentration – This smell occurs when a component realizes more than one concern/implements more than one feature. It makes the component less reusable, more coupled and more difficult to understand. The prevention of this smell is reminiscent of applying the Single Responsibility Principle known from Martin [27] to components. Sharma provides the definition of a design smell as a design violation impacting a set of classes and identifies a set of design smells [34]. As examples of such he proposes:

• Hub-like modularization – This smell describes an abstraction that has a high number of both incoming and outgoing calls to other abstractions. It is undesirable for abstractions to have an excess number of outgoing dependencies, because it increases coupling for any class making use of the abstraction.

• Multipath hierarchy – A smell arising when a class inherits multiple times from the same ancestor, often indirectly. This is an instance of a problem with multiple inheritance, infor-mally known as "deadly diamond of death" [26] where a class inherits twice from the same

(14)

ancestor through two different base classes. In languages such as C#, this problem is allevi-ated by preventing multiple inheritance and enabling the use of "mixins" - extensions to a class which allow extending the functionality of a class without having to inherit from it. • Unexploited encapsulation – This smell arises when there are excessive type checks in the

client code. Modern object oriented languages introduced support for run-time polymorphism to avoid having to rely on explicit type checks which make the code brittle and prone to errors. Architecture hotspot smells The architecture hotspot smells are code anomalies introduced in the paper from Kazman et al.[31] which are related to an increased software defect density. The hotspots are being detected using the design rules spaces, obtained from a structural code dependency graph and a co-change dependency graph obtained from the revision history of a software project codebase.

The hotspots that we are focusing on are:

• Unstable Interface – Interfaces represent the design rules, therefore they should remain stable. If the interfaces often change together with the files that depend on them, it makes the entire hierarchy more error-prone and the risk of a defect being introduced increases. • Implicit-Cross Module Dependency – Ideally, structurally independent modules should

not change together. However, if they frequently do, it is an indicator of certain implicit dependencies (ex. code clones), which either need to be removed or made explicit. Implicit, undocumented dependencies negatively impact program comprehension and can contribute to defects.

• Unhealthy Inheritance Hierarchy – Represents class hierarchies violating the Liskov Substitution Principle [22]. If a hierarchy violates the Liskov Substitution Principle, it means that the supertypes cannot be replaced by subtypes without changing the behavior. This can lead to difficult to find defects, where subclasses violate one or more of the postconditions required by the superclass.

• Cross-module Cycle – There should be no cycles among structurally independent mod-ules. Cycles among modules contribute to high coupling and prevent static linking, if a module represents a single unit of translation in the system. Lastly, they make program comprehension more difficult by increasing cognitive load and potentially lead to defects. • Cross-package Cycle – Packages in the system should be organized in a hierarchical

man-ner. Cyclic dependencies make it more difficult to organize a codebase hierarchically and comprehend the structure of the enumerate.

In our work we focused mainly on the Unhealthy Inheritance Hierarchy hotspot. We discuss is in more detain in the Motivating Examples section.

Architecture instability smells Fontana et al. investigate Java projects for recurring patterns of architecturally relevant code anomalies [10]. The authors use compiled Java projects to construct a dependency graph. They define three main instability smells within their classification:

• Hub-Like Dependency – Defined as code which has a high number of incoming (fan-in) and outgoing (fan-out) dependencies with a large number of abstractions. This kind of dependency will cause a cascade of changes in all its incoming dependencies if it changes, leading to increased risk of introducing a defect.

• Unstable Dependency – A component which depends on another component which is less stable than itself. Similarily to the hub-like dependency, depending on less-stable dependen-cies can cause cascades of changes, which lead to increased risk of introducing defects. • Cyclic Package Dependency – A subsystem which is a part of a cyclic dependency

struc-ture. For this smell, the same problems apply as for the "hotspot" cross-module cycle and cross-package cycle smells.

We looked into the relationship between other classifications. The Cyclic Package Dependency pattern is also a part of the "hotspot" hierarchy. The Hub-Like Dependency is referred to as "throughput code" within the SIG Maintainability Model, albeit not directly referred to as a "smell".

(15)

2.5 Decoupling level

Kazman et al. propose a new metric – decoupling level [32] which is based based on the earlier defined concepts of design rule spaces and design rule hierarchy. The metric aims at describing how costly the maintenance of architecture would be. The metric is based on two previously defined metrics: independence level – describing how independent are the modules within the system and propagation cost – measuring how tightly coupled a system is. The decoupling level alleviates the problem with independence level – insensitivity to module size, and propagation cost – sensitivity to system size.

We looked into the decoupling metric definition and usage as an alternative for the currently used metrics for measuring architecture level quality: component independence and component balancing. The latter metrics depend heavily on what is considered a component within a system and on fan-in and fan-out metric thresholds and the decoupling level metric does not suffer from the same problem as it is expressed as a single real number ranging from 0 to 1.

(16)

3 Motivating examples

In this chapter we will discuss and illustrate examples of hotspots that we investigated. The examples were retrieved from an open source project 21 _{in Java language. The hotspots can be}

larger or smaller, but for the purpose of illustrating them we chose relatively small, but at the same time non-trivial hotspots.

The hotspots, being graphs are presented as square matrices where the first column is a list of vertices - all the files that are a part of the hotspot and the remaining cells are potential edges of the graph. Edges can have different types, depending on the type of the dependency. For example: Uses, Inherits, Call, Import From. We highlight the edges causing the violation in red. What is worth noting that the matrix forms a design rule hierarchy, meaning that the classes lower in the matrix should only depend on those which are higher in the matrix.

3.1 Unhealthy inheritance hierarchy

We divide the unhealthy hierarchy hotspots into two categories based on the area in which they are detected. In the paper from Kazman [31], this difference is expressed through using different formulas to detect the hotspots within the dependency graph.

3.1.1 Internal

We refer to unhealthy inheritance hotspots as "internal" if the cause is a dependency from a parent class to a subclass. Such a dependency should be avoided because it implies that a parent class has knowledge about the existence or type of specific child classes. This, in turn increases coupling.

Figure 1: An example of an internal Unhealthy Inheritance Hierarchy hotspot.

In the case of this four class hotspot, we can see an abstract class CauseOfBlockage and three child classes. The CauseOfBlockage class instantiates one of its child classes: NeedsMoreExecu-torImpl. This is a problem because it causes coupling between the parent class and one of its children. A possible remedy can be to break the dependency by extracting the instantiation of the child classes to a separate, specialized Creator class [21].

3.1.2 External

Figure 2: An example of an external Unhealthy Inheritance Hierarchy hotspot.

(17)

In this case we can see a hotspot consisting of 10 files. At the top there is the base class AbstractActionManager. Classes from 2 to 9 are its subclasses and the bottom class is a factory responsible for creating all of the subclasses.

The violations here are the the uses of superclass in all of the subclasses. Upon inspecting the code of AbstractActionManager.java, we notice that it has multiple responsibilities, violating the Single Responsibility Principle: it serves not only as a base class but also lists and filters valid next actions using the ActionManagerFactory. A possible remedy here would be extracting this responsibility to a separate class, which would resolve the hotspot.

3.2 Cross-module cycle

A cross module cycle is a hotspot occurring when two or more files for a cyclic dependency and they do not belong to the same module. This is harming modularity because, similarly to packages, the design rule space modules should form a hierarchical structure.

Figure 3: An example of a Module Cycle hotspot.

In this example we can see six classes, the HexView being at the bottom of the hierarchy. Classes from 2 to 6 (UpperPane, ASCIIPane, HexPane, AddressPane) use the HexView and the HexEditor creates the Panes and uses the view. However, we can also spot a dependency cycle within this module:

1. HexView creates and calls the HexEditor. 2. HexEditor creates the UpperPane. 3. UpperPane uses the HexView.

Upon identification, the developer can break the cycle. There can be multiple ways of achieving it in this case. One method would be to extract the creation of the HexEditor to a separate class – this would remove the dependency from HexView to HexEditor. Another method would be to extract the constants from HexView to a separate class – this would remove the Use dependencies of the Panes to the HexView.

3.3 Cross-package cycle

The Cross-package cycle is a hotspot occurring between classes in different packages. Packages should form a hierarchical structure and cycles between them make it impossible therefore reducing modularity.

Figure 4: An example of an Package cycle hotspot.

In this example a simple 2 element dependency cycle is presented. The PDStrikeoutAppear-anceHandler and PDAnnotationStrikeout classes belong to different packages and refer to each other.

Upon inspection, this hotspot can also be resolved in multiple ways. One solution could be to move the responsibility for creating the PDStrikeoutAppearanceHandler from PDAnnotation-Strikeout to a separate factory class. Another way would be to investigate which properties of the

(18)

PDAnnotationStrikeout need to be exposed for the PDStrikeoutAppearanceHandler and provide them separately, without referencing the entire object.

(19)

4 Research method

This chapter is dedicated to describing the research approach that we have chosen. Given that the research is conducted with the goal of improving the Software Improvement Group Better Code Hub product we chose to base our approach on the Technical Action Research (TAR) from Wieringa [39].

In essence, the idea behind the method is to develop a solution for a specific problem that a client has, test it on a small scale and gradually attempt to generalize it to a broader scope, so that it can be applied in a real world setting. Ideally, in the end the result would be an artifact which is useful in solving problems in an industrial setting.

Before adapting the approach to our particular improvement problem, we need to make sure that it is properly defined. In addition to that, in order to address the broader scope, we also need to define knowledge questions. An example of an improvement problem can be "How can we detect architecturally significant code anomalies, automatically?". Knowledge problems could be "What kind of architecture representation facilitates identifying code anomalies?" or "How can we evaluate the impact of a code anomaly on the architecture?".

In order to solve the improvement problem, we develop, what Wieringa calls treatments -prototype artifacts that solve or reduce the severity of the problem and that can be validated in the client’s environment. In our particular context the client’s problem is that the users of their code quality analysis tool - Better Code Hub, find it difficult to act upon violations for architecture quality.

Currently, Better Code Hub diagnoses modules within components which have too much throughput code and does not suggest any actions. However, we believe that the throughput code may simply be a symptom of a larger problem - an architectural smell. Identifying the struc-tural issue behind the measurement would allow Better Code Hub to guide a developer towards a fix.

To generalize the treatment means to distinguish between the particular problem experienced by the Software Improvement Group in the Better Code Hub and the general problem of architec-turally relevant code anomalies, informally referred to as "architecture smells". This is relevant because not all code anomalies are architecturally relevant [25] and fixing them will not have a large impact upon the maintainability of architecture.

4.1 Assumptions

In the approach we understood that Component Independence and Component Balance issues are indicators of structural problems within the codebase. We assumed that detecting the structural problems, which we refer to as "architecture smells" will help the developers in addressing the structural issues. The end result being the improved values of component level metrics. We verified the correlation between the density of hotspot occurrences and the component interface lines of code measurements from the Software Analysis Toolkit. This way, we can hypothesize that if the developers solve the design defects described by the hotspot, the result would be reflected in the metrics calculated by the Software Analysis Toolkit.

4.2 Problem analysis

In TAR the researcher aims to solve a specific improvement problem and investigate if, and to which extent his approach can be generalized, therefore we look at the problem from the specific context, the SIG point of view, and the general context.

4.2.1 SIG context

The developers reported two main problems with the existing recommendations. The first problems is that the developers are not always sure what needs to be done about a certain recommenda-tion. The proposed solution here is to present the recommendation in the context of a hotspot (antipattern) that it is a part of. The second problem that we discovered is that the developers do not know where to start, as the recommendations are prioritized descending by affected file size in descending source lines of code.

(20)

Stakeholders goals

• Better Code Hub/SIG – Provide relevant, actionable feedback for the clients on how to improve their system’s Component Independence, so that it will satisfy the guideline within the maintainability model used by Better Code Hub.

• Better Code Hub users/SIG clients – Verify that the codebase is of high maintainability. Obtain actionable feedback on how to make the components more independent so as to make the system more maintainable.

Stakeholders criteria

• Better Code Hub/SIG – The tool should provide suggestions for refactoring actions, which, after applying will be reflected in the Component Independence score.

• Better Code Hub users/SIG clients – Refactoring suggestions should be understandable and actionable. The refactored system should be less bug-prone and easier to maintain.

4.2.2 General context

In the general context we consider the area of identifying architectural design defects and connecting them to maintainability measurements. Even though in our research we utilize the maintainability model from SIG, since it is based on the ISO 25010 maintainability standard, it can be applied in any code quality analysis tool.

Stakeholders goals

• Developers – Identifying problematic areas of the code. Providing information on which modules are affected and how to address the violations.

• Researchers – Observe the common architecture level design defects. Devise automated solutions to the mentioned design defects.

• Software development tool providers – Create commercial tools that assist developers and architects in evaluating the impact and solving architecture level design defects.

Stakeholders criteria

• Developers – High relevance of the refactoring suggestions. Limited false positives.

• Researchers – Connection to other aspects of maintainability, ex. bug prediction, quality of architecture-as-implemented.

• Software development tool providers – Applicability to a large number of languages. Relia-bility of the suggestions.

4.3 Solution design

We can use the prototype tool to suggest the following:

• It will enhance the list of files to refactor provided by Better Code Hub with a list of hotspots that the file is a part of. We believe that providing a structure to the problematic file will make it easier for the developer to decide on what needs to be done to improve the system. • For each hotspot, the tool will visualize all the involved classes and highlight the dependencies which are causing the problem. We expect the visualisation to support the developer in coming up with an improved design.

Resolving the design problems highlighted by the hotspots should improve the overall main-tainability score. However, we anticipate that for single or a few hotspots, the metric obtained from the Software Analysis Toolkit may not yield a large difference.

The final goal within the SIG context is to make the component independence violations more actionable and we aim to achieve that by providing the structural information about the violations by classifying them and providing information on why they have been classified as such.

(21)

Figure 5: A high level overview of the solution

Figure5illustrates the steps involved in the detection, from cloning the source code repository to providing the refactoring recommendations. Note that the visualisations of the hotspots embedded in the refactoring recommendations are not constructed as a part of the detector.

4.4 Validation

We will use the proof-of-concept tool on a selected subset of open source repositories available on GitHub. The results obtained from the analysis will be then consulted with domain experts -whether the identified hotspots are relevant and provide enough detail to act upon. In addition to that, we will use a historical comparison. We will compare snapshots of the same projects in different time periods.

We expect that the experts will assess the suggestions as both useful and easy to act upon, compared to what is available now. In addition to that, we expect that the hotspot density will change in the same way as the component independence as measured by the Software Analysis Toolkit. The higher amount of interface code within a component, corresponds to less independence and therefore a higher number of hotspots within that component.

(22)

5 Experiments

In this section of the report we discuss how and which data we gather, how we process the data and use it to obtain answers to the research questions. We also discuss the different validation strategies that we apply to subsequent steps. We divided the validation into three main sections: approach validation, detector validation and prototype validation. The approach and detector validations are presented in the following section. The validation of the prototype is discussed in more detail in the Evaluation section (6).

5.1 Data sources

As the data sources for the validation we selected a number of GitHub repositories that are both available as open source projects and contain the majority of the code in languages that are supported by the Better Code Hub tool.

5.1.1 Selection criteria

The projects we targeted needed to fulfill certain criteria to make them viable for investigation. 1. Large enough volume – Projects that are too small in terms of source lines of code would not

be suitable for detecting architecture smells. On the other hand, there is a limitation of the code analysis tooling (BCH) to 200k SLOC. This means that our preferred projects would need be between 50k SLOC and 200k SLOC in volume.

2. Long enough history – We want to be able to compare the evolution of the architecture-as-implemented as measured by the component independence and hotspot density metrics. Projects with a very short history may not have undergone enough development to show any relevant patterns. The project will need to have lasted for at least 3 years.

3. Language supporting inheritance – The hotspot of our choice relies heavily on the capability of language to support the inheritance mechanism.

4. Strongly and statically typed language – The detection of hotspots relies heavily on the ability to identify class hierarchies and the dependency relationships between the classes statically. However, we will be interested in projects using dynamic languages for the purpose of identifying the limitations of the hotspot approach.

5.1.2 Selected repositories

The following list contains the systems selected for analysis. The volumes had been measured with the Understand tool and represent lines of code excluding empty lines and comment lines.

1. Jenkins – an open source automation server built with Java and developed since 2008. Vol-ume: 100k LOC.

2. Bitcoin – an implementation of a peer-to-peer network to manage digital currency transac-tions. Built with C/C++ and developed since at least 2013. Volume: 120k LOC.

3. jMonkeyEngine – a 3d game development suite developed in Java and maintained since 2008. Volume: 240k LOC.

4. Apache PDFbox – an open source Java tool for working with PDF documents developed and maintained since at least 2011. Volume: 150k LOC.

5. JustDecompileEngine – an open source .NET decompilation engine developed since 2011. Volume: 115k LOC.

6. nunit – a unit testing framework for the .NET environment developed since 2009. Volume: 59k LOC

7. openHistorian – a back office system designed to efficiently integrate and archive process control data developed since 2016. Volume: 72k LOC

(23)

8. OpenRA – a real-time strategy game engine developed since 2013. Volume: 110k LOC 9. Pinta – a clone of the Paint .NET raster image edition tool, developed since 2010. Volume:

54k LOC

10. ShareX – an open source program that lets you capture or record any area of your screen developed since 2013. Volume: 95k LOC

11. react – a declarative JavaScript library for building user interfaces developed since 2013. Volume: 90k LOC

12. angularjs – a development platform for building mobile and desktop web applications using Typescript/JavaScript and other languages developed since 2014. Volume: 221k LOC

5.2 Hotspot distribution

In order to address research questions RQ 1.1 and RQ 1.2 (subsection 1.2) we investigated the distribution and impact of hotspots in a number of open source systems. We used the Understand code analyzer to generate source graphs which we then fed into the Titan tool in order to obtain the hotspots. Afterwards we aggregate the hotspot quantities and types per analyzed system in the table below. Note that the file count indicates how many distinct files contain hotspots. One file can be a part of multiple hotspots, therefore we make sure to count file once, even if it is a part of multiple hotspots. Note that we only use Understand and Titan tools in this experiment. For the detector and the prototype, we use our own implementation and the Software Analysis Toolkit source code analyzer.

Table 1: Hotspot overview in selected systems System Language Unhealthy

Inheritance instances (files) Cross-module cycle in-stances (files) Package cy-cle instances (files) Bitcoin C++ 16 (75) 31 (108) 49 (117) Jenkins Java 80 (170) 10 (403) 513 (372) jMonkeyEngine Java 69 (436) 59 (402) 335 (410) JustDecompileEngine C# 79 (290) 8 (205) 92 (89) nunit C# 24 (94) 6 (62) 62 (74) openHistorian C# 12 (37) 31 (89) 63 (114) OpenRA C# 19 (150) 35 (273) 202 (206) pdfbox Java 64 (252) 23 (379) 261 (301) Pinta C# 17 (57) 12 (112) 109 (91) ShareX C# 11 (76) 38 (205) 189 (248) react JavaScript 0 (0) 0 (0) 0 (0) angularjs JavaScript 0 (0) 0 (0) 0 (0) In order to be able to reason what is the impact of the hotspots on the overall maintainability of the projects, we compare the number of files affected by hotspots to the total number of files containing code in the analyzed projects. Kazman et al. [31] show that files containing hotspots are more bug prone and exhibit maintenance problems, therefore we believe that the higher percentage of files affected by hotspots, the less maintainable the codebase is. The percentage of file affected by hotspots is then juxtaposed with the component independence metric (CI) as measured by Better Code Hub (BCH). We would expect a negative correlation between the percentage of files affected by hotspots and the component independence metric.

(24)

Table 2: Hotspot impact on selected systems System Files analyzed Files

af-fected by hotspots % Files affected by hotspots CI mea-sured by BCH Bitcoin 675 117 17.33% 0.9894 Jenkins 1112 403 36.24% 0.9868 jMonkeyEngine 2077 436 20.99% 0.6812 JustDecompileEngine 814 290 35.62% 0.8311 nunit 781 94 12.03% 0.6329 openHistorian 726 114 15.70% 0.9572 OpenRA 1157 273 23.60% 0.8362 pdfbox 1279 379 29.63% 0.6283 Pinta 400 112 28.00% 0.7421 ShareX 677 248 36.63% 0.6842 react 687 0 0.00% 1.0000 angularjs 1173 0 0.00% 0.9574 5.2.1 Discussion

We observe two main findings after our investigation. First of all, no hotspots have been identified in the react and angularjs projects. We suspect that this may be due to the nature of the type system implementation in JavaScript. The dynamic type system with a prototype-based inheritance as opposed to class base inheritance will not show the Unhealthy Inheritance hotspot. Secondly, we expected the percentage of files affected by hotspots to be negatively correlated with component independence as measured by Better Code Hub. The correlation between the percentage of files affected and the Component Independence metric as measured by Better Code Hub is -0.3796 which may indicate that there is a weak correlation. This may mean that the impact of the hotspots on the codebase on the maintainability of the system may be measurable using static analysis services, such as Better Code Hub. However, if we exclude the react and angularjs data points as we suspect that the hotspots may be there but are not detected due to the type system of the language (JavaScript) then the correlation changes to -0.0162, which indicates almost no correlation. Taking that into account, based on the above analysis, we indicate that the impact of hotspots as a whole on the code base may not be measurable using the Component Independence metric from the Better Code Hub. However, in the next section we will investigate if the files provided as refactoring candidates by the Better Code Hub contain hotspots.

5.3 Approach validation

In order to verify that our approach – detecting hotspots in source code and providing refactoring suggestions, is relevant, we conducted a correlation analysis. The main question that we wanted to answer was: is there are a correlation between the number of hotspots a file is a part of and the amount of undesirable interface code in the file? We calculated the correlation of two metrics: hotspot density, which represented the number of hotspots that a file is part of, and component interface code, which represented the number of lines in a file, which comprise interface code. A greater number of hotspots and a higher number of interface code lines indicate code which is prone to errors, difficult to maintain and should be minimized. We expected that a higher number of interface code lines would be positively correlated with hotspot density. If this was the case, then we could use hotspots to show to the developers the modularity problems in their code and expect that resolving the hotspots would also improve the measurement of Component Independence Interface LOC, which is a metric of code maintainability in the SIG Maintainability Model. We investigated three open source systems which are available on GitHub: Jenkins, jMonkeyEngine and Bitcoin.

(25)

Table 3: Correlation between Hotspot Density (HD) and Component Independence Interface LOC System Language HD CI LOC correlation

Jenkins Java 0.795

Bitcoin C++ 0.678

jMonkeyEngine Java 0.232 5.3.1 Discussion

For two of the projects: Jenkins and Bitcoin there is a strong positive correlation. This indicates that suggesting refactoring recommendations based on hotspot density can address the underlying component independence problems highlighted by the interface code metric calculated by the Software Analysis Toolkit. For the jMonkeyEngine, the correlation was not significant. We identify three main reasons why this correlation is different in the case of the jMonkeyEngine project.

Firstly, since it is a game engine, it contains a high number of utility classes with a rather broad interface, which are at the same time very loosely coupled. This results in high measurement of the Component Independence Interface metric, but a low hotspot density. Examples of such modules could be: Vector4f, Matrix4f which implement common vector and matrix operations and renderer classes (GLRenderer, PssmShadowRenderer) which encompass a variety of rendering operations and expose a broad interface. Secondly, the engine is very modularized, consisting of more than 20 components. This elevates the amount of measured interface code, but does not necessarily impact the hotspot density. Thirdly, the engine interfaces between other libraries, exposing their functionality through an interface, but mostly consisting of glue code. For example, jMonkeyEngine provides physics simulation functionality by interfacing the Bullet 22 physics library. This also causes the amount of interface code to be higher, but the implementation is unlikely to contain much complexity, because its purpose is to adapt between the API of the game engine and the API of the library.

To look at it from an opposite perspective, hotspot detection would be more appropriate for projects which contain mostly business logic components and little glue code.

5.4 Detector

In order to obtain the hotspots for the distribution analysis (section5.2) we used the Titan toolkit. However the research environment that we are conducting the experiment in defines limitations on our inputs, therefore we decided to apply the state-of-the-art hotspot approach described in [31] to Better Code Hub by creating a detector within the Better Code Hub system. We are using the source code graph created by the Software Analysis Toolkit as opposed to the Understand source code analysis tool used by Kazman et al. The main reasons for this is that all the other metrics are being obtained from the graph computed by the Software Analysis Toolkit. We needed to make sure that our detector yields comparable results to ensure that the correlation between hotspot density and the interface lines of code will still hold for the hotspots detected by our implementation of the detector. We expected our detector to generate a subset of the hotspots that are detected by Titan. This is because the source code analysis tool used by Titan generates graphs with more detail, but for fewer languages than the Software Analysis Toolkit available in the Software Improvement Group.

(26)

5.4.1 Prototype overview

Figure 6: The overview of the detector environment and the control flow.

In figure6we present the environment of the hotspot detector and the visualisation component that make up the complete prototype – indicated by green rectangles. The hotspot visualisation is a part of the Edge frontend component and only consumes the hotspot data produced by the detector. The detector itself is a part of the GuidelineChecker component. The other main actors in the control flow are: Scheduler – an orchestrating component which schedules a Jenkins (automation server) task and notifies the Edge component that the analysis is finished. The Jenkins component clones the source code repository, invokes the Software Analysis Toolkit (SAT) which outputs a source code graph generated from the cloned repository and lastly the guideline checker performs a check of all the 10 guidelines as described in section2.2. The hotspot detector is invoked as a part of the Couple Architecture Components Loosely guideline check and follows the logic illustrated on figure5. Once the guideline check is done, the analysis result is stored in the MongoDB database, where it can be retrieved from by the Edge frontend component and visualised by the visualisation part of the prototype.

5.4.2 Implementation

The general idea control flow of the detector is as follows – first, the class hierarchies are extracted from the source code graph (Listing: 1) and represented as sets of vertices. Next, each class hierarchy is checked for presence of both internal and external hotspots (Listings: 2and3). While detecting internal hotspots we investigate only the classes and edges that belong to the hierarchy. These are detected in Listing4. For detecting external hostpots we also need the neighbourhood of the class and specifically the clients of the class hierarchy. The clients are classes which in any way depend on any of the classes in the hierarchy. The pseudocode for constructing the neighbourhood is available in Listing5.

Listing 1: Class hierarchy extraction

e x t r a c t C l a s s H i e r a r c h i e s ( dependencyGraph : Pseudograph ) : L i s t <Set> { i n h e r i t a n c e E d g e s := dependencyGraph . e d g e s ( ) . f i l t e r ( e −> e . getEdgeType ( ) == I n h e r i t a n c e ) b a s e C l a s s e s := i n h e r i t a n c e E d g e s . map( e −> e . getTo ( ) ) d e r i v e d C l a s s e s := i n h e r i t a n c e E d g e s . map( e −> e . getFrom ( ) ) i n h e r i t a n c e G r a p h : M u l t i g r a p h i n h e r i t a n c e G r a p h . a d d V e r t i c e s ( b a s e C l a s s e s ) i n h e r i t a n c e G r a p h . a d d V e r t i c e s ( d e r i v e d C l a s s e s ) i n h e r i t a n c e G r a p h . addEdges ( i n h e r i t a n c e E d g e s )

(27)

r e t u r n i n h e r i t a n c e G r a p h . c o n n e c t e d S e t s ( ) }

Listing 2: Internal hotspot detection

d e t e c t I n t e r n a l ( dependencyGraph : Pseudograph , h i e r a r c h y C l a s s e s : S e t ) : H o t s p o t { i n t e r n a l H i e r a r c h y G r a p h := c o n s t r u c t S u b g r a p h ( dependencyGraph , c l a s s e s ) v i o l a t i n g E d g e s : S e t f o r e a c h ( c l a s s i n h i e r a r c h y C l a s s e s ) { e d g e s := i n t e r n a l H i e r a r c h y G r a p h . e d g e s O f ( c l a s s ) c h i l d C l a s s e s := e d g e s . f i l t e r ( e −> e . getTo ( ) == c l a s s && e . g e t R e l a t i o n s h i p T y p e ( ) == R e l a t i o n s h i p . I n h e r i t a n c e ) . map( e −> e . getFrom ( ) ) c a l l s F r o m C u r r e n t C l a s s T o C h i l d r e n := e d g e s . f i l t e r ( e −> e . getEdgeType ( ) == R e l a t i o n s h i p . C a l l && e . getFrom ( ) == c l a s s && c h i l d C l a s s e s . c o n t a i n s ( e . getTo ( ) ) ) v i o l a t i n g E d g e s . a d d A l l ( c a l l F r o m C u r r e n t C l a s s T o C h i l d r e n ) } i f ( ! v i o l a t i n g E d g e s . isEmpty ( ) ) { r e t u r n new H o t s p o t ( i n t e r n a l H i e r a r c h y G r a p h , v i o l a t i n g E d g e s , c l a s s e s ) } }

Listing 3: External hotspot detection

d e t e c t E x t e r n a l ( dependencyGraph : Pseudograph , h i e r a r c h y C l a s s e s : S e t ) : H o t s p o t { h i e r a r c h y G r a p h W i t h C l i e n t s := c o n s t r u c t N e i g h b o u r h o o d ( dependencyGraph , h i e r a r c h y C l a s s e s ) c l i e n t s := h i e r a r c h y G r a p h W i t h C l i e n t s . v e r t e x S e t ( ) c l i e n t s . r e m o v e A l l ( h i e r a r c h y C l a s s e s ) v i o l a t i n g E d g e s : S e t f o r e a c h ( c l i e n t i n c l i e n t s ) { c l i e n t T o H i e r a r c h y E d g e s := h i e r a r c h y G r a p h W i t h C l i e n t s . e d g e s O f ( c l i e n t ) c l i e n t T o C o n n e c t i o n s = c l i e n t T o H i e r a r c h y E d g e s . map( e −> e . getTo ( ) ) i f ( c l i e n t T o C o n n e c t i o n s . e q u a l s ( h i e r a r c h y C l a s s e s ) ) { v i o l a t i n g E d g e s . a d d A l l ( c l i e n t T o H i e r a r c h y E d g e s ) } } i f ( ! v i o l a t i n g E d g e s . isEmpty ( ) ) { r e t u r n H o t s p o t ( h i e r a r c h y G r a p h W i t h C l i e n t s , v i o l a t i n g E d g e s , h i e r a r c h y C l a s s e s ) } }

Listing 4: Class hierarchy subgraph constuction c o n s t r u c t S u b g r a p h ( dependencyGraph : Pseudograph ,

Actionable Measurements - Improving The Actionability of Architecture Level Software Quality Violations

University of Amsterdam

Master’s Thesis

Actionable Measurements in Better Code

Hub

Improving The Actionability of Architecture Level Software

Quality Violations

Wojciech Czabański

Contents

Acknowledgements

1

Introduction

1.1

Problem analysis

1.2

Research questions

1.3

Outline

2

Background

2.1

Program analysis

2.2

SIG Maintainability Model

2.3

Design rule spaces

2.4

Software design defects

2.5

Decoupling level

3

Motivating examples

3.1

Unhealthy inheritance hierarchy

3.2

Cross-module cycle

3.3

Cross-package cycle

4

Research method

4.1

Assumptions

4.2

Problem analysis

4.3

Solution design

4.4

Validation

5

Experiments

5.1

Data sources

5.2

Hotspot distribution

5.3

Approach validation

5.4

Detector