Software Agility

(1)

Software Agility

Ferron Saan

September 23, 2017

Supervisor: Hans Dekkers (UvA), Patrick Das (ING) and Paolo Brunasti (ING) Host organisation: ING Customer Experience Center

Universiteit van Amsterdam

Faculteit der Natuurwetenschappen, Wiskunde en Informatica Master Software Engineering

(2)

Abstract

Software startups face an unique business context where producing visible results is more important than the long term perspective. As a result, it is essential that software can be adjusted easily. This is possible when the developers are (a) familiar with the code, or (b) the code is easy to understand. Understandability is something that is extensively researched but due to the small team size and high code churn the usual understandability metrics do not apply to startups.

The goal of this research is to see if it is likely if agility of startups can be measured. In order to do so we first selected 15 metrics based on a literature research. We calculated these metrics for the current code base of a project, and also for each month of the project’s history. Then we compiled a list of 9 problems by apprenticing with developers to see if the characteristics of these problems are covered by metrics selected. Finally, we looked for issues that were revealed during apprenticing but that were not covered by our metrics.

Recurring themes in these 9 problems are required knowledge of domain, engineering ability, com-plexity of existing code and the inherent comcom-plexity of the task. We were able to identify 3 of the 9 problems. In these cases, we found that the metrics for code familiarity and code understandability were able to indicate a problem for agility. In 6 of the 9 problems, our metrics were not able to iden-tify the problem. In order to do so, metrics that need to be included are changes to data structures, required domain knowledge and a tasks inherent complexity.

Based on the recurring themes found by apprenticing with developers and the additional required metrics, we propose software agility as a heuristic formula of a team’s competence and complexity. Competence consists of knowledge of the domain and engineering ability, and complexity consists of complexity of the existing code and complexity of the change.

(4)

1 Introduction

Software development agility is important because change is the constant [36]. Changeability is an important aspect, especially in environments where software changes are frequently required. One such environment are software startups. Software startups are newly created companies with no operating history and fast in producing (cutting-edge) technologies. These companies develop software under highly uncertain conditions, tackling fast-growing markets in small teams. Therefore, software startups present an unique combination of characteristics which pose several challenges to software development activities [43]. Software with high technical quality can evolve with low cost and risk to keep meeting functional and non-functional requirements [57]. However, high technical quality comes at a price. A price most startups cannot pay because of the business context they are in. This business context more or less implies a time pressured, code-oriented development process [61]. As a result, startups focuses, at least for the very beginning on producing visible results in the short term instead of a long-term perspective. Because of this short term focus, the goal is mainly to produce working code instead of quality code. Technical Debt is a metaphor introduced by [18] and is the result of fulfilling an immediate need with a compromise on the quality of the code. Therefore, the more technical debt is incurred the harder it gets to quickly adapt to customer needs and this is essential in this unique business context. In order to enable the project to quickly adapt to customer needs, developers needs to be (a) familiar with the code, or (b) the code must be easy to understand. Code familiarity is reflected by the fifth law of Lehman [36], Conservation of Familiarity, which states that if the code base grows too fast, developers risks losing its understanding of the code base. Simply put, you can’t safely change what you don’t understand.

Software quality is extensively researched and there is a lot of literature with recommendations and best practices for writing ‘better’ code [8] [37] [30] [34] [39] [54, Chapter 6] [38, Chapter 2]. Maintainability is one of the characteristics of software quality. According to the ISO 25010 guidelines, maintainability means that code can be efficiently and effectively maintained by the team intended to maintain it. Understandability of the software is vital from a maintenance perspective.

However, due to the small team size of startups and the high code churn the usual maintainability metrics do not apply [50]. The goal of this research is to see if it is likely if agility of startups can be measured. We will first see which metrics in literature can be used to identify issues that impede agility. Then, for one particular case, compile a list of problems by apprenticing with developers to learn about problems they encountered during development. For each of these problems, we will analyze what it is that makes it hard to do and we will see if the metrics are able to correctly identify the issues pointed out by the developers, or suggest new metrics. Next we will compile the metrics for the case and see if the outliers indicate real problems or not. We will then reflect on the findings and see if we have enough insights and knowledge to propose likely indicators that can be validated in future research.

(5)

2 Related Work

2.1 Software Agility

Agility is defined by [31] as “the ability to create and respond to change, and the ultimate expression of agility from a software perspective is continuous delivery and deployment”. In [2] several definitions of agility are listed, the common component are the high dynamics of the development processes and the ability to quickly react to changes. Therefore, an important aspect of software agility is changeability. So in order to be agile, developers need to be able to respond to change. This requires a good understanding of the code that has to be changed, this can consist of either prior knowledge [58] or because the code is easy to understand [35]. The amount of effort required to make changes is lower when developers have prior knowledge of the code that has to be changed [58]. So code understandability and code familiarity are both important aspects of software agility.

2.2 Code Quality

Code understandability refers to how readable the code is. This has a large impact on how quickly developers can begin working with an existing code base they are not yet familiar with. Code un-derstandability is an essential aspect of code quality because code is read far more often than it is written. This is supported by the different quality models such as ISO 25010 [1], Boehm [12], Dromey [22] or the SIG model [30] where understandability and changeability play an important role.

Arguably the most important aspect of code understandability are intention revealing names. In-tention revealing names solves the problem of the ”implicitly” of code (the code implicitly requires that you know certain details/characteristics about the code) [38, Chapter 2]. Lines of code and cy-clomatic complexity are other widely used metrics for code understandability. Larger methods causes lower analyzability and makes it more difficult to understand [30] [9, Chapter 3] [37]. Other code smells that have a negative impact on the understandability of the code are long parameter lists [9, Chapter 3], long logic statements [39] [38], deep nesting [38], crosscutting concerns [56], change cou-pling [9] [33], spatial complexity [16], comments [9, Chapter 3] [8] [33], duplicated code [9, Chapter 3] and feature envy [33]. In literature there is a lot of critique regarding some metrics mentioned above, such as critique on McCabe’s cyclomatic complexity by [51] and [60]. Both studies found no evidence that files with high complexity metrics can be used to find bugs. However, cyclomatic complexity has the same intention as metrics that measure the readability of natural language texts such as the Flesch-Kincaide [17]. These metrics try to quantify readability and are backed up by some evidence in psychology. Obviously, reading code is different from reading natural language but there are some similarities. And although sometimes complexity is needed, more often than not, it probably isn’t. Furthermore, it must be noted that software metric cannot determine quality, only a lack thereof.

Code changeability characterizes the amount of effort required to change a (part of the) system [1]. It refers to the effort required to change the code, for example to address changed customer requirements. The size of code [30] [9, Chapter 3], the used programming languages, the amount of cloned code [9, Chapter 3], and characteristics like coupling and cohesion are examples that can affect the changeability of a software system [9] [33]. These different characteristics require knowledge about the system, this knowledge can consists of prior experience or can be gained because the code is easy to understand.

2.3 Agility Metrics

Over the last decade various tools have been developed to measure the agility in software development teams. The Agility Metric used by [32] is a metric that is independent of various software development process models. This metric uses the financial indicator ‘turnover’ and the hardware manufacturing indicator ‘cycle time’ [41] and applies it to software projects. Software development agility is then defined as ‘the ability to minimize the period of the unvalidated inventory state’ in the software validation model [11]. This is done because according to the authors, only validation can confirm the quality of a product. Other agility metrics are the Sidky Agile Measurement Index (SAMI) [52], the Comparative Agility tool [59] and the Agility Measurement Index [21]. These agility metrics however, are performed by an agile coach or measured with questionnaires and not by looking at the code itself.

(6)

2.4 Candidate Metrics

2.4.1 Change Coupling

Change coupling is considered a bad symptom in software design [8] [27] [28] [4] [19]. Change coupling are implicit and evolutionary dependencies between parts of a system which change together and are therefore linked to each other, even though they are not structurally related. As a result, developers who modify code need to modify the implicitly related code as well. This introduces potential issues during development. Not only is it easy to forget to make one change, it also has a negative impact on the readability of the code [9] [33]. A developer that knows where this change coupling exists has knowledge of the code, and as long as this knowledge is available change coupling is not an issue. So change coupling requires developers to have knowledge about the system. Change coupling is found by analyzing the version history. However, since startups don’t have an extensive version history one can argue that the required knowledge is still memorized. Therefore, change coupling only is a problem if knowledge is not available. For example, when files with change coupling have a low rate of change, when a developer leaves the team or when new developers join the team.

2.4.2 Code familiarity

If you want the work done right it should be done by the developer who has the most knowledge about it. Not somebody who is not familiar with that specific part of the code. Several studies found that the more people touch the same code, there is more chance of misunderstandings and mistakes [40] [46] [10]. Research in domains such as manufacturing, support this idea. When someone performs a task repeatedly it will lower the amount of time taken to complete subsequent tasks and increase quality [20]. Software development of course differs from such domains but the main idea remains the same and is supported by several studies such as [49] [47] [5] [26]. By reusing knowledge and expertise quality can be maintained because developers don’t have to acquire new knowledge [6]. In short, having one clear “owner” for a part of the code can get more work done, faster, better and by fewer people. On the contrary however, an individual owner has a higher tolerance for complexity simply because they wrote it and know how it works. So they don’t need to simplify it just to make changes. This can build up over time resulting in technical debt only one developer is familiar with. This can become a serious problem for startups, who don’t might not have the luxury to depend on one developer due to time pressure. However, in the aforementioned studies small teams are not mentioned, so does this apply to startups as well? Research to the ‘ramp-up’ time for new developer suggests it does [44] [45]. They mention that it is better to keep teams together for a long time so that the developers do not have to acquire new knowledge.

One way to reduce the impact of missing knowledge is stratified code. Stratification means trying to keep the levels of decomposition stratified so that you can view the system at any single level and get a consistent view [39, chapter 5]. So when the code is stratified, it becomes less important to be familiar with the code because the way it is written is consistent with other parts of the code. Another way to get familiar with new code is through documentation. Even when the code is designed so that changes can be carried out efficiently, the design principles and design decisions are often not recorded [42]. On the flip side, documentation that is out of date is even worse than no documentation at all. 2.4.3 Code Understandability

When the required knowledge is not available, it is important that the code is easy to understand so that changes can be made relatively easy. Code understandability refers to how readable the code is. This has a large impact on how quickly new developers can begin working with an existing code base. Arguably the most important aspect of code understandability are intention revealing names, these names act as ‘beacons’ which help with top-down program comprehension [53]. Intention revealing names solves the problem of the ”implicitly” of code. (the code implicitly requires that you know certain details/characteristics about the code) [38, Chapter 2]. Lines of code and cyclomatic complexity are other widely used metrics for code understandability. Larger methods causes lower analyzability and makes it more difficult to understand [30] [9, Chapter 3]. Other code smells that have a negative impact on the readability of the code are long parameter lists [9, Chapter 3], long

(7)

logic statements [39] [38], deep nesting [38], crosscutting concerns [56], change coupling [9] [33] and spatial complexity [16].

Like mentioned before, if you are familiar with the code you have a higher tolerance for complexity and thus is code understandability less important. The trade off to be made is that of speed versus understandability, this trade off can be made for files with a low rate of change.

Code understandability become less of an issue if the ‘bad’ code is hidden behind a layer so you only need to interact with this layer [9]. This way, (new) developers don’t have to understand the bad code mitigating the issues related to understandability up until the point some bugs fixes need to be made.

2.4.4 Metrics Selected

Below are the candidate metrics selected to identify agility problems and the way they are collected (at file level):

• Raw metrics are collected with radon 1_{, these include linces of code (loc), logical lines of code}

(lloc), source lines of code (sloc), number of comment lines (comments) and the number of blank lines (blank).

• Cyclomatic complexity is computed using Radon, it returns a cyclomatic complexity score (cc) based on an analysis of the AST of a Python program (more details can be found here 2_).

• Coupling, the number of calls to other files.

• Change coupling is determined by analyzing the git history and the score is the sum of coupling used in [19]. Divided into multiple columns (25, 50 and 75), to show change coupling with at least 25%, 50% and 75%.

• Readability is based on a code analysis from Pylint 3 _{and includes intention revealing names,}

redefined builtin names, too long lines, bad continuation, superfluous parenthesis, too many arguments, unused variables, unused arguments, broad excepts, too many branches, too many statements, too many lines, too many locals, too many nested blocks, long logic statements and todos.

• Naming is also based on a code analysis from Pylint and includes invalid names, redefined outer names and redefined builtin names.

• Documentation is based on a code analysis from Pylint as well, and includes missing doc strings, no name in modules and empty docstrings.

• Code familiarity is determined by using the ”git blame” command, which indicates which author is the last editor of a line, also used in [29]. Addiontially, this is divided into multiple columns (total, 25, 50, 75) to show the total number of owners, and to show the number of owner with at least 25%, 50% and 75% ownership .

• Number of commits, based on the Git history.

• Number of tickets, based on the key of a ticket that is mentioned in a commit message. • Number of bugs, a subset from the Jira tickets where the ticket is flagged as a bug.

1_{https://pypi.python.org/pypi/radon}

2_{http://radon.readthedocs.io/en/latest/intro.html#cyclomatic-complexity} 3_{https://www.pylint.org/}

(8)

3 Materials and Methods

3.1 Case overview

The idea behind the application is to create a global network solution to share innovation projects across the company. The application enables innovation managers to track their innovation portfolio, allows innovation coaches to track and take actions on their projects, allows project members to see and manage the details of their projects and allows all employees to browse the innovation projects and their FinTech partnerships. Development of this project started in November 2015. The application is build in Flask 4 _{for Python. A brief overview of the project history can be seen in figures} _1a_, _1b

and1. Currently, the application has 414 active users across various countries and 1549 projects.

(a) Number of files over time (b) Lines of code over time

Figure 1: Project size overview

Figure 2: Number of commits per author over time

(9)

3.2 Version Control Repository

The version control repository used is Github5. GitPython6is used to interact with this git repository. The number of commits at the time of writing is 3658. The entire project history is available in this repository.

3.3 Ticket system

Jira7is used to plan and track our sprints. The built-in export function is used to extract the tickets to a csv file. The total number of tickets at the time of writing is 1000, of which 504 could be related to a specific commit. 200 of those 504 are labeled as bug. We use various states for each ticket in Jira, namely To Do, In Progress, Testing, Ready for Production and Done (which means in production). And although the state of a ticket might not always be correct throughout a sprint, after each deploy everything that is deployed will be marked appropriately as ‘done’. Furthermore, on February 9th 2017 we switched from Jira to Jira and Confluence 8_{, which meant that we lost all activity history}

of each ticket from before that time. We did have an export from that time, meaning we could access static data such as create date, resolve date and so on. However, data such as state changes is unavailable. Lastly, ticket history data is not available for export in Jira (without a plugin).

3.4 Apprenticing protocol

In order to understand why certain tickets were perceived as difficult by the developers, we apprenticed with developers to learn about the problems they encountered. Using a thinking aloud protocol we discussed with the developer, by looking at the Git log, which tickets were difficult and why. This process is done with four developers, and took about two hours total. The logs are available in the appendixA. Think aloud protocol is a robust research method that has a sound theoretical basis and provide a valid source of data [23] [15].

3.5 Metrics

Each candidate metric is calculated for the entire project, for each month from 2015-07 until 2017-06, and the day before the first commit and after the last commit related to each ticket that was perceived as difficult. The data is saved in a csv file and then analyzed using Pandas9.

5_{http://github.com/} 6 https://gitpython.readthedocs.io/en/stable/ 7_{https://www.atlassian.com/software/jira} 8_{https://www.atlassian.com/software/confluence} 9_{http://pandas.pydata.org/} 8

(10)

4 Apprenticing with developers

The following section contains 9 tickets that were perceived as difficult by the developers. For each ticket an overview is included as well as one file with results for the selected metrics.

Remarks:

• Lines touched is the sum of lines added and removed, as returned by Git. An edited line therefore represents two lines touched (one removed and one added)

• Ownership with a ‘-’ means the file didn’t exist yet

4.1 SRS-213

Ticket SRS-213 was difficult because the developer did not had a clear definition of the problem and during development the vision of the product owner changed.

Table 1: SRS-213 Overview

Ticket Key SRS-213

Description Measure of Success

Before date 2015-10-13 After date 2015-10-16 Commits 5 Commit size [487, 399, 290, 88, 104] Files 3 Ownership before models.py 80.42 views.py 97.84 views admin.py 100

Hypothesis based on our metrics The project was three months old at this time, and therefore the ownership percentages are very high. Because of these high ownership percentages this ticket is expected to have a low impact on the agility.

Calculating our initial set of metrics yields 2

Table 2: Results for ticket SRS-213, file views.py

Before After Difference

avg cc 2.36956521739 2.49090909091 0.121343873518 blank 907 1107 200 bugs 5 5 0 cc 327 411 84 change coupling25 0 0 0 change coupling50 0 0 0 change coupling75 0 0 0 comments 520 554 34 commits 148 153 5 coupling 650 758 108 documentation 137 164 27 lloc 1431 1827 396 loc 1580 2018 438 max ownership 97.8369384359 98.3976099946 0.560671558628 naming 277 351 74 owners 4 4 0 owners25 1 1 0 owners50 1 1 0 owners75 1 1 0 readability 252 312 60 sloc 2097 2574 477 tickets 9 10 1

(11)

Observations During development of this new feature (a new class, separate from existing code) a lot of lines were touched (487, 399, 290, 88 and 104 respectively, 1368 total). But in total only 438 new lines were added to views.py, and 105 lines were added in the other two files. This means there is done a lot of rework in the different commits (code churn). This is in line with the changing vision of the product owner. Even though readability, naming and cyclomatic complexity scores were alarming, this was not perceived as a problem, which can be explained by the fact that the developer was the owner of the code.

A changed vision from the product owner is not per se reflected in the code, but rather in the tickets. Therefore, looking for changes in the Jira ticket such as description, more story points or new sub tasks is a way to identify a changed vision. Unfortunately, it is not possible to calculate these metrics due to the reasons described in3.3. However, after looking up the Jira ticket manually it shows only the initial description without any other changes. This means the sense making and discussions happened on the work floor rather than through writing it down on a Jira ticket.

Conclusion The metrics we selected show alarming scores for code understandability (readability, naming, cyclomatic complexity, lines of code). However, because the developer was familiar with the code he had a higher tolerance for the complexity. The code itself was not perceived as an issue. Therefore, our metrics would not be able to detect a problem for this specific developer when working on this ticket. In order to do so, we should look at other metrics such as code churn.

What happened here is a perfect example of what Alexander described in his chapter 6 of his book Synthesis of Form [3, Chapter6]. In the process of trying to understand the problem, the developers thoughts distort the problem and make it too unclear to solve. After some iterations both the developer and the product owner have a better idea of the problem and agreed on a solution. A crucial step at this point is to refactor the code, it is for a reason that refactoring is an essential step in eXtreme Programming [7]. However, in our case the data structure is never refactored to match the solution. And good data structures make the code easy to design and maintain which is emphasized by this quote from Linus Torvalds: ‘Bad programmers worry about the code. Good programmers worry about data structures and their relationships’ [55].

(12)

4.2 SRS-275

Ticket SRS-275 was a difficult ticket because the developer was unfamiliar with the existing code and the logic regarding sprints and metrics was (too) complicated.

Ticket Key SRS-275

Description Display Metrics per sprint

Before date 2016-02-22 After date 2016-02-25 Commits 2 Commit size [445, 85] Files 4 Ownership before test.py 3.19 views.py 22.14

common test tools.py 0

test sprint.py 44.25

Hypothesis based on our metrics The developer had a low ownership percentage in all the files and because the scores for code understandability are high as well this will be a problem for agility. Calculating our initial set of metrics yields 4

Table 4: Results for ticket SRS-275, file views.py

avg cc 3.08571428571 3.27096774194 0.185253456221 blank 1231 1154 -77 bugs 12 12 0 cc 540 507 -33 change coupling25 0 0 0 change coupling50 0 0 0 change coupling75 0 0 0 comments 222 465 243 commits 329 333 4 coupling 811 714 -97 documentation 174 154 -20 lloc 2457 2290 -167 loc 2777 2621 -156 max ownership 73.511342155 71.6945557389 -1.81678641615 naming 370 339 -31 owners 6 6 0 owners25 1 1 0 owners50 1 1 0 owners75 0 0 0 readability 368 347 -21 sloc 3001 3089 88 tickets 46 46 0

Observations One big commit (445 lines touched) and one commit with some fixes and a changed user requirement (85 lines touched). However, in the big commit most of the lines were in test files. The big differences in the table above can be assigned to another commit in this period in which some endpoints were commented out due to changed requirements. Readability, naming and complexity scores were bad and since the developer was not the owner this contributed to the difficulty of the task.

This is a continuation of the ticket 4.1and the developer ‘hacked’ together a solution to make it work with the existing code, making the code progressively worse. The ‘hacked’ solution is reflected by the fact that readability and complexity scores only reduced a little bit even though over 200 lines were commented out. One of the issues during this ticket was that the data structure chosen for

(13)

ticket) the data structure withhold, what could have been, an easy solution. The quality of the code is driven by the data structure. A metric that can capture this would be a valuable addition. For example, a situation where the complexity of the new code is significant higher than the old code. In this case, the cyclomatic complexity of the touched function was 17 and afterwards it was 24, while the number of lines only increased with 1.

Conclusion The selected metrics show a low code familiarity and bad scores for code understand-ability, which indicates that it will be difficult for the developer to make the change. Therefore, our metrics were able to correctly identify an issue regarding agility. The underlying issue of a data structure that is no longer appropriate for the situation is not revealed by the selected metrics.

(14)

4.3 SRS-306

Ticket SRS-306 was difficult because of the impact on the structure of the application. It had been thought of at a later stage of the development process. Furthermore, the inherent complexity of the required permissions contributed to the difficulty of this ticket as well.

Ticket Key SRS-306

Description Built Modules should only be visible to users that have the permission

Before date 2016-04-18 After date 2016-04-30 Commits 6 Commit size [1004, 444, 3124, 259, 2, 29] Files 12 Ownership before test.py 83.98 views configy.py 95.94 models.py 87.75 bl common.py 77.27 startup actions.py 25.25 test base.py 0 bl configy.py 98.53 process permission.py 96.42 process configy.py 97.87 views admin.py 57.24 test modules.py 100 process admin.py 97.14

Hypothesis based on our metrics This ticket was a big change in the structure of the application, this can be seen by the number of files that are touched and the commit sizes. However, the ownership percentages show that the developer is familiar with the code reducing the impact on agility. Calculating our initial set of metrics yields 4

Table 6: Results for ticket SRS-306, file process admin.py

avg cc 2.57777777778 2.55555555556 -0.0222222222222 blank 265 56 -209 bugs 0 0 0 cc 116 23 -93 change coupling25 0 0 0 change coupling50 0 0 0 change coupling75 0 0 0 comments 134 893 759 commits 14 23 9 coupling 49 9 -40 documentation 47 11 -36 lloc 803 145 -658 loc 792 140 -652 max ownership 97.1428571429 95.6841138659 -1.45874327693 naming 90 11 -79 owners 2 2 0 owners25 1 1 0 owners50 1 1 0 owners75 1 1 0 readability 76 73 -3 sloc 925 1033 108 tickets 1 1 0

Observations The main difficulty was the impact on the structure of the existing application and the inherent complexity of the task. This feature had different requirements for the data structure, and therefore the data structure had to be adjusted. The initial data structure was not designed for

(15)

change. As a result, it required extra effort to complete this ticket even though the developer was familiar with the code. Again, the number of files touched show that this was a big change.

The inherent complexity of the task is hard to determine, but it can be reflected by Jira. For example, the number of story points assigned since story points represent an estimation of the work. Furthermore, during the time period of this ticket a large amount of code is commented out, the 87 remaining lines have a complexity of 23 but a readability score of 73 which indicate an issue for further development.

Conclusion This ticket was inherently complex and was extra difficult because it consists of a feature with a big impact on the structure that had been though of at a later stage of development. A metric that can detect a change in the data structure is the number of files touched. Detecting the inherent complexity can be done by, for example looking at the number of story points assigned on Jira.

(16)

4.4 SRS-667

Ticket SRS-667 had as main difficulty finding the right library to export data to a csv file. The design needed to be flexible so it could be used whenever data needed to be exported.

Ticket Key SRS-667

Description Export data from app into Excel sheet (csv)

Before date 2016-05-18 After date 2016-05-27 Commits 5 Commit size [219, 6, 24, 56, 103] Files 7 Ownership before test.py 84.35 bl project.py 73.33 bl export data.py 52.22 bl common.py 81.25

test export data.py 100

views.py 89.57

general config.py 90.61

Hypothesis based on our metrics The export functionality is a new feature that is not closely related to any other existing feature. Therefore, most metrics will not be useful when constructing a hypothesis. An example of an indicator that will help make a hypothesis will be the developers expertise regarding the programming language and the used framework, such as the plugins available for this framework.

Table 8: Results for ticket SRS-667, file bl project.py

avg cc 6.25925925926 6.34482758621 0.0855683269476 blank 241 253 12 bugs 6 6 0 cc 169 184 15 change coupling25 0 0 0 change coupling50 0 0 0 change coupling75 0 0 0 comments 89 158 69 commits 126 134 8 coupling 58 61 3 documentation 26 28 2 lloc 780 822 42 loc 1202 1255 53 max ownership 73.8732854344 74.325134973 0.451849538649 naming 54 59 5 owners 3 3 0 owners25 2 2 0 owners50 1 1 0 owners75 0 0 0 readability 69 75 6 sloc 1293 1414 121 tickets 15 15 0

Observations The main difficulty is not reflected in the metrics shown above, but is revealed in time span for this ticket. This ticket was marked as critical because certain data needed to be exported as soon as possible. In order to do so, several libraries were tried. This learning process is visible in the commits where one can see that three different libraries were tried. On top of that, the developer mentioned that he wanted the solution to be flexible despite the time pressure. In the end, using the chosen library was relatively easy (50 lines). However, the code required to collect the data in order

(17)

to export a list of projects was quite difficult (15 cc in 53 lines). As a result, making changes to this code became more difficult for developers not familiar with the code, as can be seen in4.9. On the other hand, it did become easy to add more exports, as can be seen in SRS-765A.1.

This time, the learning process was visible in the commits. However, it could also happen locally. If this is the case, data from the IDE can provide information.

Conclusion The selected metrics were not able to detect agility issues because it consisted of a new feature that was independent of existing code. However, the metrics after did show potential issues for agility at a later stage, such as the complicated way of collecting data.

(18)

4.5 SRS-694

Ticket SRS-694 was difficult because the underlying design decision regarding expenses and the cor-responding permissions were not known to the developer. Furthermore, the required solution was inherently complex due to these required permissions.

Ticket Key SRS-694

Description Easier way to approve multiple expenses at once

Before date 2016-08-03 After date 2016-09-01 Commits 7 Commit size [154, 101, 6, 17, 74, 47, 272] Files 5 Ownership before test.py 15.45 process expenses.py 89.15 test feature 694.py

-startup actions.py 62.31

views.py 19.51

Hypothesis based on our metrics Even though the developer is familiar with the code touched during this ticket, it is mentioned that the underlying expense states and permissions were not known. Because a crucial part of the task is understanding required logic, it will have a negative impact on the agility when it is not known.

Table 10: Results for ticket SRS-694, file process expenses.py

avg cc 18.0 15.5 -2.5 blank 56 61 5 bugs 6 6 0 cc 54 62 8 change coupling25 0 0 0 change coupling50 0 0 0 change coupling75 0 0 0 comments 12 121 109 commits 33 40 7 coupling 9 11 2 documentation 4 5 1 lloc 194 237 43 loc 227 275 48 max ownership 89.1525423729 92.9978118162 3.84526944331 naming 13 14 1 owners 2 2 0 owners25 1 1 0 owners50 1 1 0 owners75 1 1 0 readability 14 21 7 sloc 239 396 157 tickets 9 9 0

Observations The difficulty for this ticket (the different expense states and permissions) is not visible in the metrics, simply because this logic was not touched. Even though the developer owned the majority of the code regarding expenses, approving multiple expenses at once required new domain knowledge which was not readily available and not clearly visible in the code. Once this knowledge was available, it remained a difficult ticket because the states of the expenses and permissions had to mapped to the corresponding security functions which were unknown and hard to understand for the developer. For example, the metrics for code understandability were quite bad for the security.py

(19)

file and the ownership percentage of the developer was 0.32% which indicates a problem. In the end coupling increased with 2, which corresponds with the security checks that were needed.

Some information from Jira, could give insight into the complexity of the domain knowledge such as the number of story points or the number of images attached. However, in this case the state machine was drawn on a whiteboard and not on Jira and therefore Jira would not be able to provide any information regarding required domain knowledge.

Conclusion The selected metrics would not be able to detect an issue for this situation because the required knowledge was in the domain and in part of the code that was not touched. It is hard to determine required domain knowledge, but in order to detect issues with the latter metrics need to include familiarity and understandability of the coupled files as well.

(20)

4.6 SRS-768

Ticket SRS-768 was a difficult ticket because it required a change in the structure of the models. Furthermore, some small issues that had to be addressed which required switching focus on completely different tasks.

Ticket Key SRS-768

Description upload invoices / documents

Before date 2016-10-10 After date 2016-10-19 Commits 16 Commit size [227, 7, 401, 16, 566, 31, 325, 337, 375, 81, 339, 182, 60, 213, 87, 117] Files 24 Ownership before test.py 81.17 bl sys.py 100

test funct permissions.py 99.3

views dms.py 100 bl common.py 81.87 process config.py 63.67 process views.py 54.52 constants def.py 96.06 general config.py 96.11 bl project.py 67.62 process expenses.py 19.27 helper.py 75.63 test navigation.py 95.72 security.py 97.69 bl config.py 98.95 models.py 85.77 bl dms.py 100 models audit.py 96.01

process commons funct.py 97.05

commons.py 90.82

startup actions.py 33.01

run.py 90.48

test dms functions.py

-bl export data.py 86.82

Hypothesis based on our metrics Before, documents were connected to a project, now they could be connected to expenses as well. This time, the way to link a document was made flexible to support other objects as well. As a result, some implementations of the previous solutions had to be reworked. This is visible in the amount of files touched that show this was a big change. Since the developer is familiar with the files touched the impact on agility is not related to code complexity but rather on changed requirements and the decision to make the first implementation not flexible. Calculating our initial set of metrics yields 12

(21)

Table 12: Results for ticket SRS-768, file process expenses.py

avg cc 18.0 15.5 -2.5 blank 56 61 5 bugs 6 6 0 cc 54 62 8 change coupling25 0 0 0 change coupling50 0 0 0 change coupling75 0 0 0 comments 12 121 109 commits 33 42 9 coupling 11 11 0 documentation 4 5 1 lloc 194 237 43 loc 227 275 48 max ownership 89.1525423729 92.9978118162 3.84526944331 naming 13 14 1 owners 2 2 0 owners25 1 1 0 owners50 1 1 0 owners75 1 1 0 readability 14 21 7 sloc 239 396 157 tickets 9 9 0

Observations Even though some code is commented out the code understandability got worse (complexity, readability, naming) even though only 48 new lines were added. This decrease in under-standability was not perceived as a problem because the developer was familiar with the code. So once again, the impact on the existing structure of the application was the main cause of the difficulty and this is not reflected by the selected metrics. The chosen data structure was not sufficient to support changes at a later stage, which was the case in4.3as well, although this time on a smaller scale. Conclusion The selected metrics show high ownership numbers for the touched files, and therefore would not indicate an agility problem. And the code itself was not perceived as the problem, the data structure we chose for the previous implementation was not sufficient to support changes at a later stage. The large number of files touched can indicate this change in a data structure.

(22)

4.7 SRS-782

Ticket SRS-782 was difficult because the code was (completely) written by someone else. Furthermore, this was the first time to make more advanced performance evaluations and some logic needed to be written for that.

Ticket Key SRS-782

Description Performance on the Dashboard page (see what done for the Project List page)s

Before date 2016-07-25 After date 2016-07-27 Commits 2 Commit size [62, 98] Files 1 Ownership before bl dashboard.py 0

Hypothesis based on our metrics The developer had no familiarity with the code, which means that the code must be easy to understand. And since this was not the case (it was ‘more complicated than necessary’) which is reflected by the average complexity of 22.5, it will have a negative impact on agility.

Table 14: Results for ticket SRS-782, file bl dashboard.py

avg cc 22.5 22.0 -0.5 blank 43 50 7 bugs 1 1 0 cc 45 44 -1 change coupling25 0 0 0 change coupling50 0 0 0 change coupling75 0 0 0 comments 15 64 49 commits 8 10 2 coupling 3 3 0 documentation 3 3 0 lloc 158 158 0 loc 207 209 2 max ownership 100.0 73.1481481481 -26.8518518519 naming 20 20 0 owners 1 2 1 owners25 1 2 1 owners50 1 1 0 owners75 1 0 -1 readability 7 7 0 sloc 222 273 51 tickets 1 1 0

Observations The difficulty of this ticket is reflected by the bad scores for code understandability and the fact that developer was not familiar with the code. The 49 new comment lines include expla-nations and old code. In total, 26 percent of the lines were touched by the developer which involved comments to make the code easier to understand and two new lines for measuring performance. So only comments were added to improve understandability. And comments themselves are considered a bad smell [9], especially when they are there to explain bad code.

Conclusion The selected metrics would point out that there was one owner, which makes it impor-tant that the code is easy to understand for the other developers. This was not the case and therefore the selected metrics successfully indicated an agility problem.

(23)

4.8 SRS-984

Ticket SRS-984 had a big impact on the existing design of the application. Basically all database calls had to be adjusted to hide certain projects. Furthermore, due to the fact that we did not use branches yet for development of bigger features, it took extra effort to make sure intermediate deploys did not cause any issues.

Ticket Key SRS-984

Description We need to hide certain projects - ideally with a flag per case

Before date 2016-11-17 After date 2016-12-02 Commits 15 Commit size [7, 439, 66, 253, 89, 9, 17, 7, 155, 20, 251, 15, 2, 2] Files 33 Ownership before test.py 82.35 app security.py 96.97

test bug 490.py 0

test bug 728.py 0

test bug 772.py 0

test bug 776.py 0

test bug 788.py 0

test hidden project.py

-test web toi add.py 0

test category.py 95

constants def.py 96.37

bf forms.py 64.24

bl project.py 73.2

test bug fixes.py 97.18

test project country.py 98.77

test projects list.py 93.67

test global views permission.py 100

test project role.py 96.8

test project.py 0 models.py 85.90 startup actions.py 37.45 test expenses.py 79.40 bl common.py 85.71 test login.py 89.68 test milestone.py 33.82

test web permissions.py 92.83

test feature 681.py 0

test dms.py 98.39

test metrics.py 36

test finance permissions.py 98.85

test export data.py 77.03

test project ideafit.py 0

common tools.py 80.42

Hypothesis based on our metrics Just like with tickets4.3and4.6, this ticket required a change to the structure of the application. As a result, a lot of files are touched. The ownership percentages are high again for the relevant files (such as permissions and models). The ownership percentages are low for other files, but most of the changes in these files required are small changes to database queries. Therefore, the impact on agility will be low.

(24)

avg cc 5.25531914894 5.32653061224 0.0712114633087 blank 373 349 -24 bugs 26 28 2 cc 247 261 14 change coupling25 0 0 0 change coupling50 0 0 0 change coupling75 0 0 0 comments 273 121 -152 commits 288 301 13 coupling 121 138 17 documentation 43 45 2 lloc 1057 1098 41 loc 1354 1393 39 max ownership 73.2 69.3866666667 -3.81333333333 naming 79 81 2 owners 2 2 0 owners25 2 2 0 owners50 1 1 0 owners75 0 0 0 readability 131 133 2 sloc 1626 1525 -101 tickets 88 94 6

Observations In order to hide a project a simple flag was added to the models, and although this is a simple change almost all database calls had to be adjusted. Furthermore, the permissions had to be changed as well. The developer was familiar with the files where this logic had to change, such as the app security and bl forms. And although the developer was not familiar with most of the other files, this was not a problem because the changes required were pretty simply, which is reflected by the fact that there were (almost) no differences in the metrics from before metrics and after metrics. Also, the IDE was able to provide help in looking for places where the changes were required minimizing the impact of code unfamiliarity. So this time the data structure was able to support the change, and the impact of code unfamiliarity was small because the IDE was able to help. The problematic part of this ticket were the intervening issues that had to be resolved while keeping this ticket separate from the releases in between. The main cause of this issue can be attributed to the fact that we did not use Git properly. Had we used branches to develop separate features this would not have been a problem.

Conclusion The selected metrics show a range of ownership percentages and a large number of files touched, but most of these files only required some small changes. The problem for this ticket was outside the code and therefore, our metrics would not have been able to detect an issue regarding agility.

(25)

4.9 SRS-993

Ticket SRS-993 required only a small change (10 lines touched in total) but in part of the code where the developer had no knowledge of. The difficulty was in understanding the current implementation of the export functions, which was too complex due to variable naming and split up SQL statements.

Ticket Key SRS-993

Description The export of projects and fintechs is not filtered on the two types

Before date 2016-12-08 After date 2016-12-10 Commits 1 Commit size [10] Files 2 Ownership before bl export data.py 17.45 bl project.py 29.99

Hypothesis based on our metrics This ticker required a small change in an unfamiliar file with bad scores for understandability which means a problem for agility.

avg cc 5.36734693878 5.40816326531 0.0408163265306 blank 361 364 3 bugs 28 28 0 cc 263 265 2 change coupling25 0 0 0 change coupling50 0 0 0 change coupling75 0 0 0 comments 141 141 0 commits 301 302 1 coupling 147 149 2 documentation 45 45 0 lloc 1105 1113 8 loc 1400 1408 8 max ownership 70.0104493208 69.1428571429 -0.867592177937 naming 81 81 0 owners 2 2 0 owners25 2 2 0 owners50 1 1 0 owners75 0 0 0 readability 136 138 2 sloc 1552 1560 8 tickets 94 95 0

Observations For this ticket a very specific change was required in a big file unfamiliar to the developer and with bad scores for code understandability. Therefore, the assigned developer had to acquire new knowledge in order to complete this ticket. Had this been assigned to the original developer, it is likely that this would have been less of a problem. This supports the idea of having one clear owner of the code, so that another developer does not have to acquire new knowledge. On the other hand, had the code been easier to understand this would have reduced the importance of being familiar with the code. The size of the change does not matter because new knowledge has to be required nonetheless. Therefore, metrics for code familiarity and code understandability suffice. Conclusion The selected metrics show a low score for code familiarity and bad scores for code understandability, and therefore would identify an agility issue regardless of the size of the change.

(26)

5 Results

5.1 Summary

A summary of the tickets is shown below in table 19. It shows whether or not our selected metrics were able to detect an agility issue and why. Yes for the situation where our metrics did successfully identify an agility issue and no for the situation where our metrics did not suffice.

Table 19: Metric predictions

SRS-213 no the problem was unclear (learning by programming) SRS-275 yes metrics showed low familiarity and bad understandability

SRS-306 no task was inherently complex and our data structure did not support this change SRS-667 no new feature independent of existing code

SRS-694 no required domain knowledge and knowledge code that was not touched (coupling) SRS-768 no metrics showed high familiarity but it required a change in the data structure SRS-782 yes metrics showed low familiarity and bad understandability

SRS-984 no the problem was outside of the code (wrong way of working) SRS-993 yes metrics showed low familiarity and bad understandability

Table20lists the knowledge that was required to complete each ticket. Table 20: What knowledge was required

SRS-213 required new knowledge about the domain

SRS-275 required new knowledge about the domain, code base and existing implementation SRS-306 required new knowledge about code and task was complex

SRS-667 required new knowledge about plugins or libraries

SRS-694 required new knowledge about domain and task was complex SRS-768 required new knowledge about domain and code base SRS-782 required new knowledge about existing implementation SRS-984 required new knowledge about domain and code base SRS-993 required new knowledge about existing implementation

When we summarize table20we can see the following recurring themes: Table 21: Recurring themes

Knowledge of domain and code base SRS-213, SRS-275, SRS-306, SRS-694, SRS-768, SRS-984

Engineering ability SRS-667, SRS-984

Complexity of existing implementation SRS-275, SRS-782, SRS-993

Complexity of task SRS-306, SRS-694

• Knowledge of domain and code base

Knowledge about the domain (such as the expense flow) and knowledge about the code base (such as implicit design decisions).

We built something that worked according to the needs required at that time, rather than building a data structure that would be able to change according to possible domain changes. As a consequence, it required additional effort to change which is not visible by our selected metrics. However, ”responding to change over following a plan” is one of the four points of the Agile Manifesto [25]. But this responding to change is perceived as a problem, which can be explained by the complexity of the existing implementation and the complexity of the task. • Engineering ability

Technical skills but also knowledge about the programming languages and frameworks used, and other important developer skills such as communication skills.

In one case we did not use Git properly which required additional effort from the developers and in another a new library had to be used. Our metrics did not detect such situations.

(27)

• Complexity of existing implementation

The complexity of the existing code and implementation.

In the situations where being not familiar with the code was perceived as a problem by the developers, it means that the code is too complex and not easy to understand. The selected metrics were able to detect these situations.

• Complexity of task

The inherent complexity of the task.

In some situations the task or required change is just inherently complex which had an impact on the agility. The selected metrics were not able to detect these situations.

Other aspects that stood out are the degradation of the code base and the amount of changes to data structures that were needed.

Degradation of code base Another aspect that stands out is that for these tickets, the code scored bad for understandably and got worse as well. Even when the code was perceived as difficult there was never an effort made to actually improve the code. As result, a degradation of the code base. For example, in tickets 4.2, 4.7 and4.9 developers were unfamiliar with the code and the code was hard to understand. The complexity during these tickets changed with +7, -1 (even though 49 lines were removed) and +2 respectively. In the other tickets the complexity increased with +84 (4.1), +15 (4.4), +8 (4.6), +8 (4.5) and +14 (4.8). In4.3the remaining complexity was 23 in 87 lines of code. Changes to the data structure Furthermore, several of the difficult tickets had an impact on the data structure of the application, which had an impact on quite a lot of files. In the figure below (3) these tickets are clearly visible (4.6 and4.8). Other tickets with more than 20 files touched are SRS-866 (logical delete of projects, 35 files) and SRS-840 (move idea fit from sprint to project, 25 files) which has an impact on the data structure as well. However, there is also some noise such as the commits with 21 files (SRS-404) and 30 files (SRS-669) which impacted quite a lot of front end files but also consists of multiple different changes committed at once. In total there are 10 tickets with more than 20 files touched of which 5 had a big impact on the data structure. The other 5 includes 3 front end changes, a collection of changes committed at once and a change that was quite big but did not had an impact on the data structure.

Figure 3: Number of files touched per commit

(28)

George Martin’s quote about the architects versus gardeners10_{perfectly describes the situation of}

this use case. Two years ago a seed was planted not knowing how it would grow. This is reflected by the changes that were required to the data structures. Foote and Yoder call this situation ‘piecemeal growth’ [24] and change threatens to outpace our ability to cope with it. The biggest risk associated with piecemeal growth is that it will gradually erode the overall structure of the system, and inexorably turn it into a big ball of mud [24]. And this is something that is happening in this case with the degradation of the code base as mentioned above. So in order to remain agile and being able to cope with this change, this project highly depends on the developers and assigning the right tasks to the right developers. And in order to avoid a big ball of mud, it is important that insights about how data structures might have been designed better are applied and that hard to understand code is refactored instead of making it progressively worse.

5.2 How did the selected metrics perform?

5.2.1 Code familiarity

Code familiarity identifies potential problematic file by showing the number of developers familiar with a file. In this use case, not all developers are familiar with all parts of the code as can be seen in4. The majority of the files have only 1 or 2 owners, which can be a problem if those files are hard to understand as well.

Figure 4: Number of owners per file

5.2.2 Code understandability

Code understandability (cyclomatic complexity, readability, naming, documentation) is an essential metric to discover files which will be hard to understand for developers not familiar with the code. For example most of the files have a low level of complexity (see 5) but there are 15 files with a complexity of over 50. Examples are files such as common tools (241 complexity, 1 owner with 80%, other owners have 12%, 6% and 2% ownership respectively) and bl project (269 complexity, 2 owners with 56% and 44% ownership respectively).

Potential problems for further development are in bl forms.py (47 commits), common tools.py (6 commits), bl project.py (87 commits) and app security.py (1 commit) that score significantly worse on all metrics regarding code understandability and only have 1 or 2 developers familiar with the code. The number of commits and last modified dates show that bl forms and bl project are files that are regularly touched, however only by one or two developers. For them, the complexity itself is not a problem which is also confirmed by various tickets mentioned above where the complexity of these

(29)

files was not perceived as an issue. What this shows is a dependency on specific developers of the team regarding specific files.

Figure 5: Complexity per file

5.2.3 Change coupling

Change coupling did not help reveal any issues regarding agility. This can be explained by the fact that the change coupling scores were low for all files (except for the emails templates, which had a 100% change coupling between the .txt and the .html variants). Although change coupling metrics did not help in this use case, it is still a promising metrics because it represents required knowledge about the code base.

5.2.4 Raw metrics

The raw metrics (loc, lloc, sloc, comments and blank) did help identifying understandability issues as well as giving an indication about what changed during commits. For example, in ticket4.1it did show the code churn in the various commits. In ticket4.7 the number of comments show how was dealt with the complexity.

5.2.5 Coupling

Coupling shows the dependencies of a file. In ticket 4.5 it showed that coupling increased by two which corresponded with two permission checks that were needed. However, without looking at the coupled code this metric does not provide enough information.

5.2.6 Number of commits

The number of commits show how often a file was touched. On its own it is not able to detect agility issues but in combination with file ownership and code understandability it can point to problematic files.

5.2.7 Number of tickets

The number of tickets did not reveal any agility issues. This might be due to the fact that the number of tickets is based on the git message including the ticket key, and is therefore a subset of the number of commits.

(30)

5.2.8 Number of bugs

The number of bugs did not reveal any more issues than the number of commits could, because the number of bugs is a subset of the number of commits as well. When a file has significant more bugs than other files it might lead to potential issues but this was not the case.

5.3 What other metrics would be needed?

Table 19 shows that our metrics did not suffice. In order to detect more problems for agility the following metrics need to be added:

• A metric to detect code churn. We already calculated the code churn for each ticket but did not include code churn as a metric. It helped identify situations where a lot of code was rewritten in order to satisfy changed requirements, such as ticket 4.1. Furthermore, code churn can also identify a learning curve4.4.

• A metric to detect changes to the data structure. Changing a data structure has a big impact on the existing code as shown in tickets 4.3 and 4.6 The number of files touched was able to point out 6 out of 10 situations where the data structure changed.

• A metric to detect domain knowledge. The amount of domain knowledge a developer has can help explain certain design decisions, for example a developer familiar in the domain will be able to create a better fitting data structure. [13] examined consensus theory to measure domain knowledge. However, it is very hard to measure knowledge because it is hard to define what constitutes knowledge. A rough estimation of domain knowledge could be the amount of expe-rience a developer has in this domain. Such a metric will distinguish a novice engineering from a more experienced one and consequently increase the chance that a developer is knowledgeable of the domain. Another metric could be the amount of experience a developer has as a software developer, since it is likely that a more senior developer will be able to cope with unknowns better.

• A metric to detect the inherent complexity of the task. When a task is inherently complex, the solution will be complex as well. For example ticket 4.5. As a result, the metrics will show a complex block of code even though this complexity is needed. The task complexity involves the cognitive factors of a task [48]. [14] created a summary of complexity as a function of objec-tive characteristics that contribute to task complexity such as multiple path goal connections, interrelated and conflicting sub tasks and constraints that need to be satisfied. This might be reflected by the amount of story points assigned on Jira because developers will unconsciously take such characteristics into account when estimating the work load.

5.4 Agility equation

Based on the metrics we were missing (5.3) and the information gained during apprenticing with developers (summarized in table 20) we see four essential aspects that have an influence on agility. The first thing we found was that developers need to have domain knowledge and knowledge of the code base to complete some tickets. In other situations it is important that developer have the technical skills to learn new plugins. Furthermore, the inherent complexity of the task and the complexity of the current implementation also play an important role for software agility. By an analogy with Lewins equation11 _{we propose that Software Agility is the function of a teams Competence and the}

underlying Complexity of the work:

Agility : f (Competence, Complexity) (1)

Where competence exists of 1) knowledge of the domain and the code, and 2) engineering ability. And complexity exists of 1) complexity of the existing code, and 2) complexity of the change. Our selected 11_{Lewins equation is a heuristic formulate to explain what determines behavior, where Behavior is the function of a}

(31)

metrics primarily covered the complexity of the existing code and knowledge of the code base. So in order to determine software agility metrics need to cover these four aspects.

5.5 Threats to validity

First of all, the candidate metrics are only validated with one use case. The same metrics might yield different results when using another project. Secondly, the code understandability only consists of static code analysis. It is possible that, even though the code understandability scores are good, the code is still hard to understand because of the overarching structure.

Furthermore, the metrics are calculated at file level. Calculating the metrics at method level will yield different results. Another issue that could arise is due to the fact that some metrics only make sense in terms of tickets rather than within a single commit. Lastly, ownership is a strong determinant in who knows what about a systems source code, but according to a study by [26] not the only one. For example, the frequency and recency of interaction of a developer also play an important role. This is not taking into account when considering the familiarity of a developer.

(32)

6 Conclusion

We compiled a list of 9 problems by apprenticing with developers to see if the characteristics of these problems are covered by the 15 metrics we selected based on a literature research. Recurring themes in these 9 problems are required knowledge of domain, engineering ability, complexity of existing code and the inherent complexity of the task. We were able to identify 3 of the 9 problems. In these cases, we found that the metrics for code familiarity and code understandability are good indicators for the complexity of the existing code. In 6 of the 9 problems, our metrics were not able to identify the problem. In order to do so, metrics that need to be included are changes to data structures, engineering ability, required domain knowledge and a tasks inherent complexity.

Based on the recurring themes found by apprenticing with developers and the missing metrics, we propose a heuristic formula similar to Lewin’s equation. Software agility can be defined as function of a team’s competence and complexity. Competence consists of knowledge of the domain and engineering ability, and complexity consists of complexity of the existing code and complexity of the change. Further research need to cover these four aspects of software agility.

(33)

References

[1] ISO 25010. Iso/iec 25010:2011. https://www.iso.org/standard/35733.html, 2011. [Online; accessed 2017-04-16].

[2] Ashish Agarwal, Ravi Shankar, and MK Tiwari. Modeling agility of supply chain. Industrial marketing management, 36(4):443–457, 2007.

[3] Christopher Alexander. Notes on the Synthesis of Form, volume 5. Harvard University Press, 1964.

[4] Thomas Ball, Jung-Min Kim, Adam A Porter, and Harvey P Siy. If your version control sys-tem could talk. In ICSE Workshop on Process Modelling and Empirical Studies of Software Engineering, volume 11. Citeseer, 1997.

[5] Rajiv D Banker, Gordon B Davis, and Sandra A Slaughter. Software development practices, software complexity, and software maintenance performance: A field study. Management science, 44(4):433–450, 1998.

[6] Victor R Basili and Gianluigi Caldiera. Improve software quality by reusing knowledge and experience. MIT Sloan Management Review, 37(1):55, 1995.

[7] Kent Beck. Extreme programming explained: embrace change. addison-wesley professional, 2000. [8] Kent Beck. Implementation patterns. Pearson Education, 2007.

[9] Kent Beck, Martin Fowler, and Grandma Beck. Bad smells in code. Refactoring: Improving the design of existing code, pages 75–88, 1999.

[10] Christian Bird, Nachiappan Nagappan, Brendan Murphy, Harald Gall, and Premkumar Devanbu. Don’t touch my code!: examining the effects of ownership on software quality. In Proceedings of the 19th ACM SIGSOFT symposium and the 13th European conference on Foundations of software engineering, pages 4–14. ACM, 2011.

[11] Barry W. Boehm. Verifying and validating software requirements and design specifications. IEEE software, 1(1):75, 1984.

[12] Barry W Boehm, John R Brown, and Hans Kaspar. Characteristics of software quality. 1978. [13] Stephen P Borgatti and Inga Carboni. On measuring individual knowledge in organizations.

Organizational Research Methods, 10(3):449–462, 2007.

[14] Donald J Campbell. Task complexity: A review and analysis. Academy of management review, 13(1):40–52, 1988.

[15] Elizabeth Charters. The use of think-aloud methods in qualitative research an introduction to think-aloud methods. Brock Education Journal, 12(2), 2003.

[16] Jitender Kumar Chhabra, KK Aggarwal, and Yogesh Singh. Code and data spatial complex-ity: two important software understandability measures. Information and software Technology, 45(8):539–546, 2003.

[17] Scott A Crossley, David B Allen, and Danielle S McNamara. Text readability and intuitive simplification: A comparison of readability formulas. Reading in a foreign language, 23(1):86, 2011.

[18] Ward Cunningham. The wycash portfolio management system. In The WyCash Portfolio Man-agement System, pages 29–30. Addendum to Proc. Object-Oriented Programming Systems Lan-guages and Applications, 1992.

(34)

[19] Marco D’Ambros, Michele Lanza, and Romain Robbes. On the relationship between change cou-pling and software defects. In Reverse Engineering, 2009. WCRE’09. 16th Working Conference on, pages 135–144. IEEE, 2009.

[20] Eric D Darr, Linda Argote, and Dennis Epple. The acquisition, transfer, and depreciation of knowledge in service organizations: Productivity in franchises. Management science, 41(11):1750– 1762, 1995.

[21] Subhajit Datta. Agility measurement index: a metric for the crossroads of software development methodologies. In Proceedings of the 44th annual Southeast regional conference, pages 271–273. ACM, 2006.

[22] R. Geoff Dromey. A model for software product quality. IEEE Transactions on software Engi-neering, 21(2):146–162, 1995.

[23] Karl Anders Ericsson and Herbert Alexander Simon. Protocol analysis. MIT press Cambridge, MA, 1993.

[24] Brian Foote and Joseph Yoder. Big ball of mud. Pattern languages of program design, 4:654–692, 1997.

[25] Martin Fowler and Jim Highsmith. The agile manifesto. Software Development, 9(8):28–35, 2001. [26] Thomas Fritz, Gail C Murphy, and Emily Hill. Does a programmer’s activity indicate knowledge of code? In Proceedings of the the 6th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on The foundations of software engineering, pages 341–350. ACM, 2007.

[27] Harald Gall, Karin Hajek, and Mehdi Jazayeri. Detection of logical coupling based on product release history. In Software Maintenance, 1998. Proceedings., International Conference on, pages 190–198. IEEE, 1998.

[28] Harald Gall, Mehdi Jazayeri, and Jacek Krajewski. Cvs release history data for detecting logical couplings. In Software Evolution, 2003. Proceedings. Sixth International Workshop on Principles of, pages 13–23. IEEE, 2003.

[29] Tudor Girba, Adrian Kuhn, Mauricio Seeberger, and St´ephane Ducasse. How developers drive software evolution. In Principles of Software Evolution, Eighth International Workshop on, pages 113–122. IEEE, 2005.

[30] Ilja Heitlager, Tobias Kuipers, and Joost Visser. A practical model for measuring maintain-ability. In Quality of Information and Communications Technology, 2007. QUATIC 2007. 6th International Conference on the, pages 30–39. IEEE, 2007.

[31] Jim Highsmith. Velocity is killing agility! http://jimhighsmith.com/ velocity-is-killing-agility/, 2011. [Online; accessed 2017-06-06].

[32] Mikio Ikoma, Masayuki Ooshima, Takahiro Tanida, Michiko Oba, and Sanshiro Sakai. Using a validation model to measure the agility of software development in a large software development organization. In Software Engineering-Companion Volume, 2009. ICSE-Companion 2009. 31st International Conference on, pages 91–100. IEEE, 2009.

[33] Brian W Kernighan and Phillip James Plauger. The elements of programming style. The elements of programming style, by Kernighan, Brian W.; Plauger, PJ New York: McGraw-Hill, c1978., 1978.

[34] Gregor Kiczales, John Lamping, Anurag Mendhekar, Chris Maeda, Cristina Lopes, Jean-Marc Loingtier, and John Irwin. Aspect-oriented programming. ECOOP’97Object-oriented program-ming, pages 220–242, 1997.

Software Agility