Estimated team size and its relation to software quality

(1)

UNIVERSITY OF AMSTERDAM MASTER THESIS

61 PAGES

Estimated team size

and its relation to software quality

Author: Dominik ˇSafari´c

University: University of Amsterdam

Course: Master of Science in Software Engineering

Host organisation: Software Improvement Group (SIG) Host organisation supervisor: prof. dr. ir. J.Visser

University supervisor: dr. Magiel Bruntink

July 22, 2015

(2)

Abstract

We present a model for estimating the size of a development team over time by mining software repositories of open source and industrial software systems. Apart from this, this thesis explores the relation between the size of a develpment team and technical quality.

Technical quality has been measured by using the SIG Maintainability Model devel-oped by the Software Improvement Group (SIG). The SIG Maintainability Model maps measured source code properties to the maintainability characteristic and sub - characteristics of the ISO/IEC 25010.

Estimations of team sizes have been produced by the established model for 10 software systems within the data set. The results indicate a strong positive association between estimated and actual team size. For their relation to technical quality, a correlation analysis showed a strong negative relation to several source code properties, but not to the maintainability characteristic and sub-characteristics of the ISO/IEC 25010.

Keywords Team size estimation, mining software repositories, software system qual-ity, SIG Maintainability Model

(3)

Introduction

Team size estimations are generally produced by companies in early phases of a soft-ware product life cycle. Once a softsoft-ware product becomes operational, the information describing the number of developers maintaining a system is no longer recorded [10]. However, this information is becoming of major interest when making estimates of cost and duration for activities similar as in the past [3].

By examining the current body of knowledge, we conclude that team size estimation models focus on expert judgement using historical data and statistical modelling [3, 7]. Unfortunately, in many cases historical data is not available or out-of-date, meaning that estimations based on this data may be incorrect or biased by expert judgement.

On the other hand, software repositories partially compensate for the lack of historical data by recording the interaction of particular development groups with the source code of a software system.

So far, research on team size estimation models by using data from software reposito-ries mainly focused on particular development team fragments such as core develop-ment teams [25, 27, 9]. Consequently, the contributions of less active team members remain unknown.

In our study, we first establish a model for estimating the size of a development team. The model does not disregard the contributions of less active developers. To estimate the size of a development team, the model uses commit history from software

(7)

In the second step of our study, we use the estimation model to explore the relation of estimated team size to technical quality.

So far, team size has demonstrated to correlate with increased failure-proneness [21], increased organisational complexity [5] and decreased product quality as perceived by the customers [20]. But, empirical evidence of the relation to source code metrics remains limited.

Because of this, we seek to provide empirical evidence onto how team size expansion might be an indirect threat to technical quality. But, we do not assume causality implied by the relation.

For measuring technical quality, we use the SIG Maintainability Model designed by the Software Improvement Group (SIG) [16]. The SIG Maintainability Model maps measured source code metrics to the maintainability characteristic and sub-characteristics of the ISO/IEC 25010 [18].

1.1 Research statement

We are looking to establish a model for estimating the size of a development team by mining software repositories. The model should be applicable to industrial and open-source development projects. Ideally, the model will provide accurate estimations of team size through the history of a development project.

Next, we seek to investigate the relation of the development team size to technical quality. We expect that a significant increase in team size correlates with decreased technical quality. We do not assert that having large teams leads to decreased tech-nical quality, because other factors might have an influence. For example, increased communication and coordination overhead [5] or unequal levels of domain-specific and project-specific knowledge [20].

(8)

1.2 Research questions

The past section 1.1 highlights a number of points that should be further investigated. Hence, this thesis aims to answer the following research questions:

[RQ1] How do we estimate the size of a development team over time by mining soft-ware repositories of publicly available open source and industrial softsoft-ware systems?

[RQ2] Does the size of a development team correlate with software system technical quality as measured by the SIG Maintainability Model1 ?

[RQ2.1] Is the size of a development team associated with software system source code properties, where source code properties ⊆ {volume, unit size, unit plexity, unit interfacing, duplication, module coupling, component balance, com-ponent independence}?

[RQ2.2] Is the size of a development team associated with software system prod-uct quality, where prodprod-uct quality ⊆ {maintainability, analysability, modifiability, modularity, reusability, testability}?

1.3 Research methods and context

Research questions listed in the section 1.2 will be answered by means of empirical research. The data supporting the research will be gathered by mining git source code management systems of both open source and industrial software system repos-itories. The data will be obtained by developing a Java application for automated data retrieval and storage using the GitHub REST API2. Chapter 3 describes the data retrieval process in further detail. The projects will be selected based on ac-tivity, i.e. frequency of commits and total duration of development. Two software

1_{Section 2.4 describes the SIG Maintainability Model in detail}

2

(9)

products will be used as subjects of the control group. The Software Analysis Toolkit (SAT) provided by the Software Improvement Group (SIG)3 and Rascal by Centrum Wiskunde & Informatica (CWI)4.

To answer RQ1 we design a development team size estimation model based upon an extensive literature study and interviews with researchers from the mining software repositories (MSR) research domain. We aim at validating the model by means of correlation analysis. For the correlation analysis a comparison of estimated and actual size of a development team will be made.

In order to answer RQ2 we analyse selected software products for technical quality by means of static source code analysis using the Software Analytics Toolkit (SAT) by the SIG. We then correlate the produced software system quality ratings with the estimated development team sizes by means of correlation analysis. The underlying technical quality metrics have already been validated by the SIG.

1.4 Related work

Robles et al. present an effort estimation model based on translating the activity of developers, as recorded in the source code management system of OpenStack [26]. They describe the effort estimation model as an identification technique of full-time and non-full-time developers. The distinction is made by setting an unique threshold for the minimum number of commits a developer must have in order to be classified as a full-time collaborator.

Capiluppi et al. studied the characteristics of open source projects in terms of generic characteristics, community of developers and community of users [6]. The authors underline the assumption that open source projects hold a community of core devel-opers if 10 or more develdevel-opers are contributing to the code base of a development

3

https://www.sig.eu

4

(10)

project. Furthermore, the study distinguishes core development from co-development groups by first classifying developers their contributions either as stable or transit.

Robles et al. present a model for tracking the evolution of the core team of developers in open source projects [25]. The methodology is based on a classification of the social structure of open source development projects set by Crowston [9]. The classification by Crowston distinguishes between four types of contribution groups - including core and co-development groups. As a measure of contribution Robles et al. used numbers of commits of a developer during a certain period. They identified a core development group during sequential periods of the observed GIMP project, by considering a fraction of the top 10% most active developers. Upon the identification process, the methodology produces normalised and absolute matrixes, with the value of a cell, Mi,j

being the number of commits performed by a certain core group i, during a period j.

Gonzalez-Barahona et. al extended the study of Robles et. al [25]. In a case-study, they analysed changes in the core development team of the Mozilla development project. For the identification of core developers, they used the method proposed in [25].

1.5 Definitions

Commit An action by which an individual submits a set of changes to a version control system - including modifications to the source code. A commit is always assigned with a message describing the changes, and a timestamp indicating when the changes were submitted.

Committer An individual performing a commit to a repository. A committer may not necessary be the author of the changes.

Author An individual contributing the changes. An author may not necessary be the committer of the changes because it may not have the privileges to directly

(11)

submit the changes to a repository.

Development team A group of developers contributing to the same software devel-opment project by submitting source code changes to a source code management system. Throughout the thesis we will refer to development team as team.

LOC Lines of code, a metric expressing the size of a software system artefact. ”A line of code is any line of program text that is not a comment or blank line, regardless or the number of statements or fragment of statements on the line. This specifically includes all lines containing program headers, declarations, and executable and non-executable statements.” [7]

1.6 Outline

The structure of this thesis is as follows. Chapter 2 provides background information on the context of the research. Chapter 3 describes the data used in the research, and the methods of acquiring the data. In Chapter 4, we present the designed team size estimation model. In addition, the validation of the model is described. Chapter 5 describes the context and results of exploring the association between team size and technical quality. Finally, Chapter 6 concludes the thesis with threats to validity of the research, and outlines future research improvements.

(12)

Chapter 2

Background information

2.1 Git

Git is a decentralised source code management (DSCM) system. Unlike centralised source code management (CSCM) systems (e.g. Subversion5), work output flows sideways, and privately among collaborators, rather than via a central repository [2]. Collaborators fork (clone) a central repository, and make changes independently to each other [15]. In contrast to centralised source code management systems, collabo-rators merge changes either by creating a pull request or directly pushing the changes to a central repositorys branch. Pull requests specify a local branch to be merged with the main repository branch [15]. They are a subject of a code review process before merged to the main repository branch. Furthermore, commits in a centralised source code management system are sent directly to a central repository, while in Git they remain local. Consequently, commits may not be visible to all collaborators [2].

An example of a decentralised source code management system includes GitHub6, which has spread widely acclaim and adaptation by both the research community and developers.

5

http://subversion.apache.org

(13)

2.2 ISO/IEC 25010

The ISO/IEC 25010 by the International Organisational for Standardisation specifies that the quality of a software system can be measured and evaluated using eight different quality characteristics [18]. Figure 2.2.1 shows the quality characteristics defined by the standard.

Figure 2.2.1: Product quality characteristics specified by ISO/IEC 25010

In the light of the specified RQ2, we focus on the quality characteristic of Maintain-ability, which is defined as:

”Degree of effectiveness and efficiency with which a product or system can be modified by intended maintainers.”

The quality characteristic of Maintainability is further decomposed into the 5 sub-characteristics: Modularity, Reusability, Analysability, Modifiability, and Testability.

2.3 Software Improvement Group

Software Improvement Group (SIG)7 is a management consultancy firm based in Amsterdam, the Netherlands. The goal of SIG is to provide insight into quality characteristics of software systems - including maintainability. Organisations can use

7

(14)

the services of SIG either by requesting a single software system assessment, or by monitoring their software systems over a period of time.

To assist these services, the SIG has designed a pragmatic model for measuring and rating technical quality of software systems in accordance to the ISO/IEC 25010 stan-dard. The model allows for factual representation of technical quality of a software system by means of static source code analysis. Further information regarding the model are presented in section 2.4.

2.4 SIG Maintainability Model

The SIG created a model for mapping source code properties to the maintainability characteristic and sub-characteristics of the ISO/IEC 25010 standard [18]. Source code properties are being measured on a number of software artefacts, including source code files. These measurements are then compared to other software systems in a benchmark, and assigned a rating between 0.5 and 5.5. Based on these source code properties, a rating of the same range is assigned to maintainability and its sub-characteristics as specified by the ISO/IEC 25010. The quality ratings of the maintainability characteristic and sub-characteristics are determined by measuring the following source code properties:

Volume The overall size of the source code, as determined in number of lines of code of a software product.

Duplication The degree of source code redundancies. It concerns the occurrence of identical fragments of source code throughout the code base.

Unit size The size of source code units8.

Unit complexity The degree of complexity in the units of source code. A higher complexity indicates a larger number of paths of execution.

8

A unit represents the smallest executable and testable fragment of source code. In Java or C# a unit is a method, in C a unit is a procedure.

(15)

Unit interfacing The overall number of interface parameter declarations of units the units of source code.

Module coupling The coupling of modules in terms of the number of incoming dependencies between the modules of the source code.

Component balance The number of first-level components in the system, and the component-size uniformly.

Component independence The percentage of source code in modules that have no incoming dependencies from modules in other top-level component.

The definitions were taken from internal documentation of the SIG and [16], [4]. Upon measuring the source code properties are mapped to the sub-characteristics of maintainability, which is graphically depicted in figure 2.4.1. In the final step, the evaluation results for the sub-characteristics are used to produce a single evaluation value, indicating the level of technical quality, i.e. maintainability.

Figure 2.4.1: Mapping of the source code properties to sub-characteristics of main-tainability of the ISO/IEC 25010

2.5 Software Analysis Toolkit

The Software Analysis Toolkit (SAT) offers static source code analysis capabilities for a wide range of programming languages. It provides ratings benchmarked with other industrial software systems for both source code properties and sub-characteristics of

(16)

maintainability of the ISO/IEC 25010, as mapped by the SIG Maintainability Model.

2.6 Monte Carlo simulation

Monte Carlo simulation is a stochastic method based on repeated random sampling and statistical analysis to produce a complete range of risk associated with each variable of a model [31, 24]. For the method to be applied, the analyst constructs a mathematical model that simulates a real system. The model requires a domain of possible input variables, which are randomly sampled from a statistical distribution describing the variable. Then, random samples are drawn from the distribution, which represent the values of the input variables. The sampling is repeated thousands of times. For each input variable, random outcomes are produced. These results are aggregated for a number of simulation runs. Finally, the output variables are subject of a statistical analysis. The statistical analysis produces both descriptive statistics describing the aggregated results (i.e. output variables), and a range describing the level of (un)certainty of an outcome to occur.

(17)

Chapter 3

Data

In this chapter of the thesis we describe the data that will be used by the team size estimation model. The data will be retrieved by mining Git version control systems of 10 development projects. We briefly describe the procedure of retrieving this data, and provide an overview of the selected software systems. These software systems will be used as subjects of all research questions of the thesis.

3.1 Data required

We use the following data for the team size estimation model:

Repository description Descriptive information regarding a repository of a soft-ware development project of interest - including information about the owner of a repository, and programming languages used.

Commits to the default branch of a repository Commits are an integral part of the proposed development team size estimation model further elaborated in Chapter 4. With every commit, we retrieve the following information: committer of a commit, author of a commit, date of a commit, commit message.

Commit related information We retrieve a list of files that were modified by a commit, along with the number of added, and deleted LOC. The rational for extracting commit related information in terms of its associated message and modifications, is that we believe it will provide us an insight onto the changes that were made.

(18)

3.2 Challenges

When retrieving data from Git version control systems using the GitHub REST API, the following challenges are present:

GitHub API schema The overall schema of GitHub is not document. A researcher wishing to retrieve and store repository related data from GitHub has to reverse engineer the schema from the corresponding JSON replies [14].

Limited number of requests The GitHub API has a limited number of requests per hour. The REST API provides five thousand requests per hour per single account, using the OAuth2 authentication protocol.

Abuse detection The GitHub API holds an abuse detection mechanism. By trans-mitting an extensive amount of API requests in a sequence, the abuse detection mechanism temporary prohibits access to its data. The information regarding the exact rate of requests transmitted per second that invoke the abuse detection mechanism is not publicly available.

3.3 Data retrieval

The data used for our study was obtained by developing a Java application for au-tomated data retrieval and storage using the GitHub REST API. The rationale for developing an in-house retrieval tool is that publicly available tools provided by the research community do not provide data such as commit associated information in terms of added LOC, deleted LOC, files modified. The architecture of the developed tool consists of three components.

Given a list of requested GitHub API URLs, the retrieval component executes the requests simultaneously using multiple threads of execution. The rationale for im-plementing the retrieval component as such is performance wise. The GitHub API

(19)

limitation of the limited number of requests per hour has been overcome by creating a TokenStack class storing multiple OAuth29 authentication Token object instances.

The storage component, upon completion of the retrieval process, creates class objects representing the database entities. An Object/Relational Mapping (ORM) framework (notably Hibernate10) maps the objects to certain database entities and stores them into a relational MySQL11 _database.

The database schema of the application has been created by reverse engineering the GHTorrent mirroring service MySQL database schema [14]. GHTorrent is a service that allows for automated retrieval of repository related information including commits, issues and pull requests [13, 14].

The result is a MySQL database consisting of 11 entities. Some of the entities were imported from the reverse engineered schema, while the others were created or mod-ified according to our needs. For instance, the GHTorrent MySQL database schema does not provide an entity storing commit associated information in terms of addi-tions, deletions and files modified. This limitation has been overcome by creating an additional entity cross-referencing the commit table.

A post-processing component of the developed tool filters commit related messages for a given list of keywords. For instance, the dataset of a certain repository may hold auto-generated files committed to the code base of a certain repository. By defining a list of keywords related to auto-generated files, the component provides us with a list of commits for which we might consider as such.

9_{https://developer.github.com/v3/oauth/}

10

http://hibernate.org/orm/

(20)

3.4 Data selection

The data we used for our study was obtained from the Git version control systems of 10 projects. Table 3.4.1 provides an overview. These projects were randomly selected based on the following criteria.

A project must be active during the period of 1st of January 2011 until 31st of March 2015. We define an active project as having at least one commit during all sequential monthly periods. Next, the total percentage of commits not having an author identifier assigned cannot exceed an arbitrary threshold of 10%. The lack of identifiers might be due to migration from one version control system to another. But moreover, if a person is not registered within GitHub, his identity might be missing.

Project name Main programming language No. of commits Weekly no. of commits (mean) Commits without author identifiers (percentage) Rascal Java 8049 36.09 0.11% SAT Java 5356 23.91 0.00% SonarQube Java 13904 63.48 1.24% ServiceStack C# 4838 22.50 5.13% BioFormats Java 7906 35.45 2.63% Nova Python 25705 117.91 0.00% MongoDB C++ 16364 73.05 0.00% Git C 14914 66.87 6.85% Gradle Java 22080 99.01 9.13% Salt Python 50130 231.01 9.56%

Table 3.4.1: Selected software products

All software products have a non-significant percentage of missing identifiers, with Salt having the largest 9.56% and MongoDB, Nova and Git as the smallest 0%. Two projects (notably Software Analytics Toolkit (SAT) and Rascal) were included in the dataset as control groups. The software products included in the data set are of various size. MongoDB as the largest software system accounted in June 2015 for 666.151 LOC, excluding blank spaces and comments. In contrary, ServiceStack is the

(21)

smallest software system within the data set, accounting for 141.711 LOC in June 2015. The figures were extracted by means of an analysis of the SAT.

The total number of retrieved commits is equal to 169.246. For each individual com-mit we retrieved information related to the modifications made including added lines of code, deleted lines of code, files modified and their status. The status indicates whether a file was added to the project directory, removed, modified or renamed. Fur-thermore, for each commit we retrieved associated commit messages. The messages describe the modifications introduced by a commit.

3.5 Data cleaning

Several steps have been taken to reduce the noise in the data set. First, in the control group (notably Rascal and SAT), we manually inspected all commits for missing committer and author identifiers. The inspection was done by querying the database for commits with null values assigned to the committer or author field. We replaced the null values with the corresponding identifiers. The list of the corresponding identifiers was created by first manually inspecting each commit for identity traces (including names, e-mail addresses), and later issuing a request via the GitHub REST API for an identifier value. The percentage of missing commit identifiers was reduced by 86,13% in the Rascal dataset. In the case of the SAT, the reduction was equal to 100%. This process was not performed for the other software development projects because this would require manual inspection of more than 5739 commits.

Second, we manually inspected the data set for outliers in terms of auto-generated files. We identified outliers in terms of auto-generated files by using descriptive statis-tics and quantile-quantile plots describing the amount of added and deleted LOC. Furthermore, in conjunction with the development team of the Rascal project, we identified four files and a directory in the master branch where all files are auto-generated. The deletions of files accounts for 166.422 lines of code added, respectively

(22)

154.741 lines of code deleted. Due to lack of system knowledge when identifying auto-generated files in other software systems, we considered information provided by the OpenHub12 service. OpenHub provides a comprehensive list of auto-generated files per repository. An alternative to the OpenHub service would require to elicit this information from members of an organisation developing each software system within our data set.

(23)

Chapter 4

Team size estimation

This chapter of the thesis describes a model aimed at estimating the size of the overall development team over time. The model does not distinguish between core development, respectively co-development groups as presented in [9, 25, 26]. This deliberate choice was made due to ambiguity of what constitutes a core, respectively co-developer.

When estimating the size of the overall development team, the following limitation that could potentially make the estimates difficult has to be considered. By mining a single source of information (notably Git version control system), only the activity of developers contributing to the overall code base can be extracted. The contributions of development roles such as software architects, requirements analysts, are out of scope of our estimations.

4.1 Model introduction

The model proposed in this thesis is based on retrieving data about the activity of developers from a source code management system of a software development project at hand.

To estimate the size of a development team, first we query the database for commit history of a given repository, beginning with 1st of January 2011 until 31st of March 2015. In order to eliminate potential noise in the retrieved commit history, we apply

(24)

a straightforward commit characterisation procedure, which is explained in section 4.2. Then, the commit history is split in periods of equal duration. The duration of periods is equal to 1 quarter, i.e. for each quarter we estimate the size of a development team. This is accomplished by applying the following procedure in the R13 statistical environment.

First, we calculate the number of developers contributing to a repository during each week of an observed quarter. Next, from the produced distribution describing the number of developers contributing per week basis of a given quarter, we calculate the minimum, mode, and maximum value. These parameters serve as an input to a stochastic method Monte Carlo [31, 24]. The Monte Carlo method is a part of the model because it provides us with the likelihood of a certain estimate to occur. Section 4.3 describes the calculation process in detail.

Finally, the model produces estimates of the size of a development team during each quarter in terms of a lower and upper bound value. Section 4.4 describes in further detail the output of the model.

4.2 Commit characterisation

Our team size estimation model does not assume equality in weight of individual commit contributions. Namely by manually inspecting a number of commits and their set of source code changes, we made an observation that certain individuals commit 1 LOC changes to a source code. It is our belief that this pattern of behaviour would pose a threat to the overall estimations. Because of this we classify commits into two categories: meaningful, and trivial contributions. We define trivial commit contribution as having 1 or fewer LOC changed. In contrary, we consider a commit to be meaningful if it has more than 1 LOC changed. Furthermore, the classification presented does not neglect the importance of small changes dealing with critical bug

13

(25)

fixes. Therefore, a commit can only be classified as trivial if it fulfils the size criteria described above, and does not contain the following keywords within its message -bug, fix, improvement.

Project name No. of trivial commits % of trivial commits

Rascal 199 2.47% SAT 189 3.52% SonarQube 315 2.26% ServiceStack 187 3.87% BioFormats 211 2.67% Nova 357 1.14% MongoDB 476 2.91% Git 331 2.21% Gradle 689 3.12% Salt 2785 5.56% Total 5739 3.39 %

Table 4.2.1: Total number of commits identified as trivial

4.3 Identification of the number of developers

For each week during a given quarter, we calculate the number of developers con-tributing to a repository. This is done by calculating the number of unique author identifiers from the commit associated information. We have chosen to identify on the basis of author identifiers, because a committer might not necessary be an author of certain changes. The output is a distribution containing approximately 14 weeks. Figure 4.3.1 shows an example of the distribution.

From the distribution, we identify the minimum, mode, and maximum number of developers. For example, from the distribution depicted in Figure 4.3.1, M in = 2, M ode = 11, M ax = 21. These parameters are used as an input to a random triangular distribution of a Monte Carlo simulation. The reasoning behind applying a random triangular distribution is that the identified number of developers is distributed in a non-symmetric way. In fact, a Shapiro-Wilk [29] test yielded a significant p−value for

(26)

0 10 20 30 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Week of a quarter

Number of identified author identifiers

Figure 4.3.1: Example distribution of number of developers (y axis) identified per each week (x axis) of a quarter

each software system within the data set, thus allowing us to reject the null hypothesis that the data is normally distributed. Next, we believe that a random triangular distribution may be identical to the identified distribution because its shape might be biased by the input variables. In order to support this claim, a distribution fitting test should be applied.

The Monte Carlo simulation randomly samples from the given triangular distribution 10.000 times. This produces a range representing the most, and least certain values that may be considered for an estimate. We apply both the Monte Carlo method, and identification of the number of developers in the R statistical environment by using the mc2d package14. The values are further used when producing the final output of the model.

4.4 Output of the model

The output of our model is an estimate represented by a lower, and upper bound value. The lower bound value represents the most likely estimate, thereby being computed as the mode value of the randomly sampled distribution. In contrast, the

(27)

upper bound value is the least certain value, i.e. the value at 97,5% significance level of the confidence interval produced by the Monte Carlo simulation. We present our estimates in terms of a lower, and upper bound value because we believe that providing a single point estimate would yield less accuracy of the overall model.

4.5 Results

Table 4.5.1 shows the results of estimating the size of development teams for projects within the data set. The presented descriptive statistics were produced by calculat-ing the mean value of the lower, and upper bound estimate throughout all quarterly periods.

Project name Min Max Mean Median Std. dev. IQR

Rascal 3.31 7.25 5.36 5.39 13.62 3.93 SAT 4.65 10.12 7.29 7.29 1.34 5.47 SonarQube 2.53 7.64 4.36 3.76 1.55 5.11 ServiceStack 1.00 7.29 3.04 2.48 4.59 6.29 BioFormats 2.52 6.80 4.35 4.42 1.18 4.16 Nova 1.51 44.52 19.40 22.58 13.21 43.01 MongoDB 5.83 26.54 13.67 12.30 5.90 20.71 Git 8.15 18.17 12.31 11.57 2.33 10.52 Gradle 1.00 12.58 5.80 5.96 2.56 11.43 Salt 1.55 27.37 12.98 13.81 8.29 25.81

Table 4.5.1: Descriptive statistics of the development team size estimations by con-sidering the mean value of the lower and upper bound estimate throughout all periods beginning with 1st of January 2011, until 31st of March 2015, where the total number of periods N = 17

The results demonstrate that the majority of software systems within the data set are managed by a relatively small group of developers. The Nova software system accounts for the largest development team, with ¯x = 19.40 and max = 44.52. The interquartile range, with IQR = 43.01 indicates great variation in the number of developers contributing to the software system throughout the observed periods.

(28)

The estimates of ServiceStack, SonarQube, and BioFormats indicate that these projects are managed by the smallest community of developers. For example, ServiceStack accounts for the smallest community of developers with ¯x = 3.04. Furthermore, the estimated mean team size of BioFormats is ¯x = 4.35, while for SonarQube ¯x = 4.36.

4.6 Validation

We validate the proposed development team size estimation model in terms of corre-lation analysis. The validation has been performed by applying the proposed model onto the control group development projects (notably Rascal and SAT).

For the correlation analysis a comparison of estimated and actual size of a develop-ment team was made. The actual size was obtained in terms of expert opinion from both the SIG and CWI, hence it is not an exact indication of the size of a team, but rather an approximation. The reasoning behind this is that none of the control group development projects enforce full-time-equivalent collaborators, but rather periodical. Figure 4.6.1 shows a comparison of estimated and actual team size per development project. The correlation between the data was calculated by means of the Spearman’s rank correlation coefficient [30].

Rascal SAT 3 4 5 6 7 8 9 10 11 12 13 V alue

variable ESTIMATED ACTUAL

Figure 4.6.1: Comparison between estimated and actual team size per development project

(29)

For the Rascal development project, the results indicate a statistically non-significant correlation, with ρ = 0.45 and p − value = 0.0727. In the case of the SAT, ρ = 0.71 with p − value = 0.0017. These results indicate a highly strong association between the actual and estimated development team size.

4.7 Limitations

By manually inspecting the estimates produced for both control groups, and in con-junction with their development team members, the following limitations of the model were identified:

Infrequent commits Infrequent commits to a source code management system ob-scures an individuals contributions. An individual having a non-trivial amount of contributions will no be identified as a members of a development team if the contributions were not committed frequently. The frequency is yet to be determined. This limitations has been found in the Rascal development project. Inability to distinguish developers For instance, if the composition of a develop-ment team D is equal to D1⊆ {developer1, developer2, developer3} during one

week, and for a subsequent weekly period equal to D2 ⊆ {developer4, developer5,

developer6}, the estimated development team size will be equal to 3.

4.8 Discussion

In this chapter of the thesis we propose a model aimed at quarterly estimating the size of a development team.

The alternative could include producing monthly estimates of the size of a develop-ment team. In fact, the initial model was aimed at estimations for such duration of periods. In our opinion, estimating the size of a development team per month may

(30)

produce estimates that are to low. The reasoning behind this is that the distribution used for the identification of the number of developers would not contain sufficiently enough data points.

For example, based on a distribution d, where d ⊆ {3, 7, 5, 2} a lower bound estimate, i.e. mode value could not be calculated because of the equally distributed probabilities of all values to occur. In contrast, the number of data points increases when producing quarterly estimates, thus the likelihood of the same values to occur is higher.

4.9 Conclusion

In this chapter of the thesis we have proposed a model for estimating the size of a development team. The model produces quarterly estimates of team sizes by using commit history from the Git version control system. We claim the model is generally applicable to development projects independent of the Git version control system, but this requires further refinements (refer to Section 6.1.2).

A first introduction of the model within the SIG and CWI has received positive feedback. But, doubts have been raised about the importance of the Monte Carlo simulation when producing the final output. In spite of that, further evaluation of the model includes measuring its usefulness.

We evaluated the model in terms of correlation analysis. The results indicated a strong positive relation of estimated to actual team size. In addition, we are very interested in applying the model to software systems managed only by full-time-equivalent developers.

This would provide us with further validity evidence, because systems within the data set enforce periodical developer engagement. For this, a data set containing more industrial software systems would be required.

(31)

Chapter 5

Team size and software quality

This chapter of the thesis seeks to explore the relation between the size of a devel-opment team, and ratings of technical quality in regard to the SIG Maintainability Model (see Section 2.4). The relation will be explored on an individual software sys-tem level, and pooled level. Technical quality will be measured both in terms of source code properties and sub-characteristics of maintainability as defined by the ISO/IEC 25010 standard (see section 2.2). Measurements of source code properties and the sub-characteristics of maintainability will be performed by means of the Software Analysis Toolkit (SAT) (refer to section 2.5).

We expect that with a significant increase in team size ratings of technical quality will decrease. The reasoning behind this assertion is that an increase in size re-quires significant overhead, such as training, increased communication, coordination, and complexity of organisational structure [5, 21]. In addition, new team members have to acquire project-specific and domain-specific knowledge, meaning that initial implementations and system integrations may be inconsistent [5, 20].

However, we do not assume causality implied by the relation. We believe that acquir-ing empirical evidence of the relation would not unequivocally prove that havacquir-ing large teams leads to decreased technical quality. Rather, the empirical evidence would help researchers and practitioners to comprehend how team expansion can indirectly be a threat to software quality.

(32)

5.1 Context and methods

The size of a development team as an independent variable was acquired quarterly for each software system within the data set. The estimates were produced by applying the general procedure of the development team size estimation model presented in Chapter 4. For the correlation analysis, we used the mean value of the lower, and upper bound estimate. This deliberate choice cannot be supported by any grounded theory, thereby we claim that it is an arbitrary logical choice.

Ratings of technical quality as the dependent variables were produced for each soft-ware system quarterly by means of the SAT - including ratings of source code proper-ties and the maintainability characteristic and sub-characteristics. For the calculation of the measurements two inputs were required - source code revisions and scope files. A further elaboration of the two inputs is provided in sections 5.1.1 and 5.1.2.

The correlation between the two stated variables was explored by means of the Spear-man’s rank correlation coefficient [30]. The Spearman rank correlation coefficient describes the percentage of the variance explained by the independent variable, indi-cating the strength of the correlation. The rationale behind applying the Spearman’s rank correlation coefficient is an assumption that it may provides us with more ac-curate results in case of prominent outliers, than for instance the Pearson product-moment correlation coefficient [22].

5.1.1 Source code revisions

In order for the SAT to perform a static source code analysis, the code base of a software system is required. Source code revisions of software systems within the data set were retrieved by developing a retrieval script in Python. The script clones locally a repository from the default branch of a given software system, and produces revisions of the code base. Because we estimate the size of a development team

(33)

quarterly, the source code revisions were retrieved for the same period duration. Hence, the total number of revisions retrieved per software system is equal to 17.

5.1.2 Scope file

A scope file specifies the scope of an analysis of a software system, thereby requires a specification of programming languages and architectural components to be analysed. This manual specification is required because of the inability of an automated iden-tification of programming languages and architectural components. Not specifying architectural components for each software system would bias source code property metrics, hence leading to inaccurate ratings of the sub-characteristics of maintain-ability.

In order to specify the programming languages used in a development process of a software system, we used the data acquired by the retrieval tool (see section 3.3). Fur-thermore, we identified a set of directory paths containing architectural components based on the last source code revision (notably 1st quarter 2015) of each software system by means of expert opinion of the technical consultants from the SIG. The disadvantage of identifying a fixed set of directory paths containing architectural components is that the architectural decomposition of a software system might have changed over time.

1 <? xml version=” 1 . 0 ” e n c o d i n g=”UTF−8” standalone=” y e s ” ?>

2 <s y s t e m customerName=”SIG” name=”SAT” p r o j e c t C o d e=” X2015 −0000 ”>

3 <s c o p e b a s e D i r=” / c o d e ”> 4 <e x c l u d e p a t t e r n=” . ∗ / l i b / . ∗ ” /> 5 <e x c l u d e p a t t e r n=” . ∗ / P a r s e r / . ∗ ” /> 6 <components de pt h=” 3 ” /> 7 <l a n g u a g e name=” Python ”> 8 <s o u r c e o v e r r i d e D e f a u l t F i l t e r s=” f a l s e ” /> 9 < t e s t o v e r r i d e D e f a u l t F i l t e r s=” f a l s e ” /> 10 </ l a n g u a g e> 11 </ s c o p e> 12 </ s y s t e m>

Figure 5.1.1: Scope file specifying the scope of a system analysis, with Python as the only programming language to be analysed and a component depth equal to 3

(34)

5.2 Results for source code properties

Table 5.2.1 shows the results of the correlation analysis between development team size and source code properties as defined by the SIG Maintainability Model. The cor-relation coefficients were calculated on a pooled system level. In addition, Appendix A presents scatter plots of the relation.

Development team size vs. ρ p − value

Volume -0.22 0.00 Duplication 0.20 0.00 Unit size -0.25 0.00 Unit complexity -0.24 0.00 Unit interfacing -0.50 0.00 Module coupling -0.37 0.00 Component balance -0.18 0.02 Component independence -0.40 0.04

Table 5.2.1: Development team size source code properties correlation results by considering all software products within the data set, N = 170

The results indicate that all correlations are statistically significant, with p <0.05. The strongest negative correlation was found between development team size and unit interfacing with ρ = −0.50, indicating that with an increase of the size of a development team the overall unit interfacing rating decreases. Contrary to our assertion, the correlation between team size and duplication is positive, with ρ = 0.20.

Table 5.2.2 presents the results of the correlation analysis between team size and source code properties at an individual system level. The strongest correlations found on an individual level were in the Salt, Nova and Gradle development projects. In contrary, neither Rascal nor Git account for statistically significant p − values of the correlation. A possible explanation of these findings is the fact that the Salt, Nova and Gradle development projects exhibit among the largest changes in their respective team sizes, while Rascal the least significant (see Table 4.5.1). This explanation confirms to the findings of Meneely et. al that a significant change in team size

(35)

V olume Unit size Unit complexit y Unit in terfacing Duplication Mo dule coupling Comp.balance Comp.indep endence Rascal - - - -SAT - -0.48 - 0.51 - - - -0.44 SonarQube -0.53 - - -0.76 0.82 -0.96 0.68 -0.81 ServiceStack - - -0.57 -0.62 0.58 - 0.51 -BioFormats - - -0.61 -0.54 0.61 0.71 - 0.54 Nova -0.81 - -0.65 -0.80 - - -0.82 -MongoDB -0.50 0.72 0.75 0.80 -0.40 - - -Git - - - -Gradle -0.95 -0.92 - - -0.94 - -0.90 -Salt -0.99 -0.91 -0.95 -0.94 -0.80 - -0.98

-Table 5.2.2: Spearman’s rank correlation coefficients between development team size and source code properties. Empty indicates statistical non-significance of p − value with p > 0.05

negatively associates with system quality [20].

By examining the underlying metrics determining source code properties few addi-tional patterns applicable to several software systems with an increase of team size were identified. The overall percentage of high risk source code units with a McCabe Cyclomatic Complexity [19] value above 11 increases. The overall percentage of high risk source code units measuring 31 LOC or more increases. The level of encapsula-tion decreases, as the size of a development team increases. The thresholds presented above were defined by the SIG Maintainability Model and Bouwers [4].

5.3 Results for sub-characteristics

Table 5.3.1 shows the results of the correlation analysis between development team size and ratings of the maintainability characteristic and sub-characteristics. The

(36)

correlation coefficients were calculated by considering all software systems within the data set. In addition, Appendix A shows the scatter plots of the relation.

Development team size vs. ρ p − value

Analysability - 0.80 Modifiability - 0.18 Testability -0.25 0.00 Modularity -0.36 0.00 Reusability -0.30 0.00 Maintainability -0.24 0.00

Table 5.3.1: Development team size and sub-characteristics of maintainability corre-lation results by considering all software products within the data set, N = 170

Analysability and modifiability ratings are not significantly correlated with the devel-opment team size. The ratings of testability, modularity, reusability and maintain-ability are negatively correlated with the size of a development team.

Development team size vs. Rascal SAT Git

ρ p − value ρ p − value ρ p − value

Analysability -0.55 0.01 - 0.75 - 0.16 Modifiability 0.46 0.05 - 0.88 0.59 0.01 Testability - 0.36 -0.48 0.02 - 0.18 Modularity -0.55 0.01 - 0.21 - 0.17 Reusability -0.54 0.02 - 0.90 - 0.64 Maintainability - 0.06 -0.43 0.04 - 0.06

Table 5.3.2: Results of the correlation analysis between development team size and maintainability characteristic and sub-characteristic at an individual system level. Empty indicates statistical non-significance of p − value

Table 5.3.2 shows the results of the correlation analysis between team size and the maintainability characteristic and sub-characteristics at an individual software sys-tem level. Software syssys-tems not having at least one statistically significant correlation were not included in the table.

(37)

As depicted in Table 5.3.2, the strongest significant correlations were found in the Rascal development project. The results indicate that the ratings of analysability, modularity and reusability decreased with an increase in development team size. How-ever, the SAT and Git development projects account for significance only in the case of testability and modifiability.

5.4 Discussion

The duplication source code property correlated positively with team size in case of a few software systems, meaning that with an increase in team size the percentage of re-dundant LOC decreases. In addition, these systems account for uniformly distributed small team sizes. On the other hand, duplication correlated negatively within systems accounting for the largest changes in their team sizes. A possible explanation of this is the fact that a significant increase in short periods can lead to inconsistent imple-mentations, due to unequal levels of domain-specific and project-specific knowledge [20].

A few software systems accounted for a significant negative correlation to component balance. This result indicates that with an increase in team size the overall volume of a system is less uniformly distributed over its architectural components. In the case of component independence, only two software systems accounted for a strong nega-tive correlation. We claim this result is due to an inadequate process of identifying architectural components.

An alternative to identifying architectural components as described in section 5.1.2 would require to elicit this information from members of the developing organisation. However, this would require deliverables from each developing organisation containing a list of architectural components throughout the entire development lifecycle.

(38)

de-velopment team size. We claim the statistical significance was due to large sample size, with N = 170. This claim is supported by the fact that the majority of soft-ware systems account for a non-significant p − value in relation to the ratings of the maintainability characteristic and sub-characteristics.

Furthermore, we claim the distinct correlation outcomes in between ratings of source code properties and sub-characteristics can be explained by the mapping of the SIG Maintainability Model. The model does not imply binary mappings of properties to sub-characteristics, but assigns weights according to certain technologies and proper-ties [8]. Because of this, the ratings of sub-characteristics might not be identical to ratings of source code properties, hence the distinct correlations.

Finally, the strong negative correlations in regard to a few software systems are ex-plainable by a selection bias. Some of the development projects began at the first period of our observation accounting for a non-significant amount of LOC. For ex-ample, the Salt software system accounted at the first period of observation for 3079 LOC. Consequently, the technical quality ratings were extremely high. Hence, as the system evolved over time, the ratings decreased, thus the significant negative correlations.

5.5 Conclusion

In support to RQ2.1, this chapter of the thesis has demonstrated that development team size correlates negatively with source code properties. In addition, several source code properties demonstrated a significant negative correlation, whereas duplication correlated positively. At an individual system level, the least significant correlation was found in regard to module coupling, and component independence. We claim that component related source code properties were negatively influenced by the inability to identify architectural components for all revisions of software systems within the data set.

(39)

In support to RQ2.2, section 5.3 has showed at a pooled system level a significant negative correlation between team size and the maintainability characteristic and sub-characteristics. We claim the statistical significance of the relation at a pooled level was due to sample size, and not strength of the relation. At an individual system level a weak significant correlation was found only in few sub-characteristics of maintainability. Hence, we conclude that development team size does not correlate to ratings of the maintainability characteristic and sub-characteristics.

(40)

Chapter 6

Conclusion

In the past chapters of this thesis we answered a number of research questions. The main contributions of this thesis are:

Team size estimation model We designed a model for estimating the size of a development team by considering the commit history from a Git version control system. We claim the model is generally applicable to the majority of version control systems, but further refinements would be required (see Section 6.1.2). The model has been validated in terms of correlation analysis on two develop-ment projects of the control group. The results indicated a strong association between estimates produced by the model, and actual team sizes. Furthermore, we identified certain limitations of the model. These limitations are concerned with frequency of commits and the inability of the model to distinguish individual developers.

Team size and software quality In support to RQ2, we provided empirical ev-idence of the relation between development team size and technical quality, as measured by the SIG Maintainability Model. The correlation analysis part of RQ2.1 demonstrated a strong negative relation between team size and several source code properties. The correlation analysis part of RQ2.2 demonstrated a strong negative relation of team size to maintainability characteristic and sub-characteristics of the ISO/IEC 25010. But, the majority of correlations at a system level were not significant. Hence, we conclude that based on the selected software products, no relation supporting RQ2.2 was found.

(41)

6.1 Threats to validity

There are several validity threats to our findings. These validity threats are split into three categories as proposed by Perry [23]: internal validity, external validity, construct validity.

6.1.1 Internal validity

Lack of author identifiers The lack of author identifiers may result in the inability to detect a certain number of developers that contributed to a software devel-opment project. Hence the estimates produced by the develdevel-opment team size model may be lower.

Enforcement of technical quality measurements A development project A com-posed of an identical number of developers as project B may exhibit distinct quality ratings if A or B continuously employs technical quality measurements. Hence the change in the dependent variable is likely to be distinct due to internal processes.

Identification of architectural components Incorrect identification of architec-tural components biases architecarchitec-tural related source code metrics, hence ratings of sub-characteristics of maintainability. Thus the change in the dependent vari-able is due to the inability of determining architectural components for each source code revision.

6.1.2 External validity

Generalisation to industrial software systems All software systems within the data set, expect of the SAT, are developed in open source projects. To the best of our knowledge, open source development projects, like industry driven, enforce

(42)

quality assurance techniques such as code reviews, or testing prior pushing a set of changes to a central repository. Next, the SAT did not contradict the patterns found in open source projects. Because of this, we expect that the results of RQII can be generalised to an industrial extend.

Generalisation to version control systems All software systems within the data set are managed via the Git version control system. Unlike Git, version con-trol systems such as Subversion or Concurrent Versions System (CVS) do not distinguish an author from a committer of a set of changes. Consequently, the team size estimation model is applicable to version control systems other than Git with further refinements.

6.1.3 Construct validity

Ignorance of pull requests In open source projects, an individual without direct access to a central repository contributes his or her set of source code changes via a pull request. A members of the developing organisation reviews the changes, and decides upon submitting the changes to a central repository branch. The proposed development team size estimation model does not consider activity of individuals contributing to the development process by reviewing code, hence estimates might be incorrect.

Distinct development roles The model does not consider development roles such as software architects. The reasoning behind this is that source code management systems do not provide activity traces of development roles other than developers contributing with source code changes.

Time window For the correlation analysis we measured technical quality quarterly. Quarters might not be a strong indication of changes in the corresponding de-pendent nor indede-pendent variable. Instead, the minimum and maximum time window of the most significant changes in the variables should be considered.

(43)

6.2 Future work

This section of the thesis outlines possible future extensions to the current study categorised as: extending the data set, improving the methods, further investigations.

6.2.1 Extending the data set

Add software systems with large development teams The strongest signifi-cant correlations between team size and technical quality were found in large development projects. In order to generalise these conclusions, more software systems with large development teams are required. As it can be observed in Figure 6.2.1, throughout all observed periods the systems were estimated to be managed by a small group of developers.

Increasing the number of software systems More software systems can extend the strength, and generalisability of the relation between the size of a devel-opment team and technical quality. A further investigation of the relationship could be performed upon approximately 100 systems.

0 5 10 15 20 0 10 20 30 40

Estimated number of developers, mean lower and upper bound value

Frequency

Figure 6.2.1: Frequency distribution of estimated team sizes throughout all periods from 1st of January 2011 to 31st of March 2015 of all software systems, N = 170

(44)

6.2.2 Improving the methods

GitHub API independence Instead of retrieving commit history via the GitHub API, a more independent and sufficient method can include extracting the history from the metadata of a cloned (forked) git repository. The approach would enable for retrieval of commit history without being imposed to a limited number of requests per hour.

Weight for contributions Relative weighting of contributions would stren- gthen the commit characterisation step of the team size estimation model. Instead of classifying commits as trivial, a relative weight of a developer contribution would determine his or her impact to a development project. Hence developers with low weighted impact would not be determined to be team members.

Time window Technical quality of software systems within the data set could be measured by determining the minimum and maximum time window of the most significant changes in technical quality ratings. The minimum and maximum time window could be determined by considering systems within the data set of the SIG.

Identification of architectural components Instead of determining the architec-tural decomposition upon a single source code revision, architecarchitec-tural components can be specified for each source code revision. Consequently, the component re-lated metrics would be more accurate.

Distribution fitting The input of the Monte Carlo simulation is a triangular distri-bution with parameters of the minimum, mode and maximum value. However, a distribution fitting method should be applied to verify this choice.

6.2.3 Further investigations

(45)

relation between development team size and changes in productivity. In fact, we performed a preliminary investigation of the relationship. We hypothesised that an increase beyond a certain point in the size of a development team decreases project productivity. Project productivity was measured as:

P roject productivity = number of active days added LOC + deleted LOC

where number of active days represent the effort, while added LOC + deleted LOC the output. The results showed no correlation. In our opinion, the reasoning behind this outcome lies in the difficulty of measuring variables influencing and determining productivity. Based on an extensive literature study, we identified more than 20 variables.

(46)

Acknowledgements

Ever since I’ve signed in for an undergraduate course in Zagreb, I had a great desire of gaining a MSc abroad. And yet, here I am, writing the final words of this thesis within an inspiring environment of the Software Improvement Group (SIG), as a master student of the University of Amsterdam. However, it would have been even a more bumpy ride without the support of certain people.

First and foremost, I would like to show my gratitude to Magiel Bruntink for super-vising and inspiring this research. Not only his advice, knowledge and professionalism kept the research safe, but determined my present knowledge and future interests.

Secondly, I would like to thank Dennis Bijlsma from the SIG for his major contribution to the research. Especially for his professional advice, friendliness and willingness to help.

Furthermore, I would like to share my appreciation to Joost Viser, Haiyun Xu, Rob van der Leek and all colleagues from the SIG. Their helpfulness, friendliness and knowledge prompted me during the research to always seek for perfection and quality. I would also like to thank Gregorio Robles, and Georgios Gousios for providing me with their personal research experience and knowledge.

Last but not least, I would like to show my gratefulness to my caring family, who always supported and believed in me.

(47)

References

[1] Abdulkareem Alali, Huzefa H. Kagdi, and Jonathan I. Maletic. What’s a typ-ical commit? A characterization of open source software repositories. In The 16th IEEE International Conference on Program Comprehension, ICPC 2008, Amsterdam, The Netherlands, June 10-13, 2008, pages 182–191, 2008.

[2] Christian Bird, Peter C. Rigby, Earl T. Barr, David J. Hamilton, Daniel M. German, and Prem Devanbu. The promises and perils of mining git. In Pro-ceedings of the 2009 6th IEEE International Working Conference on Mining Software Repositories, MSR ’09, pages 1–10, Washington, DC, USA, 2009. IEEE Computer Society.

[3] B.W. Boehm. Software engineering economics. Software Engineering, IEEE Transactions on, SE-10(1):4–21, Jan 1984.

[4] Eric Bouwers. Metric-based Evaluation of Implemented Software Architectures. PhD thesis, Delft University of Technology, 2013.

[5] Frederick P. Brooks, Jr. The Mythical Man-month (Anniversary Ed.). Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA, 1995.

[6] A. Capiluppi, P. Lago, and Maurizio Morisio. Characteristics of open source projects. In Software Maintenance and Reengineering, 2003. Proceedings. Sev-enth European Conference on, pages 317–327, March 2003.

[7] S. D. Conte, H. E. Dunsmore, and V. Y. Shen. Software Engineering Metrics and Models. Benjamin-Cummings Publishing Co., Inc., Redwood City, CA, USA, 1986.

[8] Jos Pedro Correia, Yiannis Kanellopoulos, and Joost Visser. A survey-based study of the mapping of system properties to iso/iec 9126 maintainability

(48)

char-acteristics. In ICSM, pages 61–70. IEEE Computer Society, 2009.

[9] Kevin Crowston and James Howison. The social structure of free and open source software development. First Monday, 10(2), 2005.

[10] Robert L. Glass. Facts and fallacies of software engineering. Addison-Wesley, 2003.

[11] Jesus M. Gonzalez-Barahona, Gregorio Robles, and Israel Herraiz. Impact of the creation of the mozilla foundation in the activity of developers. In Proceedings of the Fourth International Workshop on Mining Software Repositories, MSR ’07, pages 28–, Washington, DC, USA, 2007. IEEE Computer Society.

[12] Jesus M. Gonzalez-Barahona, Gregorio Robles, and Daniel Izquierdo-Cortazar. The MetricsGrimoire database collection. In MSR ’15: Proceedings of the 12th Working Conference on Mining Software Repositories, pages 478–481. IEEE, 2015.

[13] Georgios Gousios. The GHTorrent dataset and tool suite. In Proceedings of the 10th Working Conference on Mining Software Repositories, MSR ’13, pages 233–236, May 2013.

[14] Georgios Gousios and Diomidis Spinellis. GHTorrent: GitHub’s data from a fire-hose. In Michael W. Godfrey and Jim Whitehead, editors, MSR ’12: Proceedings of the 9th Working Conference on Mining Software Repositories, pages 12–21. IEEE, June 2012.

[15] Georgios Gousios and Andy Zaidman. A dataset for pull-based development research. In Proceedings of the 11th Working Conference on Mining Software Repositories, MSR 2014, pages 368–371, New York, NY, USA, 2014. ACM.

[16] I. Heitlager, T. Kuipers, and J. Visser. A practical model for measuring main-tainability. In Quality of Information and Communications Technology, 2007.

(49)

QUATIC 2007. 6th International Conference on the, pages 30–39, Sept 2007.

[17] Israel Herraiz, Daniel Rodriguez, Gregorio Robles, and Jesus M. Gonzalez-Barahona. The evolution of the laws of software evolution: A discussion based on a systematic literature review. ACM Comput. Surv., 46(2):28:1–28:28, 2013.

[18] ISO/IEC. IEEE systems and software engineering - systems and software quality requirements and evaluation (SQuaRE) - system and software quality models, 2011.

[19] T.J. McCabe. A complexity measure. IEEE Transactions on Software Engineer-ing, 2(4):308–320, 1976.

[20] Andrew Meneely, Pete Rotella, and Laurie Williams. Does adding manpower also affect quality?: An empirical, longitudinal analysis. In Proceedings of the 19th ACM SIGSOFT Symposium and the 13th European Conference on Foundations of Software Engineering, ESEC/FSE ’11, pages 81–90, New York, NY, USA, 2011. ACM.

[21] Nachiappan Nagappan, Brendan Murphy, and Victor Basili. The influence of organizational structure on software quality: An empirical case study. In Pro-ceedings of the 30th International Conference on Software Engineering, ICSE ’08, pages 521–530, New York, NY, USA, 2008. ACM.

[22] Karl Pearson. Note on regression and inheritance in the case of two parents. Proceedings of the Royal Society of London, 58:pp. 240–242, 1895.

[23] Dewayne E. Perry, Adam A. Porter, and Lawrence G. Votta. Empirical studies of software engineering: A roadmap. In Proceedings of the Conference on The Future of Software Engineering, ICSE ’00, pages 345–355, New York, NY, USA, 2000. ACM.

(50)

Con-ference, 2008. WSC 2008. Winter, pages 91–100, Dec 2008.

[25] G. Robles, J.M. Gonzalez-Barahona, and I. Herraiz. Evolution of the core team of developers in libre software projects. In Mining Software Repositories, 2009. MSR ’09. 6th IEEE International Working Conference on, pages 167–170, May 2009.

[26] Gregorio Robles, Jesús M. González-Barahona, Carlos Cervigón, Andrea Capiluppi, and Daniel Izquierdo-Cortázar. Estimating development effort in free/open source software projects by mining software repositories: A case study of openstack. In Proceedings of the 11th Working Conference on Mining Software Repositories, MSR 2014, pages 222–231, New York, NY, USA, 2014. ACM.

[27] Gregorio Robles and JesusM. Gonzalez-Barahona. Contributor turnover in libre software projects. In Ernesto Damiani, Brian Fitzgerald, Walt Scacchi, Marco Scotto, and Giancarlo Succi, editors, Open Source Systems, volume 203 of IFIP International Federation for Information Processing, pages 273–286. Springer US, 2006.

[28] Gregorio Robles, Stefan Koch, and Jes´us M Gonz´alez-Barahona. Remote analysis and measurement of libre software systems by means of the cvsanaly tool. In 2nd ICSE Workshop on Remote Analysis and Measurement of Software Systems (RAMSS), pages 51–55, 2004.

[29] S. S. Shapiro and M. B. Wilk. An analysis of variance test for normality (complete samples). Biometrika, 52(3/4):pp. 591–611, 1965.

[30] C. Spearman. The proof and measurement of association between two things. The American Journal of Psychology, 15(1):pp. 72–101, 1904.

[31] N.T. Thomopoulos. Essentials of the Monte Carlo Simulation. Springer-Verlag New York, 2013.

Estimated team size and its relation to software quality