• No results found

Success in Open Source Projects Exploratory research towards the relation between releases and growth in external pull requests

N/A
N/A
Protected

Academic year: 2021

Share "Success in Open Source Projects Exploratory research towards the relation between releases and growth in external pull requests"

Copied!
67
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

University of Amsterdam

Master Software Engineering

Success in Open Source Projects

Exploratory research towards the relation between releases and

growth in external pull requests

Author:

Cindy Berghuizen

UVA Supervisor:

Magiel Bruntink

Itude Mobile B.V. Supervisor:

Pjotter Tommassen

(2)
(3)

Preface

This thesis concludes the Master in Software Engineering at the Faculteit der Natuurwetenschappen, Wiskunde en Informatica at the University of Amsterdam and was done at Itude Mobile B.V. over a period of 5 months. This thesis was done in the context of the recently opensourced MOBBL framework developed by Itude Mobile B.V. .

I want to thank the following people.

• Magiel: for the new insights, hints, directions and motivation boosts • Itude: for providing a place to work and the mobile mondays • Ingrid: for kicking my ass that I should finish my thesis on time.

Abstract

More and more companies are putting their proprietary software on Open Source platforms like Github. One of the methods to contribute to a project is by doing a pull request. A questionaire was send to 51 projects, giving a responds of 21 usefull answers. Based on this preliminary research the hypothesis was formed that there might be a relation between the growth of a project in terms of the external pull requests it gets and the releases a project does. A more extensive research on 314 project was done to investigate the relation between total releases, major/minor releases, the release frequency and the external grow coefficient of projects. Spearman correlation and ANOVA tests did not show a statistical correlation between releases and the grow coefficient. 8 projects with a high grow coefficient (>2.5) were picked out to take a better look at. This analysis also did not indicate a relation between the grow coefficient and the releases of a project.

(4)
(5)

Contents

1 Introduction 1 1.1 Context . . . 1 1.2 Motivation . . . 1 1.3 Problem description . . . 2 1.4 Outline . . . 2 2 Background 3 2.1 Licenses . . . 3

2.2 The Open Source project life cycle . . . 4

2.3 Contribution strategies . . . 5

2.3.1 Shared repository . . . 5

2.3.2 Pull-based . . . 5

2.4 Participation . . . 5

2.4.1 The Open Source community . . . 5

2.4.2 Developer motivation . . . 6

2.5 Releases . . . 8

2.5.1 Release types . . . 8

2.5.2 Versioning . . . 9

2.5.3 Release strategies . . . 9

2.5.4 Gathering release information . . . 9

3 Problem description 11 3.1 Research goal . . . 11

3.2 Previous research on success factors . . . 11

3.3 Preliminary research . . . 12 3.3.1 Data . . . 12 3.3.2 Research method . . . 12 3.3.3 Results . . . 13 3.4 Research question . . . 13 4 Research method 15 4.1 Research approach . . . 15 4.2 Data collection . . . 17 4.3 Data selection . . . 17 4.4 Data analysis . . . 17 5 Results 19 5.1 Description of the data . . . 19

5.2 Results of statistical tests . . . 22

5.2.1 Spearman . . . 22

(6)

CONTENTS CONTENTS 5.3 Outliers . . . 25 6 Discussion 27 6.1 Statistics . . . 27 6.2 Outliers . . . 27 6.3 Related work . . . 28

6.4 Limits and Validity . . . 29

7 Conclusion 31 7.1 Conclusion . . . 31 7.2 Future work . . . 31 7.3 Recommendations MOBBL . . . 31 A Survey questions 37 B Survey results 41 B.1 Users . . . 41 B.1.1 Past . . . 41 B.1.2 Present . . . 41 B.2 Contributors . . . 41 B.2.1 Past . . . 41 B.2.2 Present . . . 42 B.3 Pull requests . . . 42 C Outlier graphs 45

(7)

List of Figures

2.1 The onion model of Open Source communities . . . 6

2.2 Taxonomy of human motivation [27] . . . 7

3.1 Example graph of a preliminary research project . . . 13

4.1 Example of a growing project in terms of external pull requests (cumulative non member. . . 16

4.2 Example of a decreasing project in terms of external pull requests (cumulative non member). . . 16

5.1 Histogram of number of releases a project does . . . 20

5.2 Histogram of number of major and minor releases a project does . . . 20

5.3 Histogram of the average releases per week a project does . . . 21

5.4 Histogram of the coefficient of external pull requests of a project . . . 21

5.5 Graph showing in which weeks releases are done . . . 22

5.6 Showing in which weeks major and minor releases are done . . . 22

5.7 Scatterplot of External growth coefficient vs Total releases . . . 23

5.8 Scatterplot of External growth coefficient vs Release frequency . . . 24

5.9 Scatterplot of External growth coefficient vs total external releases . . . 24

A.1 Cumulative pull requests . . . 38

A.2 Pull requests per month . . . 38

A.3 Distinct committers per month . . . 39

C.1 Cumulative pull requests of Liferay-portal . . . 45

C.2 Quadratic regression of Liferay-portal . . . 46

C.3 Barplot of pull requests per month of Liferay-portal . . . 46

C.4 Cumulative pull requests of Wordpress-Android . . . 47

C.5 Quadratic regression of Wordpress-Android . . . 47

C.6 Barplot of pull requests per month of Wordpress-Android . . . 48

C.7 Cumulative pull requests of Wordpress-iOS . . . 48

C.8 Quadratic regression of Wordpress-iOS . . . 49

C.9 Barplot of pull requests per month of Wordpress-iOS . . . 49

C.10 Cumulative pull requests of cocos2d-x . . . 50

C.11 Quadratic regression of cocos2d-x . . . 50

C.12 Barplot of pull requests per month of cocos2d-x . . . 51

C.13 Cumulative pull requests of contrail-controller . . . 51

C.14 Quadratic regression of contrail-controller . . . 52

C.15 Barplot of pull requests per month of contrail-controller . . . 52

C.16 Cumulative pull requests of hhvm . . . 53

(8)

LIST OF FIGURES LIST OF FIGURES

C.18 Barplot of pull requests per month of hhvm . . . 54

C.19 Cumulative pull requests of opencv . . . 54

C.20 Quadratic regression of opencv . . . 55

C.21 Barplot of pull requests per month of opencv . . . 55

C.22 Cumulative pull requests of cphalcon . . . 56

C.23 Quadratic regression of cphalcon . . . 56

(9)

List of Tables

5.1 Summary of the data . . . 19

5.2 Spearman correlation for coefficients . . . 23

5.3 ANOVA results . . . 25

5.4 Projects with an external growth coefficient of more than 2.5 . . . 25

B.1 Answers given to how users were attracted in the past . . . 41

B.2 Answers to question how users are currently attracted . . . 42

B.3 Answers to question how contributors were attracted in the past . . . 42

B.4 Answers to question how contributors are currently attracted . . . 43

(10)
(11)

Chapter 1

Introduction

1.1

Context

This research was performed in the context of the MOBBL-framework, developed by Itude Mobile. They recently open-sourced this framework, and are looking for ways to improve the participation.

Itude was found in 1992 and used to be specialized in architecture and development of mission critical systems for large organisations. Itude Mobile spun off in 2005 and became a separate company in 2009.

Currently, the focus of Itude Mobile B.V. lies in creating custom mobile applications in the domains of financial services, logistics and health.

Itude Mobile B.V. is developing and using the MOBBL framework1which they developed themselves. MOBBL is a software framework to make cross-platform development for native applications in for example iOS and Android easier and faster. MOBBL uses the traditional Model-View-Controller framework. The code in the Model layer can be entirely reused when developing for another platform. For the Controller layer up to 70% can be copied and for the View layer up to 30% can be used [16]. Itude Mobile B.V. currently invests time and money in developing and maintaining MOBBL. They want to reduce the costs for MOBBL by making it Open Source. This research focuses on gaining insight in Open Source projects and finding factors that might aid MOBBL in becoming a successful Open Source project.

1.2

Motivation

At the moment it becomes more and more popular for companies to Open Source their projects [17]. Open Source software is software distributed with a license that complies to the Open Source Definition 2allowing access to its source code, to distribute it freely, the creation of derived works, and unrestricted use [1].

Of course everyone wants their project be successful, whether this is by having lot of users, a lot of contributors or spreading the software for a larger public. However, just putting a project on a website like SourceForge or Github does not make it an instant success. Research towards Open Source projects and how to make them a success has been extensive, but also inconclusive.

A few years ago Github introduced the pull-based system, where a developer can fork a project, modify the code and send the modified code back by making a so called pull request. This makes it easier to cherry pick code and accommodates in code reviewing.

1www.mobbl.org 2www.opensource.org

(12)

1.3. PROBLEM DESCRIPTION CHAPTER 1. INTRODUCTION

This research looks at the growth in pull requests a project gets in relation with the releases a project does, with a focus on pull requests from external, non team member developers, hereafter called external pull requests. The research is a quantitative exploratory research to find a correlation between the releases of a project and the success of a project in terms of growth in external pull requests.

1.3

Problem description

Itude Mobile B.V. has open sourced MOBBL and put the source code on Github3. External

developers can now contribute by forking the project, making changes in the code and doing a pull request. Although Itude Mobile B.V. keeps developing and using MOBBL, they intend to attract external users and contributors. The external users will be valuable in discovering missing features or finding bugs. External developers can help improving MOBBL, whether this is done by bug fixing or by implementing or improving features. Itude Mobile B.V. can spend less resources than it currently does on MOBBL if they are aided by the Open Source community.

Open Sourcing a project is not a magic bullet for attracting tons of users and contributors. The problem addressed in this research is finding out what factors contribute to the success of a project and what measures can be taken to make MOBBL a success.

1.4

Outline

The rest of this thesis is organized as following: Chapter 2 gives background information about research already done in the domain of Open Source projects. In chapter 3 the pre-research, research goal and research question are described. Chapter 4 describes the research done. The results of this research are presented in chapter 5. Chapter 6 provides a discussion of the results and also explains the limits and the validity. A conclusion is given in chapter 7.

(13)

Chapter 2

Background

This chapter provides background information on Open Source projects. Section 2.1 provides information about the most common licenses used in Open Source software. Section 2.2 gives background information about the life cycle of an Open Source project. Section 2.3 provides information about the two most common ways to contribute to a Open Source project, via shared repositories or via the pull-based system.

Developer participation is discussed in section 2.4. An overview of how the Open Source community works is given in section 2.4.1 and the motivation is discussed in section 2.4.2 Section 2.5 provides information about release in Open Source project.

2.1

Licenses

Open Source Software is software distributed with a license that complies to the Open Source Definition [24] allowing access to its source code, the right to distribute it freely, the creation of derived works, and unrestricted use [1].

According to some studies, the type of license influences the success of an Open Source project. Subramaniam et al. find that a project with a more restrictive license has less developers but has more interest from non-developers [31]. Sen et al. find that a semi-restrictive license has less subscribers and developers than projects with another license [29]. There are mainly three types of licenses an Open Source project can have. The GPL, LGPL and BSD are described in the following paragraphs.

GPL

GPL stands for GNU General Public License and is the most restrictive. In a nutshell GPL states that you are allowed to use, redistribute and change the software, but any changes made should also be licensed under GPL.

LGPL

The Lesser General Public License, is less restrictive than the GPL. When non-GPL software is linked to a, for example, LGPL library it doesn’t have to be distributed under the LGPL license. You still need to provide the original source code under the LGPL license, but you are allowed to link it with private software.

(14)

2.2. THE OPEN SOURCE PROJECT LIFE CYCLE CHAPTER 2. BACKGROUND

BSD stands for Berkeley Software Distribution and is the least restrictive license, it basically says: here is the source code do whatever you want with it. If software has this license you are also allowed to take the software and turn it into proprietary software.

2.2

The Open Source project life cycle

More and more companies are releasing their proprietary software as Open Source software[17]. This involves more than just putting the software on a website like Sourceforge, BitBucket or Github. Kilamo et al. [17] introduce the OSCOMM framework to create an ecosystem for OSS Projects . This framework consists of three phases. The first phase is evaluation the readiness of the project for being opened, the goal of this phase is to identify possible bottlenecks and problems in the software. Phase one is a checklist for things that need to be considered when deciding on making a proprietary project Open Source. The second phase is called Open source engineering and is focused on getting rid of the bottlenecks discovered in the first phase and potential problems and setting up the community by creating for example a bug tracker and documentation. The third and last phase is measuring the ecosystem once the project is open. The company responsible for the project will monitor the project and take action when things are running badly or not as expected.

Once the project is Open Source it will probably show a typical four staged software project life cycle. First comes the initiation stage, where the project is put on a website like Sourceforge. In the initiation stage most of the development is still done by the developers involved in the project before it was Open Source. As the project becomes more known and users become aware of it, it will move to the second stage which is the growth stage. More users and developers will join the project until it reaches critical mass and the third stage called the mature stage. In the mature stage the main business of core team is to sustain the project. More developers will join the core team and some of the original developers might leave or have already left. After the third stage comes the decline or revival stage. In this fourth stage people are starting to loose the interest and the number of users may decline. On the other hand, some features may be added or some change may be made so that the project start growing again, this is the revival part [33]. According to Samoladas [28] there is probability of about 40% that a project will survive for more than five years. This probability is based on the most pessimistic calculations and becomes 80% in the most optimistic results. The research from Somoladas also shows that projects that are alive for at least ten years are less likely to be abandoned.

A study of Englisch and Schweik [8] shows that 35% of the projects fail at the initiation stage, meaning the project did not have a release. They also claim that 28% of the projects fail at the growth stage. Failure in this case means that the project either is not downloaded or used, or does not get until the fourth release in total.

According to Nakakoji et al.[22] Open Source projects can be classified into three dif-ferent categories: Exploration-Oriented which is based on pushing software development through sharing innovations embedded in freely shared Open Source software systems, Utility-Oriented which focuses filling voids in functionality, where especially developers that can not find a program that will fulfill their needs work on, and the Service-Oriented category, aimed at providing stable and robust services to all the stakeholders (members and end-users) of OSS systems. Nakakoji et al. claim that the Exploration-Oriented and the Utility-Oriented ecosystems are more rapid growing than the Service-Oriented. The Exploration- and Utility-Oriented projects are more subject to change because of their evolving nature, they are meant to be innovative and therefore change a lot more. When the project is getting more robust and stable it may evolve into a Service-Oriented type of project.

(15)

CHAPTER 2. BACKGROUND 2.3. CONTRIBUTION STRATEGIES

2.3

Contribution strategies

Contributing to Open Source projects can broadly be divided in two kinds of strategies: the shared repository and the pull-based strategy. Both are described in the following subsections.

2.3.1

Shared repository

The shared repository is used to share the repository of a project with contributors. Con-tributors can clone the repository, modify it locally and commit the code to the main project [13]. To make it easier to cope with multiple clones and multiple developers a branching model can be used. Using a branching model, developers can work individually on the soft-ware without the inference of changes being made by others. When the developers finish their code and the code has been tested on sufficient quality, the branch can be merged into the main branch [4].

2.3.2

Pull-based

The pull-based system does not rely on sharing the repository with potential contributors. Instead, the contributors can fork (e.g. create a clone in their own repository) the repository and make changes to the code. When they are done coding they can send the changed code by making a pull request. One of the members of the project will review the code and decide if they want to merge it into the main repository. If the new code does not satisfy certain standards or a change needs to be made, the reviewer can request the contributor to make new commits satisfying these changes to their forked branch [13] [1].

2.4

Participation

2.4.1

The Open Source community

Contributers, Committers and Reviewers

Project members in Open Source projects can be divided into three categories: the contrib-utors, the committers and the reviewers. Contributers are members who show interest in and contribute to a project. Contributing can be anything from asking question, reporting bugs and issues or adding features. They can ask to have their code added by, for example, submitting a pull request. Contributers do not need certain skills or experiences to contribute to a project. If a contributer becomes more familiar with a project and shows his or her commitment, the contributor can be asked to become a committer by a current committer. A committer is given push access to a repository in case of a shared repository approach. Committers have a better understanding of the project than contributers. Another step higher are the reviewers, they can be seen as the project admins. Their responsibility is to ensure the project runs smoothly, review code and approve changes to the code or documents [2] [11] [23].

The Onion model

A more detailed model for representing Open Source communities is the Onion model from Nakakoji et al [22] shown in figure 2.1.

A new member starts as a passive user. The user will use the project for his or her own need. When those passive users want to learn more of the system they will become readers. As they gain a better understanding of the system they will start to send bugs, fix bugs or add new features to the system. The more contributions they make to the system the more they will be recognized and possibly end up in the core team of the system. Looking

(16)

2.4. PARTICIPATION CHAPTER 2. BACKGROUND

Figure 2.1: The onion model of Open Source communities

back on the model presented in the previous subsection 2.4.1 the reviewers are the in the middle of the onion, these are the project leader and the core team. The commiters manifest themselves as the developers mentioned in the model and the contributors are extended to the bug fixer and bug reporter.

Besides some exceptions like Mozilla or Apache, the total number of developers associ-ated with a project is rather low. Midha and Palvia found out that the largest number of developers associated with a project was 26, thereby stating that most Open Source projects are maintained by a small group of developers [21]. After two to three years there are 5.5 developers on average working on a project. Sen et al. state that the average number of developers on a project is 4.7, using a larger sample of projects [29].

2.4.2

Developer motivation

Developers can be driven by different kinds of motivation to work on an Open Source project. Intrinsic motivation occurs when an activity satisfies the basic human needs for competence, control and autonomy [27]. Contributing for the joy of coding is a good example of intrinsic motivation. Extrinsic motivation on the other hand, is usually applied by someone other than the person being motivated [27]. An example of an extrinsic motivation is being paid for the job. Between intrinsic and extrinsic motivations lies a gray line which contain motivations like joining Open Source projects because it might improve ones career opportunities. A full taxonomy of motivation is shown in figure 2.2.

(17)

CHAPTER 2. BACKGROUND 2.4. PARTICIPATION

Figure 2.2: Taxonomy of human motivation [27]

Hars and Ou found different groups of developers working on Open Source projects. Every group has a different motivation, the students are more intrinsically motivated while the contracted programmers are motivated by money [15]. Because of these diverse groups of people working on Open Source projects ,and each group having it’s own kind of motivation, a specific motivation for joining cannot be pinpointed.

So far, no research has come up with a dominating reason about the motivation of developers. However, a few reasons why a developer is motivated show up in multiple papers. These reasons are described in the following paragraphs.

Joy of coding

A large group of developers contribute because they like to code, which can also be seen in the fact that a large group of developers consists of hobbyists [15]. Probably a large portion of professional programmers also have programming as a hobby.

New skills

One of the main reasons for people to help with Open Source project is to learn and develop new skills [18] [15] [10]. This might be learning a new programming language, framework or how the Open Source community works.

Participate in the OSS community

Some developers answered in the surveys that they wanted to participate in the Open Source community. This can be related to any of the other reasons why someone wants to be involved in Open Source.

(18)

2.5. RELEASES CHAPTER 2. BACKGROUND

Fulfilling a need

Open Source software is also used to ”satisfy an itch” [10]. This can mean a developer starts his/her own Open Source project to make as program to satisfy this need. It can also mean that a developer fixes a bug so the program does what he/she wants. Another option is that a developer contributes to a project that has potential to do what he/she needs by for example adding features.

Against proprietary software and/or large companies

An initial reason to start with an Open Source project can for example be because someone likes to code or has a new idea. However, it can become a more commercial and political motivation to stay active in the Open Source community. Some people stay active or develop new projects because they believe that software is not meant to be proprietary. Also, people develop in Open Source projects because they are against the large software companies gaining more and more powers [10] [18].

Altruism

Another large group of developers are active in the Open Source movement because they want to share their knowledge. They like to help realize good ideas and improve the software of others [10] [15].

2.5

Releases

A common saying in Open Source projects is: release early, release often [25]. Since a lot of changes are made to the code, it is a good idea to make these changes visible to the general public as early as possible. Frequent releases also show software is still being improved and maintained by developers.

2.5.1

Release types

Releases can be divided into three types of releases which are described in the following paragraphs.

Development releases

Development releases are releases aimed at developers who need cutting edge technology. The release might not be very stable. Development releases are used to try new features and see if there are bugs in them [20].

Stable releases

Stable releases are major releases for the users. These are the programs that end-users use. End-users expect a program to be working without any flaws. Therefore it is important that the software is thoroughly tested and well documented before a major release is done [20].

Minor releases and patches

Minor releases and patches are used as updates to existing user releases. These minor releases and patches are often used for bug fixes in the user releases [20].

(19)

CHAPTER 2. BACKGROUND 2.5. RELEASES

2.5.2

Versioning

A release can be identified by an associated version number. The typical purpose of versioning is to identify the stability of the release. Some projects do a labelling of versions to identify the expectation of the release. Examples of these are labels like alpha, beta or epsolin. Release candidates are used to gain feedback before an actual release is done [9].

The semantic version specifier (SemVer) is recommended by Github. This is the same way of versioning the Linux kernel uses [9]. The versioning follows the pattern x.y.z, where x is the major release number, y is the minor release and z is used for patches. Sometimes an extra number is added when a new build is done.

2.5.3

Release strategies

Besides different type of releases there are also different strategies used for releasing software [20].

Feature based

The feature based strategy is based on the features of a piece of software. A release will be done when certain features are fulfilled or goals are attained. This is in line with traditional software development which is feature driven [20].

Time bases

The time based strategy focuses on a specific date for the release. There is a cut-off date in which all of the features are evaluated. It will then be decided if a feature will be included, or needs to be postponed to a later release [20].

2.5.4

Gathering release information

In previous research it is shown that it is very difficult to gather release information of Open Source projects. This is due the fact that the information is very scattered. Release informa-tion can be found in mailing list archives, source code repositories, web pages, histories and structures or release artifact listings. Furthermore, not all the projects use the same version numbering scheme which makes it difficult to see which releases are of the same kind. [32].

(20)
(21)

Chapter 3

Problem description

This chapter describes the research goal and the previous work on success factors in Open Source projects. It will also explain the work done prior to the research and contains the research question.

3.1

Research goal

The goal of this research is to find out what factors contribute to a successful project. The term successful is in this case defined as the number of external pull requests a project gets. The more external pull requests the projects gets, the better the project is doing.

3.2

Previous research on success factors

In the past years quite a lot of research has been done regarding the success factors of OSS Projects . Crowston et al. [6] made an Open Source Software success model defining different types of success: system creation and maintenance, system quality, system use, and system consequences. Another model for Open Source software success was made by Lee et al. [19]. They use software quality, community service quality, user satisfaction, Open Source software use and individual net benefits as measures. Both [6] and [19] based themselves on the Information success model from DeLone and McLean [7].

In different studies success has been measured as developer interest [21] [30] [29] [31], the current user interest and project activity [30] [31], the development state of the project [5], the subscribers base [29] and the project popularity [21]. Although there are a few exceptions, most studies about Open Source success are about the amount of users, developers or project activity, or a success that is related to those measures.

Also, the characteristics used to measure success are diverse among the different studies. A commonly used characteristic is the type of license the project uses [21] [30] [29] [31] [5]. In the study of Subramaniam et al. it is shown that a restrictive license has a negative influence on the number of developers [31]. However, the research of Sen et al. shows that a semi-restrictive license has a bad influence on the success of a project[29]. The study of Midha and Palvia [21] shows that the use of license is only relevant for the first release of the projects.

Almost all the research shows that the characteristics are interrelated to each other, meaning that for example if the users go up, the developers also go up and the other way around [29]. Stewart shows that the vitality of a project also helps the popularity of a project [30].

(22)

3.3. PRELIMINARY RESEARCH CHAPTER 3. PROBLEM DESCRIPTION

3.3

Preliminary research

From the literature study in chapter 2 and the previous research described in 3.2 it becomes clear success can be measured in multiple ways. MOBBL uses the pull-based system Github offers meaning that changes are submitted via pull requests. Also, Itude Mobile B.V. made it clear that their goal is to attract more external developers to help with improving MOBBL. This information combined makes it interesting to see what factors affect the pull requests coming from external developers. To gain a bit more insight in what those factors may be a small preliminary research in the form of a survey was done.

3.3.1

Data

The data for the preliminary research was gathered using GHTorrent1[14]. GHTorrent is a

project set up for collecting Github event-data. It is very easy to extract projects using the MySQL interface on the website [12][14].

The projects used for this preliminary research were selected on the criteria described below and resulted in a selection of 426 projects.

• Have at least 200 pull requests at the moment the data was gathered. This to make sure the projects are active but to avoid getting an overwhelming amount of projects to look at.

• The project received at least one pull request from someone who is not a member of the project. This is to make sure the project is open to the public.

• The project uses the language C, C#, Objective-C, Java or C++. Those languages are among the most popular languages on Github. Furthermore these languages are of the object-oriented type which are also used for MOBBL.

3.3.2

Research method

The data gathered was used to make graphs. These graphs show the cumulative internal, external and overall pull requests over time. An example of a graph can be seen in Figure 3.1. All these graphs were manually checked on the following criteria:

• The graph shows a more than lineair growth in cumulative external, internal or overall pull requests over time.

• The graph shows a more than lineair growth in cumulative internal, external or overall pull requests at some point in time.

These criteria resulted in a subset of 81 interesting projects. A small survey was held among 51 developers from projects that came out as interesting. An example questionnaire that was sent to the developers can be found in Appendix A.

The results are described and explained in 3.3.3

(23)

CHAPTER 3. PROBLEM DESCRIPTION 3.4. RESEARCH QUESTION

Figure 3.1: Example graph of a preliminary research project

3.3.3

Results

The response rate to the questionnaire was 41%. 21 of the 51 developers that were sent an email responded with an useful answer. The answers were grouped together on the kind of answer that was given. Tables with a complete overview of the answers can be found in Appendix B. The answers about users and contributers were very diverse.

The question related on why a project showed a peak in pull requests was also answered with a variety of answers. However, two answers stood out: The first one is that one developer was working on the project and happened to be very active at the time of the peak. Therefore there were more pull requests. We know from literature that most projects only have a few developers so this answer was not unexpected. [21][29].

The second answers was that the number of pull requests increased because people were working on a release, or a release had just been done. For example, the number of pull requests increased because a release was just done and there were a lot of bugfixes. Another example is that a new release featured a lot of changes and therefore the peak was higher before the release.

9 of the 21 (43%) projects that answered the questions about pull requests mentioned the influence of releases Since it was a relative large group that came up with this answer it is interesting to see if there exists a relationship between the releases a project does and the growth of a project in terms of pull requests.

3.4

Research question

43% percent of the anwers were related to an increase in pull releases being influenced by a release. This makes that answer the most given answer. Since we are especially interested in how to attract external developers and therefore external pull requests, it is interesting to see if there is a positive relation between releases being done by a project and the number of external pull requests the project gets.

(24)

3.4. RESEARCH QUESTION CHAPTER 3. PROBLEM DESCRIPTION

Research question Are releases and the growth of external pull requests positively related? The following hypotheses was formed, based on the fact that the most given answer said that a higher number or peak in pull requests was caused by a release.

Hypothesis 1 If a project has a higher release frequency it will show growth in external pull requests.

Hypothesis 2 If a project does more releases in total it will show growth in external pull requests.

(25)

Chapter 4

Research method

This chapter contains the research methods. This includes the research approach and how the data is collected, selected and analyzed. Section 4.1 describes the research approach. Section 4.2 describes the data collection and section 4.3 the selection of this data. Finally, section 4.4 describes how the data was analyed.

4.1

Research approach

Exploratory research is research to find out what is happening, seeking new insights and generating ideas and hypothesis for new research [26]. There are hypotheses defined in section 3.4 and it is expected that there is a positive relation. However, releases can be measured in multiple ways. Therefore it is a good idea to use different ways of measuring the releases a project does and see if they are positively related to pull requests.

The following ways of measuring releases will be used:

• Total releases: Total releases at the point of gathering including major, minor, patches and build. There was made no difference between alpha, beta, stable and unstable versions. The versioning is measured according to the semantic versioning specified in section 2.5.2

• Major/minor releases: The total major and minor releases at the point of gathering (April 2013). The versioning is measured according to the semantic versioning specified in section 2.5.2

• Release frequency Release frequency as the average number of total releases per week.

Pull requests will be measured in terms of growth. A quadratic regression line fit will be made on the graphs with the cumulative internal, external and overall pull requests.

• growth coefficient: The second degree coefficient (x2) of the quadratic regression

line fit on the cumulative internal, external and overall pull requests over time. Successful projects are projects that show a convex curve (growing) and thus their second degree (x2) coefficient is larger than one. An example of the quadratic regression line of a

growing project in terms of external pull requests can be seen in figure 4.1 and of one that shows a concave (decreasing) curve in figure 4.2.

(26)

4.1. RESEARCH APPROACH CHAPTER 4. RESEARCH METHOD

Figure 4.1: Example of a growing project in terms of external pull requests (cumulative non member.

Figure 4.2: Example of a decreasing project in terms of external pull requests (cumulative non member).

(27)

CHAPTER 4. RESEARCH METHOD 4.2. DATA COLLECTION

4.2

Data collection

Many studies about Open Source projects gather their data from Sourceforge [6] [5] [21] [29] [8]. However, Beecher et al. claim that the characteristics of a project depend on the repository it is hosted on[3]. Itude Mobile B.V. has put MOBBL on Github and will make use of the pull-based system Github offers. Github is a web-based hosting service for software development projects that use the Git revision control system. It offers private and public repositories. The service is free when using the public repositories. Currently it has 5.9 million users and holds 12.5 million repositories making it the largest code host currently available (April 2014). I will use Github data that is available via GHTorrent, which is a project set up for collecting Github event-data via the Github API [14][12].

Release history of projects is gathered from the Github repository pages using a HTML scraping program written in Rascal1.

The data was gathered in the period of April 2014 to June 2014.

4.3

Data selection

The initial data selection is the same as the projects that were selected in the preliminary research described in section 3.3 and have the following criteria:

• Projects with more than 200 pull requests from the first pull request they did until the moment of data gathering.

• Projects done in the languages C, C#, Objective-C, Java or C++. • Projects have at least one pull request from a non team member.

• Projects should have release information available on the Github repository page. Projects with more than an average of one release per week were left out of the sample because this is an extremely high frequency that only a few projects have. Some projects had their first pull requests before the project was started on GitHub. This data is obviously faulty. Projects with these kind of faults were left out as well. These criteria resulted in a total of 314 projects that could be used for analysis.

4.4

Data analysis

Quantitative research is research where the data consists of numbers and classes and is analysed using statistics [26]. The data gathered for this research is quantitative data: all the information regarding releases is gathered in numbers (number of releases, averages etc). The information about pull requests is also in the form of numbers. Therefore statistics wil be used to analyse the data.

Spearman rank correlation

The Spearman rank correlation test measures the strength of association between two ranked variables. It can be seen in section 5.1 that the external pull requests data is normally distributed (figure 5.4) and the release data is skewed (figure 5.3). Also, looking at scatter plots of the two the correlation between releases and the growth of external pull requests does not seem to be linear. Therefore the Spearman correlation test was chosen to test for a correlation.

(28)

4.4. DATA ANALYSIS CHAPTER 4. RESEARCH METHOD

Analysis of Variance (ANOVA)

ANOVA is an extension of the t-test, used when comparing more than two groups with each other.

The release information will be split into quartiles of 25% of the release data each. The first group will contain the projects with the 25% lowest number of releases. This goes to the top 25% highest number of releases.

These quartiles will be used in the ANOVA test to see if there exists a difference between the groups. This will provide us with information if projects that are in the group with more total releases, major/minor releases or a high release frequency are different from projects have a lower release frequency, number of total releases or major/minor releases. Although a positive ANOVA test will not give us a specific answer on why they are different, we will be able to tell if the groups are different.

(29)

Chapter 5

Results

This chapter describes the results of the analysis. Section 5.1 describes the data in general. Section 5.2 gives an overview of the results of the statistical tests. Section 5.3 describes the outlying projects.

5.1

Description of the data

A project from the dataset does a mean of 27 releases, the median is 20. This includes all the releases a project does, including patches, build, alpha, beta, stable and unstable releases. When looking at the major/minor releases, the mean is 7.8, quite a bit lower than when all the releases are included. The median of major and minor releases is 6. The data for releases, whether those be the total releases or the major/minor releases, is skewed as can be seen in the histograms 5.1 and 5.2.

Figure 5.5 shows in which weeks releases are done by projects. Figure 5.6 shows this for the major and minor releases. The average age of a project is 122 weeks, if we take the last date of a release as the last activity.

The mean release frequency is 0.22 with a median of 0.17. The distribution is skewed as can be seen in 5.3.

The growth coefficient for external pull requests is 0.08, meaning not a lot of projects are growing in number of external pull requests. The median is 0.03. As is already described in the paper from Gousios et al. the distribution of the number of pull requests projects get is very skewed [13]. However, the spread of the external growth coefficient is more normally distributed as can be seen in figure 5.4.

Variable Maximum Minimum Mean Median Total releases 144 1 27 20 Major/ minor releases 101 1 7.8 6 Release frequency 0.97 0.008 0.22 0.17 External growth coefficient 10.50 -6.46 0.08 0.03 Internal growth coefficient 74 -241.14 -1.29 0 Overal growth coefficient 74 -240.29 -1.21 0.05

(30)

5.1. DESCRIPTION OF THE DATA CHAPTER 5. RESULTS

Figure 5.1: Histogram of number of releases a project does

(31)

CHAPTER 5. RESULTS 5.1. DESCRIPTION OF THE DATA

Figure 5.3: Histogram of the average releases per week a project does

(32)

5.2. RESULTS OF STATISTICAL TESTS CHAPTER 5. RESULTS

Figure 5.5: Graph showing in which weeks releases are done

Figure 5.6: Showing in which weeks major and minor releases are done

5.2

Results of statistical tests

5.2.1

Spearman

The results of the Spearman tests are shown in table 5.2. It can be seen than no significant correlation exists for any of the releases samples.

The accompanying scatterplots for the external grwoth coefficients are shown in figures 5.7, 5.8 and 5.9.

(33)

CHAPTER 5. RESULTS 5.2. RESULTS OF STATISTICAL TESTS external growth coef-ficient internal growth coef-ficient overall growth coef-ficient rho p rho p rho p Total releases 0.08 0.14 0.103 0.06 0.077 0.173 Release frequency -0.03 0.54 0.09 0.087 0.022 0.69 Major/minor releases 0.09 0.096 0.07 0.21 0.12 0.03

Table 5.2: Spearman correlation for coefficients

(34)

5.2. RESULTS OF STATISTICAL TESTS CHAPTER 5. RESULTS

Figure 5.8: Scatterplot of External growth coefficient vs Release frequency

(35)

CHAPTER 5. RESULTS 5.3. OUTLIERS

5.2.2

ANOVA

Table 5.3 shows the results from the ANOVA tests. None of the ANOVA tests shows a statistical significant result.

external growth coefficient internal growth coefficient overall growth coefficient Total releases 0.801 0.6451 0.6343

Average per week 0.6904 0.9901 0.9838 Major/minor releases 0.4613 0.8262 0.8147

Table 5.3: ANOVA results

5.3

Outliers

Since the statistical tests did not show statistically significant correlations it is interesting to see what happened to the projects that do have a high external growth coefficient.

Projects with an external growth coefficient of more than 2.5 are not very commong. These project are shown in 5.4 and discussed in section 6.2.

Name Total release Release frequency Major/minor releases External growth coefficient WordPress-Android 39 0.45 10 4.98 WordPress-iOS 84 0.71 26 4.78 cocos2d-x 37 0.33 7 2.73 contrail-controller 1 0.03 1 3.88 cphalcon 18 0.15 10 3.31 hhvm 15 0.07 7 4.51 liferay-portal 5 0.19 1 10.50 opencv 30 0.33 5 2.96

(36)
(37)

Chapter 6

Discussion

To answer the research question Are releases and the growth of external pull requests pos-itively related?, this chapter discusses the results, addressing the statistics in section 6.1. The outliers are discussed in section 6.2. The related work is discussed in section 6.3 and the limits and validity in section 6.4

6.1

Statistics

According to the Spearman correlation test there does not seem to be a statistically signifi-cant correlation between the growth coefficient and the total releases, major/minor releases or the release frequency. None of the Spearman rho’s came close to 1 as is shown in table 5.2. The ANOVA test also did not give any statistical significant results (p <0.05) as can be seen in table 5.3. Both the hypotheses If a project has a higher release frequency it will show growth in external pull requests. and If a project does more releases in total it will show growth in external pull requests. are not supported by the statistical tests.

6.2

Outliers

The projects with a high external growth coefficient will be discussed. These projects are discussed to see if any pointers to why these project are growing can be found. The informa-tion about these project is gathered using the project websites and the Github repositories of the projects.

The Contrail-Controller repository contains the code for the configuration management, analytics and control-plane components of the Contrail network virtualization solution. The first pull requests of the project was done in September 2013. The data only contains pull requests of a few months as can be seen in figures C.13,C.14 and C.15. Therefore the data may not be very stable and give a distorted view. The same goes voor HHVM. HHVM (also known as the HipHop Virtual Machine) is an open-source virtual machine designed for executing programs written in Hack and PHP. They have a guide on how to contribute to the project and recommend using pull requests. The data from HHVM is also from September 2013 until the point of gathering (April 2014). This also covers only a short lifespan and is therefore unstable as well as seen in figures C.16, C.17 and C.18. Contrail-Controller and HHVM are projects that do not have enough data to draw a conclusion.

Liferay-portal is a webplatform with features commonly required for the development of websites and portals. Looking at the graphs it seems the high growth coefficient of 10.49 comes from the fact that there were only a few pull requests in the first months as can be seen in figures C.1 and C.2. After this flat line the number of pull requests made per month are

(38)

6.3. RELATED WORK CHAPTER 6. DISCUSSION

about the same. The releases of liferay-portal found on github were all done in the months during the steady growth. There is a peak in the graph C.3 from April until October 2013. In November 2013 there was a big release and after that the pull requests dropped. This peak might therefore be related to the release. OpenCV is an Open Source Computer Vision Library and shows somewhat the same curve as Liferay-portal. This indicates that the high growth coefficient was caused by having only a few pull requests in the first months showing a flat line, and a lineair line afterwards. There were bigger releases in March, April, June, November and December 2013 and along those months the number of pull requests are a bit higher than in the other months as seen in figure C.13. Cocos2d-x is a multi-platform framework for building 2d games, interactive books, demos and other graphical applications. It is based on cocos2d-iphone but works with C++ instead of objective-C. They provide a guide for their contributors telling them to make use of pull requests. There was a release just after the peak in January 2014, the peak can be seen in figure C.12

For the projects Liferay-portal, OpenCV and Cocos2d-x peaks in external pull requests occur around the release dates. This indicates that there are more external pull requests during periods of a release. However, the high external growth coefficient of the projects cannot be explained by the total releases, major/minor releases or the release frequency. For Liferay-portal and OpenCV, according to the graphs the high grow efficient is caused by the few pull requests in the beginning and the lineair growth afterwards, rather than by the total releases, major/minor releases or the release frequency. For cocos2d-x the graph of the project does show a more than lineair growth as can be seen in figures C.10 and C.11. However, the number of total releases and the release frequency are just a little above the mean and the major/minor releases are a few decimals below the mean. Therefore no assumption can be made the higher than normal growth coefficient is related to the releases. Wordpress-Android and Wordpress-iOS are Wordpress installations for android and iOS devices. Especially Wordpress-iOS has a rather high number of releases (84), and is the only project of the 8 projects described that follows the expected pattern of a high growth coefficient and a large number of total releases and a high release frequency. A contributor of Wordpress-Android and Wordpress-iOS answered the questionaire that was send to con-tributors of projects. The reason that their external growth coefficient is high is because they decided that all contributions should be done via pull requests. This decision happened at the point of where the graph shows a rapid growth which can be seen in figures C.4, C.7, C.6 and C.9. The high growth coefficient is high in this case because there were only a few pull requests in the beginning, until the decision was made that all contributions should come using pull requests.

From the analysis of the above mentioned projects, no support for the hypothesis could be found. Projects that do have a high growth coefficient have this because they decided to start using pull requests. This means that doing more releases does not lead to more pull requests.

6.3

Related work

Although releases are mentioned as a success factor in some papers, they are not taken into consideration in the research done. For example Crowston et al. write that the time between releases is a measure of process success for Open Source projects [6]. However, they do not measure it in their empirical study. It can be the case they do not find it necessary to take release into consideration in the research. It can also be that it is not included because release information is difficult to attain [32]. With the findings of this research it shows that it is not necessary to include releases in the models used, when success is measured as the growth coefficient of pull requests. It can still be the case that releases are related to some other success factors like users, or the number of commits or developers.

(39)

CHAPTER 6. DISCUSSION 6.4. LIMITS AND VALIDITY

6.4

Limits and Validity

Measuring success in terms of pull requests using the growth coefficient is not the most reliabe way to measure the success of a project. Pull requests are relatively new. Github exists since 2008, the first Github tutorial mentioning pull request I found was in january 2010. Some projects, like WordPress-iOS, show up as growing just because they actively started using pull requests. Projects show a high growth because they started using pull requests and not because they are actually growing. Also, some projects have been using pull requests for a few months only, giving a distorted result on the project. The data of those projects can be very unstable.

Also, some projects make external developers team members when they contribute on a regular basis. This influences the results since they are from then on counted as internal pull requests instead of external pull requests. This may cause a decline in external pull requests while the project is still growing. This problem can also be solved if success is measured as the number of distinct committers and not the number of pull requests.

This research made the assumption that projects on Github use the SemVer guidelines for versioning their software. However, some projects do not use this system. This will have caused noise in the data because releases were grouped under the wrong category. The release information was also gathered from Github, while more information may be available on different websites or archives [32]. Therefore the release information may be incomplete. Furthermore, this research was only done with 314 projects while Github alone already has 12.5 million repositories. Also only projects with the object oriented languages were taken into account. The data of this research does not span the complete data available. Therefore the results can be different when different projects are taken.

Moreover, the preliminary research was on a very small scale with only 21 developers responding with useful answers. If a larger survey is held a more concrete direction towards factors influencing external pull requests, or pull requests in general, may be found.

(40)
(41)

Chapter 7

Conclusion

This chapter provides the conclusion of the thesis in section 7.1. Suggestions for future work can be found in 7.2 and recommendations for MOBBL can be found in 7.3.

7.1

Conclusion

This thesis started with an interest in what factors influence the number of external pull requests a project gets. A small survey among developers of Open Source projects gave pointers towards the influence of releases on the number of pull requests. Information was gathered about the releases a project does and the growth of external pull requests. The Spearman correlation test and ANOVA were used to see if any correlation existed but these tests turned out to be statistically insignificant. Taking a closer look at some projects that diverge of the standard projects, in the fact that they have a high growth coefficient, does not lead to any conclusions either. Therefore the research question: Are releases and the growth of external pull requests positively related? can be answered with: there does not seem to be any relation between the external growth coefficient of pull requests and the total releases, major/minor releases or the release frequency of a project.

7.2

Future work

The analysis of the outlying projects did not support the hypotheses. There are however projects (liferay, cocos2d, opencv) where it does look like that peaks in pull requests are caused by a release, which was also claimed by quite some projects that answered the questionnaire. For future research one can focus on the relation between the pull requests in the few months before and after a release, instead of focusing on the growth coefficient overall and see if a relation exists. It would be interesting to see if during those months not only the number of pull requests increases, but also if the number of distinct committers does. If the number of distinct committers also grows, it means a release attracts new contributors. Otherwise it means that the current contributors just work harder when a release needs to be done. Also, one can look if an announcement of a release influences the number of pull requests and distinct committers.

7.3

Recommendations MOBBL

The developers of MOBBL already do most of the promotion mentioned in the answers of the survey. They have a blog, present on conferences and use Facebook and Twitter. They

(42)

7.3. RECOMMENDATIONS MOBBL CHAPTER 7. CONCLUSION

have knowledge on how to reach potential users. It might be a good idea to try and build a community on Stack Overflow1so people can easily post questions and find answers. It can

also be recommended to provide low level tutorials as simple as: ”How do I get a button on a screen” or ”How to connect two screens”.

From the survey it seemed that most developers joined because they wanted to fix bugs or implement a feature. This indicates that developers need to be users first before they join a project. Furthermore, quite some projects I saw in this research provided a contribution guide. Some were as extensive as also giving a guide on how to clone the project with git. These guides were the place were code conventions and regulations for making pull requests could be found. This kind of guide, together with a backlog of what still needs to be done, can help people who want to help with he project but do not know where to start.

Finally, from the survey and literature it also became apparent that the most projects only have a very low number of developers working on it. Therefore Itude Mobile B.V. should keep their expectations low, especially because the target audience is very specific: cross-platform developers.

(43)

Bibliography

[1] S. Androutsellis-Theotokis, D. Spinellis, M. Kechagia, and Gousios. Open source soft-ware: A survey from 10,000 feet. Foundations and Trends in Technology, Information and Operations Management, 4(3-4):187–347, 2011.

[2] Apache. Apache contributer model. https://community.apache.org/ contributors/.

[3] K. Beecher, A. Capiluppi, and C. Boldyreff. Identifying exogenous drivers and evolu-tionary stages in floss projects. Journal of Systems and Software, 82:739–750, 2009. [4] C. Bird and T. Zimmermann. Assessing the value of branches with what-if analysis.

In W. Tracz, M. P. Robillard, and T. Bultan, editors, SIGSOFT FSE, page 45. ACM, 2012.

[5] S. Comino, F. M. Manenti, and M. L. Parisi. From planning to mature: On the success of open source projects. Research Policy, 36(10):1575–1586, 2007.

[6] K. Crowston, J. Howison, and H. Annabi. Information systems success in free and open source software development: Theory and measures. Software Process—Improvement and Practice, 11:123–148, 2006.

[7] W. DeLone and E. McLean. Information systems success revisited. In System Sciences, 2002. HICSS. Proceedings of the 35th Annual Hawaii International Conference on, pages 2966–2976, 2002.

[8] R. English and C. M. Schweik. Identifying success and tragedy of floss commons: A preliminary classification of sourceforge. net projects. In Emerging Trends in FLOSS Research and Development, 2007. FLOSS’07. First International Workshop on, pages 11–11. IEEE, 2007.

[9] J. R. Erenkrantz. Release management within open source projects. Proceedings of the 3rd Open Source Software DevelopmentWorkshop, pages 51–55, 2003.

[10] R. A. Ghosh, R. Glott, B. Krieger, and G. Robles. Free/libre and open source software: Survey and study, 2002.

[11] Github. Github contributer model. https://github.com/yui/yui3/wiki/ Contributor-Model.

[12] G. Gousios. The ghtorrent dataset and tool suite. In Proceedings of the 10th Working Conference on Mining Software Repositories, MSR ’13, pages 233–236, Piscataway, NJ, USA, 2013. IEEE Press.

[13] G. Gousios, M. Pinzger, and A. van Deursen. An exploration of the pull-based software development model. sep 2013. Submitted to the I/Users/Cindy/Documents/Master Thesis/ThesisV2/Literature.bibnternational Conference on Software Engineering 2014.

(44)

BIBLIOGRAPHY BIBLIOGRAPHY

[14] G. Gousios and D. Spinellis. Ghtorrent: Github’s data from a firehose. In Mining Software Repositories (MSR), 2012 9th IEEE Working Conference on, pages 12–21. IEEE, 2012.

[15] A. Hars and S. Ou. Working for free? motivations of participating in open source projects. In System Sciences, 2001. Proceedings of the 34th Annual Hawaii International Conference on, pages 9–pp. IEEE, 2001.

[16] Itude. Mobbl framework. http://www.mobbl.org.

[17] T. Kilamo, I. Hammouda, T. Mikkonen, and T. Aaltonen. From proprietary to open source - growing an open source ecosystem. Journal of Systems and Software, 85:1467– 1478, 2012.

[18] K. R. Lakhani and R. G. Wolf. Why hackers do what they do: Understanding motivation and effort in free/open source software projects. Perspectives on free and open source software, 1:3–22, 2005.

[19] S.-Y. T. Lee, H.-W. Kim, and S. Gupta. Measuring open source software success. Omega, 37(2):426–438, 2009.

[20] M. Michlmayr, F. Hunt, and D. Probert. Release management in free software projects: Practices and problems. In Open Source Development, Adoption and Innovation, pages 295–300. Springer, 2007.

[21] V. Midha and P. Palvia. Factors affecting the success of open source software. Journal of Systems and Software, 85:895–905, 2012.

[22] K. Nakakoji, Y. Yamamoto, Y. Nishinaka, K. Kishida, and Y. Ye. Evolution patterns of open-source software systems and communities. Proceedings of the International Workshop on Principles of Software Evolution, pages 76–85, 2002.

[23] Opencast.Jira.com. Committers and contributers. https://opencast.jira.com/ wiki/display/MH/Committers+and+Contributors.

[24] OSI. Open source initiative. http://opensource.org/definition.

[25] E. Raymond. The cathedral and the bazaar. Knowledge, Technology & Policy, 12(3):23– 49, 1999.

[26] P. Runeson and M. H¨ost. Guidelines for conducting and reporting case study research in software engineering. Empirical software engineering, 14(2):131–164, 2009.

[27] R. M. Ryan and E. L. Deci. Intrinsic and extrinsic motivations: Classic definitions and new directions. Contemporary educational psychology, 25(1):54–67, 2000.

[28] I. Samoladas, L. Angelis, and I. Stamelos. Survival analysis on the duration of open source projects. Information & Software Technology, 52(9):902–922, 2010.

[29] R. Sen, S. S. Singh, and S. Borle. Open source software success: Measures and analysis. Decision Support Systems, 52:364–372, 2012.

[30] K. J. Stewart and A. P. Ammeter. An exploratory study of factors influencing the level of vitality and popularity of open source projects. In F. Miralles and J. Valor, editors, ICIS, page 88. Association for Information Systems, 2002.

[31] C. Subramaniam, R. Sen, and M. L. Nelson. Determinants of open source software project success: A longitudinal study. Decision Support Systems, 46:576–585, 2009.

(45)

BIBLIOGRAPHY BIBLIOGRAPHY

[32] J. Tsay, H. K. Wright, and D. E. Perry. Experiences mining open source release histories. In Proceedings of the 2011 International Conference on Software and Systems Process, pages 208–212. ACM, 2011.

[33] D. E. Wynn. Organizational structure of open source projects: A life cycle approach. In Abstract for 7th Annual Conference of the Southern Association for Information Systems, Georgia, 2003.

(46)
(47)

Appendix A

Survey questions

Dear < name >.

I found your e-mail address on Github and send you this e-mail because of the following: I am doing my master thesis in the direction of pull requests. < Projectname > came out as one of the interesting projects and I would like to ask you a few questions about it. There are three graphs attached that show the cumulative number of pull requests your project received A.1, the coloured barplot shows the pull requests per month A.2 and the black barplot shows the number of distinct committers A.3.

The first questions I have are general questions about < Projectname >: • How were users attracted when the project started?

• How were contributers attracted when the project started?

• How are new users currently attracted? • How are new contributers currently attracted?

The next questions are based on the graphs attached to this e-mail

• In the graphs attached it can be noted that from september 2012 until now < Project-name > has been growing rapidly. I am very interested in how that happened. Do you know if anything special occurred around september 2012 that may have influenced this growth?

If I should contact another person for answers to this questions I would like to get in touch with them.

Thank you a lot for your time, Kind regards,

Cindy Berghuizen

Master student Software Engineering University of Amsterdam

(48)

APPENDIX A. SURVEY QUESTIONS

Figure A.1: Cumulative pull requests

(49)

APPENDIX A. SURVEY QUESTIONS

(50)
(51)

Appendix B

Survey results

B.1

Users

B.1.1

Past

Question: How were users attracted when the project just started? The answers can be found in table B.1

Category Total Projects

Previous project 4 Angband, NGX Pagespeed, Denizen, Ogs Regular posts on blogs of the owners 3 Ceph, ReactiveUI, Wordpress-Android Promoted in Social Media / IRC / chat 3 Leechcraft, Jedis, Jboss-eap-quickstarts People just found it 2 Hazelcast, Flare-game

Only or only good option available 2 Taglib, Jedis Proof of Concept 1 Rugged Posts on tech sites 1 Ceph Fit in the upcoming Agile/TDD movement 1 Junit Talks and courses 1 Radare2 Mentioned in conferences/papers 1 Shogun

Words of mouth 1 Jboss-eap-quickstarts Good SEO 1 Jboss-eap-quickstarts Table B.1: Answers given to how users were attracted in the past

B.1.2

Present

Question: How are new users currently attracted The answers can be found in table B.2

B.2

Contributors

B.2.1

Past

(52)

B.3. PULL REQUESTS APPENDIX B. SURVEY RESULTS

Category Total Projects

Website 5 shogun, Wordpress-Android, Wordpress-iOS, Denizen, Ogs Promote on social media 4 Jboss-eap-quickstarts, Flare-game, Radare2, ReactiveUI Nothing special 3 Angband, NGX Pagespeed, Leechcraft

Mentioned in blogs 3 Hazelcast, Flare-game, ReactiveUI Conferences 2 Ceph, Radare2

Commonly known tool 2 Junit, Jedis

Promoted by other companies 2 Ceph, Wordpress-Android Via other/previous projects 2 Rugged, Denizen

Words of mouth 2 Denizen, Jboss-eap-quickstarts Only option available 1 Taglib

It is a good product 1 Hazelcast Good demo’s and tutorials 1 Shogun Online slides 1 Radare2 Malinglist 1 Ogs

Table B.2: Answers to question how users are currently attracted

The answers can be found in table B.3

Category Total Projects

Found bugs / wanted to add features 4 Angband,Jedis,Radare2, Shogun

Paid / commissioned people 3 Flare-game, NGX Pagespeed, Wordpress-iOS Posted on websites 2 Ceph, NGX Pagespeed

Enthousiasm from community 2 Hazelcast, Denizen

Social media / IRC / chat 2 Jboss-eap-quickstarts, Leechcraft Previous project 2 Ogs,Taglib

Blog post 1 Ceph Collegeaus 1 Radare2

Words of mouth 1 Jboss-eap-quickstarts

Table B.3: Answers to question how contributors were attracted in the past

B.2.2

Present

Question: How are contributors currently attracted? The answers can be found in table B.4

B.3

Pull requests

Question: In the graphs attached it can be noted that <insert projectname > has had a lot more pull requests in < insert date >. Do you know if anything happened that might have influenced this growth?

(53)

APPENDIX B. SURVEY RESULTS B.3. PULL REQUESTS

Category Total Projects

Found bugs/ wanted to add features 7 Angband, Leechcraft, Taglib, Junit, NGX Pagespeed, Jedis, Rugged

Website designated to attract people 4 OpenMW, Denizen, WordPress-Android, Wordpress-iOS

Social media / IRC 4 Denizen, WordPress-Android, Wordpress-iOS, Jboss-eap-quickstarts

Pay developers 3 Radare2, WordPress-Android, WordPress-iOS

Involve contributors 2 ReactiveUI, Jedis Summer of Coding event 2 Shogun, Radare2 Merchandise 1 Radare2

Mentoring 1 Radare2 Conferences 1 Ceph Change license to be more open 1 Shogun Call for help on website 1 Flare-game Annouce release 1 ReactiveUI Company promotes it 1 Ceph Posts on tech websites 1 Leechcraft

Words of mouth 1 Jboss-eap-quickstarts Private communication 1 Ogs

Table B.4: Answers to question how contributors are currently attracted

Category total Projects

Working on release 6 Antlr4, Flare-game, NGX Pagespeed , ReactiveUI, Jedis, Denizen

One active/enthousiastic developer 6 Taglib, Rugged, Denizen, Flare-game, Leechcraft , Jboss-eap-quickstarts Maturity 5 Ceph, Hazelcast, Leechcraft, Radare2,

Ogs

After release 3 Cepth, Railo, Jedis Event (GSoC / FOA) 2 Kotlin, Shogun

Switched to PR entirley 2 Wordpress-iOS, Wordpress-Android Activity cycle 1 Angband

Became Open Source 1 Kotlin No idea 1 JUnit

(54)
(55)

Appendix C

Outlier graphs

(56)

APPENDIX C. OUTLIER GRAPHS

Figure C.2: Quadratic regression of Liferay-portal

(57)

APPENDIX C. OUTLIER GRAPHS

Figure C.4: Cumulative pull requests of Wordpress-Android

(58)

APPENDIX C. OUTLIER GRAPHS

Figure C.6: Barplot of pull requests per month of Wordpress-Android

(59)

APPENDIX C. OUTLIER GRAPHS

Figure C.8: Quadratic regression of Wordpress-iOS

(60)

APPENDIX C. OUTLIER GRAPHS

Figure C.10: Cumulative pull requests of cocos2d-x

(61)

APPENDIX C. OUTLIER GRAPHS

Figure C.12: Barplot of pull requests per month of cocos2d-x

(62)

APPENDIX C. OUTLIER GRAPHS

Figure C.14: Quadratic regression of contrail-controller

(63)

APPENDIX C. OUTLIER GRAPHS

Figure C.16: Cumulative pull requests of hhvm

Referenties

GERELATEERDE DOCUMENTEN

The effect of the Asian countries with a high level of judicial independence on the relation between the dependent and independent variable deteriorates the negative

As my main research question concerns disclosures of companies by means of press releases about the effects of adopting IFRS over the financial year 2005, it is important to

Safety and legislation / Response to shooting Second Amendment rights and freedoms Second Amendment rights and freedoms Political critique and political condemning.. 138

To my knowledge this is the first research that uses cultural characteristics of a country in explaining the strength of the relation between real stock market returns and

The results show that the coefficient for the share of benefits is significant in the standard model for the total number of crimes committed, but the movement

The layout of the press release consists of: (upper area) the logo, the release statement, the headline block, the top information block; (lower area) the bot- tom information

Although several methods for generic FDMs construction, have been proposed for facial landmark local- ization in still images, they are insufficient for tasks such as facial

A case study about the RFID public transport e–paying system in the Netherlands (OV chip card), for instance, serves to illustrate how social and ethical