Reliability of Planning Poker estimations in scrum based projects

(1)

Reliability of Planning Poker

estimations in scrum based projects

Master Software Engineering Thesis

June 24, 2018, 76 pages

Author:

Floris Velleman

Supervisor:

Drs. Hans Dekkers

Organization:

(2)

Contact and organization

Name Floris Velleman

Email florisvelleman@hotmail.com Student

number

11078774

Name Hans Dekkers Email H.L.Dekkers@uva.nl

Name Ministry of the Interior and Kingdom Relations Website https://www.rijksoverheid.nl/ministeries/

ministerie-van-binnenlandse-zaken-en-koninkrijksrelaties Address Turfmarkt 147

2511 DP Den Haag

Name Universiteit van Amsterdam

Faculty Faculiteit der Natuurwetenschappen, Wiskunde en Informatica Website http://www.software-engineering-amsterdam.nl/

(3)

Abstract

In this case study we examined how planning is done in a Scrum based envi-ronment. This environment is, according to the organization, one that suffers from reliability issues concerning the estimation and planning. Two teams of three developers were monitored for a period of six sprints (roughly three months). We recorded the different planning activities, tracked progress and compared these with planning. Perceptions on reasons for deviations from planning were collected, as well as hard data on factors that could be of in-fluence.

The planning was taking place at two levels. Management planned be-tween four to six epics or projects ahead; mostly setting priorities to use the fixed capacity as optimally as possible, trying to plan in units that are roughly one or two sprints. The development team planned, based on the priority and availability, user stories into the sprints.

All of the management planning estimations turned out to be realistic: given the available time and the time required to complete the projects the planning was doable. The development team planned and estimated a total of 119 items. 26 of these items were delivered exactly on time (best estimate possible). 94 user stories (79% of the total user stories) turned out to be the closest to their estimated group (e.g. 15 hours actual with 16 hours es-timated).

Despite the estimations being quite reliable, projects and epics are still running into delays. One of the reasons this has happened was due to chang-ing priorities (9 times) but most of all less development capacity than what could realisticaly be expected (10 times). The data reveals that the biggest challenge to deliver a sprint on time is to have enough development capacity. In 4 of the 12 sprints there was less capacity than the 80% norm.

(4)

Acknowledgements

I would like to express my sincerest gratitude to my supervisor Hans Dekkers who has shown continuous support and has provided me with a lot of knowl-edge and patience.

Besides my supervisor, I would like to thank the organization for provid-ing me with the opportunity and support for this thesis. The involvement that the organization’s supervisor showed has motivated me a lot and allowed me to continue my work.

I would also like to thank the development teams and their respective management for the opportunity to tag along and gather data.

(5)

6.1.1 Team 1 - Sprint 10 . . . 47 6.1.2 Team 1 - Sprint 11 . . . 48 6.1.3 Team 1 - Sprint 12 . . . 49 6.1.4 Team 1 - Sprint 13 . . . 50 6.1.5 Team 1 - Sprint 14 . . . 52 6.1.6 Team 1 - Sprint 15 . . . 53 6.1.7 Team 2 - Sprint 23 . . . 54 6.1.8 Team 2 - Sprint 24 . . . 55 6.1.9 Team 2 - Sprint 25 . . . 56 6.1.10 Team 2 - Sprint 26 . . . 58 6.1.11 Team 2 - Sprint 27 . . . 59 6.1.12 Team 2 - Sprint 28 . . . 60

6.2 Appendice B - Data set management . . . 61

6.2.1 Team 1 . . . 61

6.2.2 Team 2 . . . 68

6.3 Appendice C - Additional literature . . . 72

6.3.1 Estimations as an integral part of user stories . . . 72

6.3.2 Bias . . . 72

6.3.3 Estimation methods . . . 73

6.3.4 Personal software process . . . 74

6.4 Appendice D - Analysis . . . 75

6.4.1 Team performance . . . 75

(7)

(8)

Chapter 1

Introduction

1.1 Context

The Ministry of the Interior and Kingdom Relations is a government orga-nization in the Netherlands. The main responsibility of the ministry is to safeguard the core values of democracy. According to their website the min-istry stands for public administration and public authorities that the public can trust [31]. The ministry will hereafter be referred to as the organization. To make sure these tasks are accomplished effectively the organization develops software that assists its employees in completing these tasks. The software that is used by the organization is written in a wide variety of pro-gramming languages across different teams.

Recently the organization has started using the Scrum methodology for its software development. This is quite a new step as most software projects in the organization were using a sequential development methodology or, in some teams, the Rational Unified Process (RUP) [18]. With the change in methodology the organization is confronted with new challenges. This thesis starts of in the organization as the adaptation has occured roughly one and a half year earlier and takes a look at a problem that persists in some development teams and is causing a lot of trouble for the organization.

1.2 Problem

The issue, as indicated by the organization, that presents itself as a result of the recent Scrum adaptation is the reliability of estimations according to those involved in the proces. Scrum masters, developers and project lead-ers indicate that sprints are not always progressing as originally planned. Although the previously used methodologies had similar problems the devi-ations were not this large according to the organization.

(9)

A sprint may be done very early, late or on time (with or without some functionality). While this may be expected the deviations from the planning are considered to be so large that the planning is not reliable at all. As a result there is confusion as to whether or not a long term planning will or will not be met. If two out of six sprints were completed well within time the organization feels like there is no guarantee for the other four sprints to follow a similar pattern due to how fluctuating the reliability is (for under-and overestimations).

This uncertainty has spread to the customers who are uncertain on how much their budget will deliver, and if the desired goal will be met with a given budget. As such, the customers, scrum masters, project leaders and management are interested in having more reliable estimations.

Sprints that indicate certain functionality can be delivered while it may not, or more could have been made, could originate from poor user story esti-mations. Due to this the organization has started using Planning Poker [14] but notes that this did not resolve the issue (although the estimations did be-come more accurate according to developers). Estimations however are still unreliable to the point where middle management feels it cannot provide a planning for more than three months that is reliable.

1.3 Research questions

The research questions are:

• Are the estimations that are made by the organization reliable? – How are estimations done in the organization?

– What bandwidth do the current estimates have compared to the actual time that is required?

– How reliable should estimations be for the organization and what is considered a good estimation in general?

(10)

Chapter 2

Background

2.1 Necessity of estimations

In this subchapter the need for estimations is questioned and discussed ac-cording to where the need comes from. It may originate from the products they deliver or be set as mandatory by the development process.

2.1.1 Causes for estimation

Why do we estimate at all and are estimates truly necessary to deliver a product? From a customers point of view the necessity may seem obvious; what can this budget get me, when will the product be delivered and is the price right for the product? These are common questions that management will need to deal with during a project.

Shepperd and Schofield write, based on a survey in several software com-panies, that in most cases the major cost factor in a project is labor and that for that reason estimating development effort is central to the management and control of a software project [37].

2.1.2 Products of estimation

But are estimations really necessary to control labor cost? A budget may very well translate to a certain amount of labor. The amount of labor that is available could also be fixed. Despite this offering an alternative the ad-vantages that estimations offer are not limited to labor cost. Boehm writes that the purpose of estimation is to obtain insight into budgetting, trade-off and risk analysis, project planning and control and software improvement investment analysis [4]. A project planning can provide the customer with an answer as to whether the budget that is available will allow for the cre-ation of a desired product. At the same time it will allow the team to have an indication of how many developers are needed to deliver the product in

(11)

time.

Lederer et al. state, based on the reponses of 116 systems managers and analysts in a nationwide association of information to a questionnaire, that estimations are used for:

1. scheduling projects

2. selecting proposed projects for implementation 3. providing a price for the customer

4. staffing projects

5. auditing project success

6. controlling or monitor project implementation 7. evaluating project estimators

8. evaluating project developers

[25]. Naturally each of these advantages has a certain impact and relevance that can differ based on the project and industry. These advantages however only fully apply if the estimation is reliable which an estimation is not by definition.

Molkken writes that estimation is directly related to projects running for a longer time than was intended, which is based on a literature review of sev-eral articles in which it is found that 30%-40% is the average effort overrun of software projects [28]. Given this information it is remarkable that despite these inaccuracies the products that estimation delivers are still considered a necessity in many projects.

The estimations for user stories are typically made before the sprint plan-ning session. The project wide estimations are often made before the sprints are started or when the stakeholders require answers to questions that can only be answered based on estimates.

(12)

2.1.3 Working without estimations

Despite the positives of the products of estimation it is interesting to ques-tion whether it is possible to work without the use of estimaques-tions. For Agile projects specifically the use of estimations is not required according to the NoEstimates movement [19]. The concept that is proposed by the move-ment and strongly related literature is to have very small chunks of work, for which an estimation would be unnecessary due to that chunk of work being overseeable (and thus always bringing the advantages) [19, 42]. Interestingly enough the products that estimation provides are still easily available but only for a lower level in the process. Most of the literature seems to revolve around this lower level rather then for example project level estimations.

Questions surrounding the project seem to be the primary factor for hav-ing estimations, yet most advantages are pointed towards financial aspects. If budget is not a problem for a project or if the capacity and longetivity of a project is fixed the advantages estimating offers may be outweighed by the overhead estimation introduces.

2.1.4 Estimations and the Scrum method

But what about estimations in a Scrum project, should we estimate in these projects and do they provide enough benefit to offset the overhead of esti-mating? According to the Scrum guide, which only defines estimation for the user story level, the main benefit to using estimations for user stories is that it becomes apparent how many stories will fit into a sprint [36]. Simply picking up stories during the sprint and not scoping the sprint is mentioned to have significant downsides by Abrahamsson [2] though this is more clearly related to not scoping a sprint at all, which then results in having to many or to few tasks for the given time period.

As the amount of story points is directly related to the estimation of a user story the way in which these are assigned has an impact on the estimation. But in what way are user stories assigned story points? According to Hill the effort required, the complexity and the inherent risk in developing a feature are the key factors on which an estimation is based and for which story points should be assigned [15].

(13)

2.2 Reliability of estimations

In this subchapter we look at whether estimations should be reliable and to what extend. This subchapter also looks at what influences this reliability and how processes deal with these influences.

2.2.1 Necessity of reliability

If the basis for all the advantages estimations offer is unreliable then is there a reason to use them? In fact Carroll writes, based on a literature study, that the reliability of these estimations has been a major challenge for the software industry [7].

According to Molkken, who held a survey across multiple companies (123 responses accross different IT sectors), most software projects (60-80%) en-counter effort and/or schedule problems [28]. Due to the fact that there is a level of vagueness in estimations (e.g. due to vague requirements) some literature states that there is never a certainty of a budget delivering a prod-uct [33, 37].

The need for reliable estimates seems to originate from this type of issues. As these estimates directly influence the products they deliver and the deci-sions that are made based on these products one can argue that these should be completely reliable. Most projects however do not make this assumption that the estimations are always right and will have a level of wiggling room in the deadline/product detail that allows for missteps (at least at the project level - a sprint does not). Fehlmann even goes as far as to write that due to software cost estimation being so difficult it is wise never trusting any initial cost estimate but take precaution for higher cost [13].

As such the estimations do not have to be completely accurate (which would be contrary to its definition). The reliability of estimations should thus fall into where the originally planned period combined with the wiggling room is enough time. The margin of error that can be used to determine the amount of time that should be allocated for the worst case is however un-known due to uncertainty. Determining what amount of deviation in the estimates is acceptable is closely related to the budget. Will the product provide the crucial functionality with a given budget? If this is the main

(14)

concern of a project it is likely to be the key indicator of what amount of wiggling room is available. A low or non-existing amount will however be ca-pable of having projects run overdue due to the fact that humans are biased towards optimistic estimation.

So what can be considered a reliable estimate? Literature shows that as long as the max deviation is 10% to 25% for 75% of the time it is considered to be a good estimate. The 10% originates from Capers Jones who noted that such an accuracy is only considered a good estimate if the project is well-controlled [20]. In a more recent literature study into what can be considered a good estimate Stutzke found that the 25% deviation for 75% of the time definition as posed by Conte and Shen in 1986 can be considered the standard in literature due to its frequent usage [11, 39].

2.2.2 Visible incidents and how to reduce them

Overestimating user stories can lead to missed opportunities and poor re-source allocation [6, 32]. Underestimating however can have detrimental effects on business reputation, competitiveness and performance [2]. A clear preference could not be found but can be resolved logically to underestima-tion, atleast to a certain degree. In scrum based projects teams may add user stories to the sprints when all other work has be done. Although the estimated amount still defers it does so in a way that does not cause cost overruns which is often a crucial factor in projects. It is highly dependant on the project as to whether overestimating or underestimating is preferred. There are several ways to reduce the amount of incidents related to esti-mating.

Reviewing reliability

Reviewing the estimation skills of a team can be a very good idea to increase reliability. That is, if it is done right. The ability to see what amount of devi-ation occurs in regards to certain estimdevi-ations allows management to prepare for equal deviations in the future. It also allows the developers to adjust the estimation based on the previously gathered data. Multiple studies have shown however that when people are made aware of estimations being a very important task or they are offered a reward for more reliable estimates that the reliability for difficult to estimate tasks actually goes down[21, 24]. It

(15)

can however cause more accurate estimations for easier tasks. Justifying estimates

Estimates can be used by both management and the team to show that the estimates that were made were accurate or inaccurate. This allows the de-velopers and management to have some factual data to support claims they make in the cat and mouse game that is played when estimating is done. This ”software estimation game” is described by Thomsett as playing a major role in estimations [40]. He states that the conflicting interest require historical data so either side can explain the estimations towards the conflicting party.

Domain background and estimation experience

Domain and estimation experience can have quite an impact on the amount of incidents that occur. This may show in both a positive as well as a nega-tive way. Domain knowledge for example can lead to increased accuracy due to overseeing more information that is related to the task. At the same time it may lead to inaccuracies due to being to focused on information that is not to important for the task.

The same concept follows for the estimation experience. Having insight into how much a comparable task can result in a more accurate estimation but may also lead to a bias.

2.3 Historical data

The use of historical data seems logical when wanting to evaluate how well estimations are made. Common formats for gathering such data often in-clude the original estimation and the actual time that was taken to complete a task. Using this data all parties that are involved in the estimation proces can gain insight into the attributes and reliability of the estimations.

The estimation game

The historical data can be used by parties (e.g. developer(s), management, product owner(s)) involved in the estimation process to argue that an esti-mation should be higher, lower or is correct at a certain point.

Initially the parties may stick to a defensive estimation so that there is a larger certainty of delivery due to having more time. Defensive estimations

(16)

often lead to overestimations however and can result in detrimental effects on business reputation, competitiveness and performance according to a study done by Abrahamsson [2]. This may still be preferred over overestimating however. If the estimation game is done well the estimations will become more accurate and can be used by parties to scope the amount of work that is done (e.g. requirement can be met in X time with decent performance, or Y time and great performance).

Interestingly enough the organization notes that even after having made estimations for a while there is still a large inaccuracy. Even though this inaccuracy can be explained by wanting to much accuracy the estimations should also scope the way things are implemented which should still result in reliable estimates.

Gathering historical data

Though there are those who argue that gathering historical data may involve significant costs [34] most systems that are used for registration of the tasks include the ability to report such facts. The organization uses systems that can provide reports on the accuracy of estimations based on the data that was put into the system, whether such data is reliable and to what extend is discussed in chapter 3.

Historical data does not necessarily have to be available for a project ei-ther. With expert opinion it is seen that experts often use analogy, which is the association of tasks or programs that show comparable attributes, to make estimations. As such, the use of historical data can be done by using comparable projects in the earlier stages of a new project.

Large or small tasks

Smaller tasks are generally easier to estimate then large tasks [30] with large tasks often being overestimated due to uncertainty. Whether the expert opinion results in an overestimation for large tasks due to not being able to get a comparable analogy is unclear in literature. Planning Poker tries to adjust for these innaccuracies in large tasks by advising to break a task into smaller tasks when the estimation shows a significant amount of work [14]. If the overestimations are made due to experts not having a suitable analogy the proces of breaking the task into smaller tasks would also help in avoiding these overestimations.

(17)

2.4 Ways of estimating

In this subchapter we look at the most common form of estimation and the proces that is used by the organization.

2.4.1 Expert estimation

An interesting question would be to ask what the most common estimation method is. Accross literature the amounts seem to differ by quite a bit but expert estimation appears a clear winner in all studies that were found (72%[35] to 86%[22]). Most of these results seem to be based on just a sin-gle organization which can explain the significant difference in usage (still resulting in a majority usage even with the lowest usage).

The way expert estimation is performed is quite interesting as there are several ways of using expert estimation. A general sense of the methodology can be found in research but a singular proces or structure that is used by all expert estimators could not be found.

But what about the reliability of expert opinion? Why would the usage of expert opinion be so high if models are around that address the issues commonly found when dealing with estimations? There appear to be several reasons for this. Molkken states, based on a survey of multiple organizations, that companies feel uncomfortable using models they may not fully under-stand [28]. This seems like an odd reason if the benefits of such a model have a substantial impact on the reliability of estimations. It turns out that substantial evidence that proves the use of formal models leads to more ac-curate estimates when compared to expert estimation cannot be proven due to its differing structure [28].

2.4.2 Planning Poker

Planning Poker [14] is an expert judgment-based estimation technique that is widely used in agile methods for software development according to survey results from Mahnic [26]. Planning poker relies on combining expert estima-tions to come to a more reliable estimation and discussion.

In Planning Poker, an agile team assigns a number of story points to a user story which represents the estimated time that will be required to create

(18)

it. This is done by having each individual software engineer perform an esti-mate. These estimates are then discussed to which a concensus for a certain amount of story points can be formed or another round of making estimates can be initiated. A story point typically corresponds to a single day of work for a developer. Often a predefined set of values is used for estimating the amount of work (e.g. 0.5, 1, 2, 3, 5, 8, 13, 20, 40 and 100 [9]). A high number may indicate the user story is to large and should be broken down into smaller user stories so that the estimate becomes more reliable. Cohn writes that using this sequence is inherent to the fact that estimations for larger products become vaguer (e.g. it is difficult to estimate whether a user story will be done in 40 or 41 days compared to one or two).

Based on case studies several studies have concluded that group consensus (as used by Planning Poker)is more accurate then statistically combined individual expert opinions [29, 30]. The method that is used in the article for comparison is the BRE (balanced relative error) and is used on a single project. The way actual effort is measured in the article is remarkable. It is done by having the developer that signed off the user story write down how long it took to complete it.

2.5 Related work

One of the the most related practical appliances of historical data was found in a book called Software Esimation: Demystifying the Black Art in which Steve McConnell notes the significant impact of historical data on Poker Planning, primarily based on case studies within several large companies [27].

Practical examples of using historical data in combination with Planning Poker are quite rare. Jeff Atwood, co-founder of Stack Overflow, notes on his blog that development teams within the company use historical data to:

• Predict when software will be released.

• Determine which developers have the most reliable estimations. • Reflect on a developers accuracy in regards to estimations.

Although the article does not say it is limited to these uses they seem to be the primary interest of the author and can be used to answer some of the

(19)

questions that are typically posed by organizations [3].

Additional literature that was used in this study can be found in appen-dice C.

(20)

Chapter 3

Method

3.1 Type of research

Due to estimation deviations being capable of residing in a large amount of factors a lot of very specific data will need to be collected from the teams during their sprints. Clearly defined historical information that may or may not influence estimations assures a quantitative research can be done towards the main question.

The downside to this type of research is that it may miss out on con-textual detail. It is, due to its objectivity, more easily generalized towards a broader appliance. It is also common to define all the expected types of input in a quantitative research and should be more suited then for example a qualitative research in which such things typically unfold from the research itself.

3.2 Data gathering at management level

To decide where the estimation problem is really located it is necessary to gather data on a higher level as well. The organization has indicated that plannings are also made at this level (in quite a lot of detail). Gathering this data will happen by noting key attributes during the management sessions concerning the development teams. For each such session the following data will be recorded:

• The session number.

• The people that are present and their respective roles. • The priority that is given to certain projects or epics. • General points of interest (soft data).

(21)

• A planning of what will be done in the upcoming period. • Deadlines that have been set by parties.

To assure this is possible the following measures have been taken by the organization:

• Stakeholders will be present so priorities can be worked out.

• There will be a member of the development team present (team lead). • Projects or epics will not be larger then two sprints.

• Only 144 hours will be estimated for each sprint.

3.3 Data gathering in the teams

The data that is gathered within the organization will focus on the teams that have trouble with estimations. To gain a clear answer to the research question the data set that should be collected should contain for each sprint:

• The sprint number

• The amount of hours that are available in the sprint. • The amount of hours that have been estimated. • For each user story:

– All of the estimation rounds.

– All of the estimations for each developer. – Any remarks during the estimation session. – The agreed upon estimation.

– The amount of hours assigned to the estimation. – The technology the project or epic requires. – The actual amount of hours that was taken. – The type of user story (bug or user story).

– The status of the user story (finished, unfinished). – Any remarks related to the story during the standup.

(22)

3.3.1 Daily standups

During daily standups team members may discuss issues that they meet while working on a specific user story. This information may very well prove relevant. At the same time it provides another chance to record the amount of time taken (though only on a daily basis) and can be used to gain insight into how accurate the gathered data really is.

3.3.2 Data set size

As for the amount of sprints that are analyzed there should be atleast enough sprint data to clearly indicate deviations in estimations that are larger than 25%. Given that the organization is suffering issues on short term planning (i.e. three to four months) the amount of data should at the very minimum represent such a period within the organization.

3.4 Approach: Analysis

With the exception of some soft data, all of the attributes that are collected can be used to draw comparisons and/or conclusions on the effectivity of this attribute in regards to making an estimation more reliable.

It would be most logical to first look at what issues may exist on a man-agement level before there is a detailed analysis of the team data. By val-idating/invalidating the concerns expressed by the management it becomes possible to assess whether a lower level analysis is necessary.

3.4.1 Amount of hours in the sprint

The amount of hours in the sprint can indicate a wide range of situations that may have an impact on productivity. Deviations in the amount of hours that are available in a sprint can point out that there are changing priorities in the work that is assigned to team members. This can have a negative impact on the actual productivity and as such influence the actual amount of work that can be done.

(23)

3.4.2 Estimation rounds, developer estimates and remarks

These are very interesting data point in that they can indicate uncertainty towards an estimation. At the same time it provides the data set with some context as to why a certain estimation was made and what was considered while making the estimation. It can be used in combination with the actual result to figure out if critical information was left unconsidered or not.

3.4.3 The agreed upon estimation

The agreed upon estimation may show a very interesting pattern. If this estimation is always the highest or if it is always the lowest it can show a certain bias towards estimation (i.e. defensive estimates). A high estimation may also indicate that uncertainty remains and can show a situation in which it may have been better to split the story in to multiple tasks.

3.4.4 Hours assigned to the story

Due to a team potentially using a complexity based system it is relevant to convert these to hours if possible so a comparison becomes possible to the other data.

Large deviations in this attribute can indicate a number of problems that will likely need to be retrieved based on the context/reason for deviation (e.g. not enough time for refinements, threshold for splitting is to high, etc.).

3.5 Reliability and impact

The reliability of this approach is greatly influenced by the accuracy of the data that is gathered. To this extent some measures can be taken within the teams to make sure that this data is reliable. This will be discussed in chapter 4.4.

(24)

3.5.1 Reliability of hours

The use of hours for many of the attributes that are collected may not nec-essarily seem correct for every team. Some teams prefer estimating with complexity. Due to the way the teams within the organization use complex-ity (translating it to a task of a specific size which should be a certain amount of time) we can translate the complexity attribute from that respective team to hours. The team that uses the complexity attribute is also using a group based estimation technique.

3.5.2 Impact

Due to the small amount of teams and the fact that this applies only to this specific organization the research is not easily generalized towards a broader field. It would however provide a case study that can be used in further research.

(25)

Chapter 4

Case description

To answer the questions that the organization has it is important to have an overview of what the organization looks like and how the theory applies to the organization. To do this the organization and the software teams should be looked at, as well as the underlying software process and how it can be approached to find reliable answers.

4.1 Teams and organization

The organization produces a wide variety of software. This software ranges anywhere from typical document management software to crypto applica-tions. The organization has made the decision to move to agile methodologies a few years ago after suffering from problems with sequential development methodologies.

Although the organization wants to widely adapt the agile methodology some teams indicate that they still use older methodologies or have only par-tially adapted. Agile methodologies by themselves are not restricted to a single approach either and the methodologies used differ quite drastically. A quick survey among the teams revealed that the most prevelant methodology in use is SCRUM.

There are a large amount of teams within the organization that develop software. The ones that are especially interesting to this research are the ones that have difficulty meeting the estimates. Two teams in the organi-zation can be found that indicate suffering from significant problems with estimation. The teams in the organization that indicate having difficulties with estimation use SCRUM. As such, any data that is collected will only reflect towards that methodology. Gathering information from these teams

(26)

may very well provide answers to the questions posed by the organization. All of the team members are working fulltime and assigned only to the respective team they work in. The level of developers ranges from mid-level to senior with a significant majority of mid-level developers.

4.2 First team: Crypto applications

This team consists of three developers and a product owner. Although not clearly stated, the team has appointed one of the developers to handle the tasks that can normally be associated with a scrum master.

Members of this team have indicated that they are either not meeting the expected deadline or find themselves out of work way before the sprint has passed. The deviations range anywhere from a week to a month (in either direction) according to the team. A few things have been tried by this team, poker planning being one of them, without any big successes.

Most of the software that is being developed by the team is related to cryptography. A point of interest is that the team works on multiple projects at the same time and combines these into a single sprint.

The team members indicate that prioritizing user stories can also be quite troublesome and has had impact on the sprints before. This is primarily due to existing systems needing bugfixes and having these gain a higher priority then current items in the sprint. Developers indicate that having to switch from one task to another and sometimes even removing progress that was made can take a lot of time.

4.3 Second team: Document management system

The second team is a larger team consisting of six developers, a scrum mas-ter, a tester and a product owner. Management for this team has indicated that problems related to estimation are a frequent occurence in the team and that it is hard to plan around the large deviations.

(27)

This team, unlike the first team, only has a single project that needs work but the project is significantly larger. This team builds a document management system for the organization. Much like the first team this team has experimented with ways of improving the estimation. One of the ways in which the team tried to improve the estimations is by using poker planning. Recently the team has been building in more room for underestimation. According to some team members some of the user stories are purposefully estimated at a higher number of hours to make sure that a product can be delivered. This however causes underestimations and gives rise to all the associated downsides.

4.4 Gathering data: Reliability

Data from these particular teams should be as reliable as possible to reflect the actual situation. To ensure that this can be done some research has been done within the organization to see if this data is available. It was found that both teams used modern tools to register user stories, hours, bugs, sprints and so forth. The first team uses Atlassian for registering sprint related in-formation while the second team uses Team Foundation Server (TFS). Both of these systems are capable of providing the information that is required for the approach this research takes.

Reliability however is not guaranteed by having team members writing these numbers down. With the first team indicating that stories are some-times overestimated on purpose the need for additional ways of validating the data becomes clear.

To make sure that the actual hours that are registered to a task are more reliable the teams have agreed to include the specific task number and the amount of hours they have just spent on the task in their commits. This allows the data set to be compared to the amount of hours that where actu-ally taken (by taking the difference). This may deviate a bit due to the fact that people may not check in their code all that often and have numerous reasons for not actually working that many hours on that specific code (e.g. lunch breaks, conferences, days off, etc.). It should however still display ma-jor deviations from the initial estimation which can be interesting to take a

(28)

further look at.

Another way of validating the accuracy of this data is by recording the state of the scrum board every day. Although this does not assure heightened accuracy for the items that take a low amount of time it will reveal items that took less time then may be written down.

(29)

Chapter 5

Results

During a period of three months two teams within the organization had their progress and data recorded. An anonimized summary of this data is included in the appendix (Appendix A). A total of 149 estimations have been made. Appendix B includes the reports that were gathered from discussions that where held at the management level. This chapter focuses on analyzing the data that was gathered and drawing conclusions.

5.1 Is there a problem with estimations at a higher

level?

Answering this question requires insight into the way that estimations are done at this level. Having established how this proces takes place it then becomes possible to assess how well the estimations at this level work out.

How are estimations done at a higher level?

Team 1 and team 2 both work based on priority instead of hard dead-lines. Although there are stakeholders that will demand certain projects or epics to be finished within a given time (sort of a deadline) the assigned prior-ity effectively decides when a project or epic will be picked up and worked on. Both teams have meetings at a management level where a planning for the upcoming sprints is discussed. For team 1 these meetings take place every two weeks while the team 2 has a monthly meeting. During these meetings a representative from the team (either a developer or a product owner) informs management and any stakeholders what has been done since the last meet-ing. These meetings then proceed to management assigning projects or epics to specific sprints based on the progression of the last sprint and deadlines that are set by external parties.

(30)

The planning is made by management on the basis of deadlines set by stakeholders and any influential party within the organization. The planning is made with the assumption that a project or epic will take a single sprint to complete. This is not always true which is why the lead developer and product owner may have some discussions as to how much can be done and in what timeframe.

To estimate the amount of hours that are available management assumes there to be 144 hours of development time per sprint (3 people * 8 days * 6 hours).

If a project or epic cannot be completed within a single sprint while there is only a single sprint available for its development management will typically remove stories from the project or epic that are not mandatory in order to fit it into the planning. This is effectively the appliance of a Lean software development principle (i.e. avoiding gold plating). The same technique is applied whenever something receives a high priority. A bug that has to be fixed quickly will be planned into a Sprint at the expense of another user story so the planning can still be met. If a project requires multiple sprints to complete, the situation where stories are removed typically happens in the last sprint.

Effectively the estimations are done by both parties during these discus-sions. Management will project a planning for an upcoming period of time (four to six sprints) and the team will reply to this by indicating whether this is achievable (this often leads to some discussion as to how the planning should be done) and provide a guarantee for atleast the upcoming sprint.

Data about the process above the teams management was not collected. Is the work being underestimated or overestimated?

At the management meetings the planning is made for atleast four sprints (with a maximum of six). This seems early enough to avoid major obstacles as well as providing the stakeholders with some insight as to when their re-spective project or epic will be picked up.

(31)

mostly an overestimation. The amount of hours that are actually needed during the sprint are lower then what was estimated. The detailed planning that takes place by the teams differ from the expected 144 hours in many cases and so does the time that is actually available. In the cases where more then 144 hours where available it was found that the actual amount of work that was estimated for turned out to be less as well.

When drawing a comparison between the available hours and the esti-mated time it can be seen that more often then not the available hours are lower then the amount that was estimated for.

Not only does it become clear that the amount of available hours is to low in ten seperate sprints (lower then 144 hours) it also becomes apparent that over 6 sprints were estimated by the team to take less then 144 hours.

(32)

So is the planning being met despite availability issues?

When looking at the project progress at a higher level it can be seen that the initial planning for team 1 is not always being met:

The initial planning for team 2 however has multiple points where devel-opment went far further outside the planning:

(33)

an interesting observation concerning these diagrams. It was found that if management demanded the start of a certain project or epic the developers sometimes added a single story for that project or epic so they could say it was being worked on. To visualize the impact of these actions the percentage of work that was done and the amount of stories for a certain project or epic per sprint are shown in the following diagrams:

(34)

As can be seen from these diagrams, both teams were doing this at the time to some extent. This may influence the higher level planning when management is confronted with the fact that a certain project is doing well while only a very small subset has been actively developed.

It can however be concluded that the initial planning that is made is not being met and team 2 is more likely not to deliver on time then actually delivering on time. As such we can conclude that despite the defensive esti-mations made by management the projects are not delivered in time.

One of the major factors, aside from the available time, is that the changes in priority to certain stories and/or projects are quite frequent. Sudden pri-orities on bugs or different user stories have an impact on sprints. The effect of a project receiving priority is also substantial and can easily set back the original planning. Regardless of the changes, adding just a single story to the sprint so that it can be said that a project has received the priority as requested with the intend of spending the majority of the time on another project seems a bit deceptive.

5.2 Is there a problem with estimations in the

devel-opment teams?

Are the estimations really unreliable?

As a reliable estimate is the opposite to an unreliable estimate it becomes an option to compare the data to the definition of reliable estimates such as the one by Stutzke, Conte and Shen. This definition states that estimations are reliable if 75% of the estimations are off by a maximum of 25% [39] [11]. Applying this definition on the data, especially the deviations, it becomes possible to see whether the estimations are reliable or unreliable.

Estimations deviation

Items at or below 25% deviation 79 items Items above 25% deviation 40 items

As can be seen from the table the actual reliability is 66.39% which means the estimations can not be considered reliable according to the definition.

(35)

Interestingly enough, when plotting these deviations it can be seen that the majority of the estimations that go very far beyond 25% are the ones that are overestimations. In fact when looking at all the data it can be seen that, while the majority of overestimations are still near the mean, the spread of the overestimations is far less pointed towards the mean then underestimations.

With all these facts combined it can be interesting to look at how often overestimations are made and for what items they are typically done.

Overestimating or underestimating - defensive estimations

Taking a more detailed look at the data reveals an interesting pattern. For the 119 items the estimations can be summarized in the following tables:

Estimations team 1 Underestimated 19 items 0% deviation 9 items Overestimated 33 items Estimations team 2 Underestimated 17 items 0% deviation 17 items Overestimated 24 items

(36)

Estimations total team 1 and team 2 Underestimated 36 items

0% deviation 26 items Overestimated 57 items

With 57 user stories being overestimated compared to the 36 items that were underestimated it can atleast be concluded that there is a strong chance of the teams making defensive estimations.

How does Planning Poker play into the estimations?

When taking a look at how estimations are done in comparison to the group-ing system provided by Planngroup-ing Poker there is a very different conclusion that can be drawn.

A good estimation can be one where the actual hours turn out to be closest to the estimated group (e.g. 15 hours actual with 16 hours estimated could not have had a better estimate, both 8 hours and 32 hours would have been worse). When looking at the data we can see that there are 94 user stories that actually satisfy this condition. This results in nearly 79% of the estimations having the closest estimation that is possible.

But being close to the estimation group does not mean the estimation is actually good. A story that took 23 hours to finish while it was estimated at 16 hours will still be closer to the 16 hour group but has run a day late.

The data shows that there are 73 user stories that were estimated to take more time then it actually took while also having more actual hours spend then a lower estimation group (e.g. 16 hours estimated while 11 hours were actually used).

What influence does size have for the reliability of an estimation?

It is interesting to look at what stories are being misestimated based on var-ious properties, one of these is the size of a story. If large or small stories are typically misestimated it may indicate a range of factors that deserves addi-tional analysis. Since the teams estimated using Planning Poker the range for these sizes can be grouped and shown for each category.

(37)

re-gards to their grouping.

When taking a look at this diagram and the underlying data it becomes clear that as the estimations grow larger the deviation which occurs increases as well. This diagram also shows that smaller estimations lean towards being underestimations while larger estimations lean towards overestimations.

The actual deviation from the mean compared to the original estimation can be expressed as percentages (which give an indication of the reliability of a certain estimation group).

Deviation % for estimation groups Original estimation Mean Deviation % 1 hour 1.5 hours +50% 4 hours 4.5 hours +12.5% 8 hours 7 hours -12.5% 16 hours 15 hours -6.25% 32 hours 21.5 hours -32.8125% 64 hours 42 hours -34.375%

Interestingly enough it becomes apparent that estimations in the 4, 8 and 16 hour estimation group have a deviation that is lower then 25%. The other

(38)

groups show quite a strong confirmation of estimation bias although it can be found in the more reliable groups as well - which can indicate that the grouping may have been badly chosen if a higher reliability is required.

Do the available hours for a sprint influence the estimation reliability? The sprints themselves have quite a large deviation in the amount of available hours. The amount of finished user stories compared to unfinished user stories and available time is shown in the following tables:

Team 1: Available hours compared to finished/unfinished user stories % Available hours Deviation previous Sprint % Sprint Finished/Unfinished 138 hours 0% 10 8:2 170 hours +23.1884% 11 8:0 142 hours -16.4706% 12 8:2 159 hours +11.9718% 13 18:1 140 hours -11.9496% 14 11:4 116 hours -17.1428% 15 9:1

Team 2: Available hours compared to finished/unfinished user stories % Available hours Deviation previous Sprint % Sprint Finished/Unfinished 87 hours 0% 23 14:1 96 hours +10.3448% 24 10:2 131 hours +36.4583% 25 18:0 27 hours -79.3893% 26 2:9 115 hours +325.9259% 27 6:5 66 hours -42.6086% 28 7:2

Though there is a link between heavy availability changes and the amount of unfinished user stories a relation between a drastic increase in availability is not apparent. This data seems to show that if a decrease in capacity oc-curs after a sprint that saw an increase the amount of unfinished user stories increases, the data set however is to small too make a definitive statement on this point.

The ratio of finished to unfinished user stories can be made for the avail-able hours compared to the estimated hours. This reveals an interesting pattern:

(39)

Team 1: Sprint hours compared to finished/unfinished user stories % Available hours Estimated hours Sprint Finished/Unfinished 138 hours 128 hours 10 8:2 170 hours 172 hours 11 8:0 142 hours 156 hours 12 8:2 159 hours 187 hours 13 18:1 140 hours 144 hours 14 11:4 116 hours 113 hours 15 9:1

Team 2: Sprint hours compared to finished/unfinished user stories % Available hours Estimated hours Sprint Finished/Unfinished 87 hours 93 hours 23 14:1 96 hours 96 hours 24 10:2 131 hours 135 hours 25 18:0 27 hours 160 hours 26 2:9 115 hours 168 hours 27 6:5 66 hours 97 hours 28 7:2

With just one small exception the data shows that when the estimated hours are higher than the available hours there will be an unfinished user story. When the estimated hours is adjusted for ”reliable” estimates (i.e. maximum of 25% deviation) it becomes apparent that the combination of these available hours cannot realistically lead to an achievable goal.

(40)

5.3 Conclusions

The estimations at a management level are very defensive and not very ac-curate as a result. Despite these defensive estimations the teams still do not always manage to deliver the sprints on time. The long term planning that is made at the management level is reliable when it comes to capacity (most sprints estimate for this amount and end up being less work).

The teams themselves are dealing with problems related to availability. The quantity of hours that is used by management (144 hours) is not equal to the actual amount that is available. The sprints that are delivered on time are accomplished with less hours due to defensive estimations being applied on both the management level as well as the team level.

Another issue that was found is where high priority work is injected into the sprints. This ranges from bugs to items. The team and the product owner do however exercise a good practice of looking at what the minimum viable product is that can be delivered and suggest user stories to be dropped that are not mandatory.

The detailed planning is not too far off when it comes to reliable esti-mations. Although a lot of overestimation has taken place in the teams the amount of items that went past 25% deviation from the estimate were far less then the estimations within these bounds.

When looking at the most reliable estimations it was found that estima-tions that were estimated as 4, 8 or 16 hours were the most reliable. The type of a user story seemed to have some influence on what type of deviation would take place. For all the items labeled as bugs it was found that most of them were underestimations.

The estimations revealed that 79% had the best estimation possible (clos-est to the Planning Poker points) which is a high reliability according to definitions of reliable estimates.

The teams may struggle with sudden reductions in sprint hours, the sprints that saw a reduction in the amount of hours available typically had a larger amount of unfinished user stories.

(41)

All in all the teams are quite capable of providing reliable estimates. The sprints are very predictable when estimated by the teams. In combination with the lower amount of hours that is actually available and the very defen-sive estimates from all parties involved it would seem as though the team is underperforming.

5.4 Validity

To assure the reliability of the data several measures where taken to get an accurate representation of what amount of time has been spend on a par-ticular item. These items can be used to replicate the study and get to the same results.

All of the data used in the conclusions of this thesis can be found in the appendices which allow the reader to not only reproduce the analysis but also perform different sorts of analysis.

5.5 Future work

The effects deviations in sprint capacity have on the amount of unfinished user stories is something that could not be gathered from the data due to the small size of the data set. It would be interesting to look at what impact these deviations have.

Whether running multiple projects or epics at once has an influence on the productivity has not been explored yet but could very well proof to be an important factor in a more general case.

(42)

References

[1] James A Shepperd, Patrick Carroll, Jodi Grace, and Meredith Terry. Exploring the causes of comparative optimism. 42, 01 2002.

[2] Pekka Abrahamsson, Ilenia Fronza, Raimund Moser, Jelena Vlasenko, and Witold Pedrycz. Predicting development effort from user stories. In Proceedings of the 2011 International Symposium on Empirical Software Engineering and Measurement, ESEM 11, pages 400–403, Washington, DC, USA, 2011. IEEE Computer Society.

[3] Jeff Atwood. Let’s play planning poker!, 2007.

[4] Barry Boehm, Chris Abts, and Sunita Chulani. Software development cost estimation approaches a survey. Ann. Softw. Eng., 10(1-4):177–205, January 2000.

[5] Barry W. Boehm. Software Engineering Economics. Prentice Hall PTR, Upper Saddle River, NJ, USA, 1st edition, 1981.

[6] L. C. Briand and I. Wieczorek. Resource modeling in software engineer-ing. J. Marciniak, Wiley, Ed. New York, 2002.

[7] Edward R. Carroll. Estimating software based on use case points. In Companion to the 20th Annual ACM SIGPLAN Conference on Object-oriented Programming, Systems, Languages, and Applications, OOP-SLA ’05, pages 257–265, New York, NY, USA, 2005. ACM.

[8] Mike Cohn. User Stories Applied: For Agile Software Development. Ad-dison Wesley Longman Publishing Co., Inc., Redwood City, CA, USA, 2004.

[9] Mike Cohn. Agile Estimating and Planning. Prentice Hall PTR, Upper Saddle River, NJ, USA, 2005.

[10] Mike Cohn. Advantages of the as a user, i want user story template, 2008.

[11] S. D. Conte, H. E. Dunsmore, and V. Y. Shen. Software Engineering Metrics and Models. Benjamin-Cummings Publishing Co., Inc., Red-wood City, CA, USA, 1986.

(43)

[12] Neil D. Weinstein and William Klein. Resistance of personal risk per-ceptions to debiasing interventions. 14:132–40, 04 1995.

[13] Thomas Fehlmann and Eberhard Kranich. Quality of estimations - how to assess reliability of cost predictions. pages 8–14, 10 2012.

[14] J. Grenning. Planning poker or how to avoid analysis paralysis while release planning. Hawthorn Woods: Renaissance Software Consulting, 3, 2002.

[15] Peter Hill. Practical Software Project Estimation- A Toolkit for Esti-mating Software Development Effort and Duration. Mc Graw Hill Edu-cation, 2010.

[16] Robert Hughes. Expert judgment as an estimating method. 38:67–75, 12 1996.

[17] Watts Humphrey. The personal software process (psp). Technical Re-port CMU/SEI-2000-TR-022, Software Engineering Institute, Carnegie Mellon University, Pittsburgh, PA, 2000.

[18] Ivar Jacobson, Grady Booch, and James Rumbaugh. The Unified Soft-ware Development Process. Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA, 1999.

[19] Ron Jeffries. The noestimates movement, 2013.

[20] C. Jones. Estimating Software Costs. Computing McGraw-Hill. McGraw-Hill, 1998.

[21] M. JoRgensen. A review of studies on expert estimation of software development effort. J. Syst. Softw., 70(1-2):37–60, February 2004. [22] M. Jrgensen. An empirical evaluation of the mkii fpa estimation model.

1997.

[23] Daniel Kahneman and Amos Tversky. Intuitive prediction: Biases and corrective procedures. 12:44, 06 1977.

[24] Albert L. Lederer, Rajecsh Mirani, Boon Siong Neo, Carol Pollard, Jayesh Prasad, and K. Ramamurthy. Information system cost estimat-ing: A management perspective. MIS Q., 14(2):159–176, June 1990.

(44)

[25] Albert L. Lederer and Jayesh Prasad. Perceptual congruence and in-formation systems cost estimating. In Proceedings of the 1995 ACM SIGCPR Conference on Supporting Teams, Groups, and Learning In-side and OutIn-side the IS Function Reinventing IS, SIGCPR ’95, pages 50–59, New York, NY, USA, 1995. ACM.

[26] Viljan Mahniˇc and Toma Hovelja. On using planning poker for estimat-ing user stories. J. Syst. Softw., 85(9):2086–2095, September 2012. [27] S. McConnell. Software Estimation: Demystifying the Black Art. Best

practices. Microsoft Press, 2006.

[28] Kjetil Molkken and Magne Jrgensen. A review of surveys on software effort estimation. In Proceedings of the 2003 International Symposium on Empirical Software Engineering, ISESE ’03, pages 223–, Washington, DC, USA, 2003. IEEE Computer Society.

[29] Kjetil Molokken-Ostvold and Nils Christian Haugen. Combining esti-mates with planning poker–an empirical study. pages 349–358, 2007. [30] Kjetil Molokken-Ostvold, Nils Christian Haugen, and Hans Christian

Benestad. Using planning poker for combining expert estimates in soft-ware projects. J. Syst. Softw., 81(12):2106–2117, December 2008. [31] Ministry of the Interior and Kingdom Relations. Ministry of the interior

and kingdom relations, 2017.

[32] James W. Paulson, Giancarlo Succi, and Armin Eberlein. An empirical study of open-source and closed-source software products. IEEE Trans. Softw. Eng., 30(4):246–256, April 2004.

[33] R. Popli and N. Chauhan. A sprint-point based estimation technique in scrum. In 2013 International Conference on Information Systems and Computer Networks, pages 98–103, March 2013.

[34] R. Popli and N. Chauhan. Cost and effort estimation in agile software development. In 2014 International Conference on Reliability Optimiza-tion and InformaOptimiza-tion Technology (ICROIT), pages 57–61, Feb 2014. [35] J. r. Project estimation using screenflow engineering. In Proceedings of

(45)

and Practice (SE:EP ’96), SEEP ’96, pages 150–, Washington, DC, USA, 1996. IEEE Computer Society.

[36] Scrum.Org and ScrumInc. Scrum guide, 2017.

[37] Martin Shepperd, Chris Schofield, and Barbara Kitchenham. Effort estimation using analogy. In Proceedings of the 18th International Con-ference on Software Engineering, ICSE ’96, pages 170–178, Washington, DC, USA, 1996. IEEE Computer Society.

[38] C. Solis and X. Wang. A study of the characteristics of behaviour driven development. In 2011 37th EUROMICRO Conference on Software En-gineering and Advanced Applications, pages 383–387, Aug 2011.

[39] Richard Stutzke. Estimating Software-Intensive Systems. Pearson Edu-cation US, Boston, United States, 2005.

[40] Rob Thomsett. Double dummy spit and other estimating games. Tech-nical report, 1996.

[41] A. Zeaaraoui, Z. Bougroun, M. G. Belkasmi, and T. Bouchentouf. User stories template for object-oriented applications. In Third In-ternational Conference on Innovative Computing Technology (INTECH 2013), pages 407–410, Aug 2013.

[42] Woody Zuill and K Meadows. Mob programming: A whole team ap-proach. In Agile 2014 Conference, Orlando, Florida, 2016.

(46)

(47)

6.1 Appendice A - Data set sprints

The following table shows which team members where actively working in the team during the time the data set was collected and what amount of hours they have assigned to fullfilling team duties (development, meetings, etc.).

Team 1

Name Role Available Team Member 1 Developer 6 hours 1 Member 2 Developer 7 hours 1 Member 3 Developer 6 hours 1 Team 2

Name Role Available Team Member 1 Developer 6 hours 2 Member 2 Developer 7 hours 2 Member 3 Developer 6 hours 2

The next tables display the sprints and their associated user stories. It also displays the amount of total hours of work that are in the sprint (the sum of the hours of estimations that are given for each user story in the sprint).

(48)

6.1.1 Team 1 - Sprint 10

Sprint data Identifier Sprint 10 Total hours estimated 128 hours Total hours available 138 hours User stories finished 8

User stories unfinished 2

Total hours estimated (finished) 96 hours Total hours actual (finished) 86 hours Deviation finished % 10.41% Total hours actual (unfinished) 52 hours

Availability

Team member Available daily Available days Total hours Member 1 6 hours 8 days 48 hours Member 2 7 hours 8 day 48 hours Member 3 6 hours 7 days 42 hours

User stories and planning

User story Final estimate Actual time Deviation % PROJECT2-22 32 hours 23 hours -28.125% PROJECT1-172 16 hours 14 hours -12.5% PROJECT1-181 8 hours 8 hours 0% PROJECT1-262 16 hours 15 hours -6.25% PROJECT1-287 BUG 4 hours 5 hours +25% PROJECT1-288 4 hours 6 hours +50%

PROJECT1-289 16 hours 32 hours Unfinished >100% PROJECT1-290 16 hours 20 hours Unfinished >20% PROJECT1-295 8 hours 7 hours -12.5%

(49)

Total hours estimated 172 hours Total hours available 170 hours User stories finished 8

Total hours estimated (finished) 172 hours Total hours actual (finished) 141 hours Deviation finished % 18% Total hours actual (unfinished) 0 hours

Availability

Team member Available daily Available days Total hours Member 1 6 hours 9 days 54 hours Member 2 7 hours 8 days 56 hours Member 3 6 hours 10 days 60 hours

User story Final estimate Actual time Deviation % PROJECT3-1 8 hours 4 hours -50%

PROJECT3-2 BUG 8 hours 14 hours +75% PROJECT3-3 4 hours 7 hours +75% PROJECT3-4 32 hours 28 hours -12.5% PROJECT3-5 BUG 8 hours 15 hours +46.66% PROJECT3-6 64 hours 42 hours -34.38% PROJECT3-7 32 hours 7 hours -78.13% PROJECT1-272 16 hours 24 hours +50%

(50)

Availability

User story Final estimate Actual time Deviation % PROJECT1-306 32 hours 20 hours -37.5% PROJECT3-26 8 hours 6 hours -25% PROJECT3-27 BUG 4 hours 4 hours 0% PROJECT3-28 BUG 8 hours 6 hours -25% PROJECT3-29 8 hours 8 hours 0% PROJECT3-30 BUG 16 hours 18 hours +11.11%

PROJECT3-31 16 hours 20 hours Unfinished >20% PROJECT3-32 16 hours 17 hours Unfinished >5.9% PROJECT3-33 16 hours 18 hours +11.11%

(51)

(52)

Availability

User story Final estimate Actual time Deviation % PROJECT3-31 16 hours 13 hours -18.75% PROJECT3-32 16 hours 17 hours +6.25% PROJECT4-146 8 hours 7 hours -12.5% PROJECT4-147 8 hours 7 hours -12.5% PROJECT4-148 8 hours 8 hours 0% PROJECT4-149 8 hours 4 hours -50% PROJECT4-150 8 hours 2 hours -75% PROJECT4-151 16 hours 16 hours 0% PROJECT4-165 8 hours 5 hours -37.5% PROJECT5-5 8 hours 10 hours +25% PROJECT5-6 16 hours 19 hours +15.79% PROJECT6-117 BUG 1 hour 3 hours +200% PROJECT6-118 BUG 16 hours 15 hours -6.25%

PROJECT7-1 8 hours 9 hours Unfinished >12.5% PROJECT8-1 4 hours 3 hours -25%

PROJECT8-2 4 hours 3 hours -25% PROJECT8-4 1 hour 2 hours +100% PROJECT8-5 1 hour 1 hours 0% PROJECT1-306 32 hours 22 hours -31.25%

(53)

Availability

User story Final estimate Actual time Deviation %

PROJECT7-1 8 hours 9 hours Unfinished >11.11% PROJECT8-6 16 hours 14 hours -12.5%

PROJECT8-7 8 hours 7 hours -12.5% PROJECT8-8 8 hours 11 hours +37.5% PROJECT8-9 8 hours 8 hours 0%

PROJECT8-10 8 hours 12 hours Unfinished >50% PROJECT8-11 8 hours 2 hours Unfinished -PROJECT8-12 8 hours 9 hours +12.5% PROJECT8-13 8 hours 6 hours -25% PROJECT8-14 16 hours 21 hours +23.8% PROJECT8-15 8 hours 3 hours -62.5% PROJECT8-16 16 hours 12 hours Unfinished -PROJECT8-17 8 hours 4 hours -50%

PROJECT8-18 8 hours 7 hours -12.5% PROJECT8-19 8 hours 4 hours -50%

(54)

Availability

User story Final estimate Actual time Deviation % PROJECT9-89 16 hours 14 hours Unfinished -PROJECT7-1 8 hours 7 hours -12.5% PROJECT7-2 4 hours 8 hours +100% PROJECT8-10 8 hours 5 hours -37.5% PROJECT8-11 8 hours 7 hours 12.5% PROJECT8-16 BUG 16 hours 23 hours +43.75% PROJECT8-20 16 hours 16 hours 0% PROJECT8-21 1 hour 2 hours +100% PROJECT8-24 4 hours 5 hours +25% PROJECT8-26 BUG 16 hours 15 hours -6.25%

(55)

Total hours estimated 93 hours Total hours available 87 hours User stories finished 14 User stories unfinished 1

Availability

User story Final estimate Actual time Deviation % PROJECT1-308 8 hours 7 hours -12.5% PROJECT2-81 BUG 4 hours 4 hours 0%

PROJECT3-37 16 hours 16 hours Unfinished -PROJECT3-38 4 hours 3 hours -25%

PROJECT3-40 4 hours 6 hours +50% PROJECT3-41 16 hours 17 hours +5.88% PROJECT3-43 16 hours 12 hours -25% PROJECT3-44 4 hours 5 hours +25% PROJECT4-31 4 hours 2 hour -50% PROJECT4-32 1 hour 3 hours +200% PROJECT4-34 1 hour 1 hour 0% PROJECT4-35 1 hour 1 hour 0% PROJECT4-36 1 hour 1 hour 0% PROJECT4-37 1 hour 1 hour 0% PROJECT4-45 8 hours 4 hours -50% PROJECT4-46 4 hours 4 hours 0%

Reliability of Planning Poker estimations in scrum based projects