Complexity of multi-project scheduling: the quest for accurate measures

(1)

COMPLEXITY

OF

MULTI-PROJECT

SCHEDULING:

THE

QUEST

FOR

ACCURATE MEASURES

Aantal woorden / Word count: 16.637

Stamnummer / student number : 01406032

Promotor / supervisor: Prof. Dr. Mario Vanhoucke

Masterproef voorgedragen tot het bekomen van de graad van:

Master’s Dissertation submitted to obtain the degree of:

Master in Business Engineering: Operations Management

(2)

(3)

confidentiality agreement

Permission

I declare that the content of this Master’s dissertation may be consulted and/or reproduced, provided that the source is referenced.

(4)

(5)

foreword

I would like to thank several people that have made this master’s dissertation possible. Rob Van Eynde has an extensive knowledge of the subject matter, and his feedback proved to be very useful in generating and quickly assessing the validity of proposed ideas and theories. My friend Jonas Nuyttens was a great help during the first year, his contributions mostly included discovery and initial idea generation. Furthermore, I would like to thank my girlfriend and my parents for their input and ideas on the numerous aspects of this work.

(6)

(7)

COVID-19

The impact of the COVID-19 pandemic on the execution of this thesis remained minimal. The discussions with Rob Van Eynde were done via video calls, they provided as much guidance as the ones we had face-to-face, before the pandemic. Since practically all of the work was solitary, the execution was able to take place as planned, and the same conclusions and results were obtained. Only the uncertainty and rescheduling of exams and other projects can be mentioned, but their impact was negligible.

(8)

(9)

I

Introduction

1

1 Introduction 2

II

literature review

4

2 Introduction 5 3 RCPSP 7 3.1 network topology . . . 7

3.1.1 network size indicator . . . 8

3.1.2 number of non-redundant arcs . . . 8

3.1.3 length . . . 9

3.1.4 width . . . 10

3.1.5 serial or parallel indicator . . . 10

3.1.6 measure of network parallelism . . . 10

3.1.7 activity distribution indicator . . . 11

3.1.8 short arcs indicator . . . 12

(10)

3.1.10 topological float indicator . . . 13

3.2 network complexity . . . 14

3.2.1 coefficient of network complexity . . . 15

3.2.2 order strength . . . 17

3.2.3 complexity index . . . 18

3.2.4 total activity density . . . 19

3.2.5 cyclomatic number . . . 19

3.3 resource availability . . . 20

3.3.1 resource factor . . . 20

3.3.2 resource strength . . . 21

3.4 resource distribution . . . 22

3.4.1 average resource loading factor . . . 23

3.5 resource contention . . . 23 3.5.1 utilization factor . . . 24 3.6 objective function . . . 24 4 RCMPSP 24 4.1 network complexity . . . 25 4.2 network topology . . . 25 4.3 resource distribution . . . 26

4.3.1 average resource loading factor . . . 26

4.4 resource contention . . . 28 4.5 objective function . . . 30

III

Development

31

5 Introduction 32 6 Multi-project approach 36 6.1 topological complexity . . . 36

(11)

6.1.1 serialism . . . 37 6.1.2 width . . . 38 6.1.3 topological float . . . 41 6.1.4 connectivity . . . 42 6.1.5 combination . . . 46 7 Single-project approach 46 7.1 topological complexity . . . 47 7.1.1 serialism . . . 48 7.1.2 width . . . 50 7.1.3 connectivity . . . 51 8 Resource complexity 53 8.1 resource distribution . . . 54 8.2 resource contention . . . 58

IV

Testing

61

9 Introduction 62 10 Multiproject approach 64 10.1 dependent set . . . 64 10.2 independent set . . . 64 10.2.1 topological complexity . . . 65 10.2.2 resource complexity . . . 79

11 Single project approach 80 11.1 dependent set . . . 80

11.2 independent set . . . 82

11.2.1 topological complexity . . . 82

(12)

12 Results 114 12.1 topological complexity approach . . . 114 12.2 resource complexity . . . 115

(13)

List of Figures

1 maximum I3case . . . 12 2 minimum I4 case . . . 13 3 single-project approach . . . 34 4 multi-project approach . . . 34 5 width example 1 . . . 38 6 width example 2 . . . 38 7 float - arcs . . . 41

8 inclusion dummy nodes . . . 43

9 maximum A’ . . . 44

10 single project example . . . 49

11 multi-project example . . . 49 12 RP RUbase example . . . 58 13 density ln e(Pj) . . . 64 14 density Nj . . . 66 15 Nj - ln e(Pj) . . . 67 16 density W . . . 67 17 W - ln e(Pj) . . . 68 18 density Wrel . . . 69 19 Wrel - ln e(Pj) . . . 69 20 density Wrel,loc . . . 70 21 Wrel,loc- ln e(Pj) . . . 70 22 density I2 . . . 71 23 I2 - ln e(Pj) . . . 71 24 density OS . . . 72 25 OS - ln e(Pj) . . . 72

(14)

26 density OSadj . . . 73 27 OSadj - ln e(Pj) . . . 73 28 density S . . . 74 29 S - ln e(Pj) . . . 74 30 density Srel . . . 75 31 Srel - ln e(Pj) . . . 75

32 correlations multi-project measures . . . 76

33 density ln e(M ) . . . 80

34 density ln e(M ) full set . . . 80

35 AP DP R− AP DGEN . . . 81 36 density J . . . 83 37 J - ln e(M ) . . . 84 38 density W . . . 84 39 W - ln e(M ) . . . 85 40 density Wrel . . . 85 41 Wrel - ln e(M ) . . . 86 42 density Wloc,rel . . . 86 43 Wloc,rel- ln e(M ) . . . 87 44 density I2 . . . 87 45 I2 - ln e(M ) . . . 88 46 density I2,rel . . . 88 47 I2,rel - ln e(M ) . . . 89 48 density OS . . . 89 49 OS - ln e(M ) . . . 90 50 density OSadj . . . 90 51 OSadj - ln e(M ) . . . 91

(15)

52 density ¯OSadj . . . 91 53 OS¯ adj - ln e(M ) . . . 92 54 density σ2 OSadj . . . 92 55 σ2 OSadj - ln e(M ) . . . 93 56 density S . . . 93 57 S - ln e(M ) . . . 94 58 density Srel . . . 94 59 Srel - ln e(M ) . . . 95

60 correlations single-project measures . . . 96

61 N ARLF . . . 100 62 N ARLF - ln(PR - GEN) . . . 101 63 RP RU . . . 101 64 RP RU - ln(PR - GEN) . . . 102 65 RP RUbase . . . 102 66 RP RUbase - ln(PR - GEN) . . . 103 67 RP RU − RP RUbase . . . 103 68 (RP RUdif f) - ln(PR - GEN) . . . 104 69 M AU F . . . 105 70 (M AU F ) - ln(PR - GEN) . . . 105 71 σ2 M AU F . . . 106 72 (σ2 M AU F) - ln(PR - GEN) . . . 106 73 M AU Favg,1 . . . 107 74 (M AU Favg,1) - ln(PR - GEN) . . . 107 75 σ2 M AU F,avg,1 . . . 108 76 (σ2 M AU F,avg,1) - ln(PR - GEN) . . . 108 77 M AU Favg,2 . . . 109

(16)

78 (M AU Favg,2) - ln(PR - GEN) . . . 109

79 σ_{M AU F,avg,2}2 . . . 110 80 (σ2

M AU F,avg,2) - ln(PR - GEN) . . . 110

(17)

List of Tables

1 multi-project single regression 1 . . . 77

2 multi-project single regression 2 . . . 78

3 single-project multiple regression . . . 79

4 single-project single regression 1 . . . 97

5 single-project single regression 2 . . . 98

6 single-project multiple regression . . . 99

7 APD single regression 1 . . . 112

8 APD single regression 2 . . . 113

9 APD multiple regression . . . 114

(18)

List of Algorithms

1 algorithm: progressive level . . . 37

2 algorithm: width . . . 40

3 single project conversion . . . 47

(19)

Definitions and abbreviations

definitions

J number of projects in a multi-project instance j project index: j ∈ [1, J ]

Pj project j

Nj number of activities in project j

Aj number of arcs in project j

A0_j number of non-redundant arcs in project j i activity index i ∈ [1, Nj]

aij activity i in project j

ESij earliest start for aij

LSij latest start for aij

CPj resource unconstrained critical path duration of pj

CPmax maxj∈JCPj

P Lij progressive level for aij

Mj maxi∈NjP Li for project j

na number of activities at progressive level a

Wj width maxa∈Mjna for project j

M maxj∈JMJ for the multi-project

K number of renewable resources in a multi-project instance k renewable resource index

Rk available units of resource type k per time unit

t time unit index dij duration of activity aij

rijk demand for resource k by aij per time unit

wijk total demand for resource type k by aij

(20)

abbreviations

AoA Activity-on-the-arcs AoN Activity-on-the-node APR Activity priority rule APD Average percent delay

ARLF Average resource loading factor AUF Average utilization factor CP Critical path duration ESS Earliest Start Schedule

MAUF Modified average utilization factor

NARLF Normalized average resource loading factor

OS Order Strength

PR Priority rule PL Progressive level

RCMPSP Resource-constrained multi-project scheduling problem RCPSP Resource-constrained project scheduling problem RL Regressive level

(21)

Part I

(22)

1 Introduction

Since the introduction of project management during the 1950s, it has only gained more and more interest. This is both due to its increase in usefulness thanks to the expansion in terms of research in this field, as well as the changing structure of company workflow towards a project-based method. To obtain a certain goal, a set of activities, that are linked by precedence relations, have to be executed. The precedence relations can be interpreted as necessary conditions for an activity to start. Each of these activities need a certain amount of time to be completed and use a set amount of one or more resources (money, labour, ...). As resources are not in unlimited supply, and a common goal for projects is to be executed as early as possible, there exist two main constraints on a project: time and resources. Generating feasible and optimal schemes that minimize the duration for the project is dubbed the Resource Constrained Project Scheduling Problem, or RCPSP. Within these projects, uncertainty can be introduced by assuming the duration of the activities to be variable, as is often the case in reality. This added dimension is important in analysis, but is omitted in this paper to be able to analyse problem structure more accurately. A noticeable trend during recent years is that more and more companies use several projects that can be grouped in a portfolio. The same problem can be posed for the portfolio as a whole, and is then called the Resource Constrained Multi-Project Scheduling Problem, or RCMPSP. the j projects with Nj activities of such

a portfolio do not intertwine (e.g. an activity from one project cannot have a predecessor in another project) in terms of precedence relations, but can present overlap in their resource use. The definition of the resources per project shifts to a global definition for the portfolio, which implies that each of the resources can, but does not have to be used by each individual project. There are several variations to the RCMPSP, so a few stipulations have to be made about the general structure. It is assumed all of the projects can start at the time 0, if the precedence relations or resource constraints allow it. These tasks do not have

(23)

setup times, and they cannot be split. The resource availability is constant over the complete problem, and the resources are renewable. Additionally, the tasks have only 1 mode of execution.

Because of the relatively young nature of existing research in the field of Resource-Constrained Multi-Project Scheduling Problems, there are still aspects of these problems that are not accurately described by the existing measures. The main goal for the development of these measures is to summarise the important parts of an intricate network of activities and resources, and have the ability to draw quick and reasonably accurate conclusions from them. The first problem that arises is quite a general one: which characteristics do RCMPSP’s possess? Addi-tionally, and more concrete: which measures can and should be used to describe these characteristics in the greatest amount of detail, so as to give a clear picture of the structure (and, just as well, the complexity) of the stated problem. This area of research is the first step that allows the varying generation of datasets that are paramount in developing and optimizing solution methods for these problems. This set of challenges leads to the research question:

“How well do the existing measures describe the complexity of a Resource Con-strained Multi Project Scheduling Problem? What alternatives are available for these measures and how can we adapt or replace them to account for known pitfalls?”

The problem situation and formulation give rise to a clear first objective: gain a thorough understanding of the current landscape in terms of both characteristics and their corresponding measures. Because of the nature of the RCMPSP, this needs to happen for both RCMPSPs as well as RCPSPs. Next, these metrics are analyzed in terms of what they represent, and tested if they contain useful elements in achieving the goals set by the research question, e.g. describing complexity. From these findings, existing measures are adapted and new mea-sures are presented, and an accurate method for describing project complexity is derived. Finally, remaining pitfalls are identified.

(24)

Part II

(25)

2 Introduction

Because the existing literature on RCPSPs is extensive, as opposed to the limited availability of research for RCMPSPs, The RCPSPs are discussed first. It is necessary to understand the dynamics of a single project before trying to grasp the intricacies of a multi-project problem. Therefore, not only the measures that were developed with describing complexity in mind are discussed, but also the ones that were intended to serve as a mere indication of certain features. Since it has been the subject of many studies, and ample ideas have been generated, gauging complexity is in itself a complex matter. Since the existing RCMPSP measures are often derivatives of their single-project counterparts, these are discussed last.

As mentioned in the introduction, The Resource Constrained Project Scheduling Problem sets an objective to minimize the makespan of a project consisting of activities, precedence relations and resources. To facilitate analysis, and create a visual representation of the problem at hand, projects are described in project networks, consisting of nodes and arcs. There are two notations that need te be mentioned; Activity-on-the-Arc (AoA) and Activity-on-the-Node (AoN). In the first representation the activities are represented by arcs, and the nodes signify events. the AoN representation, which is used here, is less ambiguous, since there are no dummy nodes necessary apart from the dummy start and end nodes. The RCMPSP can use the same representation, with some stipulations regarding the dummy nodes.

The characteristics of problems and projects are generally divided in 4 main categories; complexity, topology, resource distribution and resource contention. Complexity measures would seem to be the only category discussed, but despite their title, these measure do not describe all complexity within a project. The naming of categories and measures in literature is based more on intent, than on actual outcome. As stated in Elmaghraby and Herroelen (1980): “...it seems

(26)

evident to us that the structure of the network - in whichever way it is measured - will not be sufficient to reflect the difficulty encountered in the resolution of such problems. In particular, the availability of resources must play an impor-tant role...”. That is to say that there is a necessity for resource measures in describing complexity, as the logical network structure cannot encompass it by itself.

The measures listed are grouped in terms of their likeness, either in terms of formulas or the reasoning behind them. By further investigating these measures and their drawbacks, a clear picture of the optimality of these measures can be ascertained. Measures will can appear to be more or less effective than others, which aids development of adaptations of this measure to fit a RCMPSP for the RCPSP measures. This adaptation can be in trying to perform the measure on the multi-project instance, or use the same reasoning in developing an alternative.

(27)

3 RCPSP

Because these measures do not account for an overarching multi-project to be present, the j subscripts are dropped for readability. The characteristics and their existing measures of RCPSPs can be classified in the following categories: topology, complexity, resource availability, resource distribution and resource contention. Network topology is introduced first, since it discusses concepts used in the other sections.

3.1 network topology

Network topology measures are used on a more descriptive basis than the plexity measures, and are not specifically designed to describe a problems com-plexity. They often use the same variables as the metrics mentioned in the next section, and can serve as indicators for information not included in the current complexity measures. Below are the current topology measures available in lit-erature. As defined in Coelho et al. (2008), measures 1, 5 and 8 are exact copies as defined by Coelho et al. (1999), while measures 7 and 9 are improved version of the original measures. Measure 4 is used to facilitate the other measures, while the tenth measure is newly introduced. Elmaghrabi (as cited in Coelho et al. (2008)) introduces measure 3, the second measure is defined in Drexl et al. (1995). Measure 6 is defined in Burke et al. (2000). Parameters 1, 2, 3 and 4 are used to define the topological structure of a RCPSP in Browning and Yassine (2010a). Important to note here is that these are not classified as measures, since they are only used to define the bounds to allow the complexity measure C(A0, N ), defined in the same paper, to exist. These are however included in our analysis to ensure the breadth of understanding.

1. network size indicator N ;

(28)

3. length M ; 4. width W ;

5. serial or parallel indicator I2;

6. measure of network parallelism w; 7. activity distribution indicator I3;

8. short arcs indicator I4;

9. long arcs indicator; I5

10. topological float indicator I6;

3.1.1 network size indicator

The first measure is fairly straightforward; the network size N gives useful information in defining the bounds of the other measures. On its own, it is clear that there is too much variability in other parameters to measure complexity accurately.

3.1.2 number of non-redundant arcs

The number of arcs in a network A signifies how many precedence relations are present, and as such defines how many possible variations there are in gener-ating a schedule. This number of arcs can be corrected for redundant arcs; a redundant arc is defined in Drexl et al. (1995): an arc (a; b) is called redundant, if there exists a set of arcs from a to b, where the size of the set ≥ 2. The number of non-redundant arcs in the project network is denoted by A0_.

(29)

3.1.3 length

The concept of project length is defined as the maximum number of consecutive activities from beginning to end of the network. This notion was formalized through the introduction of the progressive level (PL) or tiers. To calculate the length of a network, the PL is calculated for every activity in the network. As stated in Elmaghrabi (as cited in Coelho et al. (2008)), the progressive level P Lij of each activity aij is defined as

P Lij =      0 if P Ri= ∅ maxk∈P RiP Lk+ 1 if P Ri6= ∅ (1)

where P Ri the set of predecessors to activity i. A PL is defined as a subset of

the N activities with no arcs between them, and that depend only on activities from lower PLs. The length, or the PL for a project, is simply the maximum PL of its activities;

M = max

i∈N P Li (2)

While the number of PLs does have an effect on the complexity measure C(A0, N ), it could in theory serve as a complexity measure on its own. The length is in essence a serial/parallel measure, seen as a serial network has M = N , and a parallel network M = 2. An important remark is that the length on its own is hard to compare between projects of different sizes, since the measure ranges within [2, N ].

(30)

3.1.4 width

The width W is defined as the maximum number of activities in parallel. this is essentially the maximum of a vector n denoting the allocation of the activities to the different PL’s (Browning and Yassine, 2010a). Measures that fall in this category are also used in Coelho et al. (1999). These measures include: width W = maxa∈Mjna, e.g. the number of activities at the PL with the most

activities; the average width ¯W = _MN; the total absolute deviation from the average width for αW = P

M

a=1(na− ¯W ); and the maximum value for αW =

(M − 1)( ¯W − 1) + (N − M + 1 − ¯W )

3.1.5 serial or parallel indicator

This measure (I2) is defined in Coelho et al. (2008) as the closeness to series or

parallel graph

M − 1

N − 1 (3)

It compares the number of tiers in the network to the number of nodes in the project. Values range from [0, 1]; [parallel, serial].

3.1.6 measure of network parallelism

The measure of network parallelism w, found in Burke et al. (2000), calculates the portion of parallel activities compared to the total number of activities in the project, and is therefore defined as

N − M

(31)

w is 0 for a serial network, and asymptotically goes to one as N increases to infinity in a parallel network.

The 4 previous measures all try to capture solely the topological nature of serialism/parallelism of a project. Similar information and concepts are also present in the complexity measures, but those measures try to define a broader concept. When comparing the measures, similarities are clear between I2 and

w. Both are restricted to the interval [0, 1], and use the same variables. The main difference between the two is the following; The scope of w is in essence a comparison of the parallelism to every project imaginable, whereas I2compares

the serialism of a project to the maximum serialism the project itself can achieve, given the number of tiers and number of nodes.

3.1.7 activity distribution indicator

I3=      0 if m ∈ {0, N }; αM αmax = PM a=1|wa−w| 2(M −1)(w−1) if m /∈ {0, N } (5)

I3 signifies the distribution of the activities over the progressive levels based on

the width of each progressive level, which translates to an actual value of 0 if M ∈ {1, N }, since the network in these cases is either completely parallel or serial. I3is 1 for a network with M − 1 progressive levels with a width of 1, and

(32)

Figure 1: maximum I3 case

3.1.8 short arcs indicator

For the definition of the short arcs indicator, arc length needs to be introduced. The length l of an arc (i, j) is defined in Coelho et al. (2008) as P Lj − P Li,

and signifies how many PLs the arc spans. n0_l signifies the number of arcs of each length, and D is the maximum number of short arcs in a network, e.g. when each activity of each PL is connected to all activities of the next PL; D =PM −1

a=1 wa∗ wa+1. The short arcs indicator I4 is defined in Coelho et al.

(1999), but Coelho et al. (2008) uses the same conventions as this paper, so the latter formula is shown.

I4=      0 if D = N − w1 n0−N +w1 D−N +w1 if D > N − w1 (6)

I4 is 0 if the network has the minimum number of short arcs possible for the

amount of nodes, progressive level and width of the first PL, e.g. N − w1.

It seems to have predictive power in the random network generator RanGen2 (Coelho et al., 2008).

(33)

3.1.9 long arcs indicator I5=      1 if |A| = N − w1 (Pm−1 l=2 n 0 M −l−1 M −2 )+n 0 1−N +w1 |A|−N +w1 if |A| > N − w1 (7)

Similar to the short arcs indicator, the long arcs indicator ranges between [0, 1]. However, the maximum number of long arcs occurs at I5= 0, when the network

is triangular, with every ’extra node’ (other than the series necessary to fulfill the number of progressive levels) in parallel to the node at the first level, and connected with a maximum length to the last node of the series before the sink node.

Figure 2: minimum I4case

The other end of the spectrum signifies a network with only arcs of length 1. Coelho et al. (2008) remarks that this measure has a high correlation with the short arcs indicator, which might hamper performance if used simultaneously.

3.1.10 topological float indicator

(34)

RLij =      M if Si= ∅ maxk∈SiRLk− 1 if Pi6= ∅ (8)

Based on the difference between the progressive and regressive levels of each ac-tivity, the topological float indicator is defined as the fractional total topological floatPN

i=1(RLi− P Li) in relation to its maximum.

I6=      0 if M ∈ {1, N } PN i=1(RLi−P Li) (M −1)(N −M ) if M /∈ {1, N } (9)

I6 = 0 for completely serial networks with parallel activities that do not span

over multiple levels, 1 for networks with M serial activities with all other ac-tivities connected to final node before end node. This should be equal to the bounds of the long arcs indicator, which suggest a high correlation between these measures as well. The long arcs indicator might be a proxy for the topological float indicator. When multiple arcs leave a certain node, only the shortest will have an impact on I6, since the shortest arc determines its PL and RL, whereas

I5 will count both. A second point of critique in Johnson (1967), is that a path

of serial activities makes all activities contribute to this topological float, which might overestimate the flexibility in the network, since this slack can only be ’used’ once for the entire path.

3.2 network complexity

There are a fair amount of dedicated complexity measures for the RCPSP, which are listed below. This list is of course not exhaustive, but shows the most common and useful measures. Most commonly used in practice are the measures that have been adapted from the earlier complexity measures to use only the non-redundant arcs between nodes. The redundant arcs should not increase

(35)

complexity. The measures include but are not limited to the following:

1. coefficient of network complexity (CN C) 2. network complexity (C)

3. network Complexity (C(A0, N )) 4. Order Strength (OS)

5. density 6. flexibility

7. restrictiveness (R) estimator 8. complexity index (Ci)

9. total activity density (T − density) 10. average activity density (AAD) 11. cyclomatic number (S)

12. measures of network complexity (M N C)

3.2.1 coefficient of network complexity

Pascoe (1966) defines the coefficient of network complexity (CN C) as the aver-age number of non-redundant arcs per node for the AoA representation. Since there are several drawbacks to this notation, Davies (1974) adopted the CN C for the AoN representation, resulting in A

N. Kaimann (1974) stated the measure

could be improved by squaring the number of arcs in the equation A_N2. These measures make the number of arcs relative to the size of the problem, and hence measure connectivity.

The CN C was criticized for the following remarks: a higher complexity (accord-ing to the measure) is caused by higher connectivity, but very highly connected

(36)

problems are easier to solve. This is shown in both Drexl et al. (1995) and Alvarez-Vald´es Olagu´ıbel and Tamarit Goerlich (1989). It is not limited to the interval [0, 1], and the measure also fails to distinguish complexity between problems with equal amounts of nodes and arcs. These findings support the idea that the simple concept of connectivity itself is not what makes a problem instance hard. The used parameters, being solely the number of arcs and nodes, is criticized in Elmaghraby and Herroelen (1980); “The suggested measures of Pascoe, Davies and Kaimann rely totally on the count of the activities and nodes in the network. Since it is easy to construct networks of equal number of arcs and nodes but varying degrees of difficulty in analysis, we fail to see how these ’measures’ can discriminate among them!”.

Drexl et al. (1995) correct the CN C for redundant arcs by substituting A with A0; the network complexity C is defined as A_N0. This follows the convention that the number of redundant arcs should not have an influence on project complexity. The remarks for the CN C still apply to this measure as well, but it should be more accurate in describing complexity.

In Browning and Yassine (2010a), the network complexity C(A0, N ) normalizes the network complexity C over the interval {0, 1} by altering the equation to

A0− A0 min

A0

max− A0min

(10)

The minimum and maximum number of non-redundant arcs are calculated based on a specified number of activities and progressive levels (called tiers in Brown-ing and Yassine (2010a)), and the vector of activities at each progressive level n. A0max, for example, occurs when all activities in every tier are only connected to

all activities in the following PL. The use of the number of tiers in the complex-ity measure C(A0_{, N ) could be a limiting factor, as pointed out by the remarks}

in Van Eynde (2017). Furthermore, it is likely to span a lesser range than the Order Strength.

(37)

3.2.2 order strength

The Order Strength (OS) is regarded in literature as the most reliable and broad complexity measure developed to date. Mastor (1970) defines the OS as the number of precedence relations, including the transitive ones, divided by the theoretical maximum of such precedence relations;

|A|

N (N − 1)/2 (11)

This equation is easiest understood as explained in Kao and Queyranne (1982), about their density measure; “we denote all precedence relations in a transitive precedence matrix P = pij, where pij = 1 if task i precedes task j and 0

other-wise. Clearly, P is a square upper-triangular matrix. The maximum number of pij’s which can assume a value of one is N (N − 1)/2. Let |A| be the number of

entries in P with pij, = 1.”

The maximum number of precedence relations used in the OS measure includes redundant arcs. If only non-redundant arcs are used, this number still exceeds a realistic number of precedence relations, hereby converging OS to 0 for in-creasing number of jobs, as mentioned in Drexl et al. (1995) and Van Eynde (2017)

The flexibility ratio, as defined in Dar-el (as cited in Elmaghraby and Herroelen (1980)), is the number of zeroes in the precedence matrix divided by the total, hence OS = 1 - flexibility ratio.

The restrictiveness (R), as in Thesen (1977), is R = 1 − (_log(Slog(S)

max)), with S the

number of feasible sequences within predecessor relation constraints. Smaxis the

maximum number of sequences possible if all sequence restrictions were removed, a value between {0, 1}. This measure is criticized in Elmaghraby and Herroelen (1980): “Thesen’s ’measure of restrictiveness’, while eminently plausible, is

(38)

non-verifiable because of the impossibility of evaluating S, the number of feasible sequences. True, Thesen offers a number of ’estimators’ of S; but unfortunately their precision and validity are open to question.”

The above mentioned measures are all one and the same, as noted in Herroelen and De Reyck (1999). The different formulas all denote the same concept; as opposed to comparing the number of arcs (connectivity) to the network size, OS measures the connectivity relative to its maximum. This realization gives more insight in the different interpretations of the same concept and help in analyzing the measure. The original OS formula is easiest to use and recieves preference.

3.2.3 complexity index

Bein et al. (1992) define the complexity index (Ci) as the minimum number of

node reductions sufficient to reduce a two-terminal acyclic network to a single edge. It is used in the generators DAGEN (Agrawal et al., 1996), as well as in RANGEN (Demeulemeester et al., 2003). The main drawbacks are the lack of distinction between networks. Both fully parallel and serial networks are fully series-parallel, and so as a measure of complexity, the reduction complexity is not very useful. It does however introduce an important aspect that is missing in other metrics. The difference between serial and parallel networks should be explained by the used complexity measure(s), otherwise too much variation can take place without the measure changing, hereby rendering it less effective. The reduction complexity mentioned in Bein et al. (1992) is the original mea-sure for the AoA representation. Agrawal et al. (1996), as well as De Reyck and Herroelen (1996) use this measure in its original AoA representation. The generated networks on which their analysis is based is AoN, but each network is transferred to the AoA representation using the Bein et al. (1992) algorithm. The relationship between the OS and Ci measures is explored in an

(39)

experi-mental fashion in Demeulemeester et al. (2003), where it is hypothesized Ci has

explanatory power regarding complexity. However, the use of the AoA represen-tation and limited correlation with compurepresen-tation time, it does not seem suitable. Even if these measures seem to be less effective and/or useable than others for measuring complexity, they do propose insights in the variations in which to measure connectivity and serialism.

3.2.4 total activity density

Johnson (1967) defines the T − density asP

allmaxi(0, nsi− npi) with nsithe

number of nodes leading into node i, and npithe number of nodes leading out of

node i. The basis for this measure is clearly of a topological nature. The more the network becomes parallel, the more nodes need to have a larger number of outgoing connections than incoming ones, and the larger the measure will be. This metric corresponds to the number of unique paths in the network. Patterson (as cited in Browning and Yassine (2010a)) extends this to the average activity density (AAD); T −density_N .

3.2.5 cyclomatic number

The number of paths that are linearly independent is given by the cyclomatic number S, originally developed by McCabe (1976) for use in software complexity analysis. It can also be found in similar forms in Bonchev and Buck (2007) and Elmaghraby and Herroelen (1980). It is defined as follows: the cyclomatic number S of a graph with N vertices, A edges, and p connected components is

(40)

Elmaghraby and Herroelen (1980) change the formula in their first measure of network complexity to

S = A − N + 1 (12b)

for their application in RCPSPs. The research that has been performed on this measure is limited, while the basis seems sound.

3.3 resource availability

• Resource factor (Rf ) • RU

• Resource strength (RS) • RC

Important for the following measures is the Earliest Start Schedule (ESS), that is constructed as a resource-unconstrained solution, where every activity is sched-uled at its earliest starting time. The time it takes to complete this solution is called the Critical Path duration (CP). The set of activities that have to be executed at their earliest possible start, make up this ’critical path’. These mea-sures help gain a clearer understanding of the basis for the resource distribution characteristic. They show that defining the use of resources in a project needs to touch on more than one aspect.

3.3.1 resource factor

The resource factor (Rf ) is defined in Pascoe (1966), and is utilized in numerous studies ((Cooper, 1976), (Alvarez-Vald´es Olagu´ıbel and Tamarit Goerlich, 1989)

(41)

and Drexl et al. (1995). The Rf measures the density of array rik, which signifies

the amount of resource type k required by activity ai. Rf is calculated as

1 N ∗ K N X i=1 K X k=1      1 if rik> 0 0 otherwise (13a)

This implies that if RF = 1, then each job requests all resources. RF = 0 indicates that no job requests any resource. This measure only counts whether or not an activity requires a resource or not, and does not take into account the amount of that resource needed. A very similar measure is proposed in Demeulemeester et al. (2003); RUi= K X k=1      1 if rik> 0 0 otherwise (13b)

For a project, the RU is simply a vector of the number of different resources each activity uses.

3.3.2 resource strength

Resource Strength (RSk) is defined in Cooper (1976) as the ratio of the average

number of different kinds of resources used per job to the number available. It is calculated as RSk = PRk

i∈NrikN with Rk the amount of available resources

of type k, and rik as defined above. Drexl et al. (1995) adapted this measure

to combat the following remarks; It is not standardized in the interval [0, 1], determining feasibility from a given resource strength is not straightforward, and it does not take into account the potentially added difficulty imposed by the network topology. The updated formula is

(42)

RSk = ak− rmink rmax k − r min k (14)

where ak is the total availability of renewable resource type k, rmink is the

maximum amount of resources of type k required by a single activity, and r_kmax is the maximum demand in the Earliest Start Schedule (ESS).

This means that for RSk = 0 the resource availability is equal to the minimum

satisfying feasibility, and RSk= 1 is the maximum case. It is noted in

Demeule-meester et al. (2003) that “Elmaghraby and Herroelen (1980) were the first to conjecture that the relationship between the complexity of a RCPSP and the resource availability varies according to a bell-shaped curve. De Reyck and Her-roelen (1996) and HerHer-roelen and De Reyck (1999) confirmed this conjecture and rejected the negative correlation between problem difficulty and the RS found by Drexl et al. (1995).”

An alternative measure to the RS has been introduced by Patterson (as cited in Browning and Yassine (2010a)); RCk =r_ak_k, where rk =

PN i=1rik PN i=1Yik with Yik=      1 if rik> 0 0 otherwise

These findings point to the resource availability as an important parameter, but there is much debate over the performance of these measures.

3.4 resource distribution

• Average Resource Loading Factor

The resource distribution is an aspect of resource complexity that has only recently surfaced. It tries to determine where (time-related) the majority of the

(43)

resources are situated. This analysis is performed on the ESS.

3.4.1 average resource loading factor

The Average Resource Loading Factor (ARLF ) (Davis and Kurtulus, 1982) was developed with the RCMPSP in mind, but also finds its application in a single project setting. Principles are further elaborated upon in section 4.3. The formulas for the single-project case [...] .

ARLF = 1 CP CP X t Ki X k=1 N X i=1 ZitXit( rik Ki ) (15a) with Zit=      1 if t ≥CP₂ −1 if t <CP₂ (15b) and Xit=     

1 if activity aiis active at time t;

0 otherwise

(15c)

3.5 resource contention

• utilization factor (U F )

Resource contention finds its use as a supplement to the resource distribution. Because the existing resource distribution measures have trouble distinguishing projects at certain values, the resource contention offers additional informa-tion. For the single-project case, it can be regarded as a measure for resource

(44)

availability as well.

3.5.1 utilization factor

The Utilization factor (U F ) was originally developed by Johnson (1967) as the ratio _a∗CPm , where a is an overall resource limit and CP a critical path solution, and a∗CP a minimum “capacity” that must be provided. If the work content m is less than this capacity then, intuitively, resources are not very constraining. More concretely, Davis and Kurtulus (1982) defines it as the ratio of the total amount required to the amount available in each time period based on the ESS. If U F > 1 for each resource in each time period, then the resources are not constraining, and the project will not encounter any delays.

3.6 objective function

There are numerous variations in the objective function that can change be-haviour of a solution method, and make instances more or less complex for the solution method to handle. This warrants the recognition of the objective func-tion as an important feature of a project. These effects might influence certain topologies or resource structures more than others, but these properties should be specifically analyzed when selecting objective functions, which is not the aim of this paper. Therefore, the objective function will not be discussed. Examples can be found in Browning and Yassine (2010b).

4 RCMPSP

The RCMPSP can be analyzed in two ways. The problem can be interpreted as a single project, combining the set of projects into a single network, and analyzed as such. A second option is to combine values at the project level into

(45)

a metric for the multi-project instance. Note that the first approach facilitates the use of RCPSP measures on the multi-project instance. Since these measures are already discussed in section 3, they will not be mentioned here.

4.1 network complexity

The existing measures for network complexity in a RCMPSP environment are generally a vector of RCPSP measures. Representations can be found in Van Eynde (2017) for OS, and Browning and Yassine (2010b) for Ci.

Despite using them, Van Eynde (2017) criticizes the vector form, due to its lacking potential for generalization and difficulty in interpretation. Other than these vectors, Van Eynde (2017) mentions the average OS; “The average OS ( ¯OS) can predict the network complexity of a multi-project to a certain extent. However, it is not a sufficient measure, as for problems with OS = 0, 5 (i.e. HHMLL, HMMML, MMMMM), there are still big differences in performance it cannot explain.” This is similar to the pitfalls in the ARLF, where averageing leads to ambiguity at the midpoint.

4.2 network topology

There are no existing measures that are equivalent to the measures used to de-scribe network topology characteristic from the subprojects the project consists of. Only some complexity measures that have a topological nature are currently used in a multi-project setting. If the single-network approach is used, of course all topological indicators can be used, but the effectiveness of their analyses then lies in the capabilities of this approach. It is important to note the difference between a large project and a portfolio of projects. The number of connections, for example, is much lower in the multi-project, as there can be no connection between activities of different projects.

(46)

4.3 resource distribution

• adjusted normalized average resource loading factor (N ARLFadj)

• variance NARLF and ARLF (σ2 ARLF)

4.3.1 average resource loading factor

The original design of the Average Resource Loading Factor (ARLF ) measure is defined by Davis and Kurtulus (1982): “it identifies whether the peak of total resource requirements is located in the first or second half of the project’s original, critical path duration.“ in other words, in determines if the bulk of the resource load is situated in the front or back half of the project or portfolio, and if the project/portfolio is respectively front- or back-loaded. In the original application for multi-project settings, the different values for each individual project were simply averaged over the number of projects, resulting in a single value for the project portfolio.

ARLF = PJ

j=1ARLFj

J (16a)

There are, however, several issues embedded in this original design. First, as pointed out by Browning and Yassine (2010b), by falling victim to the ‘flaw of averages’, where instances can have distinctively different characteristics and produce the same ARLF value. Second, by failing to take in to account the different lengths of projects in a portfolio. These observations are supposedly solved by normalizing the ARLF over the portfolio’s duration. Here, the CPj

duration used in the ARLF calculation of each project is changed to the CP du-ration of the problem, e.g. CPmax, or simply CP 16b. According to Van Eynde

(2017) this principle also needs to be applied to the calculation of the Z-values, since these values are also calculated based on each project’s individual CP

(47)

duration, which biases the measure to become assymetrical. This progression makes for an adapted NARLF, which produces acceptable values for normal projects and portfolios.

N ARLFj= 1 CP CP X t Kij X k=1 Nj X i=1 ZijtXijt( rijk Kij ) (16b)

with xijt defined similarly to (15c)

Xijt =     

1 if activity aijis active at time t;

0 otherwise (16c) and Zijt=      1 if t > CP₂ −1 if t < CP₂ (16d)

Browning and Yassine (2010a) also introduce the variance between the NARLF and ARLF, to distinguish between portfolios with different spreads of ARLF values. σ_ARLF2 = 1 J J X j=1 (ARLFj− ARLF )2 (17)

Remaining are issues that are more inherent to the measure, and that need to be addressed. Firstly, there is no determined range in which the measure is mathematically contained. This means that, for extremely outward loaded instances, one can theoretically achieve values of any magnitude. This implies losing a sense of understanding and making the measure hard to judge or inter-pret. Secondly, and probably the most problematic of the issues, is the innate

(48)

asymmetry embedded in the measure. As pointed out by (Van Eynde, 2017), because projects in the portfolio will on average have a shorter critical path than the critical path of the problem, this implies a tendency for portfolios to be front-loaded.

“In general, the NARLF will also be more negative for projects with a lower complexity. As more activities can be executed in parallel, their earliest start times will be earlier and the resource unconstrained critical path schedule will be more front loaded than for projects with a high complexity.”Van Eynde (2017)

4.4 resource contention

• utilization factor (UF)

• average utilization factor (AUF)

• modified average utilization factor (MAUF)

For RCMPSPs, This category is synonymous with resource availability. The original utilization factor Davis (as cited in Browning and Yassine (2010b)) is mentioned in Davis and Kurtulus (1982), where several issues are brought to light. It is defined as “the ratio of the total amount required to the amount available in each time period based on a time-only analysis over the original CP durations of all projects” (Davis and Kurtulus, 1982). To combat computational problems, the projects are sorted in ascending order of completion, and the UF is averaged over the intervals defined by the end of each project. The MP-problem’s AUF is then found by simply selecting the maximum AUF from the different resources.

Browning and Yassine (2010b) note two problems with the AUF; problems with similar CPj durations for individual projects lead to disproportionate intervals

(49)

averages the utilization over equal units in time, each interval a fraction of the complete problems CP duration.

M AU F = max(AU F1, AU F2, . . . , AU Fk) (18a) AU Fk= 1 S S X s=1 Wsk RkS (18b) with Wsk= b X t=a J X j=1 Nj X i=1 rijkXijt (18c)

where a = CPs−1+ 1, b = CPs, Xijt is defined as in (16c) and RkS is the

amount of resources of type k available during the interval e.g. (b − a) ∗ Rk for

any s ∈ S.

A second point of critique is that taking the maximum reduces the possibility to discriminate between projects significantly. An example 3-resource prob-lem could have both AU F1 = 1.6, AU F2 = 1.58, AU F3 = 1.59 or AU F1 =

1.6, AU F2 = 0.6, AU F3 = 0.6 and have an AUF of 1.6 in both cases. To

dis-criminate between these cases, Browning and Yassine (2010a) introduce the measure of the variance in the M AU Fk’s.

σ2_{M AU F} = 1 K K X k=1 (M AU F − M AU Fk)2 (19)

Note that this is a variance from the maximum, wich means the measure of the variance in the M AU Fk’s is actually a proxy for the average utilization, since

(50)

4.5 objective function

Similar to the comments in section 3.6, the objective function will not be dis-cussed.

(51)

Part III

(52)

5 Introduction

In analyzing the previous research, several important aspects of describing com-plexity came to light.

It is clear that there is a need for a structured overview of both the structure of complexity, and the actual complexity of a problem. Several different measures were developed separately, but can be reduced to exactly the same concept or even formula. In trying to cover all aspects of complexity by one or several measures, it is important that the sources of complexity itself are first identi-fied before attempting to describe them. Recent research, such as Browning and Yassine (2010b), was in part successful in this aspect, but there are still numerous shortcomings that might not be resolved by simply choosing one of the existing measures. The structure of complexity, or its sources, can be split in 2 main categories. On the one hand the topological structure, on the other hand the resource constraints.

The topological structure is characterized by the number of nodes and their respective precedence relations for each project individually, as well as the multi-project problem as a whole. These traits are described by both the measures termed as complexity measures and topological indicators in literature. It is clear this source is harder to define for multi-project problems, seen as so little specific measures have been developed in this field.

The resource availability and use each of the individual nodes exhibit can create additional complexity. The measures developed still have several drawbacks and ample room for improvement.

One could hypothesize a measure that encapsulates both aspects in a single concept, where an increase in resource complexity could trade off a decrease in topological complexity. This approach would most likely be harder in devel-opment, but could yield significant results. This would be most beneficial in

(53)

avoiding the correlation between the 2 sources of complexity. The difficulties faced in developing such a measure led to the measures that are presented in this paper. Either way, the identification and documentation of the underlying sources of complexity are important.

For both of the mentioned aspects, there is a trade-off between abstracting the information to generalize a certain network to a complexity measure, and the power to discriminate between the networks. Therefore, it is essential that vari-ations causing complexity in project networks are identified. These varivari-ations need to be described by accurate measures, existing or new. Next; the con-nection between the different measures needs to be acknowledged and overlap in discriminatory power should be avoided in constructing a final complexity measure. Several measures may have a similar topological basis and are hard to use in conjunction with each other, due to their implied correlation and shared predictive power. All this needs to happen internally for both the topological as resource complexity, as well as the project as a whole. One of the main effects of the overlap in discriminatory power is that describing problem complexity is harder when the used measures do not take into account which values are actually possible for this measure, given simple characteristics such as problem size, etc.

Equally as important as developing measures to describe complexity, is under-standing the complexity the measures are trying to predict. If computing the optimal solution for a problem was only dependent on the size of the network, than no measures would be necessary, since N would be the ultimate metric. For now, topological complexity is defined by the number of variations a solution method can choose from to select its optimal solution. Resource complexity is harder to isolate, even for simple problems, but the total complexity can be approximated by the difference in solution time between advanced solutions methods and simple methods. If the difference is fairly small, the problem is rel-atively easy and vice versa. These decisions are made to ensure testing is viable

(54)

and reliable, and they impact the development, so they are made in conjunction with testing, and are further elaborated on in part IV.

Because the subject of evaluation are multi-project networks, there are 2 ways of developing the desired metrics; the single-project approach, where the complete set of projects is parsed into a single problem. The multi-project approach on the other hand calculates metrics and an estimated number of variations for each individual project, and then combines them with each other so as to obtain a value for the full problem. Both methods are elaborated on, in order to be compared.

Figure 3: single-project approach

Figure 4: multi-project approach

Through research on the optimality of complexity measures, a multi-project measure can be developed or adapted. There are 2 options that seem to follow

(55)

intuition and convention. The first is a single measure in which the complexity of the problem can be captured. The second is a vector of (a) measure(s) calculated for the projects the problem is comprised of. As noted in Van Eynde (2017), it is likely a balance has to be struck between aggregation of the problem and an acceptable loss of information regarding each individual project. One of the alternatives (for both the single measure as each of the measures the vector is comprised of) is to develop a composite measure with a variance from the mean or maximum to differentiate the complexity of medium-medium vs low-high project portfolios.

The following sections will elaborate on avenues for describing specific problem characteristics, and how these interact with each other. This is done for both the single and the multi-project approach.

(56)

6 Multi-project approach

Since the analysis of single-project instances is already more elaborate, these are the first measures analyzed. The same principles will be applied to the single-project approach, with the necessary corrections.

6.1 topological complexity

Looking at a basic network structure, a problem should be easiest when all activities are in series, and hardest when the network is parallel. The parallel configuration makes for the most options to schedule, since the time at which an activity can be scheduled is only dependent on the only direct predecessor. For this reason, a parallelism measure that is consistent and reliable is necessary. However, when nodes are added to a project in series, complexity should also increase, albeit less. The impact of the nodes in parallel should be weighted correctly compared to those in series. Intuition says that, on average, larger networks should be harder to solve. There is simply more data to be processed as the network becomes larger. This notion leads to using the number of nodes as the starting point for the analysis. Most important then seems how long the network is compared to the number of nodes, this gives a first view of how dense the network is on average. This also introduces the use of tiers. With a gauge for the parallelism that is present in the network, we can then determine where this parallelism is located. A question to be investigated is whether nodes added in parallel in the beginning of the project increase the complexity more then when they are added at the final node of the serial network. If this question appears to be true, whether or not location of the widest part of the network has an impact on complexity, the absolute and relative width could give us an idea of where the peak width is located and how big it is. This can be supported by connectivity measures for a more detailed approach. The biggest challenge for the development of measures is the abstraction that is made of

(57)

certain information to compress it into a single measure. There is a fine line between an accurate measure, and a computationally expensive one.

6.1.1 serialism

The takeaways from the literature are that there are currently 2 options to describe a project’s serialism; I2 and W . Other metrics might be influenced by

the serialism of a network, but these do not measure this aspect directly, hence the 2 choices mentioned. If the size of the network is a separate measure used in analysis, as we are supposing here, I2 is a better fit, since this measure can

reach the full spectrum for each given N . Since I2requires the progressive level

for each activity, this can be implemented as follows; Algorithm 1 algorithm: progressive level

1: add activities to list remaining

2: while !remaining.isEmpty do 3: for activity = 1, 2, . . . n do 4: ready = true 5: for predecessor = 1, 2, . . . do 6: if progressivelevel = 0 then 7: ready = false 8: end if 9: end for 10: if ready then 11: progressivelevel = max(predecessor.progressivelevel) + 1 12: remaining.remove() 13: end if 14: end for 15: end while

I2is not the holy grail however, it still abstracts the placing of the nodes outside

the scope of the main path through the network. Other nodes that are not situated on the longest path to the highest tier (e.g. the end node) can be placed anywhere, and can be connected in any way (other than ways which would lengthen the network). It is clear that this introduces variability within the measure. There are two pieces of information missing to fill in these gaps.

(58)

The first is where parallel nodes are located, which is discussed in 6.1.2, the second is how much variability is present in these nodes. This last characteristic is discussed in 6.1.3.

6.1.2 width

As can be seen in figures 5 and 6, the I2 metric does not take into account

the width of the network, as both examples have the same value of 3₅ for the measure. The additional width in figure 6 can be a cause for a noticeable increase in possible variations for the network solution.

Figure 5: width example 1

(59)

The width (W ) of a network leaves little room for improvement as it is a fairly basic measure. It can however also be calculated as a relative measure. When dividing the metric by the maximum width Wrel = _{N −2}W it is normalized over

[0, 1], which accounts for the influence of the network size. This does imply that the metric will only asymptotically go to 0 as network size increases. A second option is to calculate Wrel as W −1_{N −3}, so that the network can reach the bounds

{0, 1} for any given network size. Both these metrics will be correlated to the I2 measure, since parallel networks are bound to introduce width. The width

measures only take into account the maximum width, so different networks with the same I2 and width can still be constructed. Since the second measure has

more desirable bounds, it seems more suitable.

The width metrics, absolute or relative, only tell us something about the size of the width, not its location. From a topological standpoint, there is little to no evidence to support that this should have an influence on the number of variations, but it is worth identifying. The proposed metric might not be useful in analyzing topological complexity, but can be a tool for gauging a networks layout without the need for implied complexity.

The relative width location (W Lrel) is defined as the tier at which max width

occurs compared to the total number of PLs in the network. If there are multiple PLs that share the maximum width, these can be averaged easily since width size is equal for these PLs.

W Lrel= a M (20a) a = 1 b P L X i=1 Qi (20b)

(60)

Qi=      i if Wi= W 0 otherwise (20c) b = P L X i=1      1 if Wi = W 0 otherwise (20d)

These metrics should give us an idea of where the peak width is located and how big it is.

Algorithm 2 algorithm: width

1: define width[P Lmax]

2: width = 0

3: for activity = 1, 2,. . . ,N do

4: width[PL]++

5: end for

6: for PL = 1, 2,. . . , P Lmax do

7: if width[PL] > width then

8: width = width[PL]

9: end if

10: end for

11: new list widest

12: for PL = 1, 2,. . . , P Lmax do

13: if width[PL] == width then

14: add PL to widest

15: end if

16: end for 17: average = 0

18: for i = 0, 1, . . . , widest.size do

19: average = average + width

20: end for

21: average = average/widest.size

The activity distribution indicator is based on this same principle, but uses the deviation of the average width compared to its maximum.

(61)

6.1.3 topological float

As mentioned in the previous section, the cause of the complexity difference between the 2 networks in figures 5 and 6 is not definitively known. A second option to explain this phenomenon is either arc length or float. When activities are added in parallel to the project, and have a high topological float, they can be scheduled anywhere during the completion of the project, thus increasing complexity. The topological float is inherently connected to the long arcs in-dicator (and subsequently, the short arcs inin-dicator), since higher float (higher I6) requires more long arcs (lower I5). The short arcs and long arcs metrics are

correlated very highly, so only one of the two should be considered. It is ques-tionable if the long arcs indicator and the topological float measures differ in explaining complexity, since there are numerous instances where the measures are equal for a given network. Only when ≥ 2 arcs with different lengths leave a node, should the 2 differ. However, when no redundant arcs are present (or not counted, A = A0), the arcs that do not count towards topological float actually decrease variability, while decreasing the I5. The example in figure 7 shows the

addition of arc (9, 7) lowers I6, but it also lowers I5, contrary to its supposed

correlation with complexity.

Figure 7: float - arcs

(62)

for complexity.

6.1.4 connectivity

It seems clear that measuring outright connectivity is not very efficient. This is apparent from the remarks made for the CN C, and similarly, the long arcs indicator. A fallback that most connectivity measures share is that they gen-erally only use the number of arcs and number of nodes in the calculation, so there will always be variation within these measures. Using more information (such as the number of tiers in the Ci measure), can cause problems when using

it in generation procedures.

The OS is already widely discussed as a measure for network connectivity, but the application of the cyclomatic number (S) in a project scheduling context is not. Although it uses the same variables, the thought process is very differ-ent. The cyclomatic number might be a good measure because the number of independent paths is, intuitively, a metric of how many times the network splits up, and is a gauge of how many times new options are created for topological complexity to be added. There is a clear connection with parallelism; the serial case has S = 0, parallel case S = N − 2 (if source and sink are included). The inclusion of this source and sink node are necessary, otherwise the metric can become smaller than zero in parallel cases with high connectivity to the source and sink nodes.

(63)

Figure 8: inclusion dummy nodes

With a set cyclomatic number, the remaining variation in the measure should be explained by the previously discussed measures. The variation in the measure that remains, and does not affect the other measures, is the addition of leaving arcs with l ≥ the shortest leaving arc, so as not to change the topological float indicator.

The range in which the measure is contained is less clear. The parallel case may be given by N − 2, but this does not mean this is the maximum value, since highly connected slightly parallel networks show higher values for S. Browning and Yassine (2010a) reasoning should hold without the need for specifying PLs; the maximum number of arcs occurs when every node is connected to every node of the next PL, but there must only be 2 PLs (not including start and end node tiers), as opposed to the preset number in Browning and Yassine (2010a) .

(64)

Figure 9: maximum A’

When a network is structured as described (Figure 9); when for a same N a node is moved from the progressive level into a third PL, it removes N −2₂ + 1 arcs, and adds only arcs from the second PL, since the arcs from the first PL would be redundant, so only N −2₂ arcs are added. Fairly simple deduction, and research pointing to Drexl et al. (1995) give us the maximum number of non-redundant arcs in a network given a number of nodes, a number that would maximize the cyclomatic number for a given N .

A0max=      N − 2 + (N −2₂ )2 _{if N is even} N − 2 + (N −1 2 )( N −3 2 ) if N is odd. (21a)

With the maximum number of non-redundant arcs, the measure can be nor-malized over the interval {0, 1} by dividing by the S-value for this number of arcs, e.g. the maximum number of linearly independent paths. This can be calculated as

(65)

Smax= A0max− N + 1 (21b)

with the relative cyclomatic complexity

Srel =

S Smax

(21c)

Note that the formula from Elmaghraby and Herroelen (1980) is used. This is necessary to be consistent with the single-project approach (7.1.3), where the reasoning behind the choice is explained.

There was an attempt to use the principle of the reduction complexity to cal-culate the number of activities in parallel. Instead of a new measure, a proof for the cyclomatic complexity was obtained.

Lemma 6.1. A network consisting of the 2 dummy nodes with 1 arc between them can be transformed into any network by adding parallel and serial nodes. A parallel node introduces 2 arcs to the network, a serial node only 1.

Proof. The total number of arcs A consists of arcs that connect nodes that can be classified as either parallel or serial. if a parallel node introduces 2 arcs to the network, a serial node only 1, then the number of arcs A for a network with p parallel nodes and s serial nodes can be written as

A = 2p + s + 1

When substituting N = p + s;

(66)

As a reference, the Order Strength can also be calculated. Since the adapted OS, using A0max instead of

N (N −1)

2 is suggested, the impact can also be measured.

6.1.5 combination

Once estimates for the complexity of the projects in the problem have been obtained, they can be combined through a simple calculation. This method, relayed to me by Rob Van Eynde (personal communication, april 2020), is ex-plained as follows.

The number of combinations possible for a set of networks is comprised of 2 elements; the variations between the activities, and the variations between the networks. The variations between the networks can be isolated to the product of the variations for each network. The ordering of the activities of the dif-ferent networks reduces to the multinomial coefficient, so that the number of permutations for the multi-project (e(M )) can be written as

e(M ) = P j|Ij| I1, I2, . . . , IJ Y Pj∈M e(Pj) (22)

where Ij is the activity set of project j, and e(Pj) the number of permutations

in project j.

7 Single-project approach

When analyzing the problem as if it were a single project, several issues arise. There is the choice between the inclusion of the dummy start and end nodes in each of the underlying projects, or replacing them by the new dummy start and end nodes. Since the latter facilitates a better comparison to the multi-project approach, this was implemented. It also strokes with the philosophy behind

(67)

the approach. Then there is the choice of what information to abstract. Since another layer is added over which the problem characteristics are averaged, simply applying the single-project measures is probably not the most accurate option.

Algorithm 3 single project conversion

1: for problem = 1, 2, . . . do

2: new project

3: calculate total number of activities

4: new globalStartNode, globalEndNode

5: for project = 1, 2, . . . N do

6: add startNode successors to globalStartNode

7: add successor to project activities

8: add endNode predecessors to globalEndNode

9: if !predecessor.added then

10: add predecessor to project activities

11: end if

12: for activities = 1, 2, . . . n do 13: if !activity.added then

14: add activity to project activities

15: end if

16: end for

17: end for

18: end for

7.1 topological complexity

Until now, no dedicated topology measures have been developed for multi-project instances. Some basic summary measures to describe the portfolio topology are the number of projects J and the number of nodes in total N . The number of nodes is defined as the sum of the project network sizes, cor-rected for the change in start and end nodes; N = 2 +PJ

j=1(Nj− 2).

These basic measures should be the first line of analysis. Most important then seems how long the network is compared to the number of nodes, this gives a first view of how dense the network is on average.

Complexity of multi-project scheduling: the quest for accurate measures

COMPLEXITY

OF

MULTI-PROJECT

SCHEDULING:

THE

QUEST

FOR

ACCURATE MEASURES

Promotor / supervisor: Prof. Dr. Mario Vanhoucke

Masterproef voorgedragen tot het bekomen van de graad van:

Master’s Dissertation submitted to obtain the degree of:

Master in Business Engineering: Operations Management

confidentiality agreement

foreword

COVID-19

Contents

I

Introduction

1

II

literature review

4

III

Development

31

IV

Testing

61

List of Figures

List of Tables

List of Algorithms

Definitions and abbreviations

definitions

abbreviations

Part I

1

Introduction

Part II

2

Introduction

3

RCPSP

3.1

network topology

3.2

network complexity

3.3

resource availability

3.4

resource distribution

3.5

resource contention

3.6

objective function

4

RCMPSP

4.1

network complexity

4.2

network topology

4.3

resource distribution

4.4

resource contention

4.5

objective function

Part III

5

Introduction

6

Multi-project approach

6.1

topological complexity

7

Single-project approach

7.1

topological complexity