Preventing Overload by Denying Service

(1)

Preventing Overload by Denying Service

D.R. Ostendorf

(2)

Master thesis Operations Research Supervisors:

Prof. dr. M.H. van der Vlerk (RUG) Dr. N.D. van Foreest (RUG)

(3)

Preventing Overload by Denying Service

Dennis R. Ostendorf May 13, 2008

Abstract

(4)

(5)

Foreword

This thesis is written as a final project for my master in Econometrics, with specialization Opera-tions Research at the University of Groningen (RUG). It is the result of a seven month internship at TNO ICT. TNO (Netherlands Organization for Applied Scientific Research) is an organization which aims to develop and apply scientific knowledge to (commercial) problems. TNO is divided in several branches operating in different areas. TNO ICT focuses on the area of Information and Communication Technology. Within TNO ICT a lot of interest is focused on so called non-functional aspects of web technology, e.g. quality of service, security, availability etc. This has (for example) resulted in several internal knowledge projects and internships, including the one on which this thesis is based.

Service Oriented Architectures are environments in which applications are composed by linking some elementary sub applications (‘building blocks’) together. If these building blocks have to do more work than they are capable of, costumers will experience unacceptable waiting times (and will be unsatisfied). This is called an overload situation. Admission control is a tool which can be used to prevent systems from overloading. Admission control functions like a ‘door’, which is placed in front of the system. As soon as the system tends to get overloaded, the door is closed to limit the amount of work faced by the system. Resulting in acceptable waiting times and pre-vention of a crash. Hence the title of the thesis: ‘Preventing Overload by Denying Service’. To the best of my knowledge, this research is the first to present an admission control rule which does not depend on a correct configuration (therefore it is called parameter free) and implements it in a service oriented architecture. Additionally, an upper bound for the results is presented, which can be used to valuate the results. As a validation, simulation results of a simple case are compared to ‘real’ experiment results.

The picture on the title page of this thesis, is a work of art made solely from LEGO blocks, by Nathan Sawaya (see http://brickartist.com). The work of art (in my opinion) represents a service oriented architecture, since it is made entirely from elementary building blocks. Additional blocks flow from the man’s chest representing ‘overload’.

I would like to thank all who have contributed to this thesis in any way. Especially Pieter Meu-lenhoff and Bart Gijssen for their intensive supervision from TNO and Maarten van der Vlerk and Nicky van Foreest for their supervision from the University of Groningen. Additionally I would like to thank Sindo N´u˜nez Queija with whom I have had some interesting discussions about (processor sharing) queueing models and who inspired me to develop the ‘S rule’.

(6)

(7)

Summary

Voor een Nederlandse vertaling van deze samenvatting, zie Appendix A1.

Online applications are becoming more and more popular, think about online email clients, book-ing air flights online, orderbook-ing books and even complete online word processors. Additionally traditional software often needs a connection with a network like the internet. In many cases Web Services are used in the background of such applications. Web Services is a popular standard for the transaction of information between computer systems. Due to the far-reaching standardization of Web Services, completely different systems are able to exchange information with each other. A Service Oriented Architecture is a network in which composite applications are formed by combining functionality of different Web Services. Several composite applications can use ‘each others’ Web Services. Therefore existing functionality does not have to be developed again, but can be reused.

Each composite application can be used by multiple clients at the same time. If many clients use the same applications the underlying Web Services may get in overload. This means that more clients arrive per unit of time than can be served. Leading to ever increasing server queues and therefore increasing sojourn times. If clients have to wait for an unacceptable amount of time, they will be dissatisfied. Moreover, the overload can result in a system crash. Note that not only clients from one composite application experience the increasing sojourn times, but clients from all composite applications which use the overloaded Web Service.

In this thesis, known results from queueing theory are used to develop a parameter free rule to prevent Web Services from overloading. Parameter free means that no configuration choices need to be made if the rule is implemented. Overloading is prevented by denying some clients service in order to guarantee other clients acceptable sojourn times. In that way at least part of the clients can be successfully served. Additionally, extreme situations are prevented, reducing the risk of a system crash.

Using a simulation model, the effect of the rule is investigated by means of a number of cases. The conclusion is that the rule vastly increases the number of clients which are served in acceptable time. In a specific case the rule even leads to a near optimal situation, in which the Web Services in the network are almost optimally utilized.

(8)

(9)

2.2 SOA components . . . 5 2.3 Requests . . . 6 2.4 Quality of Service . . . 7 2.5 Broker . . . 8 2.6 Admission control . . . 10 3 Model 13 3.1 Modeling approach . . . 13 3.2 Queueing theory . . . 14 3.3 Simulation . . . 15 3.3.1 Client component . . . 16 3.3.2 Broker component . . . 16 3.3.3 WS component . . . 16 3.3.4 Output component . . . 17

4 Admission control rules 19 4.1 WAC rule S . . . 19 4.2 WAC rule D . . . 20 4.3 Analysis . . . 21 4.4 Improvements . . . 22 4.4.1 Reservation . . . 22 4.4.2 Priorities . . . 23

5 Upper bound on the goodput 25 5.1 Idea . . . 25

(10)

CONTENTS CONTENTS 6 Simulation setup 29 6.1 Confidence intervals . . . 29 6.2 Warmup period . . . 29 6.3 Experiments . . . 31 7 Validation 33 7.1 Theoretical verification . . . 33 7.2 Empirical validation . . . 34 8 Cases 37 8.1 Case 1 . . . 38 8.1.1 Results Case 1 . . . 39

8.1.2 Sensitivity analysis Case 1 . . . 42

8.1.3 Conclusion Case 1 . . . 43

8.2 Case 2 . . . 44

8.2.1 Results Case 2 . . . 44

8.3 Case 3 . . . 48

8.3.1 Results Case 3 . . . 48

8.4 Case 4 . . . 52

8.4.1 Results Case 4 . . . 52

9 Conclusion and recommendations 57 9.1 Conclusion . . . 57

9.2 Further research . . . 59

(11)

Chapter 1

Problem description

In this section the basic idea of the problem will be introduced. Any (implicit) assumptions made will be elucidated in chapter 2.

In section 1.1 the background of the problem of overloading is discussed. In section 1.2 some well known precautions against overloading are briefly introduced, including admission control. The problem of this thesis is formulated in section 1.3

1.1 Background

Most people are used to applications running on their own computer. For instance a word pro-cessor or a game. In recent years, the development of web based applications has gained much popularity as well. Web based applications are applications which present a service to a user, without the need for installing or running any software locally. Examples are the search engine www.google.com or the online bookstore www.amazon.com. Web based applications are getting more and more complex. Nowadays the number of (simple) online games for instance is countless and it is even possible to use an online word processor (see docs.google.com). Besides this ‘man-to-machine’ interaction, also ‘machine-‘man-to-machine’ interaction has increased. A popular technique making ‘machine-to-machine’ interaction possible is called Web Services. Web Services can be viewed as online applications which can be called from any computer, independent of the system specifications (hardware specifications, platform, installed software, etc.).

The next step in development of online applications is the integration of several online applica-tions into a bigger application. Instead of creating standalone applicaapplica-tions, developers are trying to integrate (parts of) other applications into their own. An example is the ‘Google API’, which makes it possible for developers, to re-use the functionality of the Google search engine in their own application. Even more rigorous than integrating functionality from other applications, is creating a set of reusable and independent components. These standardized building blocks can be reused for each new service. Such a software architecture is called a Service Oriented Architec-ture (SOA), see also section 2.1. The Dutch version of the online encyclopedia Wikipedia gives (amongst others) the following benefits on the use of SOA1:

Agility: Uncoupling the components makes it easier to add or remove components. Reuse: Components can be reused for different applications.

An application built from several individual components is called a composite application and the specific building blocks are often Web Services. The software architecture of British Telecom (www.btplc.com/21cn) is an example of a SOA; see also Levy (2005). Their architecture consists

(12)

1.1. BACKGROUND CHAPTER 1. PROBLEM DESCRIPTION

of many loose components. When new (composite) services are created, the old components are reused, decreasing development costs and time-to-market and at the same time improving the cus-tomer experience. In his article, Levy (2005) compares the use of reusable (software) components with LEGO, where a small number of elemental components can create a vast number of models. Key is the standard interfaces on the base and top of each building block.

Let us present a simplified example to illustrate the concept of SOA:

Example: Consider the fictitious company MediaX, which allows its customers to watch streamed movies from its website. The company has a contract with a movie studio. When a customer wants to view a specific movie, he goes to the website. There he logs in and makes a request for the movie he wants to see. A moment later the movie starts and the fee for watching the movie is automatically subtracted from the users’ credit. Company MediaX has developed its movie service based on the idea of reusable components. A graphical representation of the architecture of MediaX is given in figure 1.1. First the component UserAuthentication is used to identify the user and verify his password. Then component SearchMovie searches the database of the movie studio for the requested movie. Next, component BandwidthCheck dedicates additional bandwidth to the user such that there is enough bandwidth available to stream the movie. Component Pay subtracts the fee for the movie from the users credit account. Finally the connection between the broadcast server and the customer is set up by component Connect. Thus, instead of a single application, there are five components interacting with each other. The benefit of this becomes clear when the company decides to implement a new service on its website: the streaming of music albums. To implement this service, components UserAuthentication, BandwidthCheck, Pay and Connect can be reused. Thus only one of five needed components has to be developed. Of course the old components can expect more work, so the available resources of these services must be able to handle this.

Clients Website MediaX UserAuthentication SearchMovie BandwidthCheck Pay Connect

Figure 1.1: Architecture of MediaX

Due to the popularity of Web Services, a lot of attention is paid to functional aspects of integrat-ing multiple Web Services (i.e makintegrat-ing SOA possible). Less attention however has been paid to non-functional aspects of service oriented architectures, like availability, reliability and security. Some difficulties can therefore be expected in that area. Since many different services may need the same Web Service, it may receive more work than it can handle. For composite applications, an entire chain of Web Services is needed. These Web Services may be used for other (composite) applications as well. Herein lays a problem. The time it takes for a Web Service to complete a job, depends on the number of jobs it is working on. A Web Service can be viewed as a queueing system2_{; the more jobs there are in the queue, the larger the sojourn time of a single job will be.}

2_{Actually, a special kind of queue is needed to properly model the behavior of a Web Service, this will be}

(13)

CHAPTER 1. PROBLEM DESCRIPTION 1.2. PREVENTING OVERLOAD

A system is called overloaded when new jobs arrive at a higher rate than they can be served. In this case the number of jobs in the system will explode and hence sojourn times will increase to unacceptable levels (unless the overload situation only occurs for a short period of time). Another consequence of overloading is that the system may crash and must be (manually) restored. The effect of an overloaded Web Service in a SOA does affect all composite applications which use the overloaded Web Service. Therefore a single Web Service in overload may effectively shut down the entire architecture.

Notice that any (IT) system will get overloaded if the arrival rate of new clients increases to a certain level. This thesis focuses on what to do in such an overload situation.

1.2 Preventing overload

Several technologies exist to handle an overload situation. Some popular ones will be briefly introduced. A more thorough discussion is given by Pepping (2008).

Additional resources: One way to reduce the problem of overloading is to assign more resources (processing power, memory) to the Web Service. This may not always be a feasible solution because additional resources may be expensive or simply unavailable. When the system is not in overload these additional resources are not in use. Thus when (severe) overload only occasionally occurs, the costs of additional resources may outweigh the benefits. On the other hand, when overload occurs on a regular basis it is probably worth it to invest in additional resources.

Load Balancing: It is possible to use several hardware platforms which have the same function-ality. A load balancer divides the incoming requests over the platforms. When overload occurs, additional platforms may be borrowed from other services which are not currently in overload. Of course the borrowed platforms must have the needed functionality.

Caching: Caching means that a Web Service ‘remembers’ previously computed results. When a new request asks for the same result it does not have to be computed again. Hence, additional memory is used to spare the processor.

None of these measures however, can prevent overloading in extreme situations. In such a situation one could deny some clients service to prevent the system from overloading. This idea is called admission control. Admission control may seem harsh at first, but instead of having unacceptable service times (latency) for all users, some users are denied service to be able to serve others a decent service quality. Moreover the risk of a system crash is avoided. Admission control may not seem to be the perfect solution, but being able to serve some clients at least is better than not being able to serve anyone. In fact, it may be the best possible solution given the circumstances. In the literature admission control is sometimes called access control. If admission control is used in a web based environment, it is sometimes referred to as Web Admission Control (WAC). In case of a network environment, admission control is also referred to as Network Admission Control (NAC). In this thesis the abbreviation WAC will be used.

This thesis focuses on the use of admission control as a way to prevent service oriented archi-tectures from overloading. The other proposed measures will not be investigated. In practice it will probably be a good idea to use other measures to reduce the risk of overloading, while admis-sion control can be used to prevent the system from collapsing if an overload situation has occurred. When implementing admission control to the components of a SOA, some components may per-form useless activity. The easiest way to explain this is by using an example:

(14)

1.3. PROBLEM FORMULATION CHAPTER 1. PROBLEM DESCRIPTION

composite applications use this component as well or simply because this component has the least resources). A request which is denied service at component ‘Connect’, has already been served at four other components. Since the request is denied at a later stage, this service is called useless. The useless activity on a component results in larger service times (or perhaps even denial) of other requests, this could potentially become a big problem.

1.3 Problem formulation

A non functional objective within a service oriented architecture may be to successfully serve as many clients as possible. A client is successfully served if it is not denied by admission control and is completed within acceptable time. The word acceptable is rather vague. In practice end users are willing to wait several (8 to 10) seconds for a response from a web page (see for instance Bouch et al. (2000)). Such results are not generally known for more complicated services, like the ones considered in this thesis.

The main problem of this research is:

Main problem: Maximize the percentage of clients which are served within their allowed timeframe in an overloaded Service Oriented Architecture.

In order to solve the main problem the following questions are important:

Q1: Can admission control be used to improve the fraction of successfully served clients in an overload situation?

Q2: Which form of admission control gives the best results?

Q3: Does the use of admission control lead to an optimal solution or is further improvement (theoretically) possible?

(15)

Chapter 2

Assumptions and notation

In this chapter all assumptions made are formalized. In addition the notation used is introduced in this section. For a brief summary of the notation used, the reader is referred to appendix C, in which a list of symbols is given.

2.1 Service Oriented Architecture

A Service Oriented Architecture (SOA) is an architecture in which each component is responsible for a specific task. Composite applications are constructed by combining several tasks and thus linking the components. The components can communicate with each other using standardized communication protocols (like XML and SOAP1_{). A formal definition of SOA is given by OASIS}2

as follows:

Definition: Service Oriented Architecture: A paradigm for organizing and utilizing distributed capabilities that may be under the control of different ownership domains. It provides a uniform means to offer, discover, interact with and use capabilities to produce desired effects consistent with measurable preconditions and expectations.

As the definition suggests, different components need not run on the same hardware, in the same organization or even on the same continent. In principle all available components in the world can be used to form a composite application. Finding the appropriate components for a new composite application (called discovery) is not trivial. In this research however we will assume a given network of Web Services which all have a unique function, hence the discovery of needed components is considered not to be an issue.

2.2 SOA components

In this thesis the components are assumed to be Web Services. In principle the use of Web Ser-vices is not necessary in a SOA environment. Nevertheless it is the most common choice. We will however keep the specifications of the services as general as possible, keeping in mind that instead of Web Services other techniques could be used as well.

Computer processors (and therefore Web Services) serve jobs ‘simultaneously’. This means that each job is processed for a short amount of time and is then preempted. Subsequently the next job is served for a short while, etc. This concept is called threading. Computer processors work in this way for several reasons. One of these reasons is that threading makes ‘multitasking’ possible.

1_{See appendix B for a list of abbreviations and expressions used in information and communication technology.} 2_{According to their website (www.oasis-open.org), OASIS is a not-for-profit consortium that drives the}

(16)

2.3. REQUESTS CHAPTER 2. ASSUMPTIONS AND NOTATION

Therefore, multiple jobs (computer programs) can be simultaneously run on one computer. For home computers this is important, e.g. without threading it would not be possible to listen to music and browse the internet at the same time. For Web Services this aspect of threading may be less important, but there is another benefit of threading. A computer not only exists of a pro-cessor, but of other elements as well. During processing it might for instance be necessary to look in a database on the hard disk of the computer. If jobs would be served sequentially the processor has to wait for this. When threading is used, the processor can work on the next job while the hard disk searches for the needed data. Thus theoretically a computer does not need twice as much time to serve two jobs, because waiting times are more efficiently spread. For simplicity this will be ignored and it will be assumed that a Web Service simultaneously serves multiple jobs with the total service rate equally divided over the jobs in service. This could be interpreted as a Web Service which only uses the processor and no other resources. For instance a Web Service which only makes a calculation.

Each Web Service is assumed to have its own resources (CPU, RAM, etc.). Hence we explic-itly ignore the fact that multiple Web Services may be installed on the same server and thus sharing the same resources. This choice is made for the sake of simplicity. If multiple Web Ser-vices are using the same resources, the system may be viewed as a single system (‘Web Service’) at which different types of jobs are served.

The number of available Web Services is denoted by W and Web Services are indexed by the symbol w ∈ W := {1, . . . , W }.

Not all jobs will need the same amount of processing time. Therefore the processing times are assumed to be stochastic and identically and independently distributed with cumulative distribu-tion funcdistribu-tion FB_.

The time it takes for a job to be served, when no other jobs are served by the Web Service (and hence the job is not preempted) is called the required service time. The inverse of the (aver-age) required service time is called the required service rate µwof Web Service w. This is the rate

at which a job would be served if no other jobs were present. When there are n jobs in service the effective service rate per job becomes µw/n. The service time distribution of Web Service w is

assumed to be exponential with rate parameter µw. In section 3.2 it will be explained that using

another service distribution (with the same expectation) will not give different results. New jobs arrive at Web Service w with a rate of λw, the load on the Web Service is defined by ρw = λµww.

For notation’s sake the index w will be omitted when it is clear from context which Web Service is meant.

2.3 Requests

A composite application consists of multiple components. An instance of a composite application is called a request, which consists of multiple jobs. Thus a job is a task which has to be performed by a Web Service and a chain of jobs form a request.

Requests arrive to the system according to some stochastic arrival process {I(t), t ∈ T := [0, T ]}, where T indicates the length of the period under consideration (if this period is unbounded then T ≡ R+_{). For simplicity it is assumed that the arrival process is a Poisson Process with rate λ.}

Request types are defined by the number of jobs they consist of and the Web Services these jobs need to be performed on. Within a network there are a finite number of Web Services and also a finite number of different request types, say K (assuming that there are no requests with an infinite number of jobs). A new request is of type k ∈ K := {1, . . . , K} with probability pk.

(17)

CHAPTER 2. ASSUMPTIONS AND NOTATION 2.4. QUALITY OF SERVICE

faces is given by µw and depends only on the Web Service the job is served by. Additionally the

notation νij is used to indicate the (expected) required service time of of job j in request i. In

principle this notation is redundant, because all information in ν follows directly from Y and µ. In some case however it is more convenient to use ν instead of µ. The used notation is illustrated by the following example:

Example: Consider a case with four available Web Services: A, B, C and D and five request types r1, . . . , r5. Requests are defined by:

p =       0.30 0.25 0.25 0.20 0.10       , Y =       r1 r2 r3 r4 r5       =       A B C D B D D A C C C A B       , µ =     5 10 3 4     , ν =       1 5 1 10 1 3 1 4 1 10 1 4 1 4 1 5 1 3 1 3 1 3 1 5 1 10       .

There are five different request types in this example. Request type r1has probability

0.30 and a length of four Web Services. It is served first by Web Service A, subsequently by B and C and finally by Web Service D. The second request type r2has probability

0.25 and is first served by Web Service B and second by Web Service D. After Web Service D the request is completed. Jobs which are served by Web Service A have a required service rate of five jobs per second. Jobs which are served by Web Service B have a required service rate of ten jobs per second etc. The required service time of e.g. the first job from request r3 is denoted by ν31. Since this job is served on Web

Service D its expected required service time equals ν31= 1/4 = 0.25 seconds.

The notation (i, j) is used to refer to job number j ∈ Ji := {1, . . . , Ji} in request i ∈ I(t) :=

{1, . . . , I(t)}, with Ji the number of jobs in request i. For ease of notation I(t) will mostly be

abbreviated by I. In this case it is assumed that t = T or that the value of t is clear from context. We would like to stress that the sequence in which the jobs of a request need to be performed (precedence constraints) is fixed and given by Y .

The total time it takes for a request to be served will be called its latency. The latency of request i will be denoted by Li. The time it takes for a job to be served will be called the sojourn

time of the job. The sojourn time of job j from request i will be denoted by Sij. It is assumed

that:

Li=

X

j

Sij, i ∈ I,

hence we ignore possible delay due to network traffic, admission control, broker activity3, etc.

2.4 Quality of Service

Quality of Service (QoS) is a non functional aspect of a Service Oriented Architecture. It quan-tifies the user perceived quality of a service (composite application). The QoS can be measured in various ways, depending on the objective. As mentioned in chapter 1, clients are willing to wait only a limited amount of time for their (composite) application to be completed. A Service Level Agreement (SLA) can be defined to quantify whether a request has been successful or not. A request i is considered successful if its latency, Li is smaller than Lmaxi .

To indicate whether request i has been successfully served or not, the variable Ui is introduced.

(18)

2.5. BROKER CHAPTER 2. ASSUMPTIONS AND NOTATION

Ui attains value 1 if request i is served successfully and value 0 if request i is late (Li> Lmaxi ) or

denied service (in which case the latency can be defined as infinity):

Ui =

1 if PJi

j=1Sij≤ Lmaxi

0 else , i ∈ I.

For simplicity it is assumed that Lmax_i = Lmax= 8 seconds unless otherwise mentioned. The choice of eight seconds is rather arbitrary. It is loosely based on Bouch et al. (2000) who found that users are on average willing to wait eight to ten seconds for the response from a website. Because of the arbitrary choice of Lmax_i = 8, sensitivity analysis will be performed. Other constant values will be considered as an alternative. Additionally a variable allowed latency can be considered, in which the maximum allowed latency can is a (positive) multiple φ of the required service time of the request: Lmaxi = φ Ji X j=1 νij, i ∈ I.

The QoS at time t will be measured in two ways: by percentage of successfully served requests z(t) and by (weighted) goodput y(t). Goodput is defined as the average number of successfully served requests per second. If some requests are more important than others, weighing parameters ωi

can be used (these can for instance be based on request type). Formally4_:

z(t) = 100 I(t) I(t) X i=1 Ui, I(t) > 0, t ∈ T , y(t) = 1 t I(t) X i=1 ωiUi, t ∈ T+:= (0, T ],

with t measured in seconds. If t = T the percentage of successfully served requests will be denoted by z and the goodput by y. In this research it is assumed that ωi= 1, for all i ∈ I.

In principle z and y are different representations of the same objective (assuming ωi = 1). Both

representations help to gain insight in a solution. One difference in appearance is that the percent-age of successful requests z will drop to zero if the overall arrival rate is chosen high enough. In the ideal situation the goodput y however, remains at its maximum level regardless of the arrival rate (as long as the arrival rate is high enough to reach this level of course).

The main problem is to maximize the value of z (or y), under the given constraints that the Web Services have limited capacities, requests arrive at random points in time according to some stochastic process and the jobs in a request have specific precedence constraints.

2.5 Broker

Composite requests can be routed through a network in several ways. Requests can be routed through the network by the individual Web Services. The Web Services communicate directly with each other and requests are sent from one Web Service to another. Alternatively a central point in the network can route requests through the network. Such a point is called a (service) broker. The broker sends the request to the next Web Service and after processing it is returned to the broker. Both methods are illustrated in figure 2.1, in which there are three Web Services Q1,

Q2 and Q3. The figure on the left illustrates the situation with the broker and the figure on the

right illustrates the situation without a central broker. At first sight, routing without the use of a

(19)

CHAPTER 2. ASSUMPTIONS AND NOTATION 2.5. BROKER

router may seem less complicated. In a larger network however (with a lot of different composite applications), this results in many communication links between all nodes in the network. If a central broker is used; each node does only need to communicate with the broker. In this research a broker will be used to route requests through the network. This broker is assumed to work infinitely fast to prevent the broker from becoming a (or the) bottleneck in the system. Because of this, the assumption of a broker does not influence the results of the experiments and therefore the same results are expected in a model in which no broker is present.

Q

₁

Broker

Q

₃

Q

₂

Clients

Q

₁

Q

₃

Q

₂

Clients

Figure 2.1: Two methods to route requests trough a network

In section 2.4 the maximum allowed latency for a request was introduced. On Web Service level the latency for an entire request is of little importance. There may be jobs to be served after the current one, before the request is completed. Therefore a limit on the sojourn time of the current job is needed. The broker (which is the only components which ‘knows’ the structure of the request) can divide the total allowed latency over all jobs in the request and hence determine the time at which the next job should be completed. This time is called the due time of the job. When a request enters the broker the due date for the next job, j∗ _{is calculated. For this}

cal-culation the total remaining time for this request, Lmax₋Pj∗−1

j=1 Sij (maximum latency minus

sojourn times of completed jobs) is divided over all remaining jobs in proportion to their service requirements. Let Dij∗ be the due date of job j∗ from request i and let t∗ be the time at which

the due date for job j∗ is calculated:

Dij∗ = t∗+ Lmax− j∗−1 X j=1 Siι ! νij∗ PJi j=j∗νij .

Recall that νij is the expected required service time of job j from request i. The remaining time

for job j from request i at time t is given by Rij(t) = Dij − t. If the total remaining time of a

request is less then zero, the request can never be completed in time. Therefore it is immediately sent to the output component by the broker (even if admission control is not used5_).

There are many available standards which add functionality to Web Services. One of these stan-dards is called WS Reliability (see Iwasa et al. (2004)). Using the WS Reliability standard it is possible to give jobs so called ‘expiry times’, which define the maximum time it may take to receive a response. Therefore implementation of above due dates seems possible within the (jungle of) existing standards for Web Services.

5_{this preemption has in principle nothing to do with the admission control rules, presented in section 4, but it}

(20)

2.6. ADMISSION CONTROL CHAPTER 2. ASSUMPTIONS AND NOTATION

2.6 Admission control

The basic idea of admission control (WAC) is to deny some clients service in order to serve others a decent quality. The concept is illustrated by the following example:

Example: Consider the special case in which there is only one Web Service, jobs arrive according to a Poisson process with rate λ and the Web Service has a (required) service rate µ = 10. When admission control is turned off, the theoretical goodput will equal λ if ρ < 1 (and the maximum response time is large enough) and otherwise go to zero because the expected sojourn time explodes. If admission control is used, the effective arrival rate can be capped at a certain level in order to keep the load small enough. Now the theoretical goodput does not drop to zero but remains at its maximum value if the ‘optimal’ admission control parameter is used. In figure 2.2 this is illustrated (in this figure Lmax_{= 8 seconds, this is ‘large enough’).}

Figure 2.2: Theoretical goodput for a single queue

Admission control is already widely used in telecommunications. Research has also been performed on the use of admission control for Web Servers, see for instance Gijsen et al. (2004) and Pepping (2008). The use of WAC to prevent stand alone Web Services (instead of Web Servers) from overloading has been discussed by Xi (2007) in a more technical approach. This research focuses on the conceptual use of WAC in a SOA, based on Web Services.

Admission control can be implemented in numerous ways. In this research a maximum num-ber, say cw6, of clients is allowed to be served simultaneously by Web Service w. When the

(c + 1)th client arrives it is denied service by admission control. An alternative approach would be to allow a maximum number of jobs to enter the system per time interval.

The effect of the admission control rule depends on the value of c. If c is too large, not enough requests are denied service and jobs will not be completed before their due date. If the value of c is chosen too low then more request are denied service than necessary.

One way to find the value of c is to try different values (perhaps by using simulation) and find ‘the optimal’ value by trial and error. In a SOA environment there are many unique Web Services and each Web Service needs its own parameter choice. When there are only ten web services and for each web service there are five possible values for c, then there are already almost ten million possible combinations! Thus it would be a lot of work to determine the optimal configuration.

(21)

CHAPTER 2. ASSUMPTIONS AND NOTATION 2.6. ADMISSION CONTROL

This research has to be done each time admission control is implemented in a new or changed environment. Another drawback is that the parameter choice in this case is fixed and does not de-pend on the current situation the Web Service faces (e.g. temporary overload). Therefore we feel that implementation of this approach is unrealistic in ‘the real world’. A more favorable approach would be to find a rule which dynamically sets the right parameter value and does not depend on user input (no parameters are needed). When such a rule is found it can be implemented in each network, without making any additional (case specific) parameter choices. From a practical point of view, this would greatly improve the possibilities of implementation of the admission control rule in a real world case.

In chapter 4 two admission control rules will be presented. Both rules set the value of c au-tomatically and (perhaps even more important) dynamically, thus the value of c is adjusted if needed.

(22)

(23)

Chapter 3

Model

This chapter starts with some general ideas about how to model admission control in an overloaded service oriented architecture. It is argued that the problem at hand is N P-hard. Furthermore it is explained that Web Services can be modeled using queueing theory. In section 3.2 some basic results from queueing theory are presented, as well as the limitations of the theory. The fact that the problem is N P-hard and that the known results from queueing theory are limited has lead to the approach of simulation. In section 3.3 a simulation model is developed in which service oriented architectures can be analyzed.

3.1 Modeling approach

The problem faced is as follows: There is a network of Web Services in which composite applica-tions are formed. Composite applicaapplica-tions (requests) are a chain of Web Services which are called in a specific order. The order of the Web Services does not have to be the same for all composite applications. Requests arrive at an unknown date and must be served within a specific timeframe. The objective is to serve as many requests as possible (within their timeframe) in an overload situation. This problem can be seen as a dynamic stochastic job shop problem, with precedence constraints (in the form of a chain), release and due dates and in which the machines (Web Ser-vices) are a special type of queues.

In section 2.2 it was explained that Web Services use ‘threading’ to serve jobs. Threading can be modeled using a round-robin (RR) service discipline in which jobs are served for a small period of time, say δ and are then preempted and returned to the back of the queue. When δ → 0 the round-robin service discipline acts as if all jobs in the queue are served simultaneously and receive a service rate of µ/n (with n the number of jobs in service). This is called an (egalitarian) processor sharing (PS) service discipline. In this thesis the PS service discipline will be used to model Web Services instead of the RR service discipline, because the PS service discipline is easier to analyze. The next example illustrates the different service disciplines:

Example: Suppose one wants to have two cups A and B filled with coffee. If the coffee machine acts as an ordinary (first-come-first-serve, FCFS) queueing system then first cup A is filled and second cup B. A round-robin way of filling the cups is to fill one cup for a short amount of time, then swap the cups and fill the other cup for a short amount of time and repeat this until both cups are filled. Alternatively consider a coffee machine which has not one, but two injectors. Now it is possible to place each cup under its own injector. Both cups are filled at half speed (compared to the FCFS discipline in which a cup was placed under both injectors). The coffee machine acts as a processor sharing queue with capacity two.

(24)

3.2. QUEUEING THEORY CHAPTER 3. MODEL

Kendall’s notation Web Services are modeled using an M/M/1/c − PS queue1. The objective of the job shop problem is to maximize the number of jobs which are completed in time. We have a stochastic variant of the job shop problem since the arrival times and other specifications of new requests are not known in advance.

The (dynamic stochastic) job shop problem is known to be N P-hard in the strong sense, even without processor sharing machines. Therefore it is not likely that an optimal algorithm exists which can solve the problem in polynomial time. The aim is to find a heuristic approach which gives good results in real time. A genetic algorithm approach is presented by Lin et al. (1997) which gives good results for the dynamic stochastic job shop problem without processor sharing machines (thus machines can only serve one job at a time). In the paper only relatively small instances (five machines, hundred jobs) are considered and computation times are not given.

3.2 Queueing theory

The processor sharing queue (PSQ) is well known in the literature. The PSQ was introduced by L. Kleinrock and he presented some basic results in his famous book, see Kleinrock (1976). For more profound results S.F. Yashkov has written three complementary survey papers in which the state of the art is presented, see Yashkov (1987, 1992) and Yashkov and Yashkova (2007). The finite capacity processor sharing queue however is much less studied.

The expected sojourn time of a PSQ can be found as follows. Since we have exponentially dis-tributed service requirements, the expected remaining required service time of all jobs equals 1/µ. The expected sojourn time of an arriving job equals its required service time multiplied by the total number of jobs in the system, thus E[S] = _µ1E[n]. With E[n] the expected number of jobs in the system. In a standard queue there is a famous expression for E[n], namely: E[n] = λE[S], with λ the arrival rate of the system. This equality is called Little’s law and is described in any introduction to queueing theory, see for instance Ross (2003). Because of the finite capacity of the queue Little’s Law is slightly different for the queue under consideration. The arrival rate has to be corrected for the fact that not all arriving jobs receive service, but some are denied instead. Little’s law becomes:

E[n] = λ(1 − pc)E[S],

with pc the blocking probability. The blocking probability is the probability that an arriving job

is rejected by admission control. Using E[S] = 1_µE[n] and E[n] = λ(1 − pc)E[S], the following

expression can easily be found for E[S]:

E[S] = 1/µ

1 − ρ(1 − pc)

, (3.1)

with ρ the server load λ/µ (on Web Server level). The blocking probability can be found when one considers a PSQ in which one job never leaves the system. It is not trivial why this is necessary and it is beyond the scope of this thesis to try to explain this, the reader is referred to Foley and Klutke (1989) instead. The states of the queue become 1, 2, . . . , c. Given these states, the balance equations can be solved and the probability that there are c jobs in the queue becomes:

pc=

ρc Pc

k=1ρk

. (3.2)

A well known result for PSQ’s is that the sojourn time in a PSQ is insensitive to the service distribution, see for instance Kleinrock (1976). So the derived results do not specifically hold for

1_{Since all jobs are simultaneously served there is no actual queue. Nevertheless the system is still referred to as}

(25)

CHAPTER 3. MODEL 3.3. SIMULATION

an exponential service distribution, but any service distribution with rate µ (thus we have an M/G/1 − PS queue).

While the expectation of the sojourn time of a job in a (finite capacity) PSQ has a nice expression, the distribution of the sojourn time is not easily found. Since the sojourn time is a key perfor-mance measure in many applications, its distribution is subject of many research, see Yashkov and Yashkova (2007) for a recent overview. In addition there is not just one PSQ but an entire network of finite capacity processor sharing queues. This makes the problem of finding the sojourn time distribution even more complicated.

The sojourn time distribution is of importance to this research since requests must be completed within a given time. The expected sojourn time is not sufficient for this, because the objective is to maximize the percentage of successfully served requests. In general the sojourn time distribution is not known to be symmetric, hence the expected sojourn time cannot be used to say anything about the percentage of successfully served requests. Since it is not possible (or at least very complicated) to use queueing theory as a tool for analyzing the problem, a simulation model will be used instead. This model will be described in the next section.

3.3 Simulation

In this section a discrete-event simulation model will be constructed to analyze the problem pre-sented in chapter 1. The model is implemented in the software package eM-Plant (version 7.0.13), see Tecnomatix (2004).

Discrete-event simulation is defined by Law and Kelton (2000) as follows:

Definition: Discrete-event simulation concerns the modeling of a system as it evolves over time by a representation in which the state variables change instantaneously at separate points in time. These points in time are ones at which an event occurs, where an event is defined as an instantaneous occurrence that may change the state of the system.

The simulation model basically consists of four components, see figure 3.1. Component ‘Client’ generates new requests. Requests are routed trough the network by component ‘Broker’. Each Web Service is an instance of component ‘WS’. When requests are completed or denied they arrive at component ‘Output’, where relevant data is collected. We will now explain how each of these components operate.

(26)

3.3. SIMULATION CHAPTER 3. MODEL

3.3.1 Client component

The client component generates new requests according to a Poisson Process with average rate λ. After a request has been generated a request type is randomly assigned, to indicate which Web Services need to be visited. See also section 2.3. Finally, for each job a required service time is drawn from a distribution with cdf FB_{. After all specifications are given to a request, it is sent}

to the Broker component. In section 2.2 the service distribution for Web Service w was assumed to be exponential with required rate µw. As mentioned in section 3.2 the processor sharing queue

is insensitive for the service distribution. Using another service distribution (with equal mean) should not give different results.

We would like to stress that the only source of randomness in the model is the creation of the requests at the client component. An alternative is to let required services times be decided upon at the Web Service itself. The benefit of our approach is that it is easier to synchronize common random numbers over different experiments. See section 6.3 for a discussion on common random numbers.

In the software package eM-Plant, random numbers are drawn as described by L‘Ecuyer (1988). This method is based on Multiplicative Linear Congruential Generators (MLCG). These are a well documented class of random number generators, see for instance Law and Kelton (2000).

3.3.2 Broker component

The broker determines whether the latency of a request has already reached its limit (as defined by the SLA). If that is the case the request is aborted and sent to the output component. Other-wise the broker determines the next Web Service needed for the request and sends the request to that Web Service. When all jobs in the request are served, the request is also sent to the output component.

When admission control is used, the broker does some additional work. If the job is denied service on the previous Web Service by admission control, it is sent to the Output component. Otherwise the broker calculates the due date for the next job, as described in section 2.5 and then sends the job to the next Web Service.

Figure D.1 in appendix D gives a flowchart of the broker component for the situation in which admission control is used. If admission control is not used step ‘Request denied previously’ must be skipped.

3.3.3 WS component

When a job enters the WS Component, it is checked whether it must be allowed or denied service. In case admission control is not used, all incoming jobs are allowed. The decision rule which decides whether an incoming job is accepted or denied will be thoroughly discussed in chapter 4. For now we assume that there is a rule which decides whether the incoming job must be served or not. If a job is denied service, the job (request) is returned to the Broker component.

Accepted jobs arrive at the Process Sharing Unit (PSU). The PSU acts as a processor shar-ing queue as described in section 3.2. The PSQ was introduced as a limit case of a Round Robin service discipline because it is easier to analyze. For the simulation model this is not an issue. The reason a PSQ is used is that it is computationally more efficient to use processor sharing instead of a round-robin service discipline. Hence the performance of the simulation model greatly improves when the PSU is used instead of a RR service discipline.

(27)

CHAPTER 3. MODEL 3.3. SIMULATION

time (RPT ) in the sorter. The job with the shortest remaining processing time (SRPT ) is se-lected and moved to the server. The next event in the simulation model will either be the departure of this job or the arrival of a new job. The processing time of the job in the server is equal to its required service time multiplied by the number of jobs in the PSU. At the next event in the PSU, the remaining processing times of all jobs are updated as follows:

RPT new = RPT old − _{number of jobs in PSU}time since last event .

After the update, all completed jobs (RPT ≤ 0) are moved to the Broker. The sorter calculates the job with the new smallest RPT and this job is moved to the server.

In appendix D flowcharts of both the WS Component and the PSU can be found (Figures D.2 and D.3).

3.3.4 Output component

When a request arrives at the output component it is first checked whether the simulation time has expired. If so, the simulation is stopped and summary results are calculated. It is then de-cided whether a new simulation (run) must be started. If a new simulation (run) has to start, all requests are removed from the model and all data (except the summary results) are deleted. Then the new simulation will be started.

(28)

(29)

Chapter 4

Admission control rules

In section 2.6 it was explained that it is difficult to set an optimal static admission control pa-rameter. In this section two dynamic admission control rules are presented and discussed. The first rule, called rule S, is derived from the theory on processor sharing queues, see section 3.2. The second rule, called rule D, only uses the expected sojourn time for the jobs given the current situation. Hence rule D ignores the fact that jobs may enter or leave the system. In section 4.3, both admission control rules are analyzed and compared with each other. In section 4.4 some possible improvements are given to the admission control rules.

4.1 Dynamic admission control, rule S

The problem is to find a value of c (the number of allowed jobs) which is a large as possible, so that as many jobs as possible are served within time. When all jobs are finished before their due date, the entire request has a latency which is smaller than the maximum allowed latency and hence the request is successful. Conversely a late job does not necessarily result in a late request. The lost time can be compensated for by the next Web Service. Thus it is not necessary that all allowed jobs are served before their due date. The expected sojourn time E(S) of a job in a Web Service should be less or equal to the average available time for the jobs ¯R. Therefore the problem on Web Service level becomes:

max

c

n

c : E(S) ≤ ¯Ro, (4.1)

with E(S) the expected sojourn time of a job in service on the Web Service and ¯R the average remaining available service time of all jobs in the Web Service. The value of ¯R changes over time, so a more correct notation would be ¯R(t) (and therefore also c(t)) but for notation’s sake this is omitted.

Computation of ¯R is straightforward since due dates of all jobs in the Web Service are known. The expected sojourn time is derived in section 3.2 and is given by equality (3.1). Substituting equality (3.1) in problem (4.1) yields:

max c n c : 1/µ 1 − ρ(1 − pc) ≤ ¯Ro,

with µ the required service rate of the Web Service, ρ = λ_µ the current load on the Web Service and λ the arrival rate of new jobs to the Web Service. An expression for the blocking probability pcis

given by equality (3.2). Substituting pc with equality (3.2) and after some algebra the following

equivalence for problem (4.1) can be found: max

c

n

c : c ≤ log_ρ 1 + µ ¯R(ρ − 1)o

(30)

4.2. WAC RULE D CHAPTER 4. ADMISSION CONTROL RULES

The solution equals c = logρ 1 + µ ¯R(ρ − 1). The admission control rule, called rule S, follows:

Rule S: Allow arriving jobs service if ρ ≤ 1 or n ≤ log_ρ 1 + µ ¯R(ρ − 1) still holds after the new job is allowed.

To compute the value of c the value of ρ is needed and thus the values of λ and µ as well. It is assumed that the service requirement rate µ is known, but the value of λ not. The arrival process (of a Web Service) will in reality not be known (or even constant) but fluctuate and thus must be estimated. Because of this fluctuating nature an appropriate time period must be determined to estimate λ. Thus we still have a (hidden) parameter choice to make when this form of admission control is implemented. Another (perhaps more important) objection to this method is the fact that the arrival rate is explicitly used to estimate the value of c. Intuitively the number of jobs, which can be simultaneously served, does not depend on the number of jobs which arrive at the system. The Web Service is capable of simultaneously serving c jobs. For this value it should not matter whether there are five or hundred jobs which needs serving, the Web Service can only simultaneously serve c jobs! Of course the blocking probability corrects for this fact, but still it seems a bit awkward. Additionally jobs are always accepted if ρ < 1. The idea is to deny jobs service if the sojourn times become too large, this may (theoretically) happen while rho < 1. In this section an alternative dynamic admission control rule will be derived, in which the arrival rate (and hence the load) is not used to estimate the appropriate admission control parameter.

4.2 Dynamic admission control, rule D

When the number of jobs n in the queue is assumed constant, the expected sojourn time for a job equals n_µ1. Thus after c has been chosen the maximum sojourn time for a job equals c/µ. When all jobs in service must be served before their due dates the problem becomes:

maxnc : c

µ ≤ Rij for all jobs in service o

.

As mentioned before not all jobs need to be completed before their due date, relaxing the constraint yields:

maxnc : c µ ≤ ¯R

o ,

with ¯R(t) the average remaining available service time for all jobs in service at time t. The solution of this problem equals c = µ ¯R. Admission control rule D follows:

Rule D: Allow arriving jobs service if n ≤ µ ¯R still holds after the arriving job is allowed service. Note that for the calculation of the admission control parameter, the arrival rate (and thus the load ρ = λ/µ) is not used for rule D.

(31)

CHAPTER 4. ADMISSION CONTROL RULES 4.3. ANALYSIS

4.3 Analysis of both admission control rules

The value of the admission control parameter in rule S depends on the load ρ (measured before admission control is initiated), the service rate µ and the average remaining available time for all jobs in service ¯R. Rule D only depends on µ and ¯R. It was assumed that µ is constant, thus the WAC parameter of rule S is a function fS of ρ and ¯R and the WAC parameter of rule D is a

function fD of only ¯R:

fS(ρ, ¯R) = logρ 1 + µ ¯R(ρ − 1)

fD( ¯R) = µ ¯R.

In figure 4.1 a plot of both functions is given for different values of ¯R and ρ (µ = 5 in this plot). The colored plane corresponds to fS and the transparent plane to fD. The horizontal axes show

the value of ρ and ¯R respectively and the vertical axis shows the value of the admission control parameter.

Figure 4.1: Plots of fS(ρ, ¯R) (colored plane) and fD( ¯R) (transparent plane)

Values for ρ < 1 are not plotted since fS attains the value infinity in that case. If ¯R < 0 then both

functions will attain negative values. These have no interpretation (other than the value zero), therefore values of ¯R < 0 are omitted in the plot.

It can easily be seen that fS ≤ fD for most parameter values. If ¯R ∈ (0, 0.2) then fS attains

a (slighlty) larger value. In practice however the value will be rounded down to an integer. Hence fS is considered equal to fD in this case.

(32)

4.4. IMPROVEMENTS CHAPTER 4. ADMISSION CONTROL RULES

control behave approximately the same when ρ is not much larger than one or ¯R is close to zero (this is difficult to predict however). Large differences can be expected however, when ρ becomes (significantly) larger than one. Moreover the admission control parameter c will fluctuate much less for rule S than for rule D, since the influence of ¯R becomes rather small when ρ becomes larger. In section 1 it was explained that using admission control in a SOA environment could lead to inefficiencies which are not a problem in traditional architectures. When a request is denied service by admission control at a specific Web Service, the already served jobs in the request are served for nothing. This is called useless activity. The proposed admission control rules do not solve this potential problem. In the next sections adaptations on the admission control rule are presented which may solve (or at least reduce) the problem of useless activity.

Both admission control rules depend on the value of the expected service time for the jobs in service. This expected service time equals 1/µ. Since service times are stochastic, the actual service time of a job will in general not be 1/µ. When enough jobs are in service this is not a problem. The deviations cancel out each other and the average observed service time will be close to the expected average service time. If the number of jobs in the system is small, the difference might be too large, resulting in bad performance. There is a way to prevent this from happening, but additional information is needed. If not only the expected service time per job is known, but the remaining needed service time of all jobs in service, then the true value of (the current value of) µ can be calculated.

4.4 Improvements of the admission control rules

4.4.1 Reservation

Admission control implemented in a SOA may result in useless activity. To prevent this, it must in some way be guaranteed that a request is either denied before any of its jobs are served or not denied at all. The fact that the system is in overload does not mean that the first Web Service is in overload, thus it is possible that a request is denied at a later stage, resulting in useless activity. A solution to this might be to reserve a path for a request, such that there is room for the request at each Web Service without exploding sojourn times.

This idea has some difficulties. For instance it is unknown at which time the request will ar-rive at a specific Web Service. Furthermore, it is very difficult to determine the value of c when there are not only jobs in service, but also reservations. Because of these reasons it seems im-possible to implement a reservation scheme in combination with the dynamic admission control rules as described in the previous sections. As an alternative it is quite easy to implement a reservation scheme with a static admission control rule (in which the value of c is fixed). In this case the following admission control rule holds: Accept an incoming job without reservation or a new reservation if n + r ≤ c. With n the number of jobs in service, r the current number of reservations and c the admission control parameter. The (optimal) decision of the value of c for the static admission control rule already was difficult without reservation and the reservation scheme only makes it more difficult. Therefore we have little hope that this approach could ever be used in real life situations and it was decided not to investigate this idea any further.

(33)

CHAPTER 4. ADMISSION CONTROL RULES 4.4. IMPROVEMENTS

simulation model as in reality) may be difficult. Due to a limited amount of time this idea is not further developed.

4.4.2 Priorities

Instead of making reservations, another idea is to give some requests a higher priority. Requests with a higher priority can preempt requests with a lower priority and thus have a higher chance of success. It might be possible to decrease the useless activity by giving priorities in a smart way. It will probably not be possible to prevent useless activity at all, using such a priority rule. Several approaches to determine priorities might work. A request which has a higher number of completed jobs might be given a higher priority because the potential loss is higher if this request is denied service. Another approach may be to look at the remaining number of jobs in the request. The more remaining jobs, the higher the probability of denial at some Web Service and thus the lower the priority, in addition this can be weighted by the remaining available time. Summarizing, priorities can be based on:

• Number of completed jobs in request; • Number of remaining jobs in request;

• Number of remaining jobs in request weighted by total remaining time for the request. The drawback of this idea is that preempting might result in more useless activity instead of less (the preempted jobs was already partially served) and even after preemption jobs should only be allowed service if the admission control rule still holds.

Based on test experiments the third rule seems to perform best. Due to a limited amount of time only this priority rule will be investigated. Thus requests are given a priority equal to the remaining number of jobs in the request, divided by the remaining available time of the request. Priorities of all jobs in service are calculated if a new request enters the Web Service. Let j_i∗ be the number of the current job in the request i. Mathematically the priority πi(t) of request i ∈ I

at time t ∈ T is calculated as follows: πi(t) =

Ji− ji∗

Dij− t

−1

i ∈ I, t ∈ T .

(34)

(35)

Chapter 5

Upper bound on the goodput

The problem at hand is too complex to solve to optimality. Nevertheless we would like to know how good an admission control rule, yielding objective value ˆy performs. If the true optimal value y is unknown; it might be possible to find an approximation ˜y for the optimal value. This ap-proximation should be an upper bound for the true optimal value. In that case, if ˜y − ˆy = κ then y − ˆy ≤ κ. If the value of κ is considered small enough then the admission control rule performs well and the upper bound is tight.

The problem remains to find an upper bound which is tight and fairly easy to calculate. In this chapter a model is presented which can be used to find an upper bound for the goodput. In section 5.1 the idea behind the model is presented, while the model itself is presented in section 5.2.

5.1 Idea of the upper bound

If the maximum allowed latency is ignored then the goodput of the system equals the throughput γ (the number of jobs which leave the system per second, successful or not), which is easier to calculate. A simple upper bound on the throughput is the external arrival rate λ. If on average λ jobs arrive at the system per second, then no more than λ jobs will leave the system per second (on average): γ ≤ λ. The difference between this upper bound and the real value of the throughput may become arbitrarily bad. This is due to the fact that each Web Service w has a limited service rate µw, while λ can always be increased. To find a better upper bound, consider the maximum

number of jobs which leave Web Service w per second. This is called the flow from Web Service w and is denoted by fw (measured in jobs per second). The flow from Web Service w is bounded

by the arrival rate and service rate,

fw≤ min(µw, λw), w ∈ W.

Because requests are formed by a chain of Web Services, the throughput of a request type is bounded by the smallest maximum flow of all Web Services in the chain.

γi≤ min

w (fw), w ∈ Wi, i ∈ I,

with Wi the set of Web Services used by request i. The throughput of the entire system is equal

to the summed throughput of all request types. Thus the summed maximum throughput (over all request types) is an upper bound for the goodput.

y ≤X

i

γi.

(36)

5.1. IDEA CHAPTER 5. UPPER BOUND ON THE GOODPUT

the throughput would be bounded by the service rate of that Web Service, not by ten times that service rate. Therefore the flow from each Web Service must be divided over multiple request types. The problem can be analyzed using graph theory. Let each node be a Web Service from a spe-cific request type. Additionally, requests start in dummy node ‘source’ and end in dummy node ‘drain’. The problem of finding the maximum throughput looks like the well known Maximum-Flow-Minimal-Cut Problem (MFMC) (for a discussion on the MFMC problem, see any introduc-tion to combinatorial optimizaintroduc-tion; for instance Hillier and Lieberman (2005)), in which one wants to find the maximum flow from source to drain. The difference with our problem is that in the MFMC problem the nodes (arcs) all have their own capacity, while in our problem a fixed capacity is shared by multiple nodes. In figure 5.1 the graph of the example from section 2.3 is given to illustrate this idea; nodes on the same horizontal line form a request type i with pithe fraction of

all request which are of type i, while nodes on the same vertical line form a Web Service w, with service rate µw.

To solve the flow-problem a Linear Program (LP) is constructed in the next section, which can easily be solved using for instance the Simplex Algorithm (see for instance Hillier and Lieberman (2005)). It may be possible (using some modeling tricks) to model the problem as a standard MFMC problem (or perhaps another well known graph problem). In that case a special purpose algorithm can be used to solve the problem more efficiently then a general purpose algorithm like the Simplex Algorithm can. However the scale of the LP model will be quite small and hence computation time already is nearly zero. Furthermore (small) LP problems can easily be solved using a spreadsheet package like Microsoft Excel. Therefore no special software is needed to cal-culate the upper bound for a small network1_.

D C B A B B D C C C D A A S D

λ

μ

_B

μ

_C

μ

_D

μ

_A p₁ p₂ p₃ p₄ p₅

Figure 5.1: Example of a graph used for finding an upper bound for the goodput.

1_{The LP solver included in Microsoft Excel is only capable of solving models with at most 200 variables. For}

(37)

CHAPTER 5. UPPER BOUND ON THE GOODPUT 5.2. LP FORMULATION

5.2 LP formulation

Before the model can be constructed, some additional notation is needed in addition to the notation introduced in chapter 2 (for a quick review of notation used, the reader is referred to appendix C): τijw binary parameter: equals 1 if a job j from request i is served by Web Service w and zero

otherwise;

xij real valued decision variable: xij is the flow on Web Service j from request type i.

The objective of the LP model is to maximize the flow to dummy Web Service D from all request types:

maxX

i∈I

xiD.

The total flow to Web Service w (summed over all request types) is at most equal to the service rate µwof this Web Service, this is modeled using the following constraints:

X

i∈I

X

j∈Ji

τijwxij ≤ µw w ∈ W.

The flow on a node (a Web Service from a specific request type) is at most equal to the flow on the preceding nodes in the same chain. Note that the first node in a chain is always the source S. Therefore this node does not have any preceding nodes:

xij ≤ xij−1 j ∈ Ji\ {S}, i ∈ I.

The flow of request type i from the source (to the first node) is at most equal to the overall arrival rate λ multiplied by the fraction of requests pi which are of type i.

xiS≤ λpi i ∈ I.

Flows cannot become negative:

xij ≥ 0 j ∈ Ji, i ∈ I.

(38)

(39)

Chapter 6

Simulation setup

In section 3.3 a model was presented in which a Service Oriented Architecture can be simulated. The simulation model in itself does not answer any question. It uses (pseudo-)random numbers to model uncertainty. Therefore the generated data has a stochastic nature and is not an absolute answer. In this chapter the setup of the simulation model is discussed.

6.1 Confidence intervals

Statistical analysis can be used to evaluate the data and draw statistically meaningful conclusions. We are not only interested in the estimated value ˆz = E[z] (with z the fraction of successfully served requests), but also in its variance σz. Using the variance it is possible to construct confidence

intervals. The confidence intervals constructed will all have an overall confidence level of 0.99. Meaning that with 99% certainty all real values lie within the confidence bounds. When multiple results are compared to each other, the Bonferroni inequality (see for instance Law and Kelton (2000)) can be used to determine the confidence level of each individual result. In each experiment a comparison will be made between both admission control rules and without admission control. Hence an individual confidence level of:

1 − 0.01

3 =

299

300 ≈ 0.997

is needed to obtain an overall confidence level of 0.99. Based on these confidence levels only the different admission control rules can be compared for every simulated arrival rate. A comparison between different arrival rates cannot be made with the same level of confidence.

6.2 Warmup period

When the simulation is started the system is empty. It takes some time to fill the system with re-quests and get in some sort of equilibrium situation. The warmup period reflects the time it takes to get in this equilibrium situation. Results from this period must be neglected in the analysis. If these results would be taken into account the estimator for z would be biased (E[ˆz] 6= z). This is because the first requests find an (nearly) empty system and are thus served faster than when the system is in overload. Without a warmup period these results are included in the calculation of ˆz. This problem is known in the literature as the problem of the initial transient, see for instance Law and Kelton (2000).

Preventing Overload by Denying Service

Preventing Overload by Denying Service

D.R. Ostendorf

Preventing Overload by Denying Service

Foreword

Summary

Contents

Chapter 1

Problem description

1.1

Background

1.2

Preventing overload

1.3

Problem formulation

Chapter 2

Assumptions and notation

2.1

Service Oriented Architecture

2.2

SOA components

2.3

Requests

2.4

Quality of Service

2.5

Broker

Q

Broker

Q

Q

Clients

Q

Q

Q

Clients

2.6

Admission control

Chapter 3

Model

3.1

Modeling approach

3.2

Queueing theory

3.3

Simulation

3.3.1

Client component

3.3.2

Broker component

3.3.3

WS component

3.3.4

Output component

Chapter 4

Admission control rules

4.1

Dynamic admission control, rule S

4.2

Dynamic admission control, rule D

4.3

Analysis of both admission control rules

4.4

Improvements of the admission control rules

4.4.1

Reservation

4.4.2

Priorities

Chapter 5

Upper bound on the goodput

5.1

Idea of the upper bound

λ

μ

μ

μ

μ

5.2

LP formulation

Chapter 6