Web Server Loads under Visitor Surges : A Model-Based Prediction

(1)

Web Server Loads under Visitor Surges: A Model-Based Prediction

G. Bouma

April 6, 2021

(2)

Committee chair:

dr. ir. R. Langerak

Formal Methods and Tools (FMT) University Twente

Committee member:

dr. ir. P.T. de Boer

Design and Analysis of Communication Systems (DACS) University Twente

Committee member:

dr. A. Hartmanns

Formal Methods and Tools (FMT)

University Twente

(3)

Abstract

Cognito Concepts is a marketing agency that also hosts websites for most of their clients. They struggle with distributing these websites in a balanced way over multiple webservers. Since some websites only have seasonal visitors and are heavily influenced by markerting campaigns they do not have a steady load throughout the year. To make optimum use of the webservers it is necessary to fairly accurate predict the load a website will have on a webserver.

The webservers allow to log measurements that give us a good insight into CPU, RAM memory and network usage. As well as to which pages are re- quested, in which order, how long they took to be processed and how long it took for the next page to be requested. This data is processed from the log files and put into a database to make it possible to easily fetch data from a specific date, time, site and/or page. From the selected data different model variants can be generated for the Modest toolset. Modest makes it relatively easy to fit this data into Markov automata to simulate visitor behavior. Then influence it with data for planned marketing campaigns to predict its impact. Having these models gave some extra insights such as the possibility to solve questions as how long it would take for a server to recover from an overload under normal conditions.

The results showed that we could do a near perfect simulation of actual

scenarios. A prediction was a bit more tricky and was a bit off. There are

still some more areas to investigate to improve accuracy. But in the end the

predictions and new possibilities look promising.

(4)

1 Introduction 1

1.1 Concepts and technical background . . . . 2

1.2 Marketing websites . . . . 5

1.2.1 Sleeper . . . . 7

1.2.2 Continuously active . . . . 7

1.2.3 Specific targeted . . . . 7

1.2.4 Event driven . . . . 7

1.3 Problem statement . . . . 9

1.4 Scope of the research . . . . 10

1.5 Research questions . . . . 11

1.6 Related work . . . . 11

1.7 Approach . . . . 11

2 Web server data 13 2.1 Measuring requests . . . . 13

2.2 Processing the measurements . . . . 14

2.3 Interpreting . . . . 15

2.3.1 Identifying Visitors . . . . 15

2.3.2 Visitor trace . . . . 16

2.3.3 Behavioral visitor patterns . . . . 16

2.3.4 Visitor entering rate and leaving percentages . . . . 16

2.4 Bottleneck of the server . . . . 17

2.4.1 Network usage . . . . 17

2.4.2 Disk usage . . . . 19

2.4.3 Memory usage . . . . 20

2.4.4 CPU usage . . . . 21

2.5 Wrapping up . . . . 24

3 Model framework 25 3.1 Requirements . . . . 25

3.1.1 Serialized processing of parallel request . . . . 26

3.2 Uppaal and Timed Automata . . . . 26

3.2.1 Concept model in Uppaal . . . . 26

3.2.2 Extended Uppaal model . . . . 28

(5)

3.2.3 SMC extension . . . . 29

3.3 PRISM and Continuous Timed Markov Chains . . . . 29

3.3.1 Uppaal model in PRISM . . . . 29

3.3.2 Extended PRISM model . . . . 32

3.4 Modest and Markov Automata . . . . 33

3.4.1 PRISM model in Modest . . . . 33

3.4.2 Final model . . . . 35

3.5 Model variants . . . . 40

3.5.1 Generating . . . . 41

3.5.2 Manual editing . . . . 41

3.5.3 Modest constants . . . . 42

3.6 Properties . . . . 42

4 Combining model and data 44 4.1 Filter . . . . 45

4.2 Trace graph generator . . . . 46

4.3 Optimizer . . . . 48

4.4 Template engine . . . . 50

4.4.1 Visitor rate . . . . 51

4.4.2 Page and waiting rates . . . . 52

4.4.3 Next page probability distribution . . . . 52

4.5 Manual editing . . . . 52

4.6 Modest model checker . . . . 53

5 Evaluation 54 5.1 Comparing the model to reality . . . . 54

5.2 Prediction . . . . 55

5.2.1 Marketing campaign . . . . 56

5.2.2 Back of the envelope prediction . . . . 56

5.2.3 Model prediction . . . . 57

5.2.4 Comparison . . . . 58

5.3 Bonus . . . . 59

5.4 Tool limitations . . . . 61

6 Conclusion 62 6.1 Limitations of the work . . . . 63

6.2 Outcome . . . . 63

6.3 Future work . . . . 64

6.4 Final word . . . . 65

Bibliography 67

Appendices 70

A PRISM model 71

B Extended PRISM model 72

(6)

C Modest model 74

D Modest final model 76

(7)

1 Introduction

Cognito Concepts is a marketing agency specialized in online marketing. They however also do paper marketing and events. Marketing comes down to at- tracting and getting the interest of the correct group of people. To give some examples from their portfolio:

Local municipalities attracting (local) visitors to social events or parks.

Stores wanting to be found on the Internet and attracting physical cus- tomers.

Foreign real estate agents reaching out to potential new home owners abroad.

Museums reaching out to all that are interested.

Webshops optimizing sales.

Marketing can be seen as fishing as owner Wijtze de Groot often refers to.

You need the correct net (advertisement channel), bait (keywords/image), spot (category) and time (time) to catch the a specific fish (group of people). Not only for paid advertisement but also SEO [1] (search engine optimization). For paid advertisement such as banners or paid links in search engines you would need a catchy title, matching keywords and a fitting image. SEO relies on using the correct wording in titles, content and such. Also by adding data such as review summaries, product properties etc. to pages which cannot be seen by visitors but are picked up by search engines. If this is done correctly it will help the targeted group of visitors to find the page. When the correct visitor has found the site the next step is to deliver the right content to keep their interest.

This will increase the chances of them visiting the shop or do an online purchase

etc. To create a viable solution to the problem, first we need to investigate what

data is available or can be made available. At the start of the project a minimum

of information was present to calculate a possible load on the servers. There

was a limited amount of access logs containing the requested page and the exact

time that page was requested. This was combined with a table of average loads

of a server in relation to the amount of requests. The assumption here was

that every page would put an equal load on the system. This data did not

suffice for our work. Thus, further investigation was needed to find additional

data. What makes Cognito Concepts stand out more than its competitors is

that they (re)design sites themselves. Normally this kind of work is outsourced

or minor changes are made. Designing can be seen quite broadly here. They

(8)

focus for example on appearance, content, SEO and the visitor journey. This gives an increased chance to not only attract the right person but also keep them engaged.

Our ultimate goal in this research is to predict the load on the server to be able to optimize and evaluate the server performance. This will help to determine if servers need to be upgraded before they are overloaded. Identifying the visitor journey helps this prediction by uncovering the ripple effect. If a visitor visits a page he or she will likely visit related pages. Each of these pages along the journey will contribute to the server load. Every page however will have a different impact. Therefore, it is important to understand the visitor’s journey.

1.1 Concepts and technical background

To help understand this thesis a few concepts in context with the research are explained, their relation to each other and a technical background on where they come to place.

Visitor A visitor is a person or a program that uses a computer, tablet, phone etc. to view or visit a (web)page that is requested over the Internet from a (web)server. When we talk of a program as a visitor we refer to it as a bot.

Bot A computerized non human visitor. These bots are for example used to index websites for search engines or to find security vulnerabilities to steal visitor data.

Resource Resources can be pages, images, styling scripts, etc.

Request A request is the action where the visitor opens a (web)page via for example a web browser. For this research we consider request as a subset of resources. Only script requests count as “a request”.

Processing queue When a request reaches the web server it is placed in the processing queue to await processing.

Visit When a visitor request a page, this is seen as a visit to that page.

Traffic Traffic is the amount of requests that reach the server per time unit.

(Rate A rate is how many times an event happens per time unit. For example the request rate is the number of requests that reach the server per time unit. This also works for “page rate”, where it defines how many pages could be processed per time unit.

Bandwidth The bandwidth defines how much bytes/bits can be transferred

per time unit. This is mostly expressed in Mbps (Mega bits per second)

or Gbps (Giga bits per second).

(9)

DDoS attack A Distributed Denial of Service (DDoS) attack is a malicious attempt by an attacker to overload a server. This is done by using multiple devices to continuously request resources from one server till it cannot handle the requests anymore.

Trace A trace is a chronological list of pages the visitor has requested. There is one trace per visit per visitor. Traces consist purely out of the visited pages and additional resources such as images are excluded. Traces will be discussed in more detail in section 2.3.2.

(Web)page A (web)page consist of HTML code, which can be seen as a recipe containing all the ingredients to build the page, such as the images, styling and functional scripts.

(Web)site A (web)site is a collection of pages connected via links. Websites are visited over the Internet by requesting pages from a (web)server.

(Web)server In the case of this research the (web)server is a physical computer placed in a server park in Amsterdam. This server uses PHP to process data from the database into HTML pages which are transported via HTTP using Apache.

HTML HyperText Markup Language is a markup language for web browsers that defines which content should be displayed and how.

PHP PHP [10] Hypertext Preprocessor --> Personal Home Page Hypertext Preprocessor is a scripting language that uses code to generate HTML pages. PHP can do more than generating HTML but that is not relevant for this thesis.

HTTP HyperText Transfer Protocol [8] is a protocol to transfer data over the Internet.

Apache Apache [9] is software that processes request. For example Apache can fetch images/files or requests PHP to process a page to deliver this to the visitor.

Database A database is a medium that can store and make easily accessible large quantities of data.

(Server)load The (server)load determines how occupied a server is. For ex- ample a 90% load means the server has only 10% capacity available to process new requests. With higher rates/multiple requests at a time the load will increase. The maximum load is determined by the bottleneck of the server. Section 2.4 goes more into depth on this.

Campaigns, ad campaigns or advertisement campaigns Campaigns, ad

campaigns or advertisement campaigns are planned strategies to draw

visitors to a certain page via links in search results, banners on webpages,

articles on social media and such.

(10)

Figure 1.1: Concepts visualized

Ripple effect When a page is visited, there is a big chance more pages on the same website will be visited as well. We call this the ripple effect.

Model In the case of this research a model is a representation of request to a web server used to calculate a prediction of the server load.

CPU The CPU (Central Processing Unit) is the brain of a computer, doing all the calculations.

CPU core A CPU can have several cores that execute calculations individually in parallel. A well written concurrent program would in theory finish twice as fast on a CPU with twice the cores.

CPU time This is the amount of time a CPU core takes to process something, this is usually expressed in milli- or microsecond. If a tasks fully utilizes 4 cores for 2 seconds then the CPU time would be 8 seconds.

SQL query A Structured Query Language (SQL) query is a request to the database to fetch data matching the criteria defined in the domain specific language for the used database.

Figure 1.1 shows the relation between some of the concepts. A visitor uses a device to display various pages that are requested over the Internet to a web server. The server uses the database and PHP to generate a page which will be delivered via Apache. A server runs multiple websites at the same time.

Multiple visitors can request multiple pages at once.

The load on a web server comes from the hosted websites which contain

pages mostly generated by PHP scripts. These scripts gather the requested

content and translate it into HTML documents that web browsers can under-

stand. Some sites generate specific content tailored down to the current visitor,

e.g. by showing more of what seems interesting to them or their region. Pages

that contain a large amount of content, specific generated content or both can

take a lot more time to process than a simple contact page containing only con-

tact details. Not just the pages define the load but also the ripple effect they

(11)

cause to other pages. The ripple effect can be revealed by investigating request to define traces. Combining these traces can reveal the patterns of website visitors. These patterns can help in predicting the impact of attracting extra visitors to a website.

Figure 1.2 in combination with the explanation below shows us how a request to the server works.

¹

A visitor request a page via a web browser.

Using HTTP the request is transmitted to the server.

Apache receives the request and requests a free PHP worker to process the page.

The PHP manager assigns a worker to the request.

The PHP worker queries the database for the page content. (Mostly text.)

The database returns the content and the PHP worker generates a HTML page.

The HTML page is delivered to Apache which returns it to the visitor.

The browser of the visitor processes the HTML page and requests addi- tional resources such as images (img), styling sheets (css) and JavaScripts (js).

– These resources are not requested in a particular order and can be requested parallel, serial and with different delays between them per request.

The requested resources are fetched and delivered by Apache.

– Again this could be parallel, serial and with different intervals. Also some resources can take longer than others to be transferred to the visitor.

1.2 Marketing websites

Website marketing is done mainly via advertisement campaigns, these come in many forms. Let us focus on a few to explain their influence on the number of requests:

Sleeper

Continuously active

Specific targeted

Event driven

Figure 1.3 shows the actual data we have gathered during campaigns. These will help us explaining them in the subsections below.

1

There are more steps taken such as agreeing on a port to use. But they can be neglected

since they take considerable less time to be processed than the rest or are generalized into

another step.

(12)

Figure 1.2: Request visualized

(13)

1.2.1 Sleeper

There is no active marketing done for this site, although there are also passive ways to market sites such as SEO. These sites mostly have a steady flow of visitors per day of the week. So every Monday in the year will have about the same traffic, as well as all the Tuesdays and so on. This could of course vary per season where a site publishing summer walks will have more traffic in the summer. Generally speaking, the visitor rate will be steady.

1.2.2 Continuously active

Continuously active campaigns are in principal always active and are never cancelled. For example fashion shops that have to attract customers all year round use these campaigns. Although on events such as when a new fashion- line comes out it could be interesting to raise the campaign’s budget attracting more visitors. They could be paused when shops are temporarily closed for vacation. It usually has a steady visitor rate over the year. The difference here with the sleeper campaign is that these sites attract a lot more traffic. Figure 1.3a (a) shows the traffic for a fashion store in 2019 and 2020 in the same time frame. 2019 had continuously active campaigns where 2020 due to COVID-19 lockdown the campaigns were all stopped (Sleeper campaign). It can be seen that they have fairly regular patterns where the active campaigns generate a lot more traffic.

1.2.3 Specific targeted

There are active ads for these sites such as sponsored keywords within search engines, banners, shopping ads etc. These ads are targeted at a very specific audience at specific times of the day, week or month. Figure 1.3b (b) shows such a campaign where a museum highly advertised their activity during a certain city wide event to attract as much attention as possible. When spending a week’s budget on just that day and have a very engaged audience on the targeted website, hefty traffic spikes can be expected. Not only because of the first requested page but also on all the follow up pages (visitor journey).

1.2.4 Event driven

Event driven campaigns are only targeted at a single event such as sport events or seasonal events. The rest of the time no campaigns are active at all. Figure 1.3c (c) shows a local marathon being advertised. The campaign slowly started getting interest. After a few days the budget was doubled. The huge spike on the event day itself that is cut off in the graph got over 400 requests per hour.

This was probably caused by visitors looking for location information and such.

After the event campaigns are off and interest slowly fades away. The request per hour a few days after the event will be about the same until the next event.

These kinds of campaigns usually take a long time to promote and have active

(14)

Mon Tue Wed Thu Fri Sat Sun 0

100 200

Requests p er hour

(a) Sleeper (green) and continuously active (blue) campaigns

Mon Tue Wed Thu Fri Sat Sun 0

50 100

Requests p er hour

(b) Specific targeted campaign

Initial 0 Doubled Event After 50

100 150 200

Requests p er h our

(c) Event driven

Figure 1.3: Requests per hour related to different kinds of campaigns.

(15)

Page CPU time G1 visits G2 visits Subtotal 2G1 Subtotal G1+2

Home 20 100 0 4000 2000

Article 20 100 0 4000 2000

Collection 20 100 100 4000 4000

Product 20 100 100 4000 4000

Checkout 20 100 100 4000 4000

Total 20,000 16,000

Table 1.1: Comparison server load with and without visitor pattern insight

ads, but only in a certain time-frame of a few weeks or mostly months. This would mean a steady, low visitor rate off season and high visitor rate in season.

Spikes could be expected just before the event but not as heavy as the specific targeted campaigns.

1.3 Problem statement

At the moment of writing Cognito Concepts host well over 200 sites for various clients. The main server handles around 150 of those. The server has to generate the pages on the fly for all these sites which generates a load. Ideally the server would have only have sites with sleeper and continuous active campaigns, having almost the same traffic year in, year out. When there is a steady increase or decline in visitor growth the server could be reassessed to determine if it is still adequate to handle all the visitor requests in the future. If for example the load on the server would be 30% on average throughout the year, more sites could be placed on this server or it could be downsized. Seeing the load for example slowly reaching 80%-90% the server should be relieved from some sites or be upgraded/replaced. Unfortunately the sites with the other campaigns also need a place to stay, where luckily the winter and summer event driven campaigns balance things out a bit, but never perfectly and specific targeted campaigns will still stand in the way of a good balance. This means that at certain times some sites with specific targeted campaigns will put an extra strain on the server.

This could cause an overload on the server. When a server is overloaded it takes longer to fulfil request or even worse, request could be dropped. This could cost a potential customer that gives up and looks for the next website offering the same. This will be extra painful when a paid ad brought in this visitor. If this would rarely happen it would not be a problem, but on a structural basis this could be a real problem. Not only for losing possible valuable customers but some search engines also lower the rank in search results when a page takes too long to load [2]. Therefore it is important to keep servers running smoothly without over dimensioning them too much for the obvious reasons:

the investment, running costs and maintenance costs. The current back of the

envelope solution is to assign percentages of resources to websites based on their

request rate. Via the logs of the servers it can be seen how much traffic each

(16)

website receives. When a server runs at 70% capacity with 10 million requests per day in total and one website handles 1 million of those, it is assumed 7% of the server performance is claimed by that website. When for a certain period double the visitors are expected due to for example a marketing campaign then the expected server load will be 77%. This assumption is often off because of the ripple effect caused by the visitor journey and varying impact of pages. To illustrate this with a simple example. Imagine 100 visitors finding a webshop via a search engine. Being interested in the topic, they click on an article and are convinced to visit the product-collection page containing all products. From the collection they pick a product and purchase it at the checkout. Table 1.1 shows this visitor journey or trace in the first column. Now imagine another 100 visitors being drawn in via a social media campaign already interested in a purchase. The campaign links them directly to the product-collection page where they click on a desired product and purchase it straight away. Assume all pages take the server the same amount of processing time and fill in column 2, 3 and 4 in the table. G1 is the first group of 100 visitors, G2 is the second group drawn in by the social media campaign. The subtotals are calculated by multiplying the visits by the amount of CPU needed for that page. We see that 2G1 (group 1 doubled) takes a total of 20,000 time units to process while G1+2 (group 1 and 2 combined) takes 16,000 time units to process. This is a difference of 20%! This could mean that needlessly a new server would be added while the current setup would have been sufficient. To prevent overloads or unnecessary server costs an improved more accurate method has to be found. We aim to do this by modeling the visitor journey and separate page loads. This should help in identifying where the load comes from and how we can predict the impact of visitor surges caused by marketing.

1.4 Scope of the research

This research focusses on a single server running PHP scripts to generate the page content. This server will host multiple websites at the same time. The main focus for solving the problem is with the use of models. These models could be used for all sites on the server individually but only one site is used consistently in this research. Measurements have been taken on the server for roughly a year which can be used. These measurements contain the CPU load and data of requests done by visitors. These requests are mostly originating from the Netherlands, Belgium, Germany and France. Bots are considered the same as regular visitors because they add load to the server in the same way.

They are also drawn in by ad campaigns. The research is entirely focussed on

the value models could have for a more precise server load prediction.

(17)

1.5 Research questions

The challenge we now face is to set up more detailed measurements, investigate the visitor journey and find more refined ways to calculate the impact of visitors.

This comes down to the following research questions:

1. How can models assist in accurately predicting the influence of marketing campaigns on the server load?

(a) What data is needed for such a prediction model?

(b) What model type would be suited?

(c) What tool would be suited for running the model?

2. How accurately can models predict the server load under visitor surges?

3. How can various scenarios such as shifting visitor behavior or initiated marketing campaigns be implemented dynamically into the model?

1.6 Related work

Multiple researches have been done on (web) server performance the last few decades. [3] and [4] propose a queueing model to predict web server perfor- mance. [5] describes a SQL query engine that can work relatively fast with gigantic data sets within Googles network. [6] works on performance predic- tion with large multi-node, multi-core systems used for long-running scientific applications. [7] investigates the effect of running multiple Virtual Machines on a server. Although interesting and helpful to fathom the problem, they are all focussed on a specific system or on overall performance. An overall perfor- mance prediction can be really helpful for deciding on when a server needs to be upgraded or another server needs to be added. Unfortunately these works use too generalized data for our problem, we want to predict the impact of a single website within the web server and especially the impact when increasing traffic for a certain page. This requires more detail about the load the pages cause that receive extra traffic and visitor behavioral patterns for the targeted website.

1.7 Approach

Chapter 2 will identify the measurements that are possible and the bottleneck of the server. Section 2.1 defines how a web page request works, what can be measured and how to make it easy accessible. Section 2.2 explains how the measurements are processed. Section 2.3 will go into more detail on how to interpret the data and define traces. Section 2.4 will identify the bottleneck of the server, so we know what to focus on. Section 2.5 will wrap up the chapter with a short summary.

Chapter 3 will go into more detail on how the model works and what con-

siderations have been made. Also an explanation of the considered tools and

(18)

modeling techniques will be discussed. In section 3.1 the model requirements are stated. We start of with a concept model created in Uppaal in section 3.2.

Extend it with PRISM in section 3.3 and finalize it with Modest in section 3.4.

Section 3.5 will discuss the model variants used to accommodate a variety of properties and improve model checking performance. The properties themselves will be discussed in section 3.6.

Chapter 4 will discuss how model and data come together. The chapter will start with how the data is filtered in section 4.1. Continue with explaining trace graphs and how they are generated in section 4.2. Section 4.3 will explain how the trace graph is optimized. The template engine in section 4.4 will explain how the trace graph will help generate the model. To use the model for predictions manual editing in section 4.5 will be explained. We wrap it up with explaining how constants can be used to influence the model in section 4.6.

Chapter 5 will show and evaluate the results from the model output. Starting with a comparison of how well the model can mimic reality in section 5.1.

Followed up by section 5.2 which evaluates a prediction of the impact of an actual marketing campaign. The model can also be used for more than we intended in this research, this bonus is discussed in section 5.3. The tool has some limitations which will be discussed in section 5.4.

Chapter 6 will conclude the work with a summary. Followed by the limita-

tions of the work in section 6.1. The outcome of the research will be discussed

in section 6.2 by answering the research questions. Future work which possibly

could improve prediction accuracy is discussed in section 6.3. The complete

work will end with a final word in 6.4.

(19)

2 Web server data

This chapter will identify the measurements that are possible, how to interpret the measurements and the bottleneck of the server. Section 2.1 will explain the measurements that are done during a request. Section 2.2 will give a short summary on how this data is processed and stored. Section 2.3 will explain the criteria for identifying visitors, visits and how traces can enrich the data.

Section 2.4 investigates the bottleneck of the server so that we know what to base the prediction model on. After this chapter we should have everything needed to create a prediction model.

2.1 Measuring requests

Before we can attempt to create a viable solution to the problem we need to investigated what data is available or can be made available. At the start of the project a minimum of information was present. This made it hard to calculate a possible load on the servers. There was a limited amount of access logs containing the requested page and the time that page was requested. This was combined with a table of average loads of a server in relation to the amount of requests. The assumption here was that every page would put an equal load on the server. This is not enough data for an accurate prediction. A further investigation is needed on which data we can gather.

Section 1.1 explained what happens during a request. The following steps are relevant for us:

The request reaches the processing queue.

The PHP processing starts.

The PHP processing ends.

The transfer to the client is completed.

Apache allows to log the entry time of the processing queue and the time it

takes till the request is completed i.e. when the client has confirmed that the

last bit of the page has arrived. PHP does the same but is not dependent on

the connection to the client, so it is done when the page is processed. Apart

from that, PHP also allows to log the CPU time and memory used to process

the page. For example if a client requests a page the following data could be

(20)

Queue

0 50

PHP

50 370

Transfer

370 400

Figure 2.1: Timing of a request in ms

logged

¹

:

Apache queues the request at timestamp: 0ms.

The time it takes the server to fulfill the request: 400ms.

PHP starts processing at timestamp: 50ms.

The time it takes PHP to process the page: 320ms.

The amount of memory PHP needed: 16MB.

The amount of CPU time PHP needs to process the request: 200ms.

This can be translated into the diagram as can be seen in figure 2.1. Subtracting the PHP timestamp from the request timestamp (yellow) we can deduce that 50ms was spend waiting for the request to start being processed. This could be due to that the server was busy, a new PHP thread had to be started, Apache had to process some rules first or various other reasons. Adding 320ms processing time (green) to that adds up to the 370 marker, leaving the last 30ms to finish up the transfer to the client. Depending on the PHP script the transfer could start during processing the page or after PHP is done.

2.2 Processing the measurements

PHP and Apache log everything in plain text, one line per event. Although such a line of text would only take a few 100 bytes of space, there are over 2 million generated per day. Apart from that there is also a script that logs the CPU activity. Which adds another million lines of plain text a day. To prevent the disk from filling up by log files, they are compressed and deleted after a few days. For the purpose of this research a backup of these files has been stored on another location where they can be processed. It is possible to just process all the files to get all the data we want, such as visitor rate or the request timing.

However, if halfway the project another variable is needed all the files would have to be processed again. This would be very time consuming considering the vast amount of log files. We have written a program to combine all these logs into a database. This makes the measurements easy accessible. The program processes all the log files one by one. The program executes the following steps:

The log file is uncompressed.

The data is read line by line and stored in a database

1

For simplicity of the example low and rounded numbers are used, the timestamp is nor-

mally the Unix Epoch Time or in other words the time that has passed since January 1st 1970

at midnight.

(21)

– Every line contains multiple columns of data.

– Every column is separately stored in the database.

The log file is compressed again.

After a year there are over a billion lines of logs. We do not have enough disk space to store them all. That is why they are compressed and only processed by a few at a time at most. Also they have to be read line by line, because of memory limitations. The data does fit into a database because there is a lot of duplicate and unnecessary data in the logs. For example if a page is visited a million times in one year. The complete url of the page is logged a million times into the log files. In the database, the url is stored once and a million references are created to it. The same goes for visitors, websites etcetera. An example of unnecessary data could be a log that completely writes out all the stats of the CPU per core. While we just need 9 numbers: CPU time of each core and the total. Although the database grows several GB in size, it stays small enough to contain all data.

As partly explained above, the other big advantage of a database is how easy it can be queried for data. If we would need another variable halfway the project, the database can be queried and returns it within several minutes to seconds. Processing the log files again would take hours. A query is also easier to formulate than to adjust a program to gather another variable. It is even possible to combine data. A request and a CPU log for example both have a timestamp. We could write a query combining the amount of requests and CPU usage per hour. Here we would match the data by the timestamp.

2.3 Interpreting

Taking a close look to the measurements will reveal extra information we can use.

The order of requests can reveal the path a visitor takes through a website. The timestamps of requests reveal how much time is between them. This together gives a good insight on how websites are used. A good interpretation of the available information can assist in creating a prediction model for server loads.

This section will give a closer look on some of the more important aspects.

2.3.1 Identifying Visitors

To create a detailed model based on behavioral visitor patterns we first have to identify the visitor itself. A visitor does not have to be a person per se. A visitor could as well be a bot following links. The load on the server will be the same for both.

A unique visitor is determined by hashing the IP and User-Agent together.

The User-Agent contains various information about the visitor, the device and browser. All of the following conditions have to hold for requests to be consid- ered from the same unique visitor:

The IP addresses should be the same.

(22)

The request has to originate from the same device type.

The request has to be made from the same type of browser.

Browser settings should be the same.

Device and browser should have the same updates/versions.

A household or office often share the same IP address. It is possible that two visitors on the same household or office have the same kind of device, browser, browser updates/versions. This would mean two visitors share the same IP and User-Agent. Adding the possibility of them being on the same website at the same time makes it very unlikely.

2.3.2 Visitor trace

A trace as discussed in section 1.1 is a chronological list of requests during a single visit. These can be queried from the database with the above require- ments. Since we have the exact time of the request stored, we can now deduce the time between requests from it. A trace helps us understand how visitors navigate through websites and how long they wait for requesting another page.

A single visit is done by one unique visitor.

A visit consists out of one or more requests.

Requests cannot be more than 30 minutes apart.

Requests must be to the same website.

2.3.3 Behavioral visitor patterns

Some pages cost more of the server’s resources to process than others. Thus it is good to know what the probability is that a certain page is requested when predicting the impact. Also when interest in a certain page is increased the ripple effect to other pages should be considered. With the help of many traces and simple statistics we can determine:

On which page visitors enter/leave the website.

How many visitors visit the website in a certain time span.

The probability a certain page is requested after the current.

How long visitors stay on a page.

The probability the current page is the last requested page.

Putting this all together gives us a behavioral visitor pattern. Each website will have their own patterns. Different time windows can also have different patterns. For example cafe websites where visitor look for breakfast in the morning and dinner in the evening.

2.3.4 Visitor entering rate and leaving percentages

We can determine the number of visitors that start browsing a website (entering

rate) by calculating how many traces start within a certain time span. The last

page that can be seen in a trace is the page where visitors left the website. We

(23)

can calculate the leaving percentage per page by comparing how many times that pages was last. If the page occurs half of the time as last and the other half first or in the middle of a trace, then the leaving percentage would be 50%.

2.4 Bottleneck of the server

The next step is to find the bottleneck of the server. There are four main points to a server that could slow it down:

Network usage

Disk usage

Memory usage

CPU usage

If any of these are exhausted, we speak of an overloaded server i.e. requests are delayed. Determining the bottleneck is an important step in this research because it will set the focus for the prediction. For the calculations in this section we use the data from the server’s request logs. Containing the requests of over 150 sites in a time span of several weeks. The following sections will discuss the above points in detail.

2.4.1 Network usage

The network usage or bandwidth determines how many bytes can be transferred per time unit. The server has a limited amount of bandwidth available meaning only a certain amount of visitors can make use of it at a time. This depends on the visitors available bandwidth and the resource size. Cable.co.uk has done several internet speed tests throughout the world [21]. The two most visiting countries for the site under test are the Netherlands and Belgium with 46.18%

and 27.19% of the request respectively. Number 3 and 4 combined consist of 18%

of the requests, to keep things simple 3 areas are considered: the Netherlands, Belgium and the rest of the world.

We will put this and other data in table 2.1. Calculations results will be put in this table as well. The values used in this section will all be from this table and related formulas. When a value from the table is used the description will be in bold.

Bandwidth = (Bandwidth

NL

∗ Requests

NL

) + ...

= (95.60 ∗ 46.18%) + (66.49 ∗ 27.19%) + (24.32 ∗ 26.63%)

= 68.70Mbps

(2.1)

Resources per s = Server bandwidth

Average page size = 1Gbps

318, 439.47bits = 3140.31resources/s

(2.2)

(24)

Value Origin Assumption Average bandwidth Netherlands 95.60Mbps Cable.co.uk

Average bandwidth Belgium 66.49Mbps Cable.co.uk Average bandwidth of the rest 24.32Mbps Cable.co.uk Request from the Netherlands 46.18% Server log

Requests from Belgium 27.19% Server log

Requests from the rest 24.32% Server log

Average visitor bandwidth 68.7Mbps Formula 2.1 No package/http head- ers, mobile visitors, errors or restrictions after the router.

Server bandwidth 1Gbps Hardware limit

Average resource size 318,439.47b Server log

Maximum resources per second 3140.31 Formula 2.2 Infinite bandwidth clients Maximum concurrent connections 512 Configuration

Minimum bandwidth per connection 1.95Mbps Formula 2.3 Maximum of connections is used

Resources per week 6,634,450 Server log

Resources per second 0.091 Formula 2.4 None (This is the actual

average rate over 7 days.) Bits downloaded per week 1,845,598,049,040 Server log

Average used bandwidth 3.05Mbps Formula 2.5

Table 2.1: Summary of calculations and data

(25)

Minimum bandwidth = Server bandwidth

Maximum of connections = 1 ∗ 10Gbps

512 = 1.95Mbps (2.3)

Resources per second = Time window

Visitors = 7days

6, 634, 450 = 0.091resources/s (2.4)

Average used bandwidth = Downloaded bits

Time window = 1, 845, 598, 049, 040

7days = 3.05Mbps (2.5) Let us start by investigating a possible network bottleneck. We assume that there is no overhead such as lost packages, package headers or bad connections for the transfers. Also [21] mentions that the speed is only measured to a household router. This means that delays due to slow wireless connections and such are disregarded as well.

The average resource size is 318,439.47 bits

²

in size with a 1Gbps server bandwidth that would mean it is possible to serve on average 3140.31 re- sources per second assuming an infinitely fast internet connection. A resource could be for example a html page, an image, a video, a pdf or anything down- loadable. Considering the average visitor bandwidth, assuming an infinitely large resource to be downloaded and demanding the highest possible download speed it is still possible to serve (

_68.70Mbps^1Gbps

) 14.56 clients at once.

Considering that on average 0.091 resources per second are requested with an average used bandwidth of 3.05Mbps this is not even near of being a problem. The server is limited to a maximum of 512 concurrent connec- tions. Assuming all clients get an equal share of the servers bandwidth and one connection at a time. This would result in an acceptable 1.95Mbps minimum bandwidth per connection. A visitor would then still be able to receive an average resource within decent time.

2.4.2 Disk usage

In the past disk usage posed a problem and an upgrade was made to NVMe SSDs. These are disks without moving parts directly connected to the PCIe bus. This means there is a very fast disk equipped with a very direct and fast link to the motherboard. This makes disk reads possible of over 1GBps. This is 8 times faster

³

than the 1Gbps network connection. A lot of websites also load a template from disk into memory and will use this template multiple times while building a page. For example a category page featuring multiple blogs with a certain styling or a webshop using the template for each product on a category

2

Often file and page sizes are spoken of in bytes, note that in this case it is in bits.

3

A byte consists out of 8 bits. Which means the 1Gbps network connection is at least 8

times slower than the 1GBps disk reading speed.

(26)

page. This could mean that for example a template of 1KB is loaded from disk once and used 100 times in memory for a page with 100 products. Creating a page or network load of 100KB. This is a simplified example because in reality it will be templates used within other templates to various depths. But it does give a good representation of how a small file can quickly grow after leaving the disk.

2.4.3 Memory usage

Next possible bottleneck would be memory consumption. The server has dedi- cated parts of its memory to various tasks, such as the database or processing PHP scripts. The only thing to keep an eye on is the memory that is left for serv- ing resources. With 16GB RAM dedicated to that an average resource would fit wel over 431,000 times in memory (Formula 2.6). Assuming a client bandwidth of nearing 0 and downloading the biggest resource found in the database. It is still possible to serve 685 clients at the same time (Formula 2.7). In prac- tice we would not even reach that many because of the maximum concurrent connections limit of 512. Which means memory is not an issue as well.

Average resources in memory = Available memory Average resource size

= 16GB

318, 439.47bits

= 431, 601.50

(2.6)

Biggest resource in memory = Available memory Biggest resource size

= 16GB

200, 359, 944bits

= 685, 96

(2.7)

Although maybe not as relevant but nevertheless interesting, having all the

data at hand it is easy to generate a plot containing a possible relation between

server traffic and memory usage. Figure 2.2 has multiple plots showing the rela-

tion between memory usage and various aspects of requests. The title of the plot

shows the comparison and in parentheses the Pearson and Spearman correlation

coefficient [11] respectively. The Pearson correlation coefficient states the linear

relation between two variables. Where Spearman states a monotonic relation-

ship between two variables. Where both indicate a total positive relation by the

value 1 and a total negative relation by -1. Negative means here that they are

mirroring each other. So if one variable increases the other variable decreases

and vice versa. 0 indicates there is no relation between the two variables found

at all. The Spearman correlation coefficient is added because it handles spikes

in the data better. Looking at the plots and correlation coefficients we see some

very faint relations, but nothing to go on. Memory is not the bottleneck and

there can be no relation found between memory and the requests. This means

(27)

we cannot use this data for our model. This is a good point to not dwell any longer on memory usage.

2.4.4 CPU usage

Last but not least, the CPU. Intuitively it was already suspected that this would be the bottleneck. Because during several DDoS attacks in the past it could be seen that network, memory

⁴

and disk usage was still within limits but the CPU load had values from 40 up to 80. This means 40 to 80 cores are needed to process the current demand on the system, having only 8 cores comes down to an overload of 5-10 times per core. Let us put this into a more concrete example by taking the downloaded amount of bits from one day and compare it to the used CPU time of that same day. If we calculate how much time it would take to upload that amount of bits and scale the CPU time accordingly we could see how many times the CPU would be overloaded. In this calculation we assume that 50% of the network traffic is overhead to keep things simple again.

In one day the server uploaded 31,891,247,864B of data and that took 206,362,080ms to process. Formula 2.8 shows us that it would take 510 sec- onds to upload all the data of one day consecutively. If we give the CPU the same 510 seconds to process 206,362,080ms of work it would come down to an overload of 51 times (Formula 2.9) per core.

Minimum time consecutive uploading = Transferred data in one day Bandwidth ∗ 0.5

= 31, 891, 247, 864B 1Gbps ∗ 0.5

= 510.26s

(2.8)

Overload = Total CPU time used

Time available ∗ Cores = 206, 362, 080ms

510.26s ∗ 8 = 50.55 (2.9) Where figure 2.2 shows a vague relation between requests and the memory usage, figure 2.3 shows a more clear relation between various request aspects (green) and the CPU load (red). Again the title of the plot shows the com- parison and in parentheses the Pearson and Spearman correlation coefficient respectively. All plots compare to the CPU load. Figure 2.3 (a) shows there is a faint relation to the number of requests the server handles per hour. Which is logical since more requests simply means there is more work to process. Fig- ure 2.3 (b) shows there is barely any relation to the time requests take to be handled. As discussed earlier, this amount of time mostly relies on the quality and bandwidth of the visitor. It gets interesting when comparing the (PHP) scripts to the CPU load. Figure 2.3 (c) show a close relation of the number of requested scripts that are processed per hour. This cannot only be seen from the plot but also both correlation coefficients indicate a 0.8 (strong) relation

4

It could be useful to note that no swap space was used, since this puts a serious strain on

the CPU as well.

(28)

10 20 30 40 50 60 70 80 90 100 4

5 6

·10

⁶

Time (h)

Memory in b ytes (red)

2 4 6 8

·10

⁴

Requests (green)

(a) Average request memory vs requests per hour (cc -0.04/-0.08)

10 20 30 40 50 60 70 80 90 100

4 5 6

·10

⁶

Time (h)

Memory in b ytes (red)

0 0.5 1

·10

¹¹

Request time in µ s (green)

(b) Average request memory vs request time per hour (cc 0.04/-0.06)

10 20 30 40 50 60 70 80 90 100

4 5 6

·10

⁶

Time (h)

Memory in b ytes (red)

2 2.5 3 3.5

·10

⁴

Scripts (green)

(c) Average request memory vs script requests per hour (cc -0.16/-0.09)

10 20 30 40 50 60 70 80 90 100

4 5 6

·10

⁶

Time (h)

Memory in b ytes (red)

0.5 1

·10

¹⁰

1.5 Script time in µ s (green)

(d) Average request memory vs script execution time per hour (cc 0.11/0.16)

(29)

10 20 30 40 50 60 70 80 90 100 15

20 25 30

Time (h)

CPU load in % (red)

2 4 6 8

·10

⁴

Requests (gree n )

(a) Server CPU load vs requests per hour (cc 0.25/0.36)

10 20 30 40 50 60 70 80 90 100

15 20 25 30

Time (h)

CPU load in % (red)

0 0.5 1

·10

¹¹

Request time in µ s (green)

(b) Server CPU load vs request time per hour (cc -0.03/0.13)

10 20 30 40 50 60 70 80 90 100

15 20 25 30

Time (h)

CPU load in % (red)

2 2.5 3 3.5

·10

⁴

Scripts (green)

(c) Server CPU load vs script requests per hour (cc 0.80/0.79)

10 20 30 40 50 60 70 80 90 100

15 20 25 30

Time (h)

CPU load in % (red)

0.5 1

·10

¹⁰

1.5 Script time in µ s (green)

(d) Server CPU load vs script execution time per hour (cc 0.82/0.88)

(30)

between the two. This can only be surpassed by the close relation of the Script execution time with a value of 0.82 and 0.88. The difference between plot (c) and (d) is mainly the extra dimension of time. Where plot (c) assumes all scripts take equal time to execute, plot (d) considers the actual time spent processing.

This gives a bit more detail hence the closer relation. We know now that the bottleneck of the server is the CPU and that there is a close relation between the CPU load and the execution time of scripts.

2.5 Wrapping up

This chapter has shown us that several points of a request are measured and that they are stored in a database for easy usage. Also how to interpret the data so that we can find traces and combine those to map behavioral visitor patterns. With the help of these patterns we can determine:

The visitor entering rate.

The first visited page.

Leaving percentages per page.

Probability distributions per page on what page will be next.

Average visiting time per page before a next page is requested.

The next chapter will use this information to create a prediction model.

(31)

3 Model framework

This chapter will gradually evolve a base model from a simple concept. The considered modeling techniques and tools will be discussed underway. The final base model will have variations that each can do different performance predic- tions. When investigating modeling tools they were selected on their usability, if they are still maintained, user friendliness and how suited they would be for our purposes. We start with defining the model requirements in section 3.1.

Next are the tools and used modeling technique. Section 3.2 explains TA with Uppaal, section 3.3 explains CTMC with PRISM and section 3.4 explains MA with Modest. Section 3.5 will discuss the base model variations that are needed for the various predictions. These predictions will be expressed in the form of Modest Properties and discussed in further detail in the section 3.6.

3.1 Requirements

We want a certain level of detail within our model to be able to make an accurate prediction. As well to make it possible to make certain modifications. If we want to increase for example the traffic to a certain page, that page must be in the model in the first place to modify it. To make our expectations more concrete it is good to list the requirements for our model. We expect our model to be able to:

Define multiple pages.

– Attribute processing time to these pages.

– Attribute “waiting” time between pages.

– Set up the relations between pages. (Which page could be visited next after the current.)

– Attribute the possible next pages with a probability to be next.

Define a rate at which new visitors appear.

Keep track of the amount of visitors waiting to be serviced. (This way we

know if the server is overloaded.)

(32)

3.1.1 Serialized processing of parallel request

For our model we consider available and required processing time. The available processing time is the actual time multiplied by the number of processor cores.

This means that a single core server would have 1 second of processing time available every second. A 10 core server would have 10 seconds of processing time available every second. The required processing time is the time it takes a core to process a request. If we want to relate this to the actual time we have to scale everything accordingly. We divide both the available and the required processing time by the number of cores. Now we can directly relate the number of processed requests per actual time unit. Let us clarify this with an example.

Assume a processor with 10 cores and a single core processor which is 10 times as fast. With 10 requests taking 1 second to complete the 10 core processor will be busy for 1 second. Because it will process the 10 requests parallel. The 10 times as fast single core processor will process the 10 requests sequentially. This takes

₁₀¹

second each totalling up to 1 second of processing time.

With fewer requests than cores this would not work. The single core pro- cessor will for example complete one request 10 times as fast as the 10 core processor. This is because we can only use one core per request. Having this few requests also means that the server is over dimensioned for the work. A clearly over dimensioned server is not the scenario we are interested in.

We will be modeling requests serialized which has the advantage that the model can be less complex. This is not only beneficial for the understanding and tuning of the model but also for the modeling tool’s performance. If we process just one request at a time and store the other requests in a queue, we only have to do bookkeeping on that one request. When considering processing multiple request at a time. We have to keep track of how far each requests is, requiring extra memory and bookkeeping for the model checker.

3.2 Uppaal and Timed Automata

We start with a simple concept model (section 3.2.1) made in Uppaal [12]. This concept will be extended in section 3.2.2 by introducing the option of choice between pages. Section 3.2.3 will explain some of the problems with Uppaal / Timed Automata and a possible solution.

3.2.1 Concept model in Uppaal

Uppaal allows the user to build a model with the aid of a drag and drop inter-

face, although there is still some coding needed. The amount depends on the

complexity of the model. Uppaal uses Timed Automata (TA) which are (as the

name suggests) automata attributed with time. TA work with clocks which can

be reset and associate with guards. Guards enforce that certain conditions are

satisfied before a transition can be taken. Clocks cannot be set with a certain

value or (temporarily) stored into another variable. Clocks are initialized with

(33)

Figure 3.1: Concept model in Uppaal

zero when the system is started. All clocks increase synchronously with time.

These properties are described in more detail in [15], [16] and [17].

Figure 3.1 shows our concept model in Uppaal containing 3 processes. pVisit represents visitors arriving at the queue (q). pTrace represents a visitor trace which will be processed per visitor. pDummy will assist in forcing the first transition in pTrace when there are visitors waiting in the queue.

Let us take a deeper look into this model and in Uppaal. pVisit has 2 states where the top double lined state is the initial state (the process starts here) and the bottom state marked with a U is an urgent state. Urgent means that after entering this state a next transition has to be chosen directly without time passing. The green text are guards which mean that the condition specified here have to be met to reach the next state. This guard states that c (clock) has to be 100 to pass (c==100). TA have to take a transition within an upper and lower bound. Since we specified both by comparing to an exact value the transition from top to bottom has to be taken every 100 time units. ([18] goes more in depth on this.) The blue text indicate updates, in this case q (a queue) is increased by one. Indicating a visitor has done a request. The bottom state is urgent and has to decide on another transition straight away and since there is only one transition it has to take the one back to the top state. Here another update is done, the clock c is reset to 0 so we can count back up to 100 again, to repeat the cycle.

pTrace depicts a trace of a visitor. Start is the initial state we see a guard

(green) checking if there is anything in the queue and an update/reset on clock

(34)

Figure 3.2: Extended Uppaal model

c (blue). The new thing here is a channel (cyan) named go, we can see that as well in pDummy. The go! gives a signal to pDummy via go? that the transition should be taken. Both have to synchronize on the channel. The trick here is that the channel go is made urgent, which means it has to be taken directly.

pDummy is just here to make sure there is something to synchronize allowing to make the channel and thus transition urgent. So when there is something in the queue it has to be processed right away. Now we enter the Home state where we see a transition containing c>5 which means at least 5 time units have to be passed to get past this state, since the clock is reset (c:=0) in the previous transition. This means it takes 5 time units to process this page and 0 to infinity time units till the visitor decides to click the next page (Article), which works in the same way but for 10 time units. Here we see that the queue (q) is decreased by 1 since the visitor is done and not returning. Since the state End is urgent again, the whole process repeats again when q>0.

3.2.2 Extended Uppaal model

Of course not every visitor will visit the same pages so let us extend our concept

model a bit. We now offer a choice to visit an article or the contact page after

the home page. Figure 3.2 implements this by adding a Decision state. The

clock will be reset at home as in the previous example so we can enforce a

processing time again. This processing time will be different per page as can be

seen in the model as well. This means the request can take longer depending

on the chosen page. For example the path from Start to End would take 15

(5 + 10) to infinite time units to travel via article. This could take till infinity

because we do not have an upper bound. Via Contact it would take 20 (5 + 15)

to infinite time units to travel to End. q is decreased now after the End state

instead of after the page. This way we only have to declare it once instead of

after each last page.

(35)

3.2.3 SMC extension

TA are nondeterministic/non-probabilistic, which means that decisions are ar- bitrary. In our extended example (figure 3.2) there is no preference to go to the state Article or the state Contact. In reality it could be for example that

1

10

of the visitors is interested in reading the article. While the other

₁₀⁹

of the visitors just visited the website to get the contact details. If the chances for both pages are equal the average server impact would be 12.5 (

¹⁰⁺¹⁵₂

) time units of processing time. But in reality it would have been 14.5 ((10 ∗ 0.1) + (15 ∗ 0.9)) time units.

For an accurate representation of reality we would need a probabilistic model i.e. a model that does prefer one transition over the other. Another problem is the nondeterminism on time between states. Since we do not have a real upper bound we might have to wait till infinity for a state transition. Uppaal supports multiple extension for extra functionalities. The SMC extension for example makes it possible to attribute transitions with probabilities. Also an attempt has been made on introducing rates for transitions. These rates help in defining upper and lower bounds for state transitions. [23] This extension is even integrated in the newer development packages of Uppaal. The Uppaal model checker does not use the the SMC extension. [19] [20] This means that it will not be possible to attribute probability and rates to transitions. Since we want to use the model checker with probabilities we need to start looking for a new tool.

3.3 PRISM and Continuous Timed Markov Chains

Section 3.3.1 will describe the implementation of the Uppaal model into PRISM.

Section 3.3.2 will extend this model with more freedom in page requests. From the start initial pages and their next pages can be defined instead of an almost static trace. The section will end with a description of the encountered problems.

3.3.1 Uppaal model in PRISM

PRISM [13] has a graphical user interface with a code highlighting text editor, simulator and an interface for creating properties. Models are expected to be fully coded. The model checker allows to show result values in a graph. PRISM supports amongst others Continuos Timed Markov Chains (CTMC) [24] [25] [26]

for their models. CTMC can be used to create models where the transitions are attributed with rates. Imagine one state with a transition looping back to itself.

Attributing this transition with a rate of 10 would mean that on average every

1

10

of a time unit this transition would be traveled. This could be extended by integrating a probability into this as well. When there is a probability of 50% that this transition will be taken and 50% that it will stay in the current state. Then the rate could be halved which would come down to a rate of 5. So originally we traveled the transaction 10 times every time unit and with 50%

probability only 5 times per time unit.

(36)

1 c t m c 2

3 c o n s t q M a x = 1 0 0 ; 4 r a t e r e q u e s t _ r a t e = 1;

5 6 m o d u l e r e q u e s t 7 r : [ 0 . . 1 ] i n i t 0;

8 9 [ r e q u e s t ] t r u e - > r e q u e s t _ r a t e : ( r ’ = r ) ; 10 e n d m o d u l e

Figure 3.3: Top of PRISM model

Our extended Uppaal model will be translated to a PRISM model. Where Uppaal has processes, PRISM defines these as modules. We have added the queue (q) in Uppaal as a module in PRISM called queue. The dummy process is no longer needed and removed. This leaves us again with 3 main parts or modules in the model.

Module request simulating new requests reaching the server. Module page for processing the requests. Module queue will keep track of the number of pending requests. Module request will increase the queue and module page will decrease it. The new model will be explained in more detail via multiple code fragments as PRISM does not give a graphical representation of the model.

The complete model can be found in appendix A. The first line of code in 3.3 defines the model type (CTMC). After that q max (the max queue size) and request rate (the rate of visitors requesting pages) are defined. PRISM works with modules which can be seen as the parallel running automata in the Uppaal examples. Module request has two states r which can be 0 and 1, where only state 0 is used. On line 9 we see 4 things of interest:

[request]: which means another module can synchronize on the label re- quest.

true: which is a guard as in Uppaal which has to be passed to continue.

– There could be a comparison here but in this case the guard is always passed.

request rate: which is the rate at which this transition is taken.

(r’ = r): which is the transition from state r to the same state r.

– This would be (r’ = 1) when the next state should be 1.

The module queue (3.4) keeps track of the incoming and processed requests.

This works by synchronizing with the above mentioned transition from the mod- ule request on the likewise named action request. This action or synchronization can be seen in line 4 and 5, where in line 4 the max queue size has already been reached. In line 5 there is still room and the queue is increased by 1. Line 6 decreases the queue on the action done, when a request is processed.

The module page (3.5) has 4 states:

Web Server Loads under Visitor Surges : A Model-Based Prediction

Web Server Loads under Visitor Surges: A Model-Based Prediction

G. Bouma

April 6, 2021

Committee chair:

dr. ir. R. Langerak

Formal Methods and Tools (FMT) University Twente

Committee member:

dr. ir. P.T. de Boer

Design and Analysis of Communication Systems (DACS) University Twente

Committee member:

dr. A. Hartmanns

Formal Methods and Tools (FMT)

University Twente

Abstract

The results showed that we could do a near perfect simulation of actual

scenarios. A prediction was a bit more tricky and was a bit off. There are

still some more areas to investigate to improve accuracy. But in the end the

predictions and new possibilities look promising.

Contents

1 Introduction 1

1.1 Concepts and technical background . . . . 2

1.2 Marketing websites . . . . 5

1.2.1 Sleeper . . . . 7

1.2.2 Continuously active . . . . 7

1.2.3 Specific targeted . . . . 7

1.2.4 Event driven . . . . 7

1.3 Problem statement . . . . 9

1.4 Scope of the research . . . . 10

1.5 Research questions . . . . 11

1.6 Related work . . . . 11

1.7 Approach . . . . 11

2 Web server data 13 2.1 Measuring requests . . . . 13

2.2 Processing the measurements . . . . 14

2.3 Interpreting . . . . 15

2.3.1 Identifying Visitors . . . . 15

2.3.2 Visitor trace . . . . 16

2.3.3 Behavioral visitor patterns . . . . 16

2.3.4 Visitor entering rate and leaving percentages . . . . 16

2.4 Bottleneck of the server . . . . 17

2.4.1 Network usage . . . . 17

2.4.2 Disk usage . . . . 19

2.4.3 Memory usage . . . . 20

2.4.4 CPU usage . . . . 21

2.5 Wrapping up . . . . 24

3 Model framework 25 3.1 Requirements . . . . 25

3.1.1 Serialized processing of parallel request . . . . 26

3.2 Uppaal and Timed Automata . . . . 26

3.2.1 Concept model in Uppaal . . . . 26

3.2.2 Extended Uppaal model . . . . 28

3.2.3 SMC extension . . . . 29

3.3 PRISM and Continuous Timed Markov Chains . . . . 29

3.3.1 Uppaal model in PRISM . . . . 29

3.3.2 Extended PRISM model . . . . 32

3.4 Modest and Markov Automata . . . . 33

3.4.1 PRISM model in Modest . . . . 33

3.4.2 Final model . . . . 35

3.5 Model variants . . . . 40

3.5.1 Generating . . . . 41

3.5.2 Manual editing . . . . 41

3.5.3 Modest constants . . . . 42

3.6 Properties . . . . 42

4 Combining model and data 44 4.1 Filter . . . . 45

4.2 Trace graph generator . . . . 46

4.3 Optimizer . . . . 48

4.4 Template engine . . . . 50

4.4.1 Visitor rate . . . . 51

4.4.2 Page and waiting rates . . . . 52

4.4.3 Next page probability distribution . . . . 52

4.5 Manual editing . . . . 52

4.6 Modest model checker . . . . 53

5 Evaluation 54 5.1 Comparing the model to reality . . . . 54

5.2 Prediction . . . . 55

5.2.1 Marketing campaign . . . . 56

5.2.2 Back of the envelope prediction . . . . 56

5.2.3 Model prediction . . . . 57

5.2.4 Comparison . . . . 58

5.3 Bonus . . . . 59

5.4 Tool limitations . . . . 61

6 Conclusion 62 6.1 Limitations of the work . . . . 63

Local municipalities attracting (local) visitors to social events or parks.

Stores wanting to be found on the Internet and attracting physical cus- tomers.

Foreign real estate agents reaching out to potential new home owners abroad.

Museums reaching out to all that are interested.

Webshops optimizing sales.