Reducing the energy consumption and costs of the WLCG by predicting job CPU efficiency

(1)

Bachelor Informatica

Reducing the energy

consump-tion and costs of the WLCG

by predicting job CPU efficiency

Ruben L. Janssen

June 19, 2014

Supervisor(s): Dr. Paola Grosso (University of Amsterdam) and Dr. Peter Elmer (Princeton University)

Inf

orma

tica

—

Universiteit

v

an

Ams

terd

am

(2)

Abstract

A large part of the costs for running the computing grid of the LHC comes from its energy consumption. With the increased of computational power in the near future, the energy efficiency of the grid has to be improved in order to maintain affordable energy costs. In this thesis, we present a prediction model which estimates the characteristics of a to-be-executed job. Using this information, we propose and evaluate schedulers that try to reduce energy consumption, improve performance and lower energy costs. In addition, we have investigated the impact of several parameters in order to get an overview of what factors play an important role for energy efficient scheduling. From our experiments, we can conclude that job characteristics greatly vary over time and that prediction models should therefore focus mainly on the most recent executed jobs when making predictions. Using this large weight on the most recent executed jobs, we estimate that the WLCG could reduce the energy consumption of its processors by 6.74% to 7.07%, depending on what scheduler is selected. Similarly, the performance could be improved by 2.95% to 3.44%. In addition, our results indicate that scheduling based on energy price could reduce the energy costs by up to 3.92%.

(3)

Introduction

With the decreased performance gain of Moore’s Law over the last decade, different solutions have to be investigated to improve the performance of processors. This decrease cannot be met with solutions that scale with the size of projects due to the cost of hardware. Next to the costs of the hardware itself, the energy costs of running the hardware have become a large part of the total bill [10] in the recent years. In addition to energy provisioning costs, cooling costs are also proportional to the energy consumed. In order to prevent extreme increases of energy costs, the energy efficiency has to be improved on a similar rate as the computing performance increases [12].

As a result of an enormous amount of energy consumption, the energy costs of large scale projects such as the Worldwide LHC Computing Grid (WLCG) are extremely high. In this thesis, we examine the possibilities of improving the energy efficiency of large scale projects and in specific of the WLCG. Prior research has shown that different processor architectures such as ARM are interesting options to improve the energy efficiency [6]. It is likely that the WLCG will move to heterogeneous sites that consist of different types of hardware [7]. However, not only do the different types of hardware differ, multiple generations of processors of the same type may differ as well. The main focus lies within making optimisations by making use of the heterogeneous aspects of the grid. To do so, the main research questions are the following:

Can scheduling using the heterogeneity of the architecture of the WLCG lead to an increase of energy efficiency?

To support this question and explore different possibilities of improving the energy efficiency, we pose the following sub-questions:

• How can predictions based on computing patterns be used to improve energy efficiency of heterogeneous architectures and if so, is this improvement signifi-cant?

• Is it worth to transfer workload to data centres located where energy costs are low?

Answering these research questions should give a clear view of the possibilities and effective-ness of different solutions. The goal is to propose suggestions that would benefit the energy efficiency of the WLCG, while not obstructing the performance of the grid.

1.1 Worldwide LHC Computing Grid

The WLCG project is a global collaboration to distribute, store and analyse data that is gener-ated by the Large Hadron Collider (LHC) at the European Organization for Nuclear Research (CERN) in Swiss. The WLCG is used by over 8 000 physicists spread across the world who use

(6)

the computational power to analyse data for their research.

Figure 1.1: WLCG Tier Struccture (2013)

The WLCG consists of four different tiers, two of which are illustrated in figure 1.1.

• Tier 0 is the CERN Computer Centre, which is responsible for the first pass reconstruction after the data is recorded from the LHC. In addition, it distributes the reconstructed data to the Tier 1 sites.

• At the 12 Tier 1 sites, large-scale reprocessing and analysis is done. In addition, a propor-tional amount of data is permanently stored at each site and is accessible for Tier 2 sites around the clock.

• Currently, there are 140 Tier 2 sites spread across the globe. They are typically research institutes and universities, capable of storing small portions of data for specific analysis used for small-scale simulation and reconstruction tasks.

• The individual scientists access the Tier 2 facilities through local computing resources, also referred to as Tier 3. These local computing resources are often local clusters at a University, or even an individual PC.

(7)

All together, the grid consists of about 355 000 cores and has about 200 Petabyte of storage. The WCLG processes over 2 million jobs a day using in the order of 10 Megawatt [5] in energy. This enormous energy consumption results in high energy bills which is impossible to scale with the performance of the grid. To ensure affordable energy consumption that is able to scale with the grid’s performance, we need to optimise the WLCG to significantly reduce its energy consumption.

Currently, every site has its own way of managing the execution of jobs. For instance, some schedule jobs based on the amount of hyper-threads, while others do not. As there is no consistency among the sites, some sites are better optimised and more efficient than others. By developing an energy efficient and well performing scheduling algorithm, an enormous increase of efficiency could be achieved if such a method would be used at all sites of the WLCG grid. As most sites of the WLCG are Tier 2, we will focus on improving the energy efficiency of the Tier 2 sites in this thesis.

1.2 Chapter Outline

In this thesis, we explore to the possibilities of making predictions of job characteristics. Knowing the characteristics in advance, we can base scheduling algorithms on this information to optimally utilize the resources of the grid. Chapter 2 will revolve around the current situation of the research done in this field, and what has lead to this research. In chapter 3, we discuss the source of the data used for the experiments done in this thesis. In addition, we give a description of the data and choices we made on removing corrupt data. Afterwards, we analyse the initial data in chapter 4. Here, we look into the composition of the jobs that are executed in the WLCG, as well as trying to make predictions based on this composition. In order to test our hypothesis, we developed a simulation [4] to estimate the difference in performance and energy consumption, which is discussed in chapter 5. Chapter 6 focuses on the experiments done with the provided data and earlier said simulation in order to evaluate the impact of the scheduling. Chapter 7 extends the scheduling algorithm to multiple sites in order to lower the costs. In the end, we discuss the results and draw conclusions.

(8)

CHAPTER 2

Related Work and Background

This thesis is highly influenced by a paper written by Abdurachmanov et al. [7], where the need for and possibilities of improving the energy efficiency and performance of the WLCG is stated. According to Abdurachmanov et al, the compute needs of the WLCG could increase by a factor O(103_{− 10}4_{) in the upcoming years, which is significantly more than Moore’s Law. To} keep up with this need relying only on Moore’s Law is impossible without extreme funding for both hardware and power costs. However, Abdurachmanov et al provide three possible ways to improve the performance while retaining affordable energy costs:

• Transition to multi-core aware software applications: By allowing applications to utilize multiple cores, opportunity is created for optimizations of processor use at the application level with respect to the current ”application per core” model.

• Processor Technology: In addition to normal CPU’s, alternative architectures are also of interest to incorporate in the WLCG in the form of coprocessors (such as GPUs). These architectures have typically improved energy efficiency and are often specialised in specific types of computation.

• Data Federations: Via the currently being deployed data federation, an application in center A can access a file in center B. This allows for additional scheduling options and greatly reduces storage requirements.

With the increased likelihood that the WLCG will consist of multiple types of hardware, research for various options such as ARM processors [8] has been conducted. The increased amount of research in energy efficient computing for mobile devices has led to an enormous development of energy efficient architectures. With the improvement, their use is not only limited to mobile devices any more, but their use might be extended for PC’s as well as supercomputers [8]. As the ARM architecture is leading in the mobile device industry, research has been done to investigate the possibility of using ARM for scientific computing. Although their performance is lower than the compared Intel architectures, the performance per watt is much higher [8] in most cases [9], as shown in table 2.

Type Cores Power (TDP) Events/minute/core Events/minute/Watt Exynos4412 Prime @ 1.704 GHz 4 4W 1.14 1.14 dual Xeon L5520 @2.27GHz 2 x 4 120W 3.50 0.23 dual Xeon E5-2630L

@2.0GHz

2 x 6 120W 3.33 0.33

(9)

The table shows that the performance Intel architectures is roughly three times as much as those of the ARM architecture (Exynos4412), while the performance per Watt is about four times as much for the ARM architecture. From these statistics we can conclude that the ARM architecture poses a significant energy consumption reduction, although its performance is lower. Given that different hardware architectures are of interest to use for scientific computing and have proven to be useful in such scenario’s, it is assumable that different types of architectures will be incorporated in large computers in the future. By using different types of hardware, the set of processors of the supercomputer will become heterogeneous which provides the opportunity to schedule jobs differently in order to maximize the utility of each processor type.

However, not only do different types of architectures differ, different generations of the same architecture are not exactly the same either. As no previous research has considered scheduling based on the heterogeneity of the processors set, and especially not based on the heterogeneity of the same types, it would be very interesting to explore the possibilities of improving the energy efficiency and/or performance of supercomputers by scheduling based on heterogeneity of processors.

In addition to improving efficiency by scheduling based on the heterogeneity of processors, the computing model of the WLCG is now evolving in different ways to improve its performance [7]. For instance, High Energy Physics (HEP) applications are in the processes of transitioning into multi-core aware software applications. With respect to the current ”application per core” model, this provides a significant possibility to improve the energy efficiency and performance of the WLCG.

(10)

CHAPTER 3

WLCG Data

In this chapter, we discuss the data that we use for our experiments. The data consist of job logs from several sites, of which we will simulate their execution in our simulation. We discuss the exact content of these job logs and their flaws.

3.1 Sites used for experiments

For all the analysis and simulations in this thesis, job logs from a set of 7 different Tier 2 sites located in the USA is used. The used sites are the following:

1. T2 US Caltech 2. T2 US Florida 3. T2 US MIT 4. T2 US Nebraska 5. T2 US Purdue 6. T2 US UCSD 7. T2 US Wisconsin

Each site differs greatly regarding their hardware, size, software and settings. Some sites are larger with more up-to-date hardware, while others use hardware of up to eight years old. The settings regarding queueing and scheduling also greatly differ. For instance, some sites schedule jobs based on the number of hyper-threads, while others only schedule based on the numbers of processors. In our experiments, the simulation will take the different types of job sets into account, but will use the same scheduling algorithm and processor sets for all sites in order to compare their impact on each job set.

3.2 Data

3.2.1 Description of Used Data

For our experiments, we used job logs from all seven sites. By using a simple curl call, we can retrieve the job logs between given times at a specific site. The jobs in the logs are those which are finished between the two timestamps, which means that jobs started days prior to the starting time stamp are also contained. Table 3.1 shows the information a job log contains. Entries that also have a description are used in our simulation.

(11)

Information Description WINIp

VOName WNHostName

WrapCPU The amount of seconds the CPU has been uti-lized for this job.

NEventsProcessed

WrapWC The total time in seconds that the job is run-ning.

STATE

FinishedTimeStamp

DURATION Duration of the job in seconds.

GridName The name of the author of the job. For pro-duction jobs, this is always Cmspilotjob. On the contrary, analysis jobs are initiated by re-searchers themselves.

TaskName

StartedRunningTimeStamp TYPE

SchedulerJobId

JOBTYPE Indicates the type of job: production, anal-ysis, relval and reprocessing. As relval and reprocessing are almost only executed at Tier 1 sites, the form a very small portion (rarely higher than 0.05%) of the job logs because those logs are from Tier 2 sites. Therefore, we ignore these types and focus on produc-tion and analysis jobs

Table 3.1: Information description of job logs. Description is only given for the information that is used in the simulation.

In this thesis, we only consider jobs of two types: analysis or production. The other types of jobs do are most often executed at Tier 1 sites and form about 0.05% of the jobs executed at Tier 2 sites. The production jobs are simulation jobs that are normally run by a team of people for the entire experiment. The objective of these simulation jobs is to produce simulated (Monte Carlo) data that looks like the kind of data that comes from the actual CMS experiment.

On the contrary, the analysis jobs are initiated by physicists themselves. The analysis jobs are applications written by the researchers themselves and are most often used for Monte Carlo simulation. In addition, the analysis type jobs also have a minimum duration of 10 minutes. This minimum duration is set to prevent ’black hole’ nodes from causing too much damage. These ’black hole’ nodes happen when an error occurs that causes the application to crash, while the node is still functional enough to fetch a new job from the queue. Once a new job is assigned, the node crashes immediately due to the error and as a result, the queue is flushed.

3.2.2 Failed Jobs

A significant portion of the jobs executed fails during their execution time. Some jobs fail and crash, while others execute no work and terminate after some time. In the dataset, we have identified at least three different types of errors:

• Failed jobs: the log contains the status of an executed job. Jobs that are not successfully terminated for any reason, are marked as failed. The jobs with the failed status are therefore not included in the dataset.

(12)

used for 0 seconds during the execution of that job. This means that the job has effectively done no work and is very likely to be an error.

• Wrong starting time stamp: on default, the starting time stamp of a job is 1970-1-1. Jobs noted with this starting time stamp are also removed from the dataset as some error must have occurred since it is impossible for a job to be started in 1970.

These errors occur in the job logs of the sites. However, some of these errors might have not occurred at all, but are errors in the logging itself.

Even though these errors may downgrade the effectiveness of the improvements posed in this thesis, they form a significant portion of the data and contribute substantially to any analysis on the data. To maintain a realistic dataset for the simulation and tests, we have chosen not to eliminate the failed jobs and corrupted data. The failed jobs do not interfere with the mechanics of the simulation, while they do provide a more realistic environment.

(13)

CHAPTER 4

WLCG Data Analysis

Before doing experiments and simulations on the dataset, we will analyse the actual data in order to get an overview of what the exact characteristics of the data are. In this chapter, we look into making predictions regarding a job’s CPU efficiency. The CPU efficiency is defined as the percentage of how much time the processor was working out of the time it was occupied by a job. Based on the estimated CPU efficiency of a job, the scheduler could improve the energy efficiency of the site and the grid.

4.1 CPU Efficiency Distribution

For energy consumption research purposes, it is very useful to know how much power specific jobs consume. In order to estimate the power used for running a job, the CPU efficiency of the job has to be known in addition to the time that the job was running. Because we are only looking at the production and analysis jobs in this thesis, only two different graphs of the CPU efficiency distribution are provided. The following graphs are generated from the job log of MIT and are from a one month period.

(a) CPU efficiency distribution with an average of 75.23%

(b) CPU efficiency distribution with an average of 73.97%

Figure 4.1: The CPU efficiency distribution of production and analysis jobs

These distributions are representative for most sites; the larger part of the jobs executed at the site have an CPU efficiency of 90% or above. The peak at 0% CPU efficiency is caused by jobs that fail and do not use the CPU. In addition, the distribution of the production jobs are always geared to a higher CPU efficiency compared with the analysis jobs. As a result, the distribution of the analysis jobs is relatively more even and is therefore more heterogeneous.

(14)

From the perspective of energy efficiency, the heterogeneity of the distributions may be used to maximize the utilization of the hardware. This could be achieved by scheduling the CPU intensive jobs on the better performing processors, while scheduling the lesser CPU intensive jobs on the relative worse performing processors in order to utilize the power of the better processors as much as possible.

4.2 Making and Using Predictions

By making predictions about the characteristics of a job, scheduling algorithms can improve the performance and the energy efficiency of the WLCG. Although it is impossible to know these characteristics in advance, it is possible to make an estimate. Each job is initiated by an author and these authors are most often researchers working on a specific project for a long time. The applications they run are developed by the researchers themselves and are executed many times with different parameters. As a result, jobs that are initiated by the same author should be very similar. By looking at the job history of a given author, it is therefore possible to make predictions regarding the CPU efficiency and the duration of a specified job. Unfortunately, production jobs are all initiated by the same author, so all production jobs have an equal estimate in our prediction model. However, as the CPU efficiency of production jobs is almost always 95% or higher as shown in figure 4.1 if the job does not fail, the CPU efficiency estimate will still be useful.

When filtering the CPU efficiency distribution on specific authors, very clear job characteris-tics become apparent. Three examples of the CPU efficiency distribution of specific authors on the Caltech site over a one month period are shown in figure 4.2.

(15)

(a) Author 1 with an average of 94.21% (b) Author 2 with an average of 50.07%

(c) Author 3 with an average of 30.78%

Figure 4.2: The CPU efficiency distribution of jobs initiated by three different authors We can see a distinct peak of percentage of jobs in CPU efficiency in each graph. This means that for that specific given author, most jobs have a CPU efficiency in the range of that peak. Given this, we can estimate the CPU efficiency of a job in advance based on these graphs because in most cases, the CPU efficiency will be in the range of the peak of the graph. To get a better picture of the actual accuracy of these predictions, we can calculate the standard deviation. For all analysis jobs, the standard deviation at all sites ranged from 0.21 to 0.30, while the average standard deviation of all the authors ranged from 0.08 to 0.12. The decreased standard deviation means that it is possible to make predictions for each author that are more accurate than those for the whole set of jobs.

(16)

CHAPTER 5

Simulation

In order to examine the possibilities of improving the energy efficiency of the WLCG, we made a simulation to test the performance of the various algorithms. The simulation will run a given set of jobs on a set of processors. Each processor and job has its own characteristics and the simulation will mainly focus on the total performance and energy consumption of the sites. In this chapter, we discuss the implementation of the simulation and discuss the design choices and its parameters.

5.1 Implementation

An overview of how to program works is given in figure 5.1. In order to create a representative simulation, the simulation makes use of a processor pool and a job pool. The processor pool is the set of processors that are able to run a job. Once a processor becomes available, the scheduling algorithm will find the best job in the job pool for the specific processor. Based on the predictions of the characteristics of a given job, the algorithm defines which job is the best available. We have implemented this as Python program in order to keep it high level and keep the simulation as simple as possible.

Figure 5.1: Overview of the simulation

Once a processor is freed, the information of the finished job is used to update the prediction of jobs from the same author. Then, the scheduler selects a new job using the predicted charac-teristics of the jobs. After doing so, the job is executed and this cycle repeats until all jobs are processed.

In order to represent a realistic situation, the processors should be occupied with a job once the simulation starts. Therefore, the processors in the processor pool are initiated with jobs from

(17)

FIFO manner in order to make sure the initial state of the site is consistent. In addition to the starting state of the processors, we also have to establish the initial author history. To do so, the simulation uses jobs logs one month prior to the ones that are run on the simulation. Based on these job logs, the average CPU efficiency of each author is calculated and is set as the estimated CPU efficiency.

Once the processors are assigned a job and the author history is set up, the simulation can begin. The simulation will enter a loop that continue until all jobs from the job pool are executed. Given the set of occupied processors, the simulation seeks the processor with the job that is going to finish first of all running jobs (note that we have the actual running time from the job logs). As this job will finish first, the processor is freed. The remaining time of the jobs on the other processors is updated with the difference in time, because they will require that much less time to finish from that point. For the freed processor, a new job is selected from the queue and the loop continues.

Figure 5.2: Illustration of how a job is selected (jobs grouped per author)

The job selection is based on the ranking of the processors and jobs. In figure 5.2, an example is given with processor ranking on performance and job ranking based on CPU efficiency. Once a processor is available, a job is found that is proportional to the processors ranking. In the example, a relative high performance processor is assigned a CPU intensive job, in order to maximize the potential of the processors performance.

5.2 Processor Set

For the simulation, we used a representative processor set to assign the jobs to, which shown in table 5.1. The set is comprised by some of the most common processors types of the seven sites, while retaining a representative distribution of generation and performance of the processors in the actual Tier 2 sites. As this set is representative for all sites in the WLCG, we expect that the results would be very similar for any other site. However, we expect that the main source of variety in results would be caused by a difference in the heterogeneity of the performance of the processors. A set of processors that greatly vary in performance would benefit more from scheduling than a set with very similar processors.

(18)

Processor HS06 Cores Nodes HS06 per Core Idle Power 100% Power Xeon 5160 3.00GHz 38.47 2 35 19.235 216 315 Xeon E5345 2.33GHz 58.98 4 21 14.745 220 334 Xeon E5345 2.33GHz 29.49 4 4 7.3725 220 334 Xeon L5420 2.50GHz 70.66 4 75 17.665 179 279 Xeon L5630 2.53GHz 94.75 4 32 23.6875 60.7 138 Xeon L5640 2.26GHz 142.125 6 44 23.6875 120 300.7 Xeon E5 2660 2.20GHz 308.52 8 30 38.565 56.1 263 Xeon E5 2670 2.60GHz 263.02 8 48 32.8775 99.3 337 Table 5.1: Processor information of processor set used in experiments. Processor benchmarkings from CMS Tier 2 spreadsheet [2], energy characteristics from SPEC [3].

5.3 Parameters and Design Choices

As the estimate of the job characteristics changes over time, we need an accurate method to keep it up-to-date. The easiest way would be to take the mean of the characteristics of all prior jobs, while updating it once a new job is executed. However, the weight of the prior jobs will grow over time as more jobs are executed. Due to the increased weight, it will become very difficult to modify the average values because new jobs will have almost no impact. To prevent the estimate values of becoming almost static, we have to give more weight to new jobs so they have a significant impact on the average value. In the simulation, a static weight of 1% is given to new jobs, making sure that the job characteristic estimate is up-to-date.

Another problem is the encounter of jobs with an unknown job author. As there is no history of the author, it is impossible to make a prediction regarding the characteristics of the job. In order to maximize the job scheduling potential, jobs with unfamiliar authors are given priority because that will allow other jobs initiated by that specific author to be scheduled properly.

From the job logs, we can derive the exact starting time of the jobs. Unfortunately, it is impossible to know what the queue consisted of because we do not have enough information of the jobs. In reality, it is most likely that the size of the queue varies greatly over time, mainly due to the day-night cycle. As it is impossible the replicate the job queue, we made a batch system. The batch system splits the job pool in batches of 2000, executing and rescheduling only the jobs in the batch before moving to the next batch. Not only decreases this the size of the queue to get a more realistic queue size, it also prevents starvation.

For some scheduling algorithms, the problem of starvation can occur: jobs might never be selected and therefore never be executed. To prevent this and to ensure a relatively quick response time, the set of jobs is split into batches of 2000 jobs. On average, around 2000 jobs are processed at each site each 8 hours. This means that upon starting a new batch, a job will be running within ±8 hours despite of the selected algorithm. The scheduler will therefore rearrange the execution order of the 2000 jobs to improve the overall performance and/or energy efficiency.

(19)

CHAPTER 6

Improving the WLCG efficiency

In this chapter, we discuss the three scheduling algorithms that we will test and explain why they are of interest. Furthermore, we will go in to the details of our proposed experiments and define our methods of measurement. Lastly, we will show and discuss the results of our experiments.

6.1 The Synergy Between Performance and Energy Efficiency

Until the last few years, the performance of processors has been increasing exponentially accord-ing to Moore’s Law. In addition to the performance, the energy efficiency has been increasaccord-ing as well. On average, the computations per kWh have been doubled every 1.57 year in the past 60 years [13]. With this steady increase, it is safe to assume that new generations of processors are more energy efficient than their previous generations, especially with the increased attention the scientific community has given to energy efficient computing in the recent years.

In the simulation used in this thesis, we propose a scheduling algorithm that uses the ranking of energy efficiency processors to reduce the energy consumption of the WLCG. Based on this ranking, jobs are scheduled depended on their CPU intensity to their proportional energy efficient processors. While this improves energy efficiency, it also boosts the performance of the grid. The main reason for this is that the ranking of energy efficient processors is very similar to the ranking of the performance of the processors. In other words, scheduling based on the performance is almost the same as scheduling based on energy efficiency, because both schedule jobs based on the CPU intensity.

As we expect that both the energy consumption and the performance will improve, we will look at both types of scheduling in the experiments. The difference should be minor between the two scheduling types, and should not exist if the ranking of the processors is equal. Because most sites consist of only the same type of processors (Intel, for instance), the only difference between its processor types is the generation. As a result, for ranking of the processors based on energy efficiency and on performance is most likely the same, as new generations of hardware are both more energy efficient and have a higher performance. Sites with multiple types of hardware that have different architectures, such as Intel and ARM, are less likely to have a similar outcome of performance based and energy based scheduling. This is because the ranking of processors based on performance and energy efficiency tends to be different, because the architectures scale differently. For instance, ARM processors are in most cases more energy efficient, but sacrifice performance.

6.2 Experimental Setup

During the development of the WLCG, performance of the computation grid was of most im-portance in order to handle the enormous amount of jobs handled every day. For this reason, energy efficiency improvements are only possible if the performance of the WLCG does not suf-fer significantly. Therefore, it is necessary to get a clear picture of the performance of the grid

(20)

with different scheduling algorithms. To maintain the grid’s performance, any future algorithms that focus on energy efficiency should aim for a similar or improved throughput as the normal algorithms. To do so, we will provide two different processor rankings: energy efficiency and performance ranking. The rankings of the processors are based on data that is shown in table 5.1.

However, for both processor rankings, a ranking of jobs is required as well. In these exper-iments, this ranking is based on the average CPU efficiency of the jobs of the specific author. The reason for choosing for CPU efficiency, is that for all processor rankings the processor with the highest ranking (whether highest performing or most energy efficient) will benefit the most if the job very CPU intensive.

For the experiment, we used three different types of scheduling algorithms:

• First In First Out (FIFO): FIFO makes sure that the first initiated job is executed first. As the order is based on the jobs logs, which is order in chronological finish time, it is very comparable to a random scheduler.

• Energy Efficiency: the Energy efficiency scheduler aims to schedule CPU intensive jobs on relative energy efficient processors. By doing so, the total energy consumption is de-creased because the CPU intensive jobs benefit more from the improved energy efficiency than the lesser CPU intensive jobs.

• CPU Performance: the CPU Performance scheduler aims to schedule jobs to processors with a performance that is proportional to the relative CPU efficiency of the job. By doing so, the total performance is increased because the CPU intensive jobs benefit more from the improved performance than the lesser CPU intensive jobs.

6.2.1 Definitions

6.2.1.1 Defining an Energy Efficient Processor

In order to schedule a job on the most energy efficient processor, we need to define an energy efficient processor. The easiest way of doing so would be to look at the maximum energy con-sumption of the processor and schedule the job to the processor with the lowest cost. While this an easy and practical way, this type of scheduling is far from desirable because it is not optimal as it does not consider the energy consumption increase when the CPU load increases.

Given that all processors of the grid are occupied 100% of the time, it is possible to discard the idle energy consumption of all processors. Because they processors are always occupied, the total amount of idle energy consumption of all processors is a static value that is not very interesting. However, this allows us to ignore idle power consumption and normalize all power functions of processors to 0 as done in figure 6.1.

(21)

(a) Energy consumption (b) Increase of energy consumption compared to idle consumption against CPU load

(c) Relative energy consumption by performance normalization

Figure 6.1: Steps of translating processor energy consumption into a energy efficiency ranking (per core)

Based on these energy consumption functions, we can make an order of energy efficiency processors and scheduling can be done according to it. The scheduling is done based on the correlated ranking of the energy efficiency of the processor and the CPU efficiency of the job. For instance, a job that has a relative high CPU efficiency will be scheduled on a energy efficient processor, while a job with a relative medium CPU efficiency will be scheduled on a medium energy efficient processor.

However, these functions do not take processor performance into account. To do so, the values of the functions should be multiplied by the difference in performance of the processors. For example, a processor that is twice as slow as another, will have an increased power consumption for twice as long. Therefore, the actual power increase is twice as much than normal, because it lasts twice as long. In essence, this represents the energy costs of running a job on a processor. A ranking of processors based on this ratio is therefore a proper way to search for the best processor for the job.

6.2.1.2 Calculating the Energy Costs

Once a job is executed, the simulation determines the energy it costed to execute. To do so, the processor running time is multiplied by the energy consumption of the processor at full power. For this calculation, we assume that the processor is always on a 100% CPU efficiency when it is in use. Although this may introduce errors, the main reason for this is that it is impossible to know the actual CPU efficiency from the logs. However, as the energy consumption increase is

(22)

very close to linear, it is not very prone to error because the ratio of energy consumption between two processor will remain very similar.

6.2.1.3 Defining an High Performance Processor

For the CPU performance scheduler, a job is run based on its CPU efficiency ranking that correlates to the ranking of the performance of the freed processor. To do so, the simulation makes an order of processors based on performance. HepSpec06 (HS06) is a benchmark derived from the SPEC2006 benchmark, with an attempt to get the right weighted mix to correspond to HEP applications [1]. By using the HS06 benchmarks, a ranking of performance is established per site. The benchmarks of the processors we used for this experiment are shown in table 5.1. 6.2.1.4 Measuring the Performance

To compare the performance of the three scheduling algorithms, we need to define the measure-ment of the performance of the algorithm. As the sites consist of many different processors, we choose to define the performance by the amount of seconds a job has occupied a processor, summed for all jobs. By doing so, we calculate only the exact amount of seconds it took to process the job set. This is far more accurate than measuring the time it took to finish all jobs, as you are also including idle time for a large part of the processors, if the last jobs are still running.

6.3 Parameter Settings

For the experiment described in section 6.2, we have to set some parameters. As these parameters might a significant impact the effectiveness of the proposed scheduling algorithms, we have to find the most optimal values of the following parameters for our experiments:

• Job weighting: In order to update the job characteristic estimate, the weighted average of the characteristics of the prior jobs and the new job is taken. Previously, a weight of 99 and 1 were used for prior and new jobs respectively. However, changing this weight ratio might improve the accuracy of the job characteristic estimate. A high ratio would mean a more precise prediction, while a lower ratio would be able to adapt more quickly to changes in job characteristic patterns.

• Ignoring failed jobs: as some jobs might fail and therefore influence the predictions of the jobs, the prediction model might be improved if failed jobs are not taking into account when determining the estimate.

6.3.1 Experiments

6.3.1.1 Job weighting

An interesting range for the job weight would be in the range of 0.1% to 50%. A weight lower than 0.1% would have no impact at the estimate at all, while a weight higher than 50% would barely take previous jobs in consideration, negating the whole prediction model. In between, we use the weights of 0.5%, 1% and every multiple of 5 until 50.

For each weight, we determine the increase of performance and energy efficiency in order to compare all weights. As we expect the outcome to very similar for all sites, we only use the job logs from Caltech over a one month period. Because there is no consistency in the characteristics of the jobs in the job logs and therefore not much difference in predicting the characteristics, we can assume that the outcomes of Caltech be very similar to those of the other sites.

6.3.1.2 Ignoring failed jobs

For this experiment, we use exactly the same setup as done for the job weighting experiment. The only difference however, is that jobs that fail are not used to update the prediction of a

(23)

specific author. We chose for this possible optimisation because failed jobs are likely to have a very distinct and deviating CPU efficiency. As a result, failed jobs are likely to have a negative impact on the job estimate. We already defined a job as failed in section 3.2.2 and we consider a job to be failed if it matches one of these three characteristics.

6.3.2 Results

6.3.2.1 Job weighting

In this section, we discuss the results of the job weighting experiment. The results of the optimisation are split into two graphs: in figure 6.2, the reduction of the energy consumption is shown for both scheduling algorithms. Similarly, figure 6.3 illustrates the improvement of the performance. The exact details of the graphs are shown in table 6.1.

Figure 6.2: Graph of energy improvement for different weights

Figure 6.2 demonstrates that the improvements of both performance and energy efficiency are very minimal if the weight of a job is low. The improvement rapidly increases until a weight of 10% and then slight increases until 20%. From there, it remains steady with minimal fluctuation.

(24)

Figure 6.3: Graph of performance improvement for different weights

Similary to figure 6.2, figure 6.3 shows a significant increase until a job weight of 10%. From there, the improvement remains relatively equal for the other weights.

Weight Performance Improvement Performance Scheduling Energy Consump-tion Improvement Performance Scheduling Performance Im-provement Energy Scheduling Energy Consump-tion Improvement Energy Scheduling 0.1 2.56% 4.74% 2.25% 5.00% 0.5 3.27% 5.91% 2.69% 6.00% 1 3.39% 6.47% 2.90% 6.79% 5 3.84% 7.44% 3.31% 7.82% 10 3.99% 7.76% 3.42% 8.17% 15 3.94% 7.77% 3.36% 8.07% 20 4.07% 8.03% 3.48% 8.34% 25 4.05% 7.95% 3.54% 8.53% 30 4.00% 7.89% 3.44% 8.31% 35 3.97% 7.74% 3.41% 8.27% 40 4.00% 7.90% 3.50% 8.39% 45 3.98% 7.84% 3.40% 8.22% 50 4.02% 7.92% 3.45% 8.30%

Table 6.1: Improvement details of optimisations

These plots demonstrate that the characteristics of jobs fluctuate greatly because an ex-tremely high weight still provides a relative proper estimate. The most optimal weight is in the range of 20% or higher, increasing the energy efficiency and performance by up to 8% and 3.5% respectively. To account for this, we will use a job weight of 20% in our main experiment in order to achieve an optimal efficiency.

(25)

6.3.2.2 Ignoring failed jobs

Here we pose and discuss the results of the optimisation experiment described in section 6.3.1.2. In figure 6.4, we see the percentage of energy improvement for each weight when not taking failed jobs into account when updating a jobs estimate. Furthermore, we clearly see a steep increase of improvement until a job weight of 1%. However, there appears to be no consistency with any weights higher than 1%, as the improvement greatly fluctuates afterwards.

Figure 6.4: Graph of performance improvement for different weights without using failed jobs to update the job predictions

Figure 6.5 shows the results of scheduling based on CPU performance, also for different weights and not using failed jobs to update the job estimate. Similarly to figure 6.4, there is a large increase until a weight of 1%, while there is no consistency for the weights higher than 1%. The exact improvements for both graphs are shown in table 6.2.

(26)

Figure 6.5: Graph of performance improved for different weights without using failed jobs to update the job predictions

Weight Performance Improvement Performance Scheduling Energy Consump-tion Improvement Performance Scheduling Performance Im-provement Energy Scheduling Energy Consump-tion Improvement Energy Scheduling 0.1 2.07% 3.69% 1.18% 3.97% 0.5 2.45% 4.52% 2.03% 4.65% 1 2.51% 4.65% 2.20% 5.03% 5 2.54% 4.69% 2.24% 5.04% 10 2.60% 4.72% 2.22% 5.04% 15 2.36% 4.37% 1.96% 4.62% 20 2.56% 4.75% 2.18% 4.97% 25 2.56% 4.75% 2.27% 5.00% 30 2.38% 4.42% 2.05% 4.76% 35 2.33% 4.34% 2.05% 4.76% 40 2.45% 4.48% 2.16% 4.94% 45 2.38% 4.38% 2.05% 4.70% 50 2.58% 4.74% 2.24% 5.07%

Table 6.2: Improvement details of optimisations

Even though that failed jobs are very likely to have deviating CPU efficiencies and therefore might have a negative impact on the job prediction model in theory, this is clearly not the case in practice. While the energy efficiency improvement of normal scheduling is in the range of 7.5% to 8.5%, the energy consumption is only reduced by ±4.8% when not taking failed jobs into account. Similarly, the performance gain decreases from the range of 3.5% to 4%, to only 2.0% to 2.5%. Because of this reduction in improvement, it is plausible be that the failed jobs are beneficial to the characteristics prediction of the jobs. Therefore, we used all jobs for the

(27)

6.4 Results

In this section, we discuss the results of the experiment with the parameters we just previously mentioned. The results of the energy consumption experiment are showed in figure 6.6, with more specific details in table 6.3. Similarly, figure 6.4 illustrates the total amount of seconds the cores were occupied. The exact details of this graph are shown in table 6.4.

Figure 6.6: Comparison of energy consumption by each algorithm for all sites

Figure 6.6 demonstrates that both the scheduling based on CPU performance and on en-ergy efficiency significantly reduce enen-ergy consumption, an improvement ranging from 4.15% to 6.98%. However, the energy efficiency improvement on average is slightly better when scheduling specifically for it; 0.20%.

Site FIFO (joule) Performance Scheduling (joule) Performance Scheduling Improvement Energy Scheduling (joule) Energy Scheduling Improvement Caltech 3.15E+11 2.92E+11 8.03% 2.91E+11 8.34% Florida 3.35E+11 3.19E+11 5.13% 3.17E+11 5.57% MIT 3.79E+11 3.50E+11 8.36% 3.48E+11 8.74% Nebraska 4.76E+11 4.45E+11 7.09% 4.44E+11 7.25% Purdue 5.89E+11 5.49E+11 8.06% 5.42E+11 8.59% UCSD 1.99E+11 1.89E+11 4.93% 1.89E+11 5.15% Wisconsin 4.69E+11 4.45E+11 5.56% 4.44E+11 5.84%

Average 6.74% 7.07%

(28)

Figure 6.7: Comparison of performance by each algorithm for all sites

Similarly to figure 6.6, figure 6.4 also shows a clear improvement for both scheduling algo-rithms compared to the FIFO strategy. The difference between the two proposed algoalgo-rithms is about twice than in that table 6.3, yet still very small.

Site FIFO (seconds) CPU Schedul-ing (seconds) CPU Scheduling Improvement Energy Scheduling (seconds) Energy Scheduling Improvement Caltech 4.73E+09 4.54E+09 4.07% 4.57E+09 3.48% Florida 4.99E+09 4.86E+09 2.63% 4.88E+09 2.26% MIT 5.72E+09 5.49E+09 4.13% 5.52E+09 3.58% Nebraska 7.10E+09 6.85E+09 3.64% 6.89E+09 3.04% Purdue 9.07E+09 8.72E+09 4.01% 8.76E+09 3.50% USCD 2.93E+09 2.85E+09 2.65% 2.87E+09 2.22% Wisconsin 6.85E+09 6.67E+09 2.96% 6.68E+09 2.56%

Average 3.44% 2.95%

(29)

CHAPTER 7

Reducing Energy Costs

In addition to improving energy efficiency, we also investigated reducing the energy costs of the WLCG. This is not only achieved by reducing the energy consumption with the methods previously proposed, but could also by achieved by making use of the difference in energy price of multiple locations. We will test three different scheduling options on two fictional sites to compare the results:

• First In First Out (FIFO): FIFO makes sure that the first initiated job is executed first. As the order is based on the jobs logs, which is order in chronological finish time, it is very comparable to a random scheduler.

• Proportional Difference: the Proportional Difference scheduler tries to select jobs based on the energy price difference between the two sites.

Figure 7.1: Selection procedure overview of proportional difference scheduling

To do so, the energy price of the site of the freed processor is subtracted from the energy price of the other site. This will results in the difference of price and CPU intensive jobs are assigned to the processor if this difference is a relative high positive value. Similarly, less CPU intensive jobs are scheduled if the energy price of the site of the freed processor is high, which results in a negative value.

For instance, if the energy price is much lower at the site of the freed processor, a CPU intensive job is selected as shown in figure 7.1. However, if the energy price is high, a job with a low CPU intensity is selected. When the price is the same the sites, a job with an average CPU intensity is executed.

(30)

• Current Cost: the Current Cost scheduler incorporates the current energy price of the sites for the processor ranking. The energy consumption of the processors in each site is multiplied with their respective energy price, creating a new ranking of processors. Ac-cording to this ranking, the corresponding job is selected similarly as showed in figure 5.2.

7.1 Experiment

To simulate and predict the potential energy cost reduction, we use two identical sites with different energy prices. In addition, both sites have access to the same job pool.

Figure 7.2: Energy price per hour for over a 1500 hour period at Princeton.

For our experiment, we make use of a dataset of the energy costs of Princeton, of which figure 7.2 shows a portion of. The dataset is a log of the energy consumption and costs per hour over multiple years of the Princeton cluster. For each hour, we can calculate the energy price and the algorithms will schedule according to that. As the energy price changes every hour only, we do not fine grain the energy price for each minute by interpolating, but calculate the energy costs per hour.

As we have only one dataset of energy prices but multiple sites, we have chosen to set a time difference between the two sites. In addition to practical reasons, this also represents the energy price fluctuation over the course of the day. To implement this in our simulation, we have set a time difference of 12 hours. By setting it to 12 hours, the energy price difference will be very close to maximal because the energy price fluctuates greatly as a result of day and night.

7.1.1 Price Difference Range

For the proportional difference scheduler, we need a scale to determine the difference between the energy prices of the two sites. Ideally, this scale would be ranging to the maximum difference in price between the energy prices. However, as the price differs greatly over time, it is very difficult to set a maximum difference. On a daily basis, the maximum difference according to our data [11] is about 0.005 dollar per kWh. In order to find the most optimal scale size, we will run the proportional difference scheduler for multiple scales ranging from 0.00125 to 0.015. As the job sets are very similar and the results should be similar, we will do this for only one job set.

(31)

7.1.1.1 Price Difference Range Results

Figure 7.3: Comparison of energy costs for different scales of running jobs from a one month period of Caltech.

In figure 7.3 the energy costs are shown for different scales. From this graph, it is very clear that the difference in performance is insignificant because the total price difference is less than 4 dollar over a one month period. As the results show no significant improvement at any scale, we use a scale of -0.005 to 0.005 difference in dollar for our other experiments.

7.2 Results

Here we present the results of our experiment on the job logs of a one month period. Figure 7.4 shows the plot of the results on all different job sets. In table 7.1 the exact details of the plot are given.

(32)

Figure 7.4: Comparison of energy costs by each algorithm for all sites

From figure 7.4 we can clearly see that the Proportional Difference scheduling does not reduce the energy cost. This also explains why the scale sizes test in section 7.1.1.1 has no impact on the costs at all, because the algorithm itself also has no impact.

However, figure 7.4 clearly shows a significant reduction in energy costs, ranging from 2.80% to 5.54% for different job sets.

Site FIFO (dollar) Proportional Difference Scheduling Proportional Difference Improvement Current Cost Scheduling (dollar) Current Cost Improvement Caltech 2196.11 2195.31 0.04% 2080.90 5.54% Florida 2390.26 2389.41 0.04% 2315.11 3.25% MIT 2682.98 2688.42 -0.20% 2565.85 4.57% Nebraska 3508.92 3512.12 -0.09% 3393.11 3.41% Purdue 4265.40 4270.06 -0.11% 4070.84 4.78% UCSD 1276.82 1279.18 -0.18% 1242.05 2.80% Wisconsin 3311.29 3317.11 -0.18% 3212.40 3.08% Average -0.10% 3.92%

(33)

CHAPTER 8

Discussion

8.1 Evaluation of the Simulation

By using a simulation in order to investigate the impact of certain factors, parameters or algo-rithms, the main questions that arises is how these results should be interpreted. We have made some assumptions that might influence the outcome of the experiments and we should analyse their impact before making any conclusions from the results.

The most important assumption was that the energy consumption of a CPU increases linearly with its load. Even though this might not always be exactly the case, it is very close to linear increase in reality. Similarly, the statistics of the processor set we used might also not be very accurate. Although we chose a representative set of processors that contained processors used commonly in different sites, it deviates from the sites as all sites differ and therefore may have an impact on our test results. These assumptions could have a significant impact on our test results. However, the assumptions have a similar impact on all experiments, making the relative results still realistic. For instance, it might be that the actual power consumption of a site is twice as high because the site contains more processors, but that would still mean that an energy consumption reduction of ±7.0% could be achieved. In addition, the only thing that matters for the scheduler is the ranking of processors and jobs. As the scheduler schedules based on this ranking, only the relative performance and energy consumption applies, rather than the processors exact characteristics.

Another problem that might impact our results, is that we implemented a batch system to process the queue, rather than a realistic dynamic queue. Unfortunately, it is impossible to create a queue that has a dynamic size based on the amount of jobs that are initiated because the logs are only from jobs that are already executed. By using a static queue, it might be that the scheduler had in some cases a different amount of scheduling options and therefore perform differently than it would with a dynamic queue. As a result, it might be that our results might be slightly off, but this would not make a significant difference because the number of scheduling options should not differ too much as the scheduling options of dynamic queue can be limited in order to prevent starvation. Although there are some inaccuracies in our experiments, these have little to no impact on our conclusions as they do not influence the percentual improvements of our proposed algorithms.

8.2 Evaluation of Results

The energy consumption experiment, described in chapter 6, tries to calculate the performance and energy efficiency improvement when using the CPU scheduling and the energy scheduling algorithm, compared to the standard FIFO scheduling. From both figure 6.4 and figure 6.6, it becomes very clear that scheduling based on CPU performance and on energy consumption both significantly improve the performance and energy consumption of the site. An overview of this

(34)

improvement is given in table 8.1.

CPU Scheduling Energy Scheduling Average Energy Consumption Improvement 6.74% 7.07% Average Performance Improvement 3.44% 2.95%

Table 8.1: Average energy and performance improvement compared to FIFO scheduling

Even though it is clear that FIFO scheduling is far from optimal, it is less obvious which of the other two algorithms performs the best. In general, the best scheduling algorithm would be based on the priority of the site admin, whether it is energy efficiency or performance. However, as the difference of energy improvement between CPU and energy scheduling is smaller than the difference for performance improvement, the net gain is better for CPU scheduling: 10.18%. On the contrary, the total percentage of improvement for energy scheduling is 10.02%. In addition, the priority for the sites of the WLCG is on performance, making CPU scheduling the most optimal choice in most cases. The small difference between the CPU and energy scheduling is in line with our expectations, as it indicates that the rankings of processors for both algorithms are very similar. The main reason for this is that the CPU ranking based on performance and on energy efficiency is very similar because both factors have been increasing as mentioned in chapter 6. Therefore, the most energy efficient processor is also the best performing one in general.

Proportional Difference Scheduling Current Cost Scheduling Average Energy Cost

Re-duction

-0.10% 3.92% Table 8.2: Average energy cost reduction compared to FIFO scheduling

In addition to efficiency, we also studied at reducing energy costs. From table 8.2 it becomes very clear that current cost scheduling has a positive impact on the energy costs, while cost proportional scheduling has not. However, the energy cost reduction is only 3.92% compared to normal FIFO scheduling. As this is less than the energy saved when scheduling for energy efficiency done in our previous experiment, it is not worth to schedule according to energy price. A larger difference in energy price will most likely increase the energy cost reduction and making it more useful however.

These experiments demonstrate that it is possible to make significant accurate predictions re-garding a jobs characteristics, based on prior jobs executed by a specific author. Even though the predictions are not very precise, they are accurate enough to provide an opportunity to improve the performance and energy efficiency of the WLCG. However, by improving the predictions algorithms, the performance and energy efficiency improvement could be enhanced in the fu-ture. Different tests of possible optimisations indicate that all jobs should be taken into account when calculating the estimate and recent jobs should have a major impact on the job predictions.

(35)

CHAPTER 9

Conclusions and Future Work

It is very clear that an improvement of energy efficiency is needed in order to maintain affordable energy costs while keeping up with the compute demand for the WLCG. In our introduction, we stated the following research question and sub-questions:

Can scheduling using the heterogeneity of the architecture of the WLCG lead to an increase of energy efficiency?

1. How can predictions based on computing patterns be used to improve energy efficiency of heterogeneous architectures and if so, is this improvement signifi-cant?

2. Is it worth to transfer workload to data centres located where energy costs are low?

With regard to the first sub-question, we showed that by using the job history of a specific author we can build a prediction model that is able to make significant accurate predictions about a job’s characteristics. Our experiments indicate that the characteristics of a job fluctuate a lot and any job prediction model of the WLCG should therefore focus on the latest executed job. Using the prediction model we have showed that it is possible to significantly reduce the energy consumption of processors by 6.74% to 7.07% and to improve the performance of the WLCG by 3.44% to 2.95%, for energy and performance scheduling respectively.

Regarding the second sub-question, we explored the possibility of reducing the energy costs by scheduling CPU intensive jobs to sites with lower energy prices. Although our experiments show that this could reduce the energy costs by almost 4%, this reduction heavily depends on the actual energy prices of all the sites. Adding sites with a larger difference in price therefore have more potential to reduce the energy costs.

Overall, it is clear that considering the jobs and processor as heterogeneous leads to an in-crease of energy efficiency and performance. We have now only considered the existing WLCG as heterogeneous, comprising of only Intel x86 processors. With the possibility of new types of hardware, such as GPU’s and ARM processors, to be added to the grid in the future, we expect that difference in performance and energy efficiency will greatly increase between the types of hardware. This increase will be beneficial to the effect of any scheduler based on heterogeneity of jobs and processors. As the effect of such schedulers will improve in the future, more research should be done in order to implement such schedulers in the future.

For more accurate and realistic results, further studies should use actual processor sets from multiple sites. In addition, energy prices for all sites should be acquired and used to run the simulations with, instead of only the energy price of one. By using real data, a comparison of scheduling algorithms can be made to optimize the efficiency of the WLCG by running the best performing. In this thesis, we presented a prediction model and further development of this model will increase the efficiency improvement of the algorithms. Next to the prediction model, all scheduling algorithms proposed in this thesis should also be developed further in the future.

(36)

Additional opportunities for improving the energy efficiency and performance of the WLCG lie within scheduling not only according to CPU efficiency, but also for other types of efficiency such as I/O. Jobs that are very dependent on I/O could be scheduled with less intensive I/O jobs because they maximize I/O utilization while not block each other too much. Similarly to CPU efficiency, performance of I/O also plays a role, as it would be better to schedule a read intensive job on a CPU with access to a solid state drive, rather than a hard disk drive.

(37)

Bibliography

[1] Hepspec06 benchmark. http://w3.hepix.org/benchmarks/doku.php/.

[2] Processor spreadsheet. https://docs.google.com/spreadsheet/ccc?key= 0AvE7aiWBwKzWdHl4MVpSZTRBcjktdXBqWlFhcnZrVmc&usp=sharing#gid=0.

[3] Processor statistics. http://www.spec.org/power_ssj2008/results/power_ssj2008. html.

[4] Simulation source code. https://github.com/rubenjanssen/WLCG-Simulation. [5] Wlcg website. http://wlcg.web.cern.ch/.

[6] David Abdurachmanov, Kapil Arya, Josh Bendavid, Tommaso Boccali, Gene Cooperman, Andrea Dotti, Peter Elmer, Giulio Eulisse, Francesco Giacomini, Christopher D. Jones, Matteo Manzali, and Shahzad Muzaffar. Explorations of the viability of arm and xeon phi for physics processing.

[7] David Abdurachmanov, Peter Elmer, Giulio Eulisse, Paola Grosso, Curtis Hillegas, Burt Holzman, Sander Klous, Robert Knight, and Shahzad Muzaffar. Power-aware applications for scientific cluster and distributed computing.

[8] David Abdurachmanov, Peter Elmer, Giulio Eulisse, and Shahzad Muzaffar. Initial explo-rations of arm processors for scientific computing.

[9] Rafael Vidal Aroca and Luiz Marcos Garcia Gonalves. Towards green data centers: A comparison of x86 and arm architectures power efficiency.

[10] Luiz Andre Barroso and Urs Holzle. The case for energy-proportional computing. 2007. [11] Dr. Peter Elmer. Internal energy consumption and cost spreadsheet of the princeton power

plant. Email communcation.

[12] Peter Elmer, Robert Knight, Curt Hillegas, Paola Grosso, Sander Klous, Giulio Eulisse, Burt Holzman, and David Abdurachmanov. Power-aware applications for scientic cluster and distributed computing. http://indico3.twgrid.org/indico/getFile.py/access? contribId=163&sessionId=46&resId=0&materialId=slides&confId=513.

[13] Jonathan G. Koomey, Stephen Berard, Marla Sanchez, and Henry Wong. Implications of historical trends in the electrical efficiency of computing. IEEE Annals of the Histroy of Computing, 2011.

Reducing the energy consumption and costs of the WLCG by predicting job CPU efficiency

Bachelor Informatica