Worst Case Execution Time on Real-Time Systems

(1)

Bachelor Informatica

Worst Case Execution Time

on Real-Time Systems

Dimitri Belfor, 10004234

June 8, 2016

Supervisor(s): Andy Pimentel and Ademar Lacerda

Inf

orma

tica

—

Universiteit

v

an

Ams

terd

am

(2)

(3)

Abstract

Design space exploration is an engineering tool used by designers to develop their prod-ucts. The Sesame framework is a design space exploration tool for embedded systems. It is developed at the UvA and is currently not worst case execution time aware and executes its simulations with arbitrary program input.

The plugin described in this thesis attempts to extend the framework by generating worst case input for the applications simulated by the framework. This input can then be used to simulate the worst case execution time. Thereby making the framework worst case execution time aware and allowing developers to draw more informed conclusions from their analysis.

The plugin is fine-tuned for a specific application and uses heuristic methods to estimate its worst case input. More research is needed in order to automatically adapt it to general applications and improve overall performance.

(4)

(5)

Introduction

In this day and age, embedded systems play a critical role in our day to day lives. For example, over ten billion CPU cores were sold in 2011 [8]. The design process of these systems is not to be underestimated and it entails many difficulties. The focus of this work will be on the development of a worst case execution time (WCET) input generating plugin for the Sesame framework. The Sesame framework is an embedded systems design space exploration (DSE) tool developed at the UvA. The tool discussed in this work will focus on real-time systems.

1.1 Real-Time Systems

Real-time systems are a type of embedded systems where deadlines play a crucial part in the performance and successful execution of the system. In the domain of real-time systems there is a distinction between hard real-time deadlines and soft deadlines. A system with hard real-time deadlines will consider a violation of the deadline a critical failure of the system. For example, if the automatic breaking system (ABS) of a car fails to meet its deadline this could result in a car crash.

On the other hand, soft deadlines can withstand not being met for a few iterations. The performance of the system will dwindle but the system will not consider missing the deadline a critical failure. For example, if a video streaming service fails to fetch enough data to construct a frame this frame will simply not be shown. This results in a frame rate drop but the system can still fetch new data and continue streaming.

Real-time systems have to be designed with hard real-time deadline constraints in mind. The systems designers have multiple ways of analysing the performance of a real-time system. One such way to analyze the performance of a real-time system is by approximating the worst case execution time (WCET) [10]. This is usually done by attempting to determine an upper bound of the execution time with either static analysis or measurement based techniques.

1.2 Design Space Exploration

Embedded systems designers have to keep many objectives like the performance, power, pro-duction costs, battery lifetime and surface area in mind when designing a system [8]. Most of these objectives are conflicting because they come at the expense of each other. For example, higher performance will usually come at the price of higher power consumption. The designer will attempt to construct a system with the lowest possible production and energy costs while still being able to optimize the other objectives.

The full set of possible system designs is one part of the design space. The possible mappings of applications and processes on the components of the architecture encompass the other part of the design space [8]. By keeping the design space as small as possible the process of designing embedded systems can be accelerated. Systems designers can use the WCET while designing a real-time system to ensure the required deadlines can be met. If they use the WCET they can

(8)

prune the design space by not taking the systems that would not be able to meet the deadline in consideration. An example of a framework constructed for DSE is the Sesame framework. This framework is currently unaware of the WCET of the applications it is simulating because it only uses arbitrarily chosen data. The framework will be described in depth Chapter 2.

1.3 Research Question

This leads to the following research question: How can real-time awareness be implemented in the Sesame DSE framework?

1.4 Thesis Outline

First the the current state of the art on WCET analysis and the Sesame-framework will be de-scribed in Chapter 2. Chapter 3 will describe the implementation of our WCET input generating approach for the Sesame framework. The experiments and results will be described in Chapter 4. We will draw conclusions and motivate further research in Chapter 5.

(9)

CHAPTER 2

Background

This section will describe the state of the art of worst case execution time research and discuss the analysis methods currently in use.

2.1 Worst Case Execution Time Research

Worst case execution time research can be split up in two categories: (i) static analysis of the source code (src) and (ii) measurement based techniques. Both research fields try to determine an upper bound of the execution time, but they each have different strengths and drawbacks. This chapter will first describe the categories in more depth and then give an example of each.

2.2 Static Analysis Methods

Static analysis methods attempt to estimate the WCET by analyzing the src instead of executing it on the hardware. These methods require an abstract model of the architecture and will produce an upper bound to the execution time of the application on the given architecture. This will be a safe estimation because the execution time is guaranteed to not exceed the calculated upper bound.

While these methods have desirable results, implementing them has a few drawbacks. Some of these drawbacks are: (i) the implementation of a static analysis tool takes a long time because it is a complex process, (ii) existing tools cost money or aren’t suited for the research in this paper, (iii) they require a lot of information directly from the developer, for example the upper bounds of loops, and (iv) they require a lot of computing power. Upper bounds of loops are hard to determine beforehand[2] and using additional tools would take a long time to set up.

2.2.1 Data Flow Analysis

An example of a static analysis method is data flow analysis. The goal of this analysis is to gather information about possible execution paths and use this information to calcualte the execution time[10].

Data flow analysis consists of three phases: (i) a flow analysis of the src, (ii) an analysis of the hardware and (iii) an execution time calculation phase[6]. In the flow analysis phase the application will be broken down into instructions or basic blocks and ordered in a directed graph. The edges in this graph represent the paths from an instruction or basic block to another instruction or basic block which will be considered the successor. Branch instructions will have edges to multiple successor nodes and loops will have edges to themselves. The src usually has to be annotated by the programmer so the information about the control flow can be extracted afterwards.

After constructing the flow graph the atomic machine instructions are simulated on a model of the target architecture[7]. The cost of the instructions can then be linked to the instructions

(10)

specified in the flow graph and will be used to deduce the critical path in the graph. Most commercial static analysis tools use data flow analysis in combination with other static analysis methods to give a safe upper bound with a low error margin.

2.3 Measurement Based Techniques

Measurement based techniques are the counterpart of static analysis. These techniques generate a subset of the application’s execution times by executing an application on an architecture and measuring the elapsed time or total clock cycles. The code can either be run on a physical representation of the architecture or it can be simulated. These methods will produce a non-safe estimate of the WCET because they only consider a subset of the possible program inputs applying heuristics and statistics.

2.4 The Sesame Framework

The Sesame framework is a DSE simulator developed at the UvA[4].

The original goal of the Sesame framework was to explore the design space of Multiprocessor System-on-Chips (MPSoCs) using generic input. The Sesame framework can simulate the per-formance of a given architecture and simulate an application by trying out different mappings of processes on the hardware.

The Sesame framework uses Y-chart exploration to traverse the design space. The Y-chart exploration method is an iterative way of refining the design of an embedded system [8].

This exploration method breaks the mapping up in three different layers along three axes forming a Y-shape, see Figure 2.1. The system model is separated into an application, archi-tecture and mapping of the application onto the archiarchi-tecture components to perform the DSE based on an analysis of this model.

Application Model Mapping Architecture Model Performance Analysis Performance Numbers

Figure 2.1: Y-chart design method[8].

The Sesame framework will use the information provided in these layers to simulate the ex-ecution of an application on a specific hardware. By splitting the application, architecture and mapping into different layers, changing a component in one of the layers can be done indepen-dently without affecting the other layers. See Figure 2.2 for a possible mapping of a simple application on a simple architecture[8]. The framework analyses the performance of an applica-tion with trace-driven simulaapplica-tion. This means the applicaapplica-tion model will generate traces of the annotated functions in the processes and calculate the number of cycles by accounting for the performance consequences of the trace events in the architecture model.

(11)

Figure 2.2: (a) Mapping an Application onto an architecture model. (b) Sesame’s three-layered structure as described above[4].

Application Model

The application model contains a description of the functions and behavior of an application. This is done with Kahn Process Networks (KPN). A KPN is a network of concurrently running processes that can only communicate over Kahn channels[8]. These Kahn channels are unbounded one-directional FIFO buffers that are operated upon by read and write operations.

There is no shared memory representation available in the application model which means all communication goes over the Kahn channels. However, these connections can be mapped to other communication mechanisms (e.g. shared memory, Network-on-Chip, etc.) in the architecture model.

Because the Kahn channels are unbounded the write operations are non-blocking. The read operations are blocking because the data required for the reads might not have been written to the channel yet.

(12)

The Sesame framework uses KPNs because the applications simulated by the framework are mostly multimedia oriented applications with a data-flow style of processing. This means the data will be passed from one process to another and the processes will manipulate the data before passing it to the next process[8].

The src has to be annotated to map the read, write and execute operations to the architecture model. Sesame supports four different event annotations in the src, namely:

• Read(channel, size) • Write(channel, size) • Execute(opcode) • Quit

The arguments specified in parentheses are used to describe the details of the operation. The read and write annotations take both the Kahn channel the data is read from or written to and the size of the read/write operation. The execute annotation takes the name of the operation that has been performed. The quit annotation signals the end of a data stream.

Architecture Model

The architecture model is used to describe the hardware components of the system. These com-ponents can communicate with remote procedure calls that are resolved in the Pearl programming language. This is a C-based discrete event simulation language with integrated communication and synchronization primitives [8].

The Pearl compiler will generate a binary that can process the event traces generated by the application model. The result of processing the event traces will be the system wide statistics (e.g. the execution time)[8]. This means the traces generated in the application model will be assigned to event times specified in a look up table in the architecture model.

Mapping Layer

The mapping layer describes the allocation of processes on the available hardware components. The mapping layer also schedules all events on the architecture model (i.e. when there are multiple processes mapped to a single processor in the architecture model, these processes have to be scheduled).

The Pearl binary described in the architecture model uses both the mapping layer and ap-plication model to determine the system-wide statistics of a specific mapping [8].

Simulation Analysis

The framework will return multiple metrics after running the simulation. Currently the number of clock cycles each process was executing and the energy costs of each process are returned but the cost of each hardware component can also be taken into account.

These metrics are then processed by a genetic algorithm (GA) in order to find the best mapping of processes on hardware components.

Optimization

Optimizing the mapping is considered a Multi-Objective Optimization Problem (MOP) because we have to keep track of three performance measures, namely the execution time, the energy consumption and the hardware cost. The Sesame framework uses GAs to explore the space of mappings of processes on architecture components.

The framework currently only uses arbitrary input when it is simulating the performance of an application, which means the framework is not WCET aware yet. To make the framework WCET aware we could either find the critical-path using data flow analysis and calculate the WCET or generate program inputs that execute the critical-path and run the simulation with the WCET input.

(13)

2.5 Heuristic Search Space Exploration

Various methods have already been constructed, mainly in the domain of software engineering, to generate test data or input that attempts to traverse all possible execution paths in an ap-plication. These methods can also be used to determine the WCET path or critical path of an application[1].

One such method is using GAs to search a large number of possible combinations of input. GAs are able to maintain a diverse population while still being able to search through a huge number of possibilities. GAs are incremental optimization algorithms that utilize evolutionary based mechanisms.

A GA starts with an initial population consisting of individuals which are described by chromosomes. These chromosomes consist of a set of genes which are used to describe the specific properties of the individual. The initial population will be considered the first generation and the GA will attempt to generate successive generations consisting of fitter individuals by applying heuristics also found in nature.

The heuristics in the optimization process are based on the evolutionary principles of selection, recombination and mutation [9]. Each successive generation is evolved from the chromosomes of the previous generation with the highest fitness.

The basic procedure of a GA can be seen in Figure 2.3. Initial Population Calculate Fitness Selection Crossover Mutation Terminate? No Yes

Figure 2.3: A flowchart with the phases in a GA.

There are three different Genetic Operators (GOs) that are based on evolutionary mech-anisms in nature. These GOs are: (i) crossover, (ii) mutation and (iii) selection.

Crossover is the GO used to diversify the orderings of genes in the chromosomes in a popula-tion. For example, in one-point crossover a random position is selected within two chromosomes.

(14)

Every gene before this position is swapped at the chromosomes for the offspring resulting in two new chromosomes, see Figure 2.4. No new genes are introduced to the population with this GO.

1 2 3 4 5 6 7 8 9 10

1 2 8 9 10 6 7 3 4 5

Parents

Offspring

Figure 2.4: One-point crossover with parents in the top row and offspring in the bottom row. To introduce new genes to the population there is also a chance one of the chromosomes mutates. This means part of the genome is randomly altered, as can be seen Figure 2.5.

1 2 3 4 5 1 2 18 4 5 Parent Offspring

Figure 2.5: Mutation with the parent on the left and the offspring on the right.

The GA can either terminate when the local maximum converges, which means the changes between generations are not significant anymore, or it can stop running after a preset number of generations.

(15)

CHAPTER 3

Implementation

This chapter describes the implementation choices that had to be made and describe the final implementation of the WCET input generating plugin.

Before the WCET generating plugin for the Sesame framework could be constructed the analysis method first had to be decided upon. A decision had to be made between using the data flow analysis of the application or using GAs to approach worst case input.

3.1 Data Flow Analysis

To find the WCET with the critical-path we could have used data flow analysis at a higher abstraction level (i.e. at the src level). Data flow analysis is usually done at the machine instruction level. The data flow analysis of the application would be combined with the traces generated by the annotations in the code. The Sesame framework can relate the annotations in the code to execution times specified in XML-files.

These could then be used to calculate the total execution time of a path in the flow graph. This would limit the estimation of the WCET to the granularity of the annotations in the src. If the programmer decided a few of the instructions weren’t costly enough they would not annotate them. This would result in the information not being used in the data flow analysis.

By doing a data flow analysis of the annotated src we would only get the critical paths per process of the application. Figuring the critical path of the whole application would be a non-trivial problem because we would also have to take the communication between the processes into account. There is no way to determine the time a process waits for data to read before we know the kind of input that’s being processed.

Using a machine instruction level implementation of data flow analysis would be the best choice for finding the critical path in an application because it would always generate a safe estimate of the upper bound of the execution time per process. However, this means we would still have to find out a way to take the cost of communications per process into account.

It was a time consuming task to get the data flow analysis working together with the Sesame framework because most tools only do their analysis on the machine instruction level. Various free tools were tested but none were suited for the flow analysis of multi-process real-time systems out of the box. The various open-source tools did not produce the desired output that could be related with the annotations in the src and they were too complex to rewrite.

Writing a tool from scratch would have been a task too complex for the intended workload. We decided to apply GAs as a more feasible alternative.

3.2 Input Generating Plugin

The WCET input generating plugin implemented in the Sesame framework will use GAs to automatically determine input that produces the WCET.

(16)

Application Methods Chromosome Template Genetic Algorithm

Figure 3.1: A diagram of the plugin with the user generated part in red.

Because the GA uses heuristics to approach the WCET there is a chance it will converge to a local maximum after a few generations. To lower the chances of finding a local maximum the NSGA2 selection method was chosen. This is an elitist, tournament based selection method aimed at also keeping a diverse population of chromosomes[3].

The input [stream] of the applications will be mapped to the chromosomes and the individual input data will be mapped to the genes. The fitness of a chromosome is defined as the total execution time of the application with the specified input [stream].

The plugin was written in a modular fashion so that it can easily be altered manually to work with other applications.

To achieve this, the plugin consists of three separate parts: (i) the main input generating plugin, (ii) the chromosome template and (iii) the application specific methods. The input generating tool and chromosome template stay the same for all applications. The chromosome template is used to specify what methods have to be implemented for the specific application. The user will write application specific methods which will then be used by the main genetic algorithm to estimate the WCET and return statistics, see Figure 3.1.

3.3 Genetic Algorithm

The genetic algorithm used in the plugin is based on the DEAP framework[5]. This framework was designed to be explicit and give the user enough tools to design their own GAs. This framework consists of two structures: the creator and the toolbox.

The creator module is a meta-factory that allows the user to create their own classes via inheritance and composition[5]. This allows us to construct our own chromosomes and specify the type of genes in the chromosomes depending on the required input for the application. It also allows the user to define their own fitness criteria which means the plugin described in this paper can be extended to also generate the best case execution time input.

The toolbox is a container for the operators the user wants to apply to the chromosomes. This means the user specifies the crossover, mutation and selection method required beforehand. The operators used in the plugin are described below.

The GA used in the plugin is called a (µ + λ) algorithm. In this case µ would be the total size of the population and the λ is the number of offspring generated each iteration of the algorithm. The total population is kept at µ with the elitist tournament selection method which removes the individuals with the lowest fitness[3].

3.4 Sesame Output Parser

To analyze the fitness of a chromosome the output of the Sesame framework first has to be parsed. We are currently only interested in the total execution time of the application. The total execution time in clock cycles is specified in the last line of the output so this has to be parsed and used as the fitness of a test run.

(17)

3.5 Application Specific Implementations

At the moment, only the Motion JPEG (MJPEG) example application of the Sesame frame-work is supported by the plugin. This application reads in a stream of input frames split in their R,G and B values and constructs the separate JPEG frames from this stream, see Figure 3.2 for the interaction between this application’s processes.

The application tries to maintain a steady bitrate by tweaking the quality of the frames while it is converting the stream of frames. The Control process can put the whole application on hold if the bitrate gets too low and tweak the quality of the frames which ensures the bitrate stays high enough in the long term.

Figure 3.2: A diagram of the interaction between the MJPEG processes.

The specific plugin methods for this application describe an interaction process as can be seen in the diagram in Figure 3.3.

1 2 . . . 10 11

Input

Chromosome

Operators

Figure 3.3: Interaction between input, chromosomes and genetic operators.

Chromosome to Input Mapping

The input is a database of possible input images separated in their R, G and B channels. An MJPEG input chromosome consists of a preset number of genes depending on the number of input frames. These genes are integers which range from 0 through the maximum number of images. Each integer is mapped to the index of an image in the database.

The plugin will copy all the required images to the input folder of the Sesame application. The Sesame simulation will then be executed and the fitness will be calculated.

Genetic Operators

The operators folder specify the genetic operators that will be set in the DEAP toolbox. The implemented crossover operator is a one-point crossover which switches the genes of two chro-mosomes from a random point in the chrochro-mosomes.

(18)

The implemented mutation operator will replace one or more genes in the chromosome with a random number from the minimum image index through the maximum image index. By mutating just one gene the convergence will be slower initially but we expect the algorithm to converge faster when it is close to an optimum as opposed to changing a large number of genes at a time.

(19)

CHAPTER 4

Experiments

This chapter describes the experiments done to test the GA and the experiments done to estimate the WCET input.

4.1 Validity Tests

To estimate the expected number of chromosome changes in one execution of the GA we con-structed the following formulae:

Ncross= Ngens· λ · Pcross (4.1)

Nmut= Ngens· λ · Pmut (4.2)

We expect the number of changes per execution to be dependent on the number of generations (Ngens) the algorithm runs for times the number of individuals either crossed over or mutated

each generation (λ · Pcross and λ · Pmut).

We assume the changes made by crossing over two chromosomes (4.1) and by mutating one chromosome (4.2) are decoupled because they don’t influence each other. Crossing over and mutating the chromosomes influence the changes in a similar way, we arbitrarily fixed these probabilities at 0.5 throughout the experiments.

Combining the formulae gives the total number of expected changes in one execution of the genetic algorithm:

Nchanged= Ncross+ Nmut (4.3)

We first tested whether the GA would be able to find the worst case input on a small subset of the images. We experimented whether the GA would be able to find the worst case input of the stream of eleven frames and a database consisting of just two images. In this instance, the space consists of 112_{= 121 different combinations, a relatively small number that could even be fully}

explored with an exhaustive search. Therefore, the population is expected to converge quickly. If the maximum and average execution time become equal and the standard deviation becomes zero, this means the population has converged to a local maximum of the worst execution time. At the next step, we experimented with a database consisting of eleven images. In this instance, there were 1111 ≈ 2, 85 · 1011 _{different combinations to be explored. We expect the}

population to converge slower in these experiments.

The final test was run on a database consisting of 38 images with a total of 1138_{≈ 3, 74 · 10}39

different combinations. Exploring this space of combinations would not be practical by an ex-haustive search.

This plugin’s execution is computationally expensive. Optimizing the parameters used in the GA (i.e. experimenting to find the optimal λ, Pcross, and Pmut) could speed up the convergence

(20)

of the method. In view of timing constraints, these parameters could not be properly calibrated. We arbitrarily decided upon a Npop of 20 individuals and a λ of Npop· 0.8 = 16. This choice

allows for a larger number of generation steps to be executed in the same time as executing the plugin with a large Ngens.

4.2 Results

The results were found to be promising. We were able to find an input set resulting in a local maximum of the WCET in most of the experiments. The graphs in this chapter show the maximum execution time per generation in blue and the average execution time per generation in red (with the standard deviation as error bars).

First we looked at the tests with the database of two images. This showed us our initial suspicions were right, the algorithm was able to find a WCET input. As can be seen in Figure 4.1, it took ten generations to find an optimal solution when we ran the plugin for ten generations. In this instance we could not yet observe convergence of the population.

Even though the WCET was found in less generation steps with higher Ngens, running the

plugin with larger Ngenshad no influence on finding the WCET faster. This behaviour was

con-tributed to the stochastic behaviour of the GA. The larger Ngenstrials were lucky and randomly

started with a fitter individual in their initial population. Running for a larger Ngens did give

the algorithm more time to converge to a local maximum. When running the plugin for twenty and fifty generations we managed to find the local maximum at two and three generations re-spectively, see Figure 4.1. In these trials, the algorithm converged to a local maxium of 9.39 · 107 at 8 generations. This is comparable to an exhaustive search for the solution. The WCET input consisted of the sequence [1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2], see Figure 4.2 for the images.

(21)

(a) Ngens= 10, max (9.39 · 107 cycles) at 10 gens. (b) Ngens= 20, max (9.39 · 107 cycles) at 2 gens.

(c) Ngens= 50, max (9.39 · 107 cycles) at 3 gens.

Figure 4.1: The performance of the plugin with a database of two images. The number of generations is plotted on the x-axis and the performance of the system (in total cycles) is plotted on the y-axis.

(22)

(a) Image 1 (b) Image 2

(c) Image 3 (d) Image 4

Figure 4.2: The images resulting in a WCET.

We expect a larger database, of eleven images, to converge slower than a smaller database due to the expanded space of possibilities.

(23)

(a) Ngens= 10, max (6.89 · 107 cycles) at 8 gens. (b) Ngens= 20, max (8.10 · 107 cycles) at 17 gens.

(c) Ngens= 50, maximum (9.03 · 107 cycles) at 42 gens.

Figure 4.3: The performance of the plugin with a database of eleven images. The number of generations is plotted on the x-axis and the performance of the system (in total cycles) is plotted on the y-axis.

As can be seen in Figure 4.3 the population doesn’t converge when the GA is run for ten or twenty generations. At the fifty generations trial, the population still did not converge. The maximum execution time input found in the fifty generations trial consisted of the sequence [3, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2], see Figure 4.2 for the images. Which follows the same pattern as the one found in the experiments with a smaller database. This might indicate it is [close to] the actual WCET input.

4.3 WCET Input Generating

Alongside these tests on the general performance of the plugin, we also ran one big trial in the background. This was an experiment with a database consisting of 38 images for a hundred generations. We chose to do this once because running an experiment for this many generations was a very computing intensive process which took over sixty hours. We did not expect the plugin to find a local maximum but we hoped it would approach a WCET larger than the ones we had previously found.

As can be seen in Figure 4.4 the GA found a local maximum at 74 generations and the population even converged to this local maximum of 1.01 · 108_{at 82 generations. Showing us the}

(24)

algorithm can even find a maximum execution time on medium sized databases within reasonable time. The WCET input consisted of the sequence [4, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2], see Figure 4.2 for the images.

Figure 4.4: Results of running the GA with 38 images, Npop= 40, Ngens= 100, λ = 32

4.4 Exploration

The generated WCET inputs also showed us some quirks in the performance of the MJPEG application. We expected the WCET input to be a series of the same image because these would be equally hard to process for the application. However, the WCET input we found consisted of one image followed by a series consisting of the same image. For larger sets of images the results might have been different.

We investigated the output logs of the Sesame framework to search for the cause of this worst case performance. We found that the the quality control process caused the whole application to halt. This quality control process attempts to maintain a steady bitrate. However, with a stream consisting of eleven frames halting the entire application to tweak the parameters causes the performance to worsen. At the moment the output of the Sesame framework is hard to read so it was unclear how many times the quality control stopped the whole application to tweak the bitrates.

(25)

CHAPTER 5

Conclusions and Further Research

As can be seen from the experiments, the plugin can find the WCET input for the MJPEG application when a database of images is supplied beforehand. This is a step in the right direction of making the Sesame framework WCET aware. At this moment, only the MJPEG application is WCET aware, the users are required to manually create chromosomes and fine-tune the plugin to also work with general applications.

The results of our experiments showed the plugin could be used in the DSE pipeline of the framework. Therefore, the designers would determine the WCET input of an application and its performance on their system. By doing this, the embedded systems designers could prune their design space by being able to ignore the architectures and mappings that do not meet a predetermined deadline. While the plugin described in this paper can successfully find the WCET for the MJPEG application, there is room for improvement.

The plugin was computationally intensive which means it took a long time for the population to converge with a medium sized database of images. If the designers are in a hurry they would either have to use a small database of images or they would have to run the plugin for fewer iterations. Running the plugin for fewer iterations could result in the population not converging, which in turn could mean the local maximum has not been found. In this case, the designers would have to be satisfied with an approximation of the WCET input.

5.1 Automatic Chromosome Construction

To ensure the plugin can be easily used with all applications without manually having to re-construct the chromosomes and helper functions, we have to think of a way to define and specify the input accepted by the application. This will allow us to extend the plugin with functionality to automatically construct chromosomes for the input data.

This could be achieved by designing an XML-document with specific field for the type of input and the number of input, etc. This document style has been chosen because the Sesame framework already uses XML-files to specify the Y-chart analysis.

5.2 Multiple Levels of Genetic Operators

The plugin could also be expanded to use multiple levels of genetic operators. This would allow the plugin to evaluate the performance with a larger set of inputs than the one initially provided in the database. Which, in turn, would result in a more accurate estimation of the WCET.

In the MJPEG example this would mean the stream of frames would be mapped to a first level chromosome and the individual images would be mapped to a second level chromosome.

The first level chromosome would behave in the same way as described in Chapter 3. But the second level chromosome would manipulate the information in the images themselves. In the MJPEG example, this would mean the individual frames would be crossed over leaving us with a frame where the top half consists of pixels from the first image and the bottom halve consists

(26)

of pixels of the second image. Mutation would mean a subset of the pixels in an image would get random R,G and B values thereby generating new input images, see Figure 5.1.

1 2 . . . 10 11

Initial Input

Chromosome

Operators

Extended Input

Figure 5.1: Two level operators showing two images from the initial database forming a new one in the extended database.

Scalability

Generating new images would possibly influence the scalability of the plugin because the image database could possibly grow too large. This means a garbage collecting module would also have to be constructed. This way the user can specify when to remove an image. For example, the user can specify that images can only be removed if they haven’t been used for a number of generations, thereby ensuring the results of the WCET input generating plugin still make sense.

(27)

Bibliography

[1] Peter Adams. “Heuristic worst-case execution time analysis”. In: Proceedings of the 10th European Workshop on Dependable Computing (1999), pp. 109–114.

[2] Christoph Cullmann and Florian Martin. “Data-Flow Based Detection of Loop Bounds”. In: 7th International Workshop on Worst-Case Execution Time Analysis (WCET’07). Ed. by Christine Rochange. Vol. 6. OpenAccess Series in Informatics (OASIcs). Dagstuhl, Ger-many: Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik, 2007. isbn: 978-3-939897-05-7. doi: http : / / dx . doi . org / 10 . 4230 / OASIcs . WCET . 2007 . 1193. url: http : / / drops . dagstuhl.de/opus/volltexte/2007/1193.

[3] K. Deb et al. “A fast and elitist multiobjective genetic algorithm: NSGA-II”. In: IEEE Transactions on Evolutionary Computation 6.2 (Apr. 2002), pp. 182–197. issn: 1089-778X. doi: 10.1109/4235.996017.

[4] Cagkan Erbas et al. “A Framework for System-Level Modeling and Simulation of Embedded Systems Architectures”. In: EURASIP Journal on Embedded Systems 2007.1 (2007), pp. 1– 11. issn: 1687-3963. doi: 10.1155/2007/82123. url: http://dx.doi.org/10.1155/ 2007/82123.

[5] F´elix-Antoine Fortin et al. “DEAP: Evolutionary Algorithms Made Easy”. In: J. Mach. Learn. Res. 13.1 (July 2012), pp. 2171–2175. issn: 1532-4435. url: http://dl.acm.org/ citation.cfm?id=2503308.2503311.

[6] J. Gustafsson et al. “A tool for automatic flow analysis of C-programs for WCET calcu-lation”. In: Object-Oriented Real-Time Dependable Systems, 2003. (WORDS 2003). Pro-ceedings of the Eighth International Workshop on. Jan. 2003, pp. 106–112. doi: 10.1109/ WORDS.2003.1218072.

[7] Daniel Sandell et al. “Leveraging Applications of Formal Methods: First International Symposium, ISoLA 2004, Paphos, Cyprus, October 30 - November 2, 2004, Revised Selected Papers”. In: ed. by Tiziana Margaria and Bernhard Steffen. Berlin, Heidelberg: Springer Berlin Heidelberg, 2006. Chap. Static Timing Analysis of Real-Time Operating System Code, pp. 146–160. isbn: 978-3-540-48929-0. doi: 10 . 1007 / 11925040 _ 10. url: http : //dx.doi.org/10.1007/11925040_10.

[8] P Stralen. “Applications of scenarios in early embedded system design space exploration”. PhD thesis. The Netherlands: Universiteit van Amsterdam, July 2014.

[9] Joachim Wegener and Frank Mueller. “A Comparison of Static Analysis and Evolutionary Testing for the Verification of Timing Constraints”. In: Real-Time Systems 21.3 (2001), pp. 241–268. issn: 1573-1383. doi: 10.1023/A:1011132221066. url: http://dx.doi. org/10.1023/A:1011132221066.

[10] Reinhard Wilhelm et al. “The Worst-case Execution-time Problem-Overview of Methods and Survey of Tools”. In: ACM Trans. Embed. Comput. Syst. 7.3 (May 2008), 36:1–36:53. issn: 1539-9087. doi: 10.1145/1347375.1347389. url: http://doi.acm.org/10.1145/ 1347375.1347389.

Worst Case Execution Time on Real-Time Systems

Bachelor Informatica