Using Automated Trace Generation for Trace-based Design-space exploration

(1)

Using Automated Trace

Gen-eration for Trace-based

Design-space exploration

Abel Pelser

June 9, 2017

Inf

orma

tica

—

Universiteit

v

an

Amsterd

am

(2)

(3)

examining an application’s execution traces - this is also known as Trace-base design-space exploration. The traces used for this purpose should include those belonging to the pro-gram’s corner cases, to ensure that the system will always meet its requirements, even in the most extreme situations possible. These corner cases should be with respect to various metrics, related to the different ways the application can generate loads on the different hardware components.

This project was not aimed at design-space exploration itself, but rather at the gener-ation of the aforementioned execution traces. A tool chain was written to generate these traces in an automated fashion. Using various third party tools, the software is first anal-ysed to obtain (an estimate of) its corner cases and their corresponding input values, using an approach based on path coverage. These input values are then used to run the software in a simulator, on a virtual ARM-configuration. The result is a Pareto Front of the ap-plication’s corner cases, including the corresponding execution traces. Users need to define which metrics to consider for this Pareto Front themselves; they can select from a large list of statistics generated by the simulator.

We demonstrate that the resulting tool chain works as expected, as long as the program under analysis complies with a certain set of restrictions, and we propose various ways to improve it.

(4)

(5)

1 Introduction 7

2 Background 9

2.1 Metrics to consider . . . 9

2.2 Program analysis . . . 9

2.2.1 Program analysis for this project . . . 10

2.2.2 Path coverage . . . 10

3 Reducing a program’s input-value space 11 3.1 Modifying SWEET . . . 11

3.2 PathCrawler . . . 12

4 Implementation 13 4.1 User input . . . 14

4.2 Obtaining inputs from PathCrawler . . . 14

4.2.1 Extra source file . . . 14

4.2.2 Analysis/instrumentation and generation . . . 15

4.2.3 Translating the parameters to console arguments . . . 16

4.3 Cross-compilation . . . 16

4.4 Generating the execution traces . . . 16

4.5 Collecting and processing the results . . . 17

5 Evaluation 19 5.1 Accuracy . . . 19

5.1.1 Accuracy of finding a program’s corner cases . . . 19

5.2 Ease of use . . . 20

5.2.1 The user interface . . . 20

5.2.2 The environment . . . 21

5.3 The script . . . 21

5.3.1 Robustness . . . 21

5.3.2 Portability . . . 22

5.3.3 Modularity . . . 22

5.4 Limits of the tool chain . . . 22

5.4.1 Limits imposed by PathCrawler . . . 22

6 Conclusions 25 6.1 The goal . . . 25

6.2 Use cases . . . 25

7 Future work 27 7.1 Automating the installation process . . . 27

7.2 Lowering the required user input . . . 27

7.2.1 Automating argument communication . . . 28

(6)

(7)

Introduction

Embedded systems are an important part of the IT industry. They may not get as much atten-tion as consumer laptops and desktops, but they have been ubiquitous in our everyday lives for many years now.

When designing an application for embedded systems, its predictability on various hardware configurations is essential. Not only is it vital to know if a system can actually execute the ap-plication, it may also be important to know its response time. Many embedded systems fulfil an important role in their direct environment, and long waiting times are typically not acceptable. For example, when the Collision Avoidance System in a car has to decide whether or not to brake, the decision has to be made quickly and reliably.

Finding or designing an optimal hardware architecture for an application is known as design-space exploration. A number of tools exist specifically for this purpose. Many of them use execution traces of the application to determine what hardware requirements the program has, and how it will behave on various hardware architectures. The execution traces used for this purpose should not be chosen randomly, however. If we want to guarantee reliability, and a certain upper bound on the response time of the program, the execution traces used for this purpose should include the corner cases of the application.

These corner cases are defined by means of a Pareto front [16]. Given a set of data vectors, its Pareto front is the subset in which none of its members are ‘dominated’ by any other members. A data vector ‘dominates’ another data vector if, for every one of its dimensions, it contains values that are bigger than or equal to the other vector’s values. (As a very brief example, the Pareto Front of the set{(1, 0, 2), (0, 1, 0), (0, 2, 0), (1, 0, 1)} is {(1, 0, 2), (0, 2, 0)}.)

To describe a program’s corner cases, the aforementioned Pareto front is with respect to various predefined metrics, related to what type of instructions a program may execute, and how often. Examples of these metrics are the number of possible memory accesses, or the total number of executed instructions. If an application meets all of its requirements even in these corner cases, it will also meet them in all other possible circumstances. A program’s behaviour is usually dependent on the input given to it, so to generate the execution traces that belong to these corner cases, we first have to derive the input-value combinations that cause them to occur. This thesis describes how to automate the process of deriving these input-value combinations, and generating their corresponding execution traces. The tools that are used to perform the various sub-processes are connected through a program written in Python ([18]). The tool chain takes the source file of a C program, and requires the user to define which metrics (the ‘predefined metrics’ from the previous paragraph) should be used. The result consists of the input-value combinations and execution traces belonging to (an estimate of) the corner cases of the application (best-case or worst-case), together with the Pareto front of the resulting values.

(8)

(9)

Background

In this chapter we discuss the theoretical aspects of the project. The two most prominent topics in this regard are the metrics on which to base the Pareto Front mentioned in chapter 1, and how to find a program’s corner cases.

2.1 Metrics to consider

In chapter 1 it was mentioned that this project focused on finding corner cases with respect to various metrics. There are many metrics we may want to consider, to evaluate a program’s behaviour. Examples of these metrics include (but are not limited to) the numbers of:

• Total instructions • Memory accesses

• Memory accesses in specific parts of the memory • Floating point instructions

• Writes to a shared buffer

The goal of using these different metrics is being able to analyse a program’s behaviour on a certain hardware architecture in great detail. The goal of design-space exploration is finding the cheapest hardware architecture suitable for an application. Because applications can place differ-ent kinds of loads on hardware, and all hardware compondiffer-ents involved should be able to handle these loads even in the program’s corner cases, these different aspects of its behaviour all need to be considered separately. For instance, we need to know if the execution will finish in time, whether or not a shared buffer will overflow and if we have enough memory available at all times. Fortunately, the simulator used in this project collects these statistics (even the more detailed ones) by default, without requiring any extra user input. Users only have to select which statistics they want to consider for a specific run; more about this in section 4.1.

2.2 Program analysis

The total set of possible valid input combinations for a program is known as a program’s input-value space or input space. In theory, finding a program’s corner cases can be done by executing the program with all input combinations in its input space. For most programs however, this is not a feasible approach, because their input space is far too big.

A more intelligent approach is needed. By analysing a program, we can try to reduce its input space to a much smaller subset of candidate input combinations. There are two main

(10)

approaches to program analysis: static analysis and dynamic analysis. Static analysis means analysing a program without executing it, while its counterpart, dynamic analysis, involves repeatedly executing a program and studying its behaviour on the fly. A significant amount of research has been done on static analysis methods for deriving Worst-Case Execution Times (WCET). This topic is related to the analysis required for this project, because it, too, involves finding an application’s corner cases. However, it is different in the sense that we want to consider more metrics than just the WCET. Static analysis is often a desirable choice in this context; measurement-based (dynamic) WCET analysis of embedded software has become very challenging (see [11], Chapter 2).

2.2.1 Program analysis for this project

Ideally, a reduced set of inputs, obtained from whichever type of program analysis is used, should still include the application’s corner cases. Unfortunately this can, by definition, not be guaran-teed for all programs: the problem of finding safe upper bounds for a program’s WCET (or other related metrics) can be reduced to the Halting problem. After all, if it’s impossible to determine whether or not an arbitrary program will run forever, it’s also impossible to derive a Worst-case Execution Time. Tools that derive this kind of information can therefore never be ‘perfect’. However, the type of software typically used on embedded systems is relatively simplistic, and for these applications static analysis tools tend to work well.

There are quite a few program analysis tools available; many of them are aimed at deriving an estimate for a program’s WCET. For this project, that was insufficient, because as mentioned before, execution time is not the only metric we want to consider. However, one possible solution that was considered (and briefly attempted), was modifying an existing static WCET analysis tool to make it suit our needs. This turned out to be infeasible within the constraints of a Bachelor’s thesis.

2.2.2 Path coverage

The eventual solution used for the tool chain was based on path coverage: the concept of using a set of inputs that cover all feasible execution paths. The difference between typical software for embedded systems (especially real-time systems), and regular applications was important when making this decision. Corner cases are often linked to the execution path within the program. But if the program contains loops that are parametric in some input values, the number of possible execution paths is vast, and the computational complexity of deriving the corner cases becomes too high. However, in programs for embedded systems, loop bounds are typically known beforehand. This lowers the number of possible execution paths, making path coverage a usable approach in this context.

Another important difference in this regard is in handling external files. For example, imagine a program that does nothing but take a string of characters, open a file with that name, and print whatever is in that file to the screen. The execution path within the C code will clearly be the same for each file. The application’s corner cases however, depend on how big the files are that it can find/open. Applications for embedded systems typically do not perform this kind of tasks, which makes their corner cases more predictable.

(11)

Reducing a program’s input-value space

We investigated two possible solutions for the problem of reducing the input-value space. The first approach, modifying an existing static WCET analysis tool, was not achievable within the scope of this project. The second one, using a code coverage tool, was successful enough to warrant its use in the end result, although it, too, has its limitations.

3.1 Modifying SWEET

The SWEdish Execution time Tool, or SWEET, is a static WCET analysis tool, built by re-searchers from the Mälardalen University in Sweden [6]. The tool uses a combination of methods to derive a safe, yet tight WCET estimate. It uses Abstract Execution [10] to perform flow anal-ysis and derive various kinds of information about the (ANSI-C) source file, without executing it. What makes this tool especially interesting is the fact that it does not just return a WCET estimate; it is also capable of finding the input values that lead to this WCET estimate. [7]

When deriving a static WCET estimation, at some point SWEET determines what type of instructions will be executed by a program, and it derives bounds on how often. This information is then combined with data on how many clock cycles the various instruction types take (which is hardware-dependent), to form a final WCET estimate. It should be possible to modify the tool in such a way that it treats this information differently, and no longer just outputs a WCET (or BCET) estimate. Instead, it should return a set (the Pareto front) of the Worst-case estimates of the various instruction type counts, combined with the input values that result in these (possible) corner cases. These instruction type counts are closely related to the various metrics we may want to consider, see section 2.1, although they do not necessarily cover all of them. (For in-stance, SWEET can not keep track of which part of the memory is used by a certain instruction.) Here follows an overview of the steps needed to implement these modifications, based on the algorithm and ideas outlined in [7]:

• Instead of WCET or BCET estimates, the tool should keep track of counts per instruction type, and find the Pareto Front of corner cases with respect to these counts.

• During the tree traversal, the ‘priority queue’ should rank tree nodes ([7], section III A) based on whether or not they are part of the current Pareto front.

• All tree nodes that are part of the current Pareto Front should be traversed ([7], section III C), unless their value space partition only holds one input combination.

• When the current Pareto front only consists of nodes, with a value space partition that only has one input-value combination, the search algorithm terminates and returns this set of nodes.

(12)

In theory, changing SWEET to implement this algorithm should be possible. If done properly, the resulting tool might have been a good solution for the problem outlined in section 2.2. Unfortunately, implementing these changes turned out to be infeasible within the scope of this project. SWEET is a complex tool, that requires a significant amount of time to get used to. Becoming familiar with its source code takes even more time - this can be illustrated by the fact that the entire project consists of around 88,000 lines of pure code (comments not included). Therefore, implementing these changes would have required a lot more time than what was available to us, and/or more people.

3.2 PathCrawler

When it became clear that modifying SWEET was not going to be a usable solution, an alter-native was needed. PathCrawler [17] is a tool mainly designed for structural software testing. It uses a combination of static and dynamic analysis techniques [19]. Like dynamic analysis tools, it does compile and execute the program, but it does not use heuristic function minimisation. Instead, it uses a form of constraint logic programming to find all feasible execution paths, and their corresponding input-value domains.

PathCrawler’s goal is to find a set of input combinations that cover all feasible execution paths in the code. In other words, when the program is executed with all of these inputs, all possible paths through the program will have been executed, unless they are unreachable. As discussed in subsection 2.2.2, this approach, ‘path coverage’, results in a good estimate of a pro-gram’s corner cases, especially for the type of software this project was focused on. It should be noted however, that whereas SWEET was designed to find safe estimates, PathCrawler was not. Therefore, the obtained results should be seen as an approximation, rather than a safe upper bound.

(13)

Implementation

The primary goal of this project was to create a tool chain as described in chapter 1. The full pro-cess consists of several parts, for which different third party tools were used. To generate inputs that cover (an estimate of) the program’s corner cases, a path coverage tool (PathCrawler) first analyses the program. Those inputs are then used to run the program in a simulator (Gem5), which results in the traces and other statistics we are interested in. Before this simulator can run the program however, the program first needs to be compiled - and because the simulator used for this thesis simulates an ARM-architecture, the compilation process must target that architecture as well.

A Python script was written to automate the entire process, which executes the third party tools in order. It does so by means of shell commands passed to the (Linux-based) Operating System, collecting the output from every step and using it to generate the input for the next one.

Here follows a step by step overview of the process, as implemented by the tool chain: 1. User input

2. Generating input-value combinations using PathCrawler: • Program analysis/instrumentation

• Changing the default parameter values (for arrays) • Generation of input-value combinations

3. Cross-compilation

4. Execution in Gem5 (while keeping track of the statistics it generates for every run) 5. Processing the results

(14)

Figure 4.1: An overview of the steps in the tool chain, and what they yield.

4.1 User input

Before the script can be run, users need to configure it for their purposes. First of all, a file named <basename>_input.py needs to be created, in which <basename> is the name of the C file under analysis. This file needs to be placed in a designated directory, and it will contain cer-tain Python functions users need to define themselves - more about these functions in section 4.2. The next step is selecting the metrics to be used; this is done by defining counters in a Python dictionary, in the main file of the script. Each counter is defined by a description and a list of strings, indicating which statistics should contribute to the counter. These statistics are referred to by means of their identifiers, as defined by the simulator that generates them (Gem5 - see section 4.4).

The aforementioned strings do not have to match the statistics’ identifiers exactly - a statistic will contribute to a counter as long as at least one of the strings in the counter’s list is a substring of the statistic’s identifier. This was a deliberate design choice: the amount of statistics available is rather large, but most of them can be considered to be part of bigger, more general categories (such as: classes of CPU instructions, or memory-related data). The identifiers of statistics within such categories tend to have large parts in common, meaning that this common part can be used to select all statistics within a category at once.

4.2 Obtaining inputs from PathCrawler

As discussed in chapter 3, the tool used for generating the input combinations was PathCrawler. This section discusses how the script interacts with PathCrawler.

(15)

ments the main function as described above, and a separate file for PathCrawler. This separate file will not be compiled or executed by the tool chain; its sole purpose is to be analysed by PathCrawler. It does not need a main function, but it has to comply with all of PathCrawler’s restrictions. It also needs to contain two extra, user-defined functions to define which code needs to be analysed, and what parameter values it expects.

Extra functions

The extra functions to be implemented in the separate source file are pathcrawler_func and pathcrawler_func_precond. The first of the two is the central function; it should take all rele-vant input parameters and execute any code the user wishes to analyse.

pathcrawler_func_precond should define which parameter values are valid. It should do so by accepting the same parameters as pathcrawler_func and returning a non-zero value if all parameters have valid values, and zero otherwise.

These two functions can also be implemented in the original source file, but are not manda-tory. In the case of pathcrawler_func, doing so may be desirable, since both the original and the modified source file should eventually execute the same code. However, pathcrawler_func_precond is best left out, because it would not serve any purpose.

It should be noted that the names of the two aforementioned functions could have been different: PathCrawler will accept any function for analysis. Its corresponding precondition function however, should follow the convention <name_of_analysed_function>_precond. The names used in this project were chosen because they are descriptive, and eliminate the need for the user to pass a function name, every time the script is run.

Filtering execution paths

PathCrawler provides several functions that can be used to tweak for which execution paths it generates inputs. These functions are useful if one only wants to test a specific part of the code. They were not used during this project, because the aim was always to obtain input values that cover all (feasible) paths.

4.2.2 Analysis/instrumentation and generation

The first step of PathCrawler’s input generation procedure is analysing the source file. During this process, it also implements instrumentation in the code, and it derives default bounds for the values of each parameter. These default bounds only depend on what type the parameters have - for example, for integers they are [−2147483648..2147483647]. These bounds are stored in a file.

Before PathCrawler can generate input values, these default bounds must be changed. For the most part, this is done by the aforementioned precondition function pathcrawler_func_precond, which essentially overrides whatever bounds are defined in the aforementioned file. However, for array parameters, an extra step is needed: the bounds for its dimensions need to be set separately. This is done by the tool chain, in an automated fashion. The user needs to provide the array’s desired dimension range in a comment, in the C file passed to PathCrawler. This comment should have the following syntax: //__DIM_<array_name>[<lower_bound>..<upper_bound>] (note: both the upper and lower bounds are inclusive).

In a similar fashion, users can also opt to limit what values the array may contain, using a similar syntax: //__DOM_<array_name>[<lower_bound>..<upper_bound>] . Bounds for array domains can also be defined in the precondition function; this works equally well, but typically

(16)

requires more code and effort.

The script finds and extracts these user-defined bounds, and uses them to produce a modified version of the default parameters file. This file is then passed to PathCrawler for the next step: generating the ‘test cases’, the resulting input-value combinations, which are stored in XML-files. The tool chain reads these files, extracts the input combinations and stores those in a Python dictionary.

4.2.3 Translating the parameters to console arguments

The final step is to turn this dictionary into a string containing the console arguments. The main function in the (original) C file has to accept these arguments, and turn them into the parameters that are used by the function under analysis. Users need to implement a function format_pc_input in the aforementioned Python file <basename>_input.py. It should take a dictionary, and return a string. Needless to say, the output generated by this function should match the way the program’s main function parses console arguments.

4.3 Cross-compilation

When a program is executed under normal conditions, there is no direct way to track exactly which instructions are being executed by the CPU. To be able to extract execution traces, the software has to be run in a simulator, capable of executing it while keeping track of this in-formation. The simulator used for this project runs the program in a simulated environment, which is different from the actual physical hardware present. To be able to run programs in this different environment, the source code needs to be compiled differently. This is known as cross-compiling: compiling software for a hardware environment that is not the same as the hardware present in the system used to run the compiler. For this project, a GCC (Gnu Compiler Collection, [9]) cross-compiler was used, targeting the ARM architecture used in our simulator. The ARM-architecture was chosen because it is commonly used in embedded systems. The cross-compiler was always run with the -O0 option, which turns off all optimisations. This is a common practice when compiling for embedded systems, since compiler optimisations typically modify the code, which is not desirable since the altered code will behave differently from the original code. This means our code analysis, performed on the original code, becomes unusable and the program’s behaviour can no longer be trusted.

4.4 Generating the execution traces

The simulator used to generate the execution traces was Gem5 [2]. Gem5 is capable of run-ning programs on virtual hardware configurations, including ones that are typical for embedded systems. During simulations, it also keeps track of a large quantity of statistics, describing the program’s behaviour in great detail, and it can generate full execution traces of processor-level instructions if desired.

(17)

capa-By default, the generated statistics and execution traces include some overhead from start-ing the simulator and loadstart-ing the program. This may not be desirable, so Gem5 provides a C interface, which can be used to define exactly at what points (in the C code) Gem5 should start and stop its measurements.

4.5 Collecting and processing the results

When Gem5 has finished running the application, the Python script reads all relevant data (as defined by the user, see section 4.1) from the generated statistics file, and stores it internally. This data is then added to the ‘current’ Pareto Front (the Pareto Front of all data that has been seen so far). If the statistics of a certain run are not part of the current Pareto Front, they are discarded immediately; otherwise they are added. If they are added, and they ‘dominate’ the statistics of an earlier run, both the new and the old (now obsolete) statistics are kept. Obsolete entries are not removed on the fly, because that would be unnecessarily expensive, especially if the Pareto Front is large. When the script is done running simulations, the Pareto Front is ‘filtered’ to remove those obsolete entries, before it is presented to the user.

(18)

(19)

Evaluation

The main aim of this project was building a script/tool chain that makes it easier and faster for a user to execute the process described in the previous chapters. Here follows an evaluation of this tool chain, explaining its strengths and weaknesses.

During the development process, tests and experiments were done using self-written programs and applications from the TACLe-benchmark collection [8]. Initially, the plan was to use more programs from the TACLe-set than what was eventually done; this turned out to be difficult, due to the fact that every TACLe-program had to be rewritten completely to be compatible with PathCrawler.

5.1 Accuracy

5.1.1 Accuracy of finding a program’s corner cases

The method used to estimate the corner cases of a program was based on path coverage. As discussed in subsection 2.2.2, the accuracy and usability of this method largely depends on the type of programs it is used for. But even if we assume that the application under analysis meets the criteria described in that section, we still need to examine the accuracy and precision of this approach. In order to do so, a program was written of which the corner cases were known beforehand. The program was designed to have 10 execution paths, and every execution path was designed in such a way that it was part of the Pareto Front of corner cases. This program was analysed by the tool chain, and the results are shown in Figure 5.1.

(20)

Figure 5.1: The results (a Pareto Front) of using the script on the test program. Note: some bars may look equal to each other, but they represent slightly different values.

As shown by the figure, PathCrawler did indeed generate inputs that covered all 10 execution paths. The script returned the statistics of all of them, since none of them ‘dominate’ any of the others, meaning all paths are part of the Pareto Front. (It should be noted that some of the differences between the statistics are very small; values that appear to be equal to each other, are, in fact, slightly different.) To verify whether or not the tool chain performs consistently, this experiment was repeated multiple times, and the results were the same for every run. The same consistency was also seen when analysing other programs. This demonstrates that PathCrawler successfully finds inputs for all execution paths.

One interesting thing to note in regards to this experiment is that most tested programs (both TACLe-benchmarks, and self-written files) tend to have only one corner case, rather than multiple (let alone 10) - at least when considering the metrics used in this particular experiment. This can be seen as a confirmation of the simplicity of the tested software; this simplicity also applies to most embedded software, as discussed in subsection 2.2.2.

(21)

• -o OUTPUT: Write results to a file, rather than to the console.

• -p, --pathcrawler: Set this flag to use PathCrawler for generating inputs, instead of a user-defined function.

• -f PATHCRAWLER_INPUTFILE: Used to provide the separate source file to be evaluated by PathCrawler.

• -n NRUNS: If a custom function is used to generate inputs, instead of PathCrawler, use this to set the number of runs.

• -l LOOPPATHLIMIT: When using PathCrawler, this limits the number of times PathCrawler will enter a loop in the code. This can be important to limit, to prevent PathCrawler from generating inputs that result in lots of fairly useless execution paths. Can be set to -1, to have no limit at all.

• -c CFLAGS: Used to pass flags to the cross-compiler. An example use would be to select the correct C standard for a certain file. This is not used by PathCrawler, only by the cross-compiler when compiling the original source code.

An example run of the script could look like this:

python script.py c_files/test1.c -p -f c_files/test1_pc.c -c="-std=gnu99" -l 2 This command would use test1.c as the original source file, and use PathCrawler to generate inputs. The file that will be analyzed by PathCrawler will be test1_pc.c, the number of loop iterations for PathCrawler is 2, and the C standard used to compile test1.c is gnu99.

5.2.2 The environment

The tool chain built during this project depends heavily on its environment. The cross-compiler, PathCrawler and Gem5 need to be installed before the script can be executed. Also, when these tools are installed, the commands used to invoke them need to be set/modified, because the names and locations of files and programs may vary between systems. This can be done by changing certain global variables in the script.

5.3 The script

5.3.1 Robustness

Making software robust should always be a key aspect of its development process. In this case, the individual steps of the tool chain have to succeed in order for the script to be able to continue; if they crash, the script should print an error message and terminate. With that in mind, the script is about as robust as it can be. Most error messages that are printed are the ones generated by Python itself when an error is encountered, because those contain the exact information that the user should know.

An exception to this rule is the process of running the program in Gem5. It is possible that some Gem5 runs may crash, while others do not, depending on the input values used. In the case of a crash, an error message will be printed, and the statistics of the run will be ignored (because they will not be available), but the script will not terminate. Instead, it will keep trying to run the program with the remaining input-value combinations, collecting statistics when possible, and produce results based only on the statistics it obtained from the successful runs.

One potential point of failure is the interaction with PathCrawler. When the script modifies PathCrawler’s parameters file, or extracts the generated inputs, it expects to find this data in

(22)

specific locations and in a specific syntax. In theory, it is possible that if PathCrawler produces something completely unexpected, the script will crash at a later stage, or (in hypothetical, and highly unlikely circumstances) return meaningless and flawed results. No syntax check is performed on what PathCrawler produces, because that would be far too complicated of a routine to implement. However, at no point during the testing of the tool chain did PathCrawler return the kind of nonsensical results that would cause these problems to occur.

5.3.2 Portability

As discussed earlier, the script heavily depends on its environment. This means that portability is very limited, because users need to install the third party tools themselves. For some of these tools, the installation processes are not trivial, and may vary across systems. This means that automating these installations was also far from trivial, and therefore this functionality is not included in the script.

It should also be noted that the script is only compatible with Linux Operating Systems, and will not work on Windows or Mac OS X. One reason for this is that it uses Linux commands to communicate with the OS, in order to start the various external processes. Another reason is that PathCrawler also only runs on Linux. If the tool chain has to be run on a Windows machine, this can be done by running a Virtual Machine with a Linux OS.

5.3.3 Modularity

The script is divided into multiple modules, each performing a different task:

• The script itself, which reads user input, calls the other modules and collects/prints the results.

• The ParetoFront class, used to store and maintain the Pareto Front.

• The input module, used to communicate with PathCrawler or user-defined input func-tions. It either calls a user-defined function, or the pc_handler module. The latter invokes PathCrawler to analyse the source code, and generate the resulting input-value combinations; it, in turn, calls the following other modules:

– define_array_limits, used to extract the user input for PathCrawler regarding

input-array dimensions and domains. This module also modifies the default parame-ters file.

– extract_pc_args, which reads and interprets the input combinations generated by

PathCrawler, and stores this data in a Python dictionary.

This division across multiple modules means the code is nicely structured, and therefore easy to read or change. Also, the individual modules could potentially be used for other applications, although their use cases are fairly specific to this project.

5.4 Limits of the tool chain

(23)

• Reading data from the console or external files is not permitted.

• Passing multidimensional arrays to functions can only be done using pointer notation, except for the first dimension:

int func(int **my_array[]) {(...) // For a 3D array.

• All pointers are treated as arrays, regardless of what they actually point to.

• Allocating arrays of variable size, using the array notation, results in a PathCrawler gen-erating a compile-time error, even if the cross-compiler does not. Using malloc to allocate the array still works, however:

void test(int x) {

int a[x]; // Does not work

int *a = malloc(x*sizeof(int)); // Works (...)

Given the type of applications this project was focused on (as discussed in subsection 2.2.2), many of these restrictions are not detrimental to the tool chain’s functioning. Some of PathCrawler’s restrictions overlap with standards that are commonly used when writing C applications for em-bedded systems. An example of this is MISRA C, a set of rules and guidelines published by the Motor Industry Software Reliability Association [15], which is used when developing software for electronic components in the automotive industry. For example, MISRA C also forbids the use of many standard library functions, including the full stdio.h library.

Overall, its restrictions do have a severe impact on the type of files PathCrawler can analyse. Adapting a source file to be compatible with PathCrawler may require a lot of code rewriting -most of which can not be automated easily (more on that subject in subsection 7.2.2).

Another issue with PathCrawler is that it treats all global variables as input variables. In practice, this was by far the most problematic aspect of its behaviour, because many (if not all) test files in the TACLe benchmark use global variables. It should be noted that later versions of PathCrawler accept a flag that turns this behaviour off, but this option was not yet available in the version used for this project.

(24)

(25)

Conclusions

6.1 The goal

The goal of this project was to create a tool that automatically derives the Pareto Front of a program’s best- or worst case corner cases with respect to one or more predefined metrics, and returns the execution traces corresponding to these corner cases. By splitting up this process into several parts, using third party tools for the individual steps and connecting these tools in a Python program, this has been achieved. Within the constraints outlined in subsection 2.2.2 and section 5.4, the script meets the expectations.

6.2 Use cases

The tool chain is useful in situations in which there is a need for calculating the corner cases of software, and/or the execution traces related to these corner cases. Given the effort it takes to set up its environment, it is especially beneficial if these situations occur frequently - otherwise, other solutions might be more time-efficient.

(26)

(27)

Future work

The tool chain can certainly be called functional, and when a user needs the functionality it provides, it can save him/her a significant amount of time. However, there are certain improve-ments that could be made to make it more useful. In particular, we detail three improveimprove-ments for future work: automating the installation process, lowering the amount of work a user needs to put in to run the script on a certain file, and making the script less restrictive in its use. In this chapter we will propose specific improvements, and discuss why they have not yet been developed.

7.1 Automating the installation process

Currently, users need to install all third party tools manually, which can be cumbersome. Au-tomating this process by means of an installation script would definitely enhance the user expe-rience. However, this is not necessarily trivial.

First of all, one of the most important tools, PathCrawler, is not freely available. On top of that, the installation process of this tool varies, depending on what kind of distribution is provided. For instance, the version of PathCrawler used for this thesis encapsulates Frama-C, a framework used for C program analysers. Because of these factors, and because the man-ual installation itself is actman-ually quite trivial, it does not make much sense to try to automate PathCrawler’s installation process.

Installing a suitable cross-compiler is not trivial either. For certain Linux distributions, pack-ages from the standard Apt package management system can be used, making it easy to get a compiler working. However, for distributions using a relatively new kernel, these cross-compilers do not work - the kernel they will be targeting is too new for Gem5’s standard ARM environment. To tackle this problem, Crosstool-NG (CT-NG) [5] can be used to build a cross-compiler from scratch. However, this takes a long time, and it is not nearly as easy as installing aforementioned packages. Perhaps automating this process could be achieved by using a prede-fined CT-NG configuration file, in combination with calls to the OS for installing and running CT-NG itself.

The remaining software requirements, like Gem5, Python and various packages should not be difficult to install in an automated fashion.

7.2 Lowering the required user input

(28)

• Argument communication: providing a function to translate a Python dictionary of pa-rameters to a string of console arguments, and ensuring that the main function of the C file parses these arguments properly.

• Providing a separate C file for PathCrawler.

Naturally, it would be beneficial for the user if one or both of these tasks can be automated. However, this is a considerable challenge, as we shall now discuss.

7.2.1 Automating argument communication

To automate this part of the script, the first step would be to create a standard syntax for all possible console arguments, and write a Python function that translates parameter dictionaries to console arguments according to this syntax (from now on called ‘translator’). This would take a significant amount of time, especially for nested argument structures (arrays/structs containing other arrays/structs).

The next part however, is even more difficult: writing a program (in Python, C or any other suitable language) that generates C code for parsing arguments, according to the aforemen-tioned syntax, combined with information on what parameters the program expects (from now on: ‘parser’). This is challenging for several reasons.

One issue is that PathCrawler does not preserve the order of parameters - the input values it generates are just combinations of parameter identifiers (as defined in the function header of pathcrawler_func) and values. This raises the question: in what order should the parameters be passed, in the list of console arguments? A possible solution would be to define a standard rule for this, based on the parameter identifiers themselves (alphabetical order, for instance). Another possibility would be to use keyword console arguments.

The next challenge is the order in which the main function passes the parameters to the function under analysis. This problem could potentially be solved by using keyword arguments within C - C does not support these natively, but it is possible to create similar functionality by using various features introduced with C99 [12]. An alternative solution would be to ‘extract’ information about the order of parameters automatically, by analysing the source code.

Another, related issue is dealing with parameter types. The parser would need to know these types beforehand, to use the correct routines for parsing and storing them. This information would also need to be extracted from the source code, in some automated fashion.

One final thing to note is that PathCrawler treats input struct parameters rather oddly: in the ‘test case’ files it generates, they are defined using the same syntax as arrays, and their members’ identifiers are omitted completely. However, for real arrays, PathCrawler also defines their dimensions in a separate section of the file, which it does not do for structs. In other words, PathCrawler treats structs like Arrays without predefined dimensions. What makes this behaviour even more peculiar is that nested structs (struct members that are structs themselves) are treated differently: their values are declared in a different syntax altogether, one that makes much more sense. This strange difference makes it very hard to parse structs in an

(29)

that need to be made can be divided into two categories: syntactic and logic modifications. Syntactic modifications are changes that do not affect the behaviour or functionality of the program. Examples of these include rewriting array declarations and parameters, removing Pragmas and changing float variables into doubles (since PathCrawler has issues with treat-ing floats). These modifications are relatively easy to automate, because they are context-independent: they can be made without knowing what the program under analysis does, or how. Logic modifications are changes that do affect a program’s behaviour. These include adjust-ments like removing/replacing calls to functions from standard C libraries, moving functional code to pathcrawler_func and removing global variables while replacing them with function parameters. These modifications are much harder (if not impossible) to automate, because they are not context-independent: the exact changes that need to be made depend on what a user wants the program to do.

To summarise, logic modifications require a programmer to think about the problem at hand, while syntactic modifications do not. Unfortunately, these two types of modifications are not really independent of each other. Rewriting a program is one coherent process; splitting it up into two steps (and automating one of them) usually does not make much sense. Unfortunately this means that, as long as these intensive code rewrites are necessary, users will have to perform them manually.

7.3 Making the script less restrictive

As explained in section 5.4, the main cause of the script’s restrictions is PathCrawler. This means that if we want to remove/alleviate these restrictions, we should focus on that tool. There are two possible solutions for this problem: using/waiting for a newer version of PathCrawler, or trying to find a different tool.

For this project, an older PathCrawler version was provided. This version is, according to its developers, almost certainly more robust than the newest version. However, the latter does provide more functionality: it handles multidimensional arrays better, and is capable of dealing with recursive functions, to name two examples. It also comes with an option that prevents PathCrawler from treating global variables as input variables. These benefits are very signifi-cant, and will improve the tool chain’s usability considerably - although users will have to accept the possibility of PathCrawler being less stable and robust. Of course, since PathCrawler is still in development, newer versions might be released in the future, which could be even better than the current newest version.

If even the latest PathCrawler versions do not suit a user’s needs, the only remaining option is to find an alternative tool for reducing a program’s input-value space. With enough time/people, one might try the solution that was already briefly attempted during this project: modifying a tool like SWEET, as described in section 3.1. Alternatively, different path coverage tools could be tried, such as CAUT [3], CREST [4] or KLEE [13]. Like PathCrawler, these tools are aimed at generating test input values, based on ‘covering’ the source code.

(30)

(31)

[1] Argparse Library. url: https://docs.python.org/2/library/argparse.html#module-argparse.

[2] Nathan Binkert et al. “The Gem5 Simulator”. In: SIGARCH Comput. Archit. News 39.2 (Aug. 2011), pp. 1–7. issn: 0163-5964. doi: 10 . 1145 / 2024716 . 2024718. url: http : //doi.acm.org/10.1145/2024716.2024718.

[3] CAUT. url: https://github.com/tingsu/caut-lib. [4] CREST. url: https://github.com/jburnim/crest. [5] Crosstool-NG. url: http://crosstool-ng.github.io/.

[6] Andreas Ermedahl. A Modular Tool Architecture for Worst-Case Execution Time Analysis. 2003.

[7] Andreas Ermedahl et al. “Deriving the Worst-Case Execution Time Input Values”. In:

Proceedings of the 2009 21st Euromicro Conference on Real-Time Systems. ECRTS ’09.

Washington, DC, USA: IEEE Computer Society, 2009, pp. 45–54. isbn: 978-0-7695-3724-5. doi: 10.1109/ECRTS.2009.32. url: http://dx.doi.org/10.1109/ECRTS.2009.32. [8] Heiko Falk et al. “TACLeBench: A Benchmark Collection to Support Worst-Case

Exe-cution Time Research”. In: 16th International Workshop on Worst-Case ExeExe-cution Time

Analysis (WCET 2016). Ed. by Martin Schoeberl. Vol. 55. OpenAccess Series in

Infor-matics (OASIcs). Dagstuhl, Germany: Schloss Dagstuhl–Leibniz-Zentrum für Informatik, 2016, 2:1–2:10.

[9] Gnu Compiler Collection. url: http://gcc.gnu.org/.

[10] Jan Gustafsson et al. “Automatic Derivation of Loop Bounds and Infeasible Paths for WCET Analysis Using Abstract Execution”. In: In RTSS.

[11] Reinhold Heckmann et al. “Worst-Case Execution Time Prediction by Static Program Analysis”. In: IEEE International Symposium on Parallel and Distributed Processing. 2004. [12] Keyword arguments in C. url: https : / / www . darkcoding . net / software / keyword

-arguments-in-c/.

[13] KLEE. url: http://klee.github.io/.

[14] Peter Marwedel. Embedded System Design: Embedded Systems Foundations of

Cyber-Physical Systems. 2013. url: http : / / ls12 - www . cs . tu - dortmund . de / daes / en /

daes/mitarbeiter/prof-dr-peter-marwedel/embedded-system-text-book/slides/ slides-2013.html.

[15] MISRA. url: http://misra.org.uk/.

[16] Pareto efficiency on Wikipedia. url: https : / / en . wikipedia . org / wiki / Pareto _ efficiency.

[17] PathCrawler. url: http://frama-c.com/pathcrawler.html. [18] Python. url: http://www.python.org/.

(32)

[19] Nicky Williams et al. “PathCrawler: Automatic Generation of Path Tests by Combining Static and Dynamic Analysis”. In: Dependable Computing - EDCC 5: 5th European

De-pendable Computing Conference, Budapest, Hungary, April 20-22, 2005. Proceedings. Ed.

by Mario Dal Cin, Mohamed Kaâniche, and András Pataricza. Berlin, Heidelberg: Springer Berlin Heidelberg, 2005, pp. 281–292. isbn: 978-3-540-32019-7. doi: 10.1007/11408901_21. url: http://dx.doi.org/10.1007/11408901_21.