From concurrent to sequential Go programs

(1)

Bachelor Informatica

From concurrent to

sequential Go programs

Tijs de Vries

June 15, 2020

Inf

orma

tica

—

Universiteit

v

an

Ams

terd

am

(2)

(3)

Abstract

The Go programming language provides a lightweight alternative to threads in the form of goroutines. Besides being lightweight, they are also first-class citizens in the programming language. Therefore, it is rather easy to create concurrent programs in Go. However, con-current programs are difficult to test due to the many possible different orders of execution. In this project, we design a framework that transforms concurrent Go programs into sequential ones, and create a prototype of this framework, called GoSquash. Sequentialising with GoSquash results in a multitude of Go programs where goroutines have been eliminated; these programs can be tested in a deterministic manner.

We validate the correctness of the results empirically with a set of five programs, and demonstrate that GoSquash is able to sequentialise concurrent programs into sequential ones that generate correct output. It works for many simple cases, but support of more Go features should be implemented to be useful for more complex applications.

(4)

(5)

Introduction

During the ”Programmeertalen” course in the first year of the bachelor ”Informatica” at the University of Amsterdam, one of the programming languages explored is Go. The primary reason for discussing Go is to explore the concept of concurrency. Goroutines, the alternative to threads that Go provides, are first-class citizens, and thus enable a more natural model of concurrency. This makes Go a great option compared to other languages like C or C++, where the use of libraries like pthreads [7], TBB [10], or OpenMP [1] are required to achieve concurrency.

1.1 Problem statement

As useful as concurrency is, one of its well-known challenges is testing concurrent programs for correctness [12]. Concurrent programs are inherently non-deterministic, so the outcome may not always be the same. For example, for a program with two threads performing separate tasks, one execution can observe thread 1 finishing first, while another execution can observe thread 1 finishing last. Of course, such behaviour can alter the program execution further, and result in different outcomes later on in the program.

This non-deterministic behaviour of concurrent programs makes their empirical testing for detecting concurrency-related errors difficult, because such errors only manifest themselves at random. This means that passing a test on a concurrent program in one run on one machine does not automatically mean that the code will pass the same test on other machines; in fact, it might not even pass again on the same machine.

One possible way to make tests more reliable is to transform concurrent programs into sequen-tial ones. That way, there are no threads that perform their tasks in different orders. However, it is the test that will now need to take all different cases into account, which means that one concurrent program is very likely to result in multiple, slightly different, yet (possibly) correct sequential programs with equivalent functionality.

The goal in this project is to simplify the testing of concurrent Go programs through sequen-tialisation. To this end, we aim to design and implement a framework that transforms concurrent Go programs into sequential ones, thus ensuring deterministic behaviour, at the cost of having multiple possible sequential variants for the same Go program. All resulting sequential versions must preserve the original functionality of the Go program. The only change in behaviour should be the fixed order of execution; thus, none of the sequential versions should cause outcomes that the concurrent version cannot possibly generate, nor should the set of sequential programs leave out any possible outcomes of the original.

(8)

1.2 Research questions and approach

To address our goal to improve the effectiveness of tests for concurrent Go programs, we aim to answer the following research question: How can we systematically transform concurrent Go programs into sequential ones, while maintaining their functionality?

To formulate an answer to this question, there are a few specific challenges that need solving. The solutions to these questions will form the basis of our framework:

How can we make the concurrent-sequential transition in a systematic way?

Our goal is to build a systematic method to translate concurrent Go programs into sequential ones. This method should be generic (i.e., applicable for any Go program) and produce a sequential output (i.e., show no concurrent behaviour). This method will form the foundation of the transition framework.

How can we verify transition correctness?

To be able to use the framework to test a concurrent program by testing its sequential version(s), all the sequential versions’ outcomes need to be a possible outcome of the concurrent program. It needs to be possible to confirm that this is the case.

What are representative applications for testing?

To test the applicability of our framework, we must provide a set of representative test-applications. We will select those from existing examples and/or benchmarks, ensuring the tests cover the se-lected concurrent primitives.

1.3 Ethics and impact

We made the framework open-source at

https://gitlab-fnwi.uva.nl/11026189/go-sequentialiser. This way, all tests can be re-produced in order to verify that the framework can indeed do all the things that are promised.

In terms of impact, GoSquash, and potential projects that build upon the framework, make it easier to test and debug concurrent Go programs and find out in advance what kind of outputs one might expect from their program. This should increase productivity and allow programmers to find potential bugs in their code in advance, saving some time by knowing something is wrong before the bug occurs in practice.

1.4 Thesis outline

In Chapter 2 we identify the concurrency primitives in Go, and mention some interesting be-haviour in the Go scheduler. Some related work is also summarised. Chapter 3 introduces the design of GoSquash, and goes into the steps necessary to produce all sequential versions of a concurrent input program. In Chapter 4 the tools is validated empirically, and shows that the framework manages to produce sequential programs that give the expected outputs. Finally, we conclude this thesis with our main findings and suggestions for future work in Chapter 5.

(9)

CHAPTER 2

Theoretical background and related work

In this chapter we summarise some relevant information on Go and its concurrency primitives, as well as some interesting behaviour in the Go scheduler that needs to be taken into account in the sequentialiser itself.

2.1 Concurrency primitives in Go

Concurrent programs are programs where multiple things are calculated at the same time, for example with the use of threads. Programs that benefits from concurrency, if programmed cor-rectly, are faster than sequential ones, where the program only does one thing at a time.

Go is a programming language that is great for creating these concurrent programs. One of the most important things for this project is to identify and understand the concurrency primitives of Go. This section serves to point out which aspects of Go classify as concurrency primitives and give some background information on them.

2.1.1 Goroutines

As mentioned in the introduction, Go’s way of making programs concurrent is through the use of so-called Goroutines [2]. Goroutines are not the same as threads in for example Java. They are not scheduled by the operating system. Instead, the Go scheduler multiplexes goroutines onto OS threads.

2.1.2 Channels

Golang encourages programmers to not use communication by sharing memory, but instead sharing memory by communicating [3]. The way Golang achieves this communication is through channels, which goroutines can send data into and receive data from.

There are two types of channels in Go, which change the behaviour of the sending goroutine: buffered and unbuffered.

Unbuffered channels are blocking, meaning that sending data into one will halt the further execution of the rest of the current goroutine until it is retrieved in another goroutine. These channels can be useful to keep goroutines synchronised.

When sending values into a buffered channel, the sending goroutine will just continue exe-cuting the rest of its tasks, not caring whether the value is retrieved elsewhere. If the channel’s buffer is full, however, the sender will block further execution, as if it were using an unbuffered channel, until the buffer has space available.

(10)

A goroutine that attempts to receive data from a channel will block until data becomes available, regardless of whether the channel is buffered or unbuffered.

2.1.3 Selects

In case there are multiple channels that a programmer expects data from, Go provides something called a select construct. This construct looks like a regular switch statement, but it is meant to interact with the multiple channels that the programmer may have set up. Unless a default case is specified, select will block until one of its cases can be run. If multiple cases are ready, one is chosen at random.

GoSquash does not support selects, and leaves implementation of this for future work.

2.2 The Go scheduler

At runtime, the Go scheduler aims to prevent (some) deadlocks from happening, essentially scheduling ready-to-run tasks for as long as possible. For example, when a goroutine is trying to send a value through a full or unbuffered channel, or when it tries to receive a value from an empty channel, the goroutine will block. When a goroutine blocks, the scheduler decides to switch to another goroutine that still has work to be done. This behaviour makes for some interesting cases to take into account in our framework, which we will discuss in Chapter 3.

2.3 Related work

There are many directions of research that are relevant for this work, including work on serialisa-tion and/or ”sequentialisaserialisa-tion” for different programming models, concurrent program analysis for causality detection, or methods and tools for concurrency errors detection. In this chapter, we discuss a few examples of such related work.

One example of research work that looks at serialization is presented in [5]. Their goal is similar to that of our project, but the work focuses on programs written in C. They first compile the C program to a BoogiePL program, to get rid of some of the complexities of C, before trans-forming it into a sequential program. They tested their tool on Windows driver benchmarks, and managed to find a bug in one of them, which has since been fixed.

Similar to [5], the work presented in [9] focuses on transforming multithreaded C programs into sequential ones in order to check programs for race conditions. The sequentialised versions are checked by SLAM, a model checker for sequential C programs. Like [5], they tested their tool on several Windows drivers, and found some race conditions in a few of them. One small issue was that some warnings were benign, and that was left for future work.

Yet another solution to serialize concurrent programs is presented in [4]. The paper describes two possible ways to reduce a concurrent program to a sequential one: the eager and the lazy approach. The eager approach executes threads one by one, guessing values of shared variables when there is supposed to be a context switch. This approach risks outcomes that are not reach-able by the concurrent program. The lazy approach will not execute threads one by one, but instead switch contexts. When switching contexts, the shared variables that have been computed until that point are saved and passed on to the next thread, and the local state of the original thread is dropped. The execution continues with the second thread, from scratch, until the next context switch, where the same procedure happens again. If the context switches back to the first thread, this thread will have to be evaluated from the beginning, since the local state is lost. A lot more computing is necessary in the lazy approach compared to the eager one, but this method guarantees that the outcome it reaches is also reachable by the concurrent program.

(11)

An example of research that focuses on determining variable causality is presented in [11]. This paper describes a solution to keep track of which variables are causally dependent on other variables. For example, if thread A writes to variable x, and in thread B we write to a variable y making use of the value of variable x, we can say that y causally depends on x. We could potentially use this approach as a way to reduce the amount of statements to be interleaved within our framework: variables that do not causally depend on variables in another goroutine could potentially be ignored, significantly reducing the output of the sequentialiser.

Finally, an example of research focusing on identifying concurrency errors is presented [6], where the authors aim to detect deadlocks in Go programs at compile-time. Usually, Go can detect communication errors/mismatches such as deadlocks at runtime, where it can no longer avoid the application crashing. This tool presented in this work enables earlier detection, at compile time. After extracting communication operations as session types, the proposed method converts these sessions into a communicating finite state machine (CFSM) and apply choreog-raphy synthesis. If the synthesis succeeds, no communication errors will occur in the program, and therefore no deadlocks are to be expected.

(12)

(13)

CHAPTER 3

Method

This chapter presents the method we adopt for our systematic sequentialisation of Go programs, and highlights the main implementation challenges and solutions we propose for our current prototype.

3.1 Design

There is a lot of preparation that needs to take place before we can transform a concurrent Go program into all its sequential versions. The way the transformation is done is by interleaving goroutines.

Ideally, prior to interleaving, every goroutine consists of a basic block. The statements in these basic blocks should accurately reflect the order of execution, in order to be able to generate the sequential versions of the input program. We will consider blocks that accurately reflect the order of execution as being ”flat”, and the process of transforming goroutines into being ”flat”, we will call ”flattening” from here on out. To achieve flat blocks, a few examples of necessary steps are loop unrolling and function inlining.

Another important challenge to take into account is that any actions that may result in mul-tiple possible programs (such as interleaving) should be done as late as possible, to avoid having to repeat actions such as loop unrolling or applying unique variable names to almost the same, but slightly different programs. In most cases, we would be doing the exact same thing multiple times, and that is inefficient: we want to avoid these cases as much as possible.

Figure 3.1 presents the (ideal) design of a complete sequentialiser, and it acts as a blueprint for our first GoSquash prototype. In the following sections, we briefly describe each component of this framework.

GoSquash itself is also programmed in Go, because interleaving can safely make use of gor-outines, and also to get more familiar with the language. GoSquash uses the go/ast[8] package to analyse and alter the concurrent program.

(14)

(15)

3.2 Input and output of the sequentialiser

In order to effectively use GoSquash, it is a good idea to first determine whether sequentialisa-tion is necessary at all. A program might not be concurrent, which means it would not need sequentialisation in the first place.

Some concurrent programs may result in deadlocks. GoSquash will also produce sequen-tial versions with deadlocks, as shown in section 3.7. Deadlocks in sequensequen-tial programs can be identified and removed from the output of our framework, but it is a lot harder whether these deadlocks are the programmer’s fault, or are correct interleavings that the Go scheduler does not allow (which is explained in more detail in Section 3.7). Programs that have a risk of running into a deadlock may hence not get a representative result.

In order to get the most reliable results, we recommend running the input program through the tool proposed by [6] first. The Go scheduler will only detect goroutines at runtime, and our framework only detects deadlocks in sequentialised versions. To see if there are any risks of running into a deadlock in the original program, use [6]’s tool before using GoSquash.

Once the concurrent program has been assessed, and it is determined that it can benefit from sequentialisation, running a concurrent program through GoSquash will result in a folder containing all (deemed) possible sequential versions of it.

3.3 Statement transformation

This phase includes all transformation needed to properly flatten the program for further steps. Ideally, every possible type of statement that could impact the order of execution of the program in any way should be taken into account in this step.

3.3.1 For-loops

An important type of statement that must be flattened is the For-loop. The body of the For-loop typically gets executed many times, which means that the For-loop effectively consists of a large amount of statements, rather than the one we see without further inspection.

In order to flatten For-loops, the sequentialiser uses loop unrolling: the body of the For-loop, as well as the post-statement, are copied for the amount of iterations.

Loop unrolling prior to runtime requires knowledge of the loop trip count (the number of iterations). However, there are several reasons for this number not to be available (yet):

• The sequentialiser does not evaluate the program, it merely inspects it. This means that it does not keep track of variable values.

• Even if we want to keep track of values of variables relevant to the loop bounds, they could potentially be changed inside the body of the For-loop as well, meaning the For-loop would effectively have to be run in order to actually know the loop bounds.

• The body could contain statements such as break, continue, goto and return. These all interrupt the loop in their own way.

(16)

To deal with these complications, there are certain assumptions that we make in order to enable some form of loop unrolling:

• The only expected/accepted type of For-loop is one with an init statement, condition, and post statement. The other forms that For-loops can have, like ranges or single conditions, are not supported.

• The init statement must set a variable to an integer.

• The condition always compares the same variable on the left-hand side to an integer to the right-hand side, with the operator being <, ≤, > or ≥.

• The post statement is always an increment (++) or decrement (−−) of the same variable. • The variable is not written-to after the init statement.

• No statements such as continue or break occur.

3.3.2 Function calls

Another important challenge to address to flatten the program is inlining functions calls. The goal of this transformation is to move the contents of function calls to the same sequence of statements as the rest of the program, thereby having the entire program in a single function, the main function.

Although goroutines execute functions, the framework handles these functions differently, and does not inline them. Functions run as separate goroutines do not have a set order of execution relative to the goroutine it was started in, which means inlining them will not help us in creat-ing a flat basic block. Instead, we interleave goroutines later on in the program to generate all possible orders of execution.

Inlining function calls comes with the potential risk that a function calls itself. If we blindly substitute every function call, our inlining will break for recursive calls, when we would poten-tially create an infinite program. A possible solution is capping the maximum recursion depth.

3.4 Applying unique variable names

Because For-loops, functions, and conditional statements have their own scopes that have been or will be removed, variable scopes will be altered, too. Some functions may use the same variable names, and interleaving them will result in them interfering with each other.

In order to prevent this from happening, the sequentialiser gives all variables a unique name. Specifically, every distinct variable gets its own number appended to its name. For example, a variable originally named foo may be renamed to foo 0. The counter does not care about the variable name, so the next variable bar would be renamed bar 1 and not bar 0, even though it is the first bar in the program.

In the go/ast package, variables can be identified by where they are declared. Variables that share the same name and place of declaration are the same, hence are given the same unique name. If a variable originally had the same name as another variable, but was declared elsewhere, the new, unique, variable name will be different from its lookalike. For example, a variable named foo that originates from another scope than foo 0 may be renamed to foo 1.

(17)

One example where an unrolled For-loop would interfere with the main function without unique variable names would be:

Input program: package main import "fmt" func main() { foo := 0 for i := 1; i < 3; i++ { foo := i fmt.Println(foo) } fmt.Println(foo) } Output: 1 2 0

Flattened program (no unique variables): package main import "fmt" func main() { foo := 0 i := 1 foo := i fmt.Println(foo) i++ foo := i fmt.Println(foo) i++ fmt.Println(foo) } Output: 1 2 3

Flattened program (with unique variables): package main import "fmt" func main() { foo_0 := 0 i_1 := 1 foo_2 := i_1 fmt.Println(foo_2) i_1++ foo_3 := i_1 fmt.Println(foo_3) i_1++ fmt.Println(foo_0) } Output: 1 2 0

A problem with this approach, however, is that, for example, unrolling For-loops would result in the exact same body multiple times over. This would result in iterations interfering with each other, because the copied variables would still point to the same original declaration, even if that declaration was copied as well. To prevent such interference, we need to make sure that, if a variable declaration is copied, usage of the variable in other copied statements makes use of the new variable declaration, rather than the original.

(18)

3.5 Removing conditionals

Conditionals result in different possible ways of executing the program. Only one of the two blocks will be relevant to one specific execution. We note that when addressing conditional statements, we assume that the rest of program has already been flattened: For-loops have been unrolled, function calls have been inlined, all that remains are Go-statements and conditionals.

As nothing else will change in the flattened concurrent program at this point, we can simply look at conditionals as generating two possible ways to execute a program: the if branch and the else branch.

Thus, to remove conditionals, we can export two programs: one containing the If-block, and one containing the Else-block. Removing the conditionals resulting in multiple programs is also the reason this operation is not done when the other statements are transformed, but later.

3.6 Interleaving Goroutines

The first step to finding all possible executions of the program is to simply interleave the code passed through to a Go statement with the rest of the program that comes after the Go state-ment. Just interleaving the two sequences of statements as they are upon finding them is not going to be enough, however: if either of these sequences contains yet another Go statement, this means that they, individually, also have different possible orders of execution.

In order to handle new goroutines properly, we evaluate them the same way as the entire program, as seen in Algorithm 1. The main function itself is a goroutine, after all. Both the evaluation of the goroutine and the rest of the program that comes after the Go-statement will result in a list of possible orders of execution for that particular part of the program. Every one of the possible orders of execution of one list needs to be interleaved with every possibility of the other to find all possible orders of execution of the program.

Data: goRoutine // list of statements in the current goroutine initialization;

for i, currentStmt in goRoutine do if currentStmt is a goStmt then

newGoroutinePossibilities = interleaveOnGoStmt(currentStmt); allPossibilities = new list;

if currentStmt is last statement in sequence then allPossibilities = newGoroutinePossibilities; else

restOfCurrentGoroutinePossibilities = interleaveOnGoStmt(goRoutine[i+1:]); for newGoroutinePossibility in newGoroutinePossibilities do

for restOfCurrentGoroutinePossibility in restOfCurrentGoroutinePossibilities do allInterleavings = interleaveAllPossibilities(newGoroutinePossibility, restOfCurrentGoroutinePossibility); append(allPossibilities, allInterleavings); end end end return allPossibilities end end

Algorithm 1: Function interleaveOnGoStmt (goRoutine) - Interleave on Go-statement en-counter

(19)

Note that the order in which statements occur within a single scope needs to be preserved. This is one aspect of relevant causality, which we will go into more detail in Section 3.8.

3.7 Handling channels

Channels are a useful tool to communicate between Goroutines, but interacting with one can result in the goroutine having to wait for another goroutine to interact with the channel. After sequentialising the concurrent program, we only have one single goroutine, meaning that once the program is blocking, it will always result in a deadlock.

For example, given a simple program with the following main function: func main() {

channel := make(chan int) go func() {

channel <- 2 }

a := <-channel }

There are two possible interleavings for this program. One example of a possible interleaving would be this main function:

func main() {

channel := make(chan int) a := <-channel

channel <- 2 }

Although the example above is a correct interleaving, it will result in a deadlock: it tries to read from the channel before it contains any values. The Go scheduler will attempt to switch to another Goroutine since this one is blocking. In this simple sequentialised program, however, there are none. As such, this interleaving will result in a deadlock, although the original program would not have.

With the other possible interleaving, we see another problem: func main() {

channel := make(chan int) channel <- 2

a := <-channel }

The channel in the above example is unbuffered, so sending a value through it will cause the cur-rent goroutine to block, and the scheduler to attempt to switch to another goroutine. However, as mentioned in the last example, there are no other goroutines, because this is a sequentialised version.

A solution to this is to artificially set the buffer size to 1 for unbuffered channels. This allows other statements that originally were in another goroutine to still be run after the send.

(20)

3.8 Determining causal dependency and relevance

A big problem with finding all possible sequential versions of a concurrent program is state ex-plosion. The amount of possibilities grows exponentially for every extra line in the program, and every goroutine introduced. This results in an unmanageable set of sequential programs.

The explosion of state can be managed by (1) excluding irrelevant statements from interleav-ing, and (2) reducing interleaving by using causal dependency.

Irrelevant statements are statements that only affect the state of the local goroutine. For example, a statement that initialises a local variable with a constant value does not affect other goroutines, so interleaving it with other goroutines is useless. Therefore, such an irrelevant state-ment can be coupled with the next statestate-ment. The interleaving is then done with the coupled statements, instead of the two statement separately, resulting in a lower number of total inter-leavings.

Causal dependency can be used to identify blocks of statements that cannot be interleaved. For example, two goroutines communicating over unbuffered channels form a causal dependency, because the statement after the write depends on the read and the statement after the read depends on the write. Therefore, the statements before the read cannot be interleaved with the statements after the write, and the same holds for the statements before the write and after read. Applying causal dependency therefore results in fewer interleavings.

However, implementing this solution in our prototype is left for future work, as it relies on more static analysis than we currently have available. A method for determining causal dependency is given in [11] and should form the basis for the implementation.

(21)

CHAPTER 4

Empirical validation

In order to validate GoSquash, we present four example programs, aiming to test different aspects of the framework and/or interesting case-studies from the users’ perspective. We also present one example of a sequentialised version for each of these programs. For each program we only show the parts that may change, so package names and imports are left out.

The tests are run with Go version 1.13.4 on Ubuntu 18.04.4 LTS.

4.1 Validation goals

The following features of GoSquash are validated: 1. Loop unrolling.

2. Channels not breaking a sequential program. 3. Goroutine interleaving.

4. Correctly merged scopes (Unique variable names). 5. Removed conditionals

The steps we go through for these tests are:

1. Running the original program 10 times. We collect the execution times and results. 2. Sequentialise the program.

3. Running every output sequential version 10 times, again collecting execution times and results.

4. Compare the minimum, average and maximum execution times. 5. Compare code measured in bytes using the ls − l command in bash. 6. Compare execution results.

(22)

4.2 Program 1

Program 4.1 showcases features 1 and 4 as described in Section 4.1. The program prints the squares of all numbers from 1 to 3. There are no other goroutines in this program, which means there is only one ”sequentialised” version.

Listing 4.1: test program1.go p a c k a g e main i m p o r t ( ” fmt ” ) f u n c main ( ) { f o r i := 1 ; i <= 3 ; i++ { v a l u e := i ∗ i fmt . P r i n t l n ( v a l u e ) } }

Results

After the transformation, the resulting sequential program is: f u n c main ( ) { i 0 := 1 v a l u e 1 := i 0 ∗ i 0 fmt . P r i n t l n ( v a l u e 1 ) i 0++ v a l u e 2 := i 0 ∗ i 0 fmt . P r i n t l n ( v a l u e 2 ) i 0++ v a l u e 3 := i 0 ∗ i 0 fmt . P r i n t l n ( v a l u e 3 ) i 0++ }

The output of the sequentialised program matches that of the original. In terms of execution statistics, we present the collected data in Table 4.1. We observe that the execution time of the sequential code is very stable. Moreover, we also point out that, as expected, the code of the sequentialized version has increased in size.

Original Sequentialised

Number of programs 1 1

Correct results 1 1

Min. execution time (s) 0.384 0.389 Max. execution time (s) 0.558 0.473 Avg. execution time (s) 0.445 0.448

Code size (B) 117 213

(23)

4.3 Program 2

Program 4.2 showcases features 1, 3 and 4 as described in Section 4.1. The program starts two goroutines using a For-loop and prints 42 + i. Notice how i is not a local variable in the goroutine, but rather a shared one. The used value of i here depends on what its value is when the statement in the goroutine is executed, not when the goroutine was started. This means that this value could also be something that is be out of bounds. When i equals 2, the For-loop stops, but the goroutines may still be running. Hence, we expect 7 different outputs: all pairs of 42, 43 and 44. This excludes the pair (42,42), because by the time the second goroutine is started, i will never be 0 anymore.

Listing 4.2: test program2.go f u n c main ( ) { v a r num i n t = 42 f o r i := 0 ; i < 2 ; i++ { go f u n c ( ) { localNum := num + i fmt . P r i n t l n ( localNum ) } ( ) } }

Results

Sequentialisation results in 45 possible orders of execution. An example of a sequentialised result is presented in the code below.

f u n c main ( ) { v a r num 0 i n t = 42 i 1 := 0 l o c a l N u m 2 := num 0 + i 1 i 1++ i 1++ l o c a l N u m 3 := num 0 + i 1 fmt . P r i n t l n ( l o c a l N u m 3 ) fmt . P r i n t l n ( l o c a l N u m 2 ) }

Table 4.2 presents the execution statistics.

Correct results 1 45

Table 4.2: Test results of program 2

Based on these results, we note that range of execution time has increased significantly, with some of the sequentialised codes running almost 3x slower than the original. However, the average execution time indicates only a marginal slowdown for the serial version of the code. Most importantly, the sequentialised versions of the code were all correct.

(24)

4.4 Program 3

Program 4.3 showcases features 2 and 3 as described in Section 4.1.

The program sends a value through the channel from a goroutine, and waits for this value in the main goroutine.

Listing 4.3: test program3.go f u n c main ( ) { c h a n n e l := make ( chan i n t ) go f u n c ( ) { c h a n n e l <− 42 } ( ) fmt . P r i n t l n (<−c h a n n e l ) }

Results

Sequentialisation results in 2 possible orders of execution. f u n c main ( ) { v a r num 0 i n t = 42 i 1 := 0 l o c a l N u m 2 := num 0 + i 1 i 1++ fmt . P r i n t l n ( l o c a l N u m 2 ) i 1++ l o c a l N u m 3 := num 0 + i 1 fmt . P r i n t l n ( l o c a l N u m 3 ) }

The statistics of the two sequentialised versions are presented in Table 4.3. Original Sequentialised

Correct responses 1 2

As with the other cases, this transformation is also successful. The resulting code is very stable in terms of execution time, although, on average, it is about 14% slower. More remarkable is that the code for the sequentialised version seems to have decreased in size compared against the original.

(25)

4.5 Program 4

Program 4.4 showcases features 1 and 3 as described in the beginning of this section.

The program creates 3 goroutines and prints a value in each of them. The value that is printed is the current sum of all values of i that have been iterated with at that point.

Listing 4.4: race condition2.go f u n c main ( ) { v a r a i n t f o r i := 0 ; i < 3 ; i++ { go f u n c ( ) { a += i fmt . P r i n t l n ( a ) } ( ) } }

Results

Sequentialisation results in 1680 possible orders of execution, and 52 unique outputs. f u n c main ( ) { v a r a 0 i n t i 1 := 0 a 0 += i 1 fmt . P r i n t l n ( a 0 ) i 1++ i 1++ i 1++ a 0 += i 1 fmt . P r i n t l n ( a 0 ) a 0 += i 1 fmt . P r i n t l n ( a 0 ) }

The statistics of the transformation and execution of Program 4 are presented in Table 4.4. Original Sequentialised

Correct outputs 1 1680

Min. execution time (s) 0.371 0.345 Max. execution time (s) 0.439 2.823 Avg. execution time (s) 0.408 0.547 Code size (B) 136 186-189 Table 4.4: Test results of program 4

We observe again an increase in the size of the sequentialised code. Moreover, the peak execution time is very high, almost 4x larger than the maximum for the original. On average, the execution of the serialised code takes about 20% more than the original.

(26)

4.6 Program 5

Program 4.5 showcases feature 5 as described in the beginning of this section.

The program declares a new variable and then assigns a value based on an If-statement. Although this program will always have one option based on the given condition, GoSquash does not evaluate the program, and hence does not know this.

Listing 4.5: if program.go f u n c main ( ) { v a r a i n t i f t r u e { a = 0 } e l s e { a = 1 } fmt . P r i n t l n ( a ) }

Results

Sequentialisation results in 2 possible orders of execution, and 2 unique outputs. f u n c main ( ) { v a r a 0 i n t a 0 = 0 fmt . P r i n t l n ( a 0 ) } f u n c main ( ) { v a r a 0 i n t a 0 = 1 fmt . P r i n t l n ( a 0 ) }

The statistics of the transformation and execution of Program 4.5 are presented in Table 4.5.

Correct outputs 1 2

We observe that GoSquash correctly splits the If-statement into two possible programs, one with the Then-block and one with the Else-block. The code size is smaller because the only transformation was the removal of some code, which also makes the program slightly faster.

(27)

4.7 Summary

Overall, these five validation programs illustrate how the execution time of sequentialised pro-grams increase on average, and that the minimum and maximum execution times are much further apart than that of the input program. Sequentialising programs also results in larger files as a result of actions such as loop unrolling. Programs that did not have For-loops but did have goroutines to interleave actually become smaller, since the Go-statement gets removed.

From the results we can clearly see the problem of state explosion: Programs 2 and 4 have the same amount of statements in the original versions, but the latter’s loop iterates once more, resulting in 3 extra statements in the sequentialised version. The increase results in over 37 times as many output programs.

(28)

(29)

CHAPTER 5

Conclusion

Go is one of the programming models designed to enable easy development for concurrent pro-grams. However, easy development does not come with guaranteed correctness: testing is still needed to detect potential concurrency-related bugs.

Testing concurrent programs - and, by extension, concurrent Go programs - is very difficult due to their inherent non-determinism: running the same program multiple times can lead to different - yet, possibly, all correct - outcomes.

The space of possible outcomes grows significantly larger as the programs grow larger and use more concurrency primitives - like, for example, goroutines or channels in Go.

In order to simplify testing, we propose in this thesis, to transform concurrent Go programs into functionally-equivalent sequential versions, which can be analysed much easier.

5.1 Main findings

The research in this project was driven by the following research question: ”How can we sys-tematically transform concurrent Go programs into sequential ones, while maintaining their functionality?”

To accomplish the transformation of concurrent Go programs, we design a framework that flattens the concurrent program as much as possible, by using techniques such as loop unrolling and function inlining. The flattening of the concurrent program allows for the framework to then interleave the goroutines, which results in all possible sequential configurations of the concurrent program.

Based on this design we produced a prototype called GoSquash that is able to sequentialise simple programs. The prototype expects the target program to have been run through the tool made by [6] to determine whether it needs to be sequentialised in the first place.

The prototype supports most of the important types of statements that Go has to offer. There are some limitations to the prototype, however. Some statements that change the flow of the program are not supported, limiting the amount of programs that are eligible for sequentialisation with GoSquash.

(30)

5.2 Future work

We suggest the following directions of future research.

Firstly, to avoid running the sequentialiser needlessly, an efficient pruning of the space of programs to be analyzed is needed. This can be done by checking true concurrent behaviour exists - the more in-depth the checks, the more aggressive the pruning. In this thesis, we have not addressed this aspect other than suggesting a potential tool specifically designed by Go.

Secondly, we note that our current approach is limited in applicability by the limited number of concurrency primitives it supports. Thus, increased support for the more concurrency state-ments is required. Currently, there is no support for statestate-ments such as goto, break, continue and select. This limits the amount of programs that can be sequentialised, Expansion of the amount of supported statements would allow for more programs to be sequentialisable.

Thirdly, relevant causality could be made use of more effectively. Currently, it is only used to make sure statements from the same scope remain in the right order while interleaving. A more in-depth approach could significantly reduce the state explosion the interleaving results in, by ignoring statements that have no impact on other goroutines.

Lastly, the validation of the approach could be further extended, to support the potential expansion of the the framework, but also to more real-life Go programs.

(31)

Bibliography

[1] Leonardo Dagum and Ramesh Menon. “OpenMP: An Industry-Standard API for Shared-Memory Programming”. In: IEEE Comput. Sci. Eng. 5.1 (Jan. 1998), pp. 46–55. issn: 1070-9924. doi: 10.1109/99.660313. url: https://doi.org/10.1109/99.660313. [2] Effective Go. https://golang.org/doc/effective_go.html.

[3] _{Andrew Gerrand. Share Memory By Communicating. 2010. url: https://blog.golang.} org/codelab-share.

[4] Salvatore La Torre, P. Madhusudan, and Gennaro Parlato. “Reducing Context-Bounded Concurrent Reachability to Sequential Reachability”. In: Computer Aided Verification. Ed. by Ahmed Bouajjani and Oded Maler. Berlin, Heidelberg: Springer Berlin Heidelberg, 2009, pp. 477–492. isbn: 978-3-642-02658-4.

[5] Shuvendu K. Lahiri, Shaz Qadeer, and Zvonimir Rakamari´c. “Static and Precise Detection of Concurrency Errors in Systems Code Using SMT Solvers”. In: Computer Aided Ver-ification. Ed. by Ahmed Bouajjani and Oded Maler. Berlin, Heidelberg: Springer Berlin Heidelberg, 2009, pp. 509–524. isbn: 978-3-642-02658-4.

[6] Nicholas Ng and Nobuko Yoshida. “Static Deadlock Detection for Concurrent Go by Global Session Graph Synthesis”. In: Proceedings of the 25th International Conference on Compiler Construction. CC 2016. Barcelona, Spain: Association for Computing Machinery, 2016, pp. 174–184. isbn: 9781450342414. doi: 10.1145/2892208.2892232. url: https://doi. org/10.1145/2892208.2892232.

[7] Bradford Nichols, Dick Buttlar, and Jacqueline Proulx Farrell. Pthreads programming. O’Reilly & Associates, Inc., 1996.

[8] Package ast. https://golang.org/pkg/go/ast/.

[9] Shaz Qadeer and Dinghao Wu. “KISS: Keep It Simple and Sequential”. In: SIGPLAN Not. 39.6 (June 2004), pp. 14–24. issn: 0362-1340. doi: 10.1145/996893.996845. url: https://doi.org/10.1145/996893.996845.

[10] Arch D. Robison. “Intel Threading Building Blocks (TBB)”. In: Encyclopedia of ParallelR

Computing. Ed. by David Padua. Boston, MA: Springer US, 2011, pp. 955–964. isbn: 978-0-387-09766-4. doi: 10.1007/978-0-387-09766-4_51. url: https://doi.org/10.1007/ 978-0-387-09766-4_51.

[11] Koushik Sen, Grigore Rosu, and Gul Agha. “Runtime Safety Analysis of Multithreaded Programs”. In: ACM SIGSOFT Software Engineering Notes 28 (Nov. 2003). doi: 10.1145/ 949952.940116.

[12] Richard N. Taylor, David L. Levine, and Cheryl D. Kelly. “Structural Testing of Concurrent Programs”. In: IEEE Trans. Softw. Eng. 18.3 (Mar. 1992), pp. 206–215. issn: 0098-5589. doi: 10.1109/32.126769. url: https://doi.org/10.1109/32.126769.

From concurrent to sequential Go programs

Bachelor Informatica