Enabling GoCART to detect incorrectness in concurrent Go programs containing buffered channels

(1)

Bachelor Informatica

Enabling GoCART to detect

incorrectness in concurrent Go

programs containing buffered

channels

Ravi Mohanlal

January 29, 2021

Supervisor(s): dr. A.M. Oprescu & J. Postema BSc

Inf

orma

tica

—

Universiteit

v

an

Ams

terd

am

(2)

(3)

Abstract

While programming languages such as Go are useful for writing concurrent programs, it is still easy for programmers to make subtle mistakes when doing so. This could for example result in deadlocks or data races to occur upon execution. GoCART was developed in order to predict these types of unwanted behaviours in Go programs by means of Predictive Trace Analysis. In this thesis we extend GoCART to offer support for buffered channels, which is a concurrency primitive that was not yet supported by its causality model. We demonstrate that the new version of GoCART is able to detect incorrectness in programs where buffered channels are used. We show that in cases where Go’s built-in data race detector reports false positives, GoCART does not.

(4)

(5)

Introduction

A way to efficiently solve certain problems is through the use of parallel computing. This involves multiple processors carrying out computations simultaneously. This technique has become more and more popular over the last 20 or so years due to physical limits being reached in regard to increasing processor speed [11]. For a certain algorithm or program to take advantage of par-allel computing it needs to be separated into units which can be executed out of order without affecting the final result. This concept is called concurrency [8].

Concurrent programming languages allow developers to specify and control concurrency within their programs. One such language is Go, whose development was started at Google in 2007 and was first released in 2012 [15]. The main concurrency constructs Go offers are Goroutines and channels. Goroutines are Go’s version of threads and channels can be used for the communica-tion between them. Goroutines can also communicate with each other through the use of shared variables, though this is not advised.

Unfortunately, concurrent programming comes with its own unique challenges. Concurrent pro-grams are prone to different types of incorrectness, such as the possibility for data races or deadlocks to occur upon execution. To efficiently detect these types of issues a technique called Predictive Trace Analysis (PTA) was developed [12]. PTA runs the program once and subse-quently analyses all thread schedules which are consistent with that run. This makes it more efficient than model checking, which analyzes every possible execution and is infeasible in practice.

GoCART is the first PTA tool that is specifically designed for the language of Go [10]. It offers support for a variety of Go’s concurrency primitives, namely dynamic Goroutine creation, access to shared variables, unbuffered channels, sync.WaitGroups and sync.Mutex locks.

1.1 Research Question

Despite GoCART offering support for several concurrency primitives, including unbuffered chan-nels, it does not yet support buffered channels. Implementing this would add a level of complexity for several reasons: a) a channel receive is no longer just dependant on the last send over the channel, but on a specific corresponding send over the channel; b) the question whether a Gor-outine blocks after a send or receive operation would no longer be a given, but also depend on the state of the channel.

Nevertheless, adding support for buffered channels to GoCART will bring it closer to being a more universal PTA tool for Go. Therefore the research question of this project is:

”How can PTA-based tools for Go (such as GoCART) be extended to include support for buffered channels?”

(8)

The following subquestions help answering this question:

• What are the challenges that buffered channels introduce to predictive analysis?

• How can we update the definition of consistent permutations when considering buffered channels?

• How accurate is the updated GoCART at detecting incorrectness related to buffered chan-nels?

Our contributions are an updated version of GoCART which supports buffered channels, as well as a performance analysis of GoCART’s ability to detect incorrectness in programs utilizing buffered channels.

1.2 Thesis outline

Chapter 2 explains the foundational techniques and algorithms used in Predictive Trace Analysis. These are the algorithms that GoCART is based on. Subsequently, chapter 3 documents how we extend these algorithms in order to offer support for buffered channels. Chapter 4 then investigates the accuracy of this new version of GoCART. The experiments investigate whether GoCART correctly deals with buffered channels and the results are compared to other Go tools. Chapters 5, 6 and 7 present a discussion of the results, an overview related work and a conclusion, respectively.

(9)

CHAPTER 2

Theoretical background

This chapter lays out the fundamental algorithms and theory behind Predictive Trace Analysis, on which GoCART is based. The chapter starts out with some general terminology and notation that will be used throughout. After that the theory and reasoning behind two algorithms, called the vector clock algorithm and computation lattice algorithm will be explained.

2.1 General terminology and notation

We consider multithreaded programs consisting of n threads, which are denoted by t1, t2, ..., tn

[13]. These threads can be executed concurrently and they communicate with each other through shared variables. Each thread consists of a sequence of events. These could be read or write events of shared variables, or simply internal events. Events will be generally referred to as e, or e′to make a distinction between events. Additionally, the notation ej_i may be used to signify that ej_i is the jthevent of thread ti. The set of all events is denoted by E.

2.2 Property monitors

Property monitors can be used to detect incorrectness in multithreaded programs. They offer a very general way of defining incorrectness. A specific property monitor can be constructed to detect data races for example.

We define a property monitor as a nondeterministic automaton [13]. It consists of a tuple (M, m0, b, ρ). Here M is a set of states, m0∈ Mis the initial state and b ∈ M is the bad state. ρ is a nondeterministic function mapping a state and an event to a set of states. The monitor starts out with M = {m0}. Every time the monitor is fed an event e, the new states are determined by

M = ⋃m∈Mρ(m, e). If at any point the bad state is encountered, a property violation has been

detected as defined by that particular property monitor.

2.3 Consistent permutations

PTA works by observing a single execution of a multithreaded program and subsequently ana-lyzing possible permutations of that execution [13]. Not all permutations are looked at, but only those that could have actually been produced by the program. The challenge is determining which those permutations are. A set of rules is needed to define what type of permutations are allowed to be made and which are not. First of all, it is easy to see that events within the same thread cannot be permuted, because these always happen in the same order. A more nuanced case is when one thread writes to a shared variable and another thread then reads from that variable. These events should not be permuted either, because this could cause a different value to be read. Then if for example an if statement gets executed dependant on that variable, it

(10)

could have resulted in the program being executed along an entirely different path, creating a whole new set of events.

These examples illustrate the importance of causality. To define this explicitly we start with the definition that e l e′if either of these is true:

• e′_{directly follows e in the same thread}

• e is the latest write to shared variable x before e′_{reads from x}

We define ≺ as the transitive closure of l (meaning that e l e′l e′′ implies e ≺ e′′). Then the relation ≺, in essence, signifies causal dependence between events.

The relation ≺ is sufficient for preserving the order of causally dependant events. However, it should also be ensured that no new causal dependencies are introduced. For example, a read event from x that used to be causally unrelated to a write event to x should not be put right after that write event. More precisely, a write to x followed by all reads from x until the next write should be viewed as an atomic set, meaning that no reads or writes to x from outside that set can be interleaved into it. If e and e′ belong to the same atomic set for variable x, this is denoted by e ⇕xe′. The atomic set of variable x an event e belongs to is denoted by [e]x.

If a permutation of an observed execution preserves the order of causally dependent events and respects the atomicity rules, then that permutation is said to be consistent with the execution.

2.4 Vector clock algorithm

Algorithm 1 is used to indicate the causality and atomicity relations between events of a given trace [13]. For every event e this algorithm will output a vector V . The values in this vector signify that e′V [i]_i is the last event of thread ti satisfying e′≺e . The reason this is useful is that given two events ei and e′j with their outputted vectors V and V′, it is then the case that:

V [i] ≤ V′[i] ⇔ e ≺ e′

To specify the atomicity relations between events, a simple counter cx can be maintained for

every variable x. This counter will be incremented on every write to that variable. If two events e and e′both access the same variable x and they output atomicity identifiers cx and c′x, then:

(cx=c′x) ⇔ (e ⇕xe′)

The algorithm works by having a vector associated with every thread ti, denoted by Vi. These

vectors are referred to as vector clocks (VCs) and are continually updated. In addition to threads, every shared variable x also has its own VC, denoted by Vx. The max function and ≤ function

are defined on VCs as:

• max(V, V′_{)[i] = max(V [i], V}′_[i])

(11)

Algorithm 1: Vector clock algorithm for ek_i in E do

Vi[i] = Vi[i] + 1

if ek_i is a write of shared variable x then Vx=Vi

cx=cx+1

Output {ek

i, i, Vi, x, cx}

if ek

i is a read of shared variable x then

Vi=max(Vi, Vx) Output {ek_i, i, Vi, x, cx} else

Output {ek_i, i, Vi, , −1}

2.5 Computation lattice

Consider picking events one by one from E to construct a consistent permutation. To do this, at each step, it has to be known which events can be safely picked without violating the consistency rules. This is dependant on which events have already been chosen. Clearly an event e cannot be added unless all events e′≺e have been added, because otherwise there is an e′that happens after e, violating e′≺e. Additionally, if an atomic set of x has not fully been added yet, an event belonging to a different atomic set of x cannot be added.

More formally, we say that an event e is enabled for a set of events Σ (called a cut) iff the following conditions are met [13]:

• For all e′∈E, if e′≺e then e′∈Σ.

• For all e′∈Σ, if e and e′both access x and e ⇕̸xe′ then [e′]x⊆Σ.

Let Σ Σ′denote that there exists an e ∈ E − Σ which is enabled for Σ and satisfies Σ ∪ e = Σ′. Let ∗be the transitive closure of . Every cut Σ for which ∅ ∗Σ, together with the relation , then form a structure called the computation lattice. The level of a cut Σ in this lattice is simply ∣Σ∣. Every path in the lattice starting from ∅ and ending at E corresponds to a consistent execution.

2.6 Computation lattice traversal

To analyze all consistent runs, the computation lattice has to be constructed and traversed [13]. The messages that were created by the vector clock algorithm are stored in Q. Algorithm 2 then maintains a a set of cuts (CurrLevel) that are present in the current level of the computation lattice. For each event e in Q it will try to create new cuts from the cuts in CurrLevel and e. The cuts which are successfully created will then be added to N extLevel. After this, CurrLevel gets set to N extlevel, N extLevel gets emptied and the algorithm repeats to construct the next level.

Algorithm 2: Main traversal loop while Q do

for m ∈ Q and Σ ∈ Currlevel do if enabled(Σ, m) then

N extLevel = N extLevel ⊎ newCut(Σ, m, Q) Q = removeM essages(CurrLevel, Q)

CurrLevel = N extLevel N extLevel = ∅

(12)

The data structure Σ does not actually contain events. Instead, it is restricted to only contain necessary information. It consists of:

• A vector clock V C(Σ), representing the latest event of each thread belonging to Σ. • An atomic identifier map AI(Σ) that maps every shared variable x to the atomic identifier

of the corresponding unfinished atomic set in Σ.

• A set of monitor states M(Σ), used to detect property violations.

The function enabled(Σ, m) that is used above checks if the event e contained in message m is enabled for Σ. It does this by first checking if e′≺e implies that e′∈Σ by comparing VCs. It also checks if the atomic identifiers match, if these exist. If both these conditions are satisfied, the function will return true. This function is defined in algorithm 3.

Algorithm 3: enabled(Σ, m) boolean enabled(Σ, m):

let m consist of {e, i, V, x, c}

if V C(Σ)[i] + 1 ≠ V [i] then return false for j ∈ [1 ... ∣V ∣] do

if j ≠ i and V [j] > V C(Σ)[j] then return false

if c ≥ 0 and AI(Σ)[x] ≥ 0 and AI(Σ)[x] ≠ c then return false return true

Messages m are of the form {e, i, V, x, c}, as defined by algorithm 1. The function newCut(Σ, m, Q) creates a new cut Σ′from Σ and e. First the VC and AI from Σ are copied to Σ′and V C(Σ′)[i] is incremented by one. Then it sets AI(Σ′)[x] to c if Σ′still contains an incomplete atomic set for x. Finally, the new monitor states are determined based on M(Σ′)and e. If the bad state b is among these states, the algorithm has detected a property violation. The pseudo-code of the newCut(Σ, m, Q) function is shown in algorithm 4.

Algorithm 4: newCut(Σ, m, Q) cut newCut(Σ, m, Q):

let m consist of {e, i, V, x, c} V C(Σ′) =V C(Σ)

V C(Σ′)[i] + + AI(Σ′) =AI(Σ) if c ≥ 0 then

for {e′, i′, V′, x′, c′} ∈Q do

if x == x′ and c == c′ and V′_{V C(Σ}′)then AI(Σ′)[x] = c

break for s ∈ M(Σ) do

M(Σ′) = M(Σ′) ∪ρ(s, e)

if b ∈ M(Σ′)then output ”property violated” return Σ′

(13)

CHAPTER 3

Extending GoCART to support buffered

channels

The PTA techniques expressed in chapter 2 are based on a concurrency model in which threads communicate with each other through shared variables. GoCART extended this concurrency model to also include unbuffered channels, waitgroups and mutex locks. In this chapter, a broader model is considered in which threads can communicate through buffered channels as well. The PTA techniques and algorithms are extended in order to support this concurrency model.

3.1 The GoCART architecture

A diagram of GoCART’s architecture is shown in Figure 1. First logging statements are added to input program. This happens in the program instrumentation part. The resulting instru-mented program is executed in order to produce the execution trace. This trace information gets formatted into events, from which the redundant events are filtered. It is also made sure the events appear in the order they happened. The causality deduction and property checking sections are where the vector clock algorithm and computation lattice are implemented, respec-tively. GoCART comes with two standard property monitors for detecting data races and leaking Goroutines. The highlighted parts are modified in order to support buffered channels. The pro-gram instrumentation part is modified to make sure the creation of buffered channels is logged in the trace. A small change is also made to the event formatting part to allow buffered channel creation events to contain a size component. The causality deduction and property checking sections are also modified in order to support buffered channels.

(14)

Figure 1: GoCART’s architecture [10]

3.2 Consistent permutations

The introduction of buffered channels requires the definition of consistent permutations to be adjusted. This is because there exist certain causality relation between sends and receives to buffered channels. To do this, it should be noted that a buffered channel b works as a FIFO queue, from which it follows that the nth send to b is always read by the nth receive from from b. With that in mind, to ensure that the receive events still read the same values, the following two rules are added to the existing consistency rules:

• If e is the nth_{send to buffered channel b and e}′_{is the n}th

receive from b, then e l e′. • If the sends to buffered channel b are permuted, then the receives have to be similarly

permuted.

3.3 Vector clock algorithm

The first rule can be easily implemented into the vector clock algorithm, since it is just an exten-sion of the l relation. As discussed in chapter 2, when it is the case that e l e′, the vector of e should be passed on to e′. In the case of shared variables this was done by maintaining a vector Vx for every shared variable x. Then a write event to x copies its vector to Vxand the following

read events read this vector from Vx. A different dynamic is at play with buffered channels, since

the vector of the nth _{send to b should only be passed on to the n}th _{receive from b. To achieve}

this, every buffered channel will have a queue of vectors associated with it, denoted by Qb. A

send to b will then add its vector to Qb, whereas a receive from b will pop a vector from Qb.

(15)

sends and receives have happened to b.

Secondly, every receive event from a buffered channel will output an additional vector U . This vector is somewhat similar to V . To reiterate, the vector V of event e indicates the latest events e′ from each thread for which e′ ≺ e. The difference is that U indicates latest receive events instead of events in general. More precisely, for a receive event e from a buffered channel b, the vector U indicates the latest events e′from each thread for which e′≺e and e′is a receive from b. In order to produce these vectors U the algorithm has to maintain matrices in addition to the regular vectors. The reason matrices are needed is because the latest receives from every buffered channel need to be kept track of, not just those from a single buffered channel, therefore in addition to vector Vi a matrix Mi will be maintained. For a buffered channel b the relevant

vector in that matrix can then be accessed by Mi[b]. Because of the introduction of these

ma-trices, in addition to the vector-holding queues Qb, matrix-holding queues Q′b are also needed.

With the addition of these two attributes, the algorithm now outputs messages of the form {e, i, V, x, c, U, n} instead of {e, i, V, x, c}.

Algorithm 5: Adjusted Vector clock algorithm for ek

i in E do

Vi[i] = Vi[i] + 1

if ek_i is a send over buffered channel b then Q′_b.push(Mi) Qb.push(Vi) nsend_b =nsend_b +1 Output {ek i, i, Vi, b, −1, , nsendb } else if ek

i is a receive from buffered channel b then

Mi[b][i] = Vi[i]

Mi=max(Mi, Q′b.pop())

Vi=max(Vi, Qb.pop())

nrec_b =nrec_b +1

Output {ek_i, i, Vi, b, −1, Mi[b], nrecb }

else if ek_i is a write of shared variable x then Mx=Mi Vx=Vi cx=cx+1 Output {ek i, i, Vi, x, cx, , −1} else if ek

i is a read of shared variable x then

Mi=max(Mi, Mx) Vi=max(Vi, Vx) Output {ek_i, i, Vi, x, cx, , −1} else Output {ek_i, i, Vi, , −1, , −1}

3.4 Computation lattice

To enforce the rule that the receives from a buffered channel have to be similarly permuted as the sends are, the computation lattice algorithm is modified in two ways.

First of all, to determine whether a send event is enabled it is not only checked if adding this event itself causes a violation of the consistency rules, but also if the associated reordering of the receive events would cause a violation of the consistency rules. To achieve this, vector clocks are associated with every buffered channel b, denoted by V CS(Σ). A particular one of these

(16)

vector clocks is accessed by V CS(Σ)[b]. This vector clock keeps track of each thread’s latest receive event from b. To check if a send to b is enabled it is then checked if the vector U of the corresponding receive event satisfies U ≤ V CS(Σ)[b].

Secondly, to determine whether a receive event from buffered channel b is enabled it is also checked whether that is the receive event that was expected from b at that time. This is done by maintaining a queue for every buffered channel. These queues contain the identifiers of sends whose corresponding receives have not happened yet and are denoted by QS(Σ). To determine if a receive from buffered channel b is enabled, it is checked if this event’s identifier n is next in line in the queue QS(Σ)[b].

Algorithm 6 shows the modified enabled function of the computation lattice algorithm.

Algorithm 6: enabled(Σ, m) boolean enabled(Σ, m):

let m consist of {e, i, V, x, c, U, n}

if c ≥ 0 and AI(Σ)[x] ≥ 0 and AI(Σ)[x] ≠ c then return false if e is a send over a buffered channel b then

if QS(Σ)[b].size() == bcapacitythen

return false for m′∈Q do

let m′consist of {e′, i′, V′, b′, c′, U′, n′} if e ≠ e′ and b == b′ and n == n′then

if U′≤V CS(Σ)[b] then V CS(Σ)[b][i′] =V′[i′] QS(Σ)[b].push(n) return true else return false

if e is a receive from a buffered channel b then if QS(Σ)[b].peek() ≠ n then

return false QS(Σ)[b].pop() return true

3.5 Simplified algorithm

The algorithm can be simplified in two ways. The causality rules will remain the same, but they are implemented differently for the sake of simplicity and efficiency. As a reminder the added causality rules are:

• If e is the nth_{send to buffered channel b and e}′_{is the n}th

(17)

enforces the first rule. This is because a receive will always be made to happen after its corre-sponding send, since it is checked if the id is in the queue. This means that it is not necessary to implement the first rule into the vector clock algorithm, so this can be removed.

In the computation lattice algorithm, when considering whether a send is enabled or not, we preemptively check whether the resulting reordering of receives would violate any of the causal-ity rules. This is done in order to only analyze consistent permutations of the trace. But in reality this preemptive checking is not necessary. Instead the question whether a receive event is enabled or not can just be checked once the algorithm actually gets to this event. It should be noted that with this method it is inaccurate to say that only consistent permutations of the entire trace are analyzed. It is now more accurate to say that the consistent permutations of any prefixes of the trace are analyzed. The important thing is that still only sequences of events are analyzed that could have actually been produced by the program. So everything that had to do with the preemptive checking can be removed, including the production of the vectors U in the vector clock algorithm and the V CS(Σ) data structure in the computation lattice.

With all these elements removed, the simplified algorithms are shown in Algorithm 7 and Algo-rithm 8.

Algorithm 7: Simplified vector clock algorithm for ek

i in E do

Vi[i] = Vi[i] + 1

if ek

i is a send over buffered channel b then

nsend_b =nsend_b +1 Output {ek

i, i, Vi, b, −1, nsendb }

else if ek

i is a receive from buffered channel b then

nrec b =n rec b +1 Output {ek i, i, Vi, b, −1, nrecb }

else if ek_i is a write of shared variable x then Vx=Vi

cx=cx+1

Output {ek_i, i, Vi, x, cx, −1}

else if ek_i is a read of shared variable x then Vi=max(Vi, Vx) Output {ek i, i, Vi, x, cx, −1} else Output {ek i, i, Vi, , −1, −1}

Algorithm 8: Simplified enabled function boolean enabled(Σ, m):

let m consist of {e, i, V, x, c, n}

if c ≥ 0 and AI(Σ)[x] ≥ 0 and AI(Σ)[x] ≠ c then return false if e is a send over a buffered channel b then

if QS(Σ)[b].size() == bcapacity then

return false QS(Σ)[b].push(n)

if e is a receive from a buffered channel b then if QS(Σ)[b].peek() ≠ n then

return false QS(Σ)[b].pop() return true

(18)

(19)

CHAPTER 4

Experiments and Results

This chapter describes the experiments used to test GoCART’s ability to detect incorrectness in programs utilizing buffered channels. Two different types of incorrectness are investigated: data races and leaking Goroutines. Both of these are implemented through the use of property monitors. A data race is defined as two events from different threads accessing the same shared variable simultaneously, where at least one of them is a write. Leaking Goroutines are Goroutines that have not finished executing yet when the main Goroutine finishes.

The test cases are constructed such that the buffered channels play a critical role in the generation of the permutation space. The experiments test whether incorrectness occurring in consistent permutations of the recorded run is detected. Furthermore, it is tested whether incorrectness in inconsistent permutations of the recorded run is correctly left undetected. This is done to rule out false positives.

The results are compared to the Go race detector [3] and goleak [4] tools for data races and leaking Goroutines respectively.

4.1 Data race experiments

For the detection of data races, the experiments are divided in three categories, each dealing with different aspects of buffered channels. The first category deals with the general functioning of buffered channels and is about which sends correspond to which receives. The second category explores the fact that a receive from an empty channel will block until a send happens. The last category explores the fact that a send to a full channel will block until a receive happens. For each of these categories a test case is constructed where a data race exists and one is constructed where no data races exist.

4.1.1 General

Listing 1 shows a program where no data races can occur on the race variable. This is because if line 8 and line 14 happened simultaneously, it would mean that line 13 happened before line 9. But in that case the receive event on line 13 would have received the value 1, causing line 14 to not be executed in the first place.

(20)

1 var r a c e int 2 var c h a n n e l = m a k e ( c h a n int , 10) 3 4 c h a n n e l < - 1 5 c h a n n e l < - 2 6 7 go f u n c () { 8 r a c e = 7 9 < - c h a n n e l 10 } 11 12 go f u n c () { 13 if ( < - c h a n n e l == 2) { 14 r a c e = 8 15 } 16 }

Listing 1: No data race

Listing 2 shows a modified version of listing 1. The only difference here is that the two sends to the channel now happen in separate Goroutines. Because of this, a data race on the race variable now does exist in the program. This is because if line 9 happens before line 5, it now is possible for line 18 to happen before line 14 and still evaluate to true.

1 var r a c e int 2 var c h a n n e l = m a k e ( c h a n int , 10) 3 4 go f u n c () { 5 c h a n n e l < - 1 6 } 7 8 go f u n c () { 9 c h a n n e l < - 2 10 } 11 12 go f u n c () { 13 r a c e = 7 14 < - c h a n n e l 15 } 16 17 go f u n c () { 18 if ( < - c h a n n e l == 2) { 19 r a c e = 8 20 } 21 }

Listing 2: Data race

Because of the nature of PTA it could depend on the observed trace whether a data race is detected or not. Since we want deterministic results, a fixed trace must be chosen. The situation of interest here is when the race = 8 part actually happens and appears in the trace. Therefore, in these two test cases it is assumed that the statements happen from top to bottom. This is enforced in practice through the use of time.sleep statements.

(21)

1 var r a c e int 2 var c h a n n e l = m a k e ( c h a n int , 10) 3 4 go f u n c () { 5 r a c e = 7 6 c h a n n e l < - 2 7 } 8 9 go f u n c () { 10 < - c h a n n e l 11 r a c e = 8 12 }

Listing 4 shows a modified version of listing 3. The only difference is the added send state-ment at the beginning of the program. Because of this, the buffered channel will be non-empty and so the receive event will not block. Therefore a data race is possible in this test case.

1 var r a c e int 2 var c h a n n e l = m a k e ( c h a n int , 10) 3 4 c h a n n e l < - 1 5 6 go f u n c () { 7 r a c e = 7 8 c h a n n e l < - 2 9 } 10 11 go f u n c () { 12 < - c h a n n e l 13 r a c e = 8 14 }

4.1.3 Blocking send

The buffered channel in Listing 5 is full after the two sends. Therefore the third send will block until the receive at line 14 happens. Because of this, no data races exist in the program.

1 var r a c e int 2 var c h a n n e l = m a k e ( c h a n int , 2) 3 4 c h a n n e l < - 1 5 c h a n n e l < - 2 6 7 go f u n c () { 8 c h a n n e l < - 3 9 r a c e = 7 10 } 11 12 go f u n c () { 13 r a c e = 8 14 < - c h a n n e l 15 }

Listing 4 shows a modified version of listing 3. The difference is that the size of the buffered channel has been increased to 3. Because of this the send event will not block and a data race is possible in this program.

(22)

1 var r a c e int 2 var c h a n n e l = m a k e ( c h a n int , 3) 3 4 c h a n n e l < - 1 5 c h a n n e l < - 2 6 7 go f u n c () { 8 c h a n n e l < - 3 9 r a c e = 7 10 } 11 12 go f u n c () { 13 r a c e = 8 14 < - c h a n n e l 15 }

4.2 Leaking Goroutine experiments

For the detection of leaking Goroutines, the experiments are divided in similar categories. Again, a category is investigated where the leaking of a Goroutine is determined by whether a receive operation will block or not. In the other category the same thing is investigated, only with blocking send operations.

4.2.1 Blocking receive

In Listing 7 the outer routine has to wait for the inner function to finish. This is because the receive blocks until the send happens, which is the last event in the inner function. So this program will not have any leaking Goroutines.

1 var c h a n n e l = m a k e ( c h a n int , 10) 2 3 go f u n c () { 4 c h a n n e l < - 2 5 } 6 7 < - c h a n n e l

Listing 7: No leaking Goroutines

In Listing 8 on the other hand, the receive will not block, since it will immediately receive the value ”1” that was sent through the channel. So a leaking Goroutine is possible in this example, because the outer routine does not have to wait for it to finish.

1 var c h a n n e l = m a k e ( c h a n int , 10) 2 3 c h a n n e l < - 1 4 5 go f u n c () { 6 c h a n n e l < - 2 7 } 8 9 < - c h a n n e l

(23)

1 var c h a n n e l = m a k e ( c h a n int , 2) 2 3 c h a n n e l < - 1 4 c h a n n e l < - 2 5 6 go f u n c () { 7 < - c h a n n e l 8 } 9 10 c h a n n e l < - 3

Listing 9: No leaking Goroutines

Listing 10 is the same as Listing 9, except for the fact that the buffered channel size is now 3 instead of 2. Because of this the third send does not have to wait for the receive to happen. Therefore a possible leaking Goroutine does exist in this program.

1 var c h a n n e l = m a k e ( c h a n int , 3) 2 3 c h a n n e l < - 1 4 c h a n n e l < - 2 5 6 go f u n c () { 7 < - c h a n n e l 8 } 9 10 c h a n n e l < - 3

Listing 10: Leaking Goroutine

4.3 Experimental setup

The following versions and settings are used for our experiments:

• Go version go1.15.5 linux/amd64 including the built-in Go race detector

• The experiments are run on a machine with 4 cores and the GOMAXPROCS setting is set the default value, which equals the number of cores

• goleak version 1.1.0

4.4 Results

The tests are run on both versions of GoCART that we have presented in Chapter 3, including the simplified version. Since the results are the same, they will be presented in a single column. In order to give an overview of the results, the test cases are divided into two categories: the ones in which no possible violations exist and the ones where possible violations do exist. Table 1 compares the responses of GoCART and Go race detector given the test cases in which no possible data races exist and Table 2 shows their responses to the test cases in which data races do exist.

Test case Data race reported by... Go race detector GoCART

Listing 1 YES NO

Listing 3 NO NO

Listing 5 NO NO

(24)

Test case Data race reported by... Go race detector GoCART

Listing 2 YES YES

Listing 4 YES YES

Listing 6 YES YES

Table 2: Results of tests where data races are possible

Similarly, for the detection of leaking Goroutines, Table 3 shows goleak’s and GoCART’s re-sponses to the test cases in which no leaking Goroutines are possible. Table 4 shows their responses to the test cases in which leaking Goroutines are possible. In this table each test case is additionally divided into a version in which the leak actually happens in the execution and one in which it does not.

Test case Leaking Goroutine reported by... goleak GoCART

Listing 7 NO NO

Listing 9 NO NO

Table 3: Results of tests where leaking Goroutines are not possible

Test case Leak occurs in execution

Leaking Goroutine reported by... goleak GoCART Listing 8 YES YES NO Listing 10 YES NO Listing 8 NO NO YES Listing 10 NO YES

(25)

CHAPTER 5

Discussion

In this chapter the results of the data race and leaking Goroutine experiments are discussed. The differences between the tools are discussed in relation to these results. Finally, we reflect on some of the ethical aspects of this work.

5.1 Data race detection

The results of the data race experiments show that in each of the cases where data races are present, both GoCART and the Go race detector were able to detect them. For the cases where no data races are possible, GoCART reports no false positives. The Go race detector however, does give a false positive in the case of Listing 1. The other experiments do show that the Go race detector is able to correctly recognize the blocking behaviour of buffered channels and is able to distinguish between programs in which data races are or are not possible as a result of this blocking behaviour. This shows that the detector does at least partially support the use of buffered channels as a synchronization construct. The false positives are likely due to the ThreadSanitizer API that the Go race detector makes use of, which is known to report false positives in some cases [14].

Regardless, the results imply that GoCART offers more comprehensive support for buffered channels. It is more accurate than the Go race detector in situations where buffered channels are used in the manner as shown in Listing 1.

5.2 Leaking Goroutine detection

GoCART’s ability to detect leaking Goroutines was compared to that of goleak. In the cases where no possible leaking Goroutines exist, both GoCART and goleak reported no false positives. In the test cases where possible leaking Goroutines do exist, the question whether this leak-ing Goroutine will actually occur when executleak-ing the program depends on the thread schedule and is nondeterministic. That is why two versions of these test were investigated. By manipu-lating the thread schedule through the use of time.Sleep statements it is ensured that in one version the Goroutine will leak and in the other it will not.

The results show that when the Goroutine actually does leak when executing, goleak is able to detect it while GoCART is not. Conversely, in the versions where the Goroutine does not leak when executing, GoCART is able to predict that it could leak in a different execution, while goleak is not.

The reason GoCART is unable to detect the leaking Goroutine when it actually occurs has to do with logging. When the main routine finishes, the program exits and logging stops. Because of

(26)

this the leaking events go undetected and as a result GoCART might not be able to detect the leaking Goroutine. The reason goleak is unable to predict leaking Goroutines that did not actu-ally occur in the execution is of course because goleak does not employ predictive trace analysis. These results suggest that it might be useful to use GoCART and goleak in tandem. This will result in more coverage, as these tools tackle each other’s weaknesses.

5.3 Ethics

The purpose of GoCART is to detect potential errors in concurrent computer programs. Our contribution of added support for buffered channels makes it a more universal detection tool that can be applied to a wider variety of computer programs. This makes it more effective in reducing unwanted behaviour exhibited by computer programs. This also means that these programs will be less vulnerable to attacks which aim to exploit this unwanted behaviour. So it will lead to safer and more secure systems for society to rely on. On the other hand, malicious software might also be strengthened using this testing, making it harder to take down. We conclude that the tool we developed is as ethical as its user.

(27)

CHAPTER 6

Related Work

Runtime verification has been an ongoing area of research over the past 20 or so years [9]. A certain subset of this research concerns itself with Predictive Trace Analysis. In this section a number of PTA tools will be discussed, as well as some standard testing tools for the Go language.

6.1 Java MultiPathExplorer

The work that introduced the concept of PTA did so by creating a prototype tool called Java MultiPathExplorer (JMPaX) based on their foundational techniques [12]. JMPaX allows for the prediction of property violations in multithreaded Java programs.

The causality model used in JMPaX defines e l e′ as follows: • e′ directly follows e in the same thread

• or e′accesses shared variable x after e does and at least one of these events is a write A permutation of the recorded trace is considered to be consistent with that trace if the l rela-tion is preserved.

JMPaX lets users define incorrectness through the use of past time linear temporal logic.

6.2 Java MultiPathExplorer 3.0

In later research the predictive capabality of JMPaX was strengthened in order to increase cov-erage [13]. This was done by adjusting the causality model, which allows for more valid runs to be predicted.

The new causality model defines e l e′as follows: • e′ _{directly follows e in the same thread}

• or e is the latest write to shared variable x before e′_{reads from x}

The relation ≺ is then defined as the transitive closure of l. In addition to this, a write to x followed by all reads from x until the next write should be viewed as an atomic set, meaning that no reads or writes to x from outside that set can be interleaved into it. If e and e′belong to the same atomic set for variable x, this is denoted by e ⇕xe′. A permutation of the recorded

(28)

Whereas the previous version used temporal logic to define incorrectness, here nondetermin-istic automata are used. This allows for a more general way of expressing incorrectness, since temporal logics and regular expressions are just special cases of this.

6.3 TraceFilter

A challenge for PTA techniques is scaling to larger traces. Therefore a method is presented for identifying and removing redundant events from the trace, thereby improving the scalability of PTA without affecting the quality of the results [7]. The number of redundant events that exist in the trace varies highly across different programs, but in one case TraceFilter was able to identify 99.9% of events as redundant and remove them.

6.4 jPredictor

jPredictor is another PTA tool for detecting concurrency errors in Java programs [1]. Like the other tools, it also allows for the generic definition of properties to be checked. It uses a concept called sliced causality to achieve more coverage. This works by additionally analyzing parts of inconsistent runs, namely the parts unaffected by any inconsistencies. jPredictor was able to discover Previously unknown data races and atomicity violations in popular open source programs.

6.5 PECAN

PECAN is a PTA tool that focuses on a criterion which they call persuasiveness [6]. This means that the tool not only detects errors, but also helps the programmer in understanding how these errors occurred. This is done by producing a concrete execution in which the predicted violation occurs. PECAN uses a graph based approach, where the nodes of the graph represent events and the edges represent causal relations between them.

6.6 GPredict

PTA techniques are often designed to detect low-level errors like data races. GPredict, on the other hand, is able to detect more high-level errors, such as a collection being modified while being iterated over [5]. Another difference compared to other PTA approaches, is that GPredict does not require a global trace of events as input but only the set of traces produced by each thread. This reduces the runtime overhead.

6.7 Go Race Detector

Go has a built in tool called Go Race Detector that identifies data races in Go programs [3]. It is based on the C/C++ ThreadSanitizer runtime library [14]. ThreadSanitizer is a sort of data race detection API that allows the user to inform the detector about synchronization used in the program. It is specifically tailored to detect data races, in contrast to other more general PTA

(29)

garbage collection. This does lower average memory consumption but is not helpful in worst-case scenarios.

(30)

(31)

CHAPTER 7

Conclusion

While concurrent programming languages such as Go offer great support for writing concurrent programs, it is still easy for programmers to make subtle mistakes when doing so. GoCART set out to predict unwanted behaviour in Go programs by means of Predictive Trace Analysis. While GoCART was able to achieve this, it still lacked support for some concurrency primitives, such as buffered channels.

In this work GoCART is extended in order to support buffered channels. This is done by modifying the definition of consistent permutations to take buffered channels into account. With this new definition in mind, the algorithms used in GoCART are modified to effectively generate these permutations and analyze them simultaneously.

To test whether GoCART is effective in detecting incorrectness in programs utilizing buffered channels, various test cases demonstrating different mechanisms of buffered channels are consid-ered. GoCART’s ability to detect data races in these instances is compared to that of the Go race detector. The results show that GoCART works as intended. Furthermore, it turns out that in a particular type of case, the Go race detector gives a false positive while GoCART does not, i.e. the Go race detector reports a data race even though it is impossible to occur. Additionally GoCART’s ability to detect leaking Goroutines in these cases is compared to that of goleak. The results confirm that GoCART has trouble detecting leaking Goroutines when they actually leak in the recorded execution, but is able to predict them when they do not [10]. Conversely, goleak only detects leaking Goroutines when they occur and has no predictive ability.

7.1 Future work

Even though GoCART’s causality model was extended to support buffered channels, there are still some possible concurrency constructs left which are not yet supported by GoCART. These are a part of Go’s sync standard library. While the sync.Mutex and sync.WaitGroup primitives are already supported, the sync library also contains a range of other concurrency primitives, such as sync.Cond or sync.Locker for example.

Another area of improvement is the filtering of redundant events. GoCART already does this to some extent, however there are several ways to expand [7]. This can aid in reducing computation time because there will be less events to permute and therefore less permutations to explore.

(32)

(33)

Bibliography

[1] Feng Chen, Traian Florin Serbanuta, and Grigore Rosu. “jPredictor”. In: Proceedings of the 13th international conference on Software engineering - ICSE ’08 (2008). doi: 10. 1145/1368088.1368119.

[2] Daniel Schnetzer Fava and Martin Steffen. “Ready, set, Go!: Data-race detection and the Go language”. In: Science of Computer Programming 195 (2020), p. 102473. doi: 10.1016/ j.scico.2020.102473.

[3] _{Go Race Detector. url: https://golang.org/doc/articles/race_detector.html.} [4] _{goleak. url: https://github.com/uber-go/goleak.}

[5] Jeff Huang, Qingzhou Luo, and Grigore Rosu. “GPredict: Generic Predictive Concurrency Analysis”. In: 2015 IEEE/ACM 37th IEEE International Conference on Software Engi-neering (2015). doi: 10.1109/icse.2015.96.

[6] Jeff Huang and Charles Zhang. “Persuasive prediction of concurrency access anomalies”. In: Proceedings of the 2011 International Symposium on Software Testing and Analysis -ISSTA ’11 (2011). doi: 10.1145/2001420.2001438.

[7] Jeff Huang, Jinguo Zhou, and Charles Zhang. “Scaling predictive analysis of concurrent programs by removing trace redundancy”. In: ACM Transactions on Software Engineering and Methodology 22.1 (2013), pp. 1–21. doi: 10.1145/2430536.2430542.

[8] Leslie Lamport. “Time, clocks, and the ordering of events in a distributed system”. In: Communications of the ACM 21.7 (1978), pp. 558–565. doi: 10.1145/359545.359563. [9] Martin Leucker and Christian Schallhart. “A brief account of runtime verification”. In: The

Journal of Logic and Algebraic Programming 78.5 (2009), pp. 293–303. doi: 10.1016/j. jlap.2008.08.004.

[10] Jesse Postema. “GoCART: Determining incorrectness in concurrent Go programs”. BSc Thesis. UvA, 2020. url: https://scripties.uba.uva.nl/search?id=714682.

[11] Thomas. Rauber and Gudula. R¨unger. Parallel programming: for multicore and cluster systems. Springer-Verlag, 2010. doi: 10.1007/978-3-642-37801-0.

[12] Koushik Sen, Grigore Rosu, and Gul Agha. “Runtime safety analysis of multithreaded programs”. In: ACM SIGSOFT Software Engineering Notes 28.5 (2003), pp. 337–346. doi: 10.1145/940071.940116.

[13] Koushik Sen, Grigore Ro¸su, and Gul Agha. “Detecting Errors in Multithreaded Programs by Generalized Predictive Analysis of Executions”. In: Lecture Notes in Computer Science (2005), pp. 211–226. doi: 10.1007/11494881_14.

[14] Konstantin Serebryany and Timur Iskhodzhanov. “ThreadSanitizer”. In: Proceedings of the Workshop on Binary Instrumentation and Applications - WBIA ’09 (2009). doi: 10. 1145/1791194.1791203.

Enabling GoCART to detect incorrectness in concurrent Go programs containing buffered channels

Bachelor Informatica