GoCART: Determining incorrectness in concurrent Go programs

(1)

Bachelor Informatica

GoCART: Determining

incor-rectness in concurrent Go

pro-grams

Jesse Postema

June 15, 2020

Supervisor(s): dr. A.M. Oprescu & D. Fr¨olich BSc

Inf

orma

tica

—

Universiteit

v

an

Ams

terd

am

(2)

(3)

Abstract

While modern technologies and languages, for example Go, have made it easier for programmers to write concurrent programs, concurrency bugs such as data races are still notoriously hard to solve, with traditional coverage testing often falling short in tackling such issues and model testing taking up too much resources thanks to the state space explosion problem. A compromise between the two was created in Predictive Trace Analysis (PTA). While plenty of PTA techniques exist for traditional languages such as Java, no such tool exists yet for Go. We present GoCART : a PTA tool for Go with a causality model specific to Go’s concurrency primitives. By default, it has support for detecting data races as well as leaking Goroutines, a type of concurrency bug specific to Go. It is also extendable through user-defined property monitors. Through experiments we show that GoCART is capable of detecting incorrectness that does not necessarily appear in the recorded trace. Due to the nature of PTA however, we are unable to detect leaking Goroutines that do appear in the recorded trace. GoCART, like all PTA techniques, excels especially in scenarios where multiple Goroutines concurrently access the same data.

(4)

Acknowledgement

First and foremost I would like to thank Ana Oprescu and Damian Fr¨olich, for their supervision, positivity and advice throughout this entire project. Their guidance has been of great help to me, and this thesis certainly would not have been possible if it were not for them.

Secondly I want to thank Nima Motamed, who despite the limited time in his schedule was able to provide me with ample feedback and several tips that I will be able to put to good use in all my future academic endeavours.

Finally I express my thanks to my loving girlfriend Hester Verdenius, who not only helped me through advice and feedback, but also looked after me and kept my spirits up during the past few months.

(5)

1 Introduction 7 1.1 Research question . . . 7 1.2 Contributions . . . 8 1.3 Thesis outline . . . 8 2 Theoretical background 9 2.1 Go concurrency model . . . 9 2.1.1 Goroutines . . . 9 2.1.2 Channels . . . 9 2.1.3 Sync library . . . 10 2.2 Go trace package . . . 11 2.3 Causality . . . 11 2.3.1 Consistent permutations . . . 12

2.3.2 Vector clock algorithm . . . 12

2.4 Computation lattice . . . 13

2.4.1 Analysis . . . 13

2.4.2 Computation lattice algorithm . . . 13

2.5 Types of incorrectness . . . 15 3 Related work 17 3.1 Causality models . . . 17 3.2 Permutation calculation . . . 18 3.3 Types of incorrectness . . . 18 4 Approach 19 4.1 Trace generation . . . 20 4.1.1 Instrumentation . . . 20 4.2 Trace formatting . . . 21 4.2.1 Event filtering . . . 21

4.2.2 Channel event ordering . . . 22

4.3 Causal dependency deduction . . . 22

4.3.1 Causality model . . . 22

4.3.2 Consistent permutations . . . 23

4.3.3 Vector clock algorithm . . . 23

4.4 Predictive property checking . . . 25

4.4.1 Computation lattice algorithm . . . 25

4.4.2 Property violations . . . 25

5 Experiments and results 27 5.1 Experimental setup . . . 27

5.2 Data race detection experiment . . . 27

5.2.1 No synchronization primitives test case . . . 27

(6)

5.2.3 sync.Mutex controlled test case . . . 29

5.2.4 Reversed access order . . . 30

5.2.5 No data races to be detected . . . 31

5.3 Leaking Goroutine detection experiment . . . 32

5.3.1 Shared variable access . . . 32

5.3.2 Added causal dependency . . . 32

5.3.3 Infinite loops . . . 33

5.3.4 Waiting on a channel . . . 33

6 Discussion 35 6.1 Data race detection experiment . . . 35

6.2 Leaking Goroutine detection experiment . . . 36

6.3 Ethical considerations . . . 36

6.4 Threats to validity . . . 37

6.4.1 Complete instrumentation . . . 37

6.4.2 Correct ordering of trace events . . . 37

6.4.3 Correct notion of causality . . . 37

6.5 Real world applicability . . . 37

7 Conclusion 39 7.1 Future work . . . 39

7.1.1 Missing causality definitions . . . 39

7.1.2 Relaxed causality definitions . . . 40

7.1.3 Additional property monitors . . . 40

(7)

CHAPTER 1

Introduction

Go (also known as Golang) is an open source programming language developed by a team at Google and others [1] over the last ten years. It is a compiled language and features type safety (through static typing), memory safety (by disallowing pointer arithmetic), and garbage collec-tion and is sometimes referred to as “C for the 21st century” [2]. Being such a modern language, concurrency is treated as a first-class citizen of the language and is achieved through Goroutines, which are a lightweight alternative to threads. This makes writing concurrent programs in Go an easy task, relative to other, more traditional languages.

However, one of the biggest challenges when it comes to writing concurrent programs is finding concurrency bugs or incorrectness. This is mainly due to the non-determinism caused by thread scheduling. This means that through traditional testing methods which focus on code coverage, concurrency bugs such as race conditions might not be exposed at all, while still being present in the production environment [3].

To tackle this problem, a testing method was developed: model testing, which executes the program using all possible thread schedules, thus completely eliminating the problem of non-determinism. However, due to the state space explosion [4] inherent in larger concurrent pro-grams, model checking quickly becomes an expensive technique to use, requiring a lot of resources. In order to reduce the time and resources consumed for testing, a compromise between tra-ditional testing and model testing was developed: Predictive trace analysis (PTA) [5]. This technique requires the test to run only a single time, just like in traditional testing, but analyses a subset of all possible thread schedules, namely those that are consistent with the recorded run. This makes it faster and thus more scalable than model checking, at the cost of lower coverage in terms of interleavings and the ability to prove a system correct.

1.1 Research question

Multiple tools for PTA have been developed [5][3][6][7], but none have support for Go at the time of writing. The research question of this project is: “How can we leverage trace analysis to detect incorrectness in concurrent Go programs?” and along with answering this question, we focus on creating GoCART, a framework that uses this technique, adapted to the Go language. To help answer this question, smaller subquestions have been defined:

1. Which events in the trace are required for finding incorrectness in concurrent programs? An execution trace is full of all kinds of events, for instance those related to Go’s garbage collector, which are not necessarily helpful when trying to find incorrectness. Filtering out the unneeded events provides a clearer search space.

2. How can we define consistent permutations in the context of Go’s concurrency model? PTA uses the consistent permutations as the search space for detecting incorrectness. A causality model, which is used to define what consistent permutations are, will need to be constructed based on Go’s concurrency primitives.

(8)

3. How effective is GoCART at finding incorrectness? To determine the usefulness of the framework, its effectiveness at finding incorrectness needs to be tested. We can achieve this, for example, by serving it sources that have a known incorrectness, and checking whether or not GoCART correctly reports them.

1.2 Contributions

The following contributions are presented in this work:

First and foremost, a causality model is presented that captures (a subset of) Go’s concurrency primitives, thereby making this the first PTA approach that, to our knowledge, is applicable to the Go language. This model encapsulates dynamic Goroutine creation, access to shared variables, unbuffered channels, sync.WaitGroups and sync.Mutex locks.

Secondly, we present GoCART, a framework that implements this causality model into a functional PTA tool. GoCART comes with data race and leaking Goroutine property monitors, and is extendable through user-defined custom property monitors.

1.3 Thesis outline

Chapter 2 explains the concurrency model of Go and its trace package as well as the founda-tions of PTA and the types of incorrectness commonly found in concurrent programs. Related approaches in the field of PTA are discussed in Chapter 3. Chapter 4 goes into detail about the design and implementation of GoCART, including its causality model and incorrectness it can detect. Experiments that show the workings and limitations of GoCART are shown in Chapter 5, along with comparisons with current state of the art tools. Chapter 6 contains the discussion and we end with a conclusion as well as future work in Chapter 7.

(9)

CHAPTER 2

Theoretical background

This chapter describes the Go concurrency model (and trace package), the fundamental algo-rithms and theory behind PTA, and the most commonly found incorrectness in concurrent (Go) programs. Together this forms all required background knowledge for this project.

2.1 Go concurrency model

Go was designed with concurrency in mind, and that is very clear from the language syntax. The keyword go for example is used to start a new Goroutine. Here is a rundown of Go’s most important concurrency related syntax.

2.1.1 Goroutines

Goroutines are a lightweight alternative to traditional threads and can be started using the key-word go:

go foo(bar, baz)

starts a Goroutine that runs

foo(bar, baz)

Note that memory space is shared between Goroutines, allowing for shared variables. Com-munication through shared variables is not recommended, however, as Go embraces a concept called: ‘Do not communicate by sharing memory; instead, share memory by communicating.’ [8] To achieve this, Go introduces another keyword, chan, used for channels.

2.1.2 Channels

Channels in Go are used to communicate between Goroutines.

ch := make(chan int)

creates a channel that accepts only communication of integer values. To send a value v over a channel, the '<-' operator is used:

(10)

To receive a value over the channel, the same '<-' operator is used, but the location of the channel and the variable are switched:

v := <- ch

Channels are blocking until both sides of the transaction are ready, meaning that execution on the ’send’ side of the communication cannot continue until a valid ’receive’ side presents itself, and vice versa. Channels can be extended with a buffer:

ch := make(chan int, 5)

The buffer works as a FIFO queue and a buffered channel will only block a ’send’ when its buffer is full and a ’receive’ if the buffer is empty.

2.1.3 Sync library

For more concurrency control, Go provides a package called sync that offers additional tools, such as mutual exclusion locks. For the scope of this project we consider only two tools provided by the sync package, WaitGroup and Mutex.

sync.WaitGroup

A WaitGroup is designed to wait for a group of Goroutines to finish. This is achieved using the Add, Done and Wait functions:

var wg sync.WaitGroup

creates a new WaitGroup. The Add function then determines how many Goroutines we are waiting for:

wg.Add(5)

The Done function signals to the WaitGroup that a Goroutine has finished:

wg.Done()

The Wait function finally blocks the Goroutine it is called in until the amount of Done calls on the WaitGroup equals that of the amount determined in the Add call(s):

wg.Wait()

sync.Mutex

A Mutex (mutual exclusion) lock is a concurrency control device more commonly found in other languages and therefore usually more familiar for programmers new to Go. A Mutex lock can be locked or unlocked.

(11)

creates a new unlocked Mutex lock. To Lock a Mutex lock, the Lock function is used:

m.Lock()

If the Mutex lock is already in a locked state when the Lock method is called, this call blocks until the lock is released. To unlock a Mutex lock, the Unlock method is called:

m.Unlock()

If the Mutex lock is not in a locked state when the Unlock method is called, a runtime error is thrown.

2.2 Go trace package

The Go trace package [9] is meant to generate traces of execution events for Go tool trace to interpret. It captures a wide range of events, including Goroutine creation, various forms of blocking and unblocking (such as blocking on a channel receive or network), system call related events as well as garbage collection. These events get a nanosecond-precise timestamp and a stack trace. Go tool trace visualizes these events (as seen in Figure 2.1) so the user can do performance analysis on them. Additionally, the package offers support for user annotation in the form of manually added log statements, which generate UserLog events in the trace.

Figure 2.1: go tool trace output

2.3 Causality

A PTA algorithm does analysis over all possible permutations that are consistent with the recorded run, meaning those permutations cannot violate the observed causal dependency par-tial order. So, in order to determine which permutations can be analysed, a causal dependency must be established. Consider a programming language that achieves concurrency using shared variables and Mutex locks. Then, the causal dependency can be defined as follows [3]:

A multithreaded program is a sequence of events e ∈ E and consists of threads t1, t2, ...tn.

We use ej_i to denote the jth event in threads ti. We say eji happens before e 0j0

i0 if:

• i == i0 _{AND j + 1 == j}0_{, meaning both events happen in the same thread and e is the}

last event that happened in that thread before e0.

• For a shared variable x, e is the last write of x before e0 _{and e}0 _{is a read of x.}

The partial order ≺ is then defined as the transitive closure over all happens before relations. Additionally any write of shared variable x followed by reads of that variable must be regarded as atomic with respect to other reads and writes of x that are not part of that sequence. This

(12)

ensures that all reads of x have the same value as in the recorded trace. Mutex locks can simply be considered as shared variables, with an acquired lock being modeled by a write of that variable, and a release being modeled by a read, so that they cannot be permuted (which is exactly the goal of a lock). These atomicity rules are defined as m.

2.3.1 Consistent permutations

Now that we have defined ≺ and m, we can define consistent permutations as follows [3]: Any linearization of events e ∈ E is called a consistent permutation, or consistent multithreaded run, iff it does not violate ≺ or m.

2.3.2 Vector clock algorithm

To calculate all happens before relations of a given trace, an algorithm based on vector clocks [10] can be used [5][3].

Let a vector clock V : T hreadId → N be a total map from thread identifiers to natural numbers, where V [t] = 0 when thread t does not (yet) exist for V . For every thread ti we define a vector

clock Vi and for every shared variable x we define a vector clock Vx. For two vector clocks V

and V0 we define the following: • V ≤ V0 _{iff V [i] ≤ V}0_{[i] for all i}

• V < V0 _{iff V ≤ V}0 _{and V [i] < V}0_{[i] for some i}

• max(V, V0_{) = max(V [i], V}0_{[i]) for all i}

Additionally, let an atomicity identifier be a counter cxfor shared variable x, with cx= 0 being

the initial value. Then, an atomic set is defined as all operations on x that have the same value of cx.

Consider a trace of events e, with ej_i to denote the jth event in thread ti. The following algorithm

is defined that processes the events in order: Algorithm 1: Vector clock algorithm

Vi = Vi[i] + 1;

if ej_i == a write of shared variable x then Vx= Vi;

cx= cx+ 1;

return message hej_i, i, Vi, x, cxi;

else if ej_i == a read of shared variable x then Vi = max(Vi, Vx);

else

return message hej_i, i, Vi, ⊥, −1i;

So on a write of a shared variable the vector clock of the variable is updated to that of the current tread, the atomicity identifier is upped, creating a new atomic set and a message is emitted that includes the current event, counter, timestamp (vector clock), variable and atomic set. On a read of a shared variable the vector clock is updated with the maximum of its own clock and that of the variable and again a message is emitted that includes the current event, counter, timestamp (vector clock), variable and atomic set. Any other events emit a similar message, without the variable and atomic set.

The manipulation of the atomicity identifier ensures that all reads following a certain write of a shared variable are in the same atomic set. The manipulation of the vector clocks ensures that events within the same thread are ordered. Additionally it ensures that reads and writes of a shared variable are also ordered using the following definitions:

For two messages e = he, i, V, x, ci and e0 = he0, j, V0, x0, c0i we say e ≺ e0 _{iff V [i] ≤ V}0_[i].

Next to that we say e mxe0, meaning e and e0 are said to be in the same atomic set, iff x = x0

(13)

2.4 Computation lattice

To check for property violations in all consistent permutations we can define a computation lattice [3] whose paths all represent a consistent execution. We define the following:

• A cut Σ ⊆ E s.t. ∀ti if e j

i ∈ Σ then e j0

i ∈ Σ for all j0 < j. Σj1j2...jn denotes the cut

containing the latest events ej1

1 , e j2

2 , ..., ejnn of all threads. If no events exist (yet) for a given

thread, jn is 0.

• A cut Σ is consistent if ∀e, e0 _{∈ E:}

• if e ∈ Σ and e0_{≺ e then e}0_{∈ Σ}

• if e, e0 _{∈ Σ are both events on shared variable x within different atomic sets, one of}

the two atomic sets must be a subset of Σ. • An event ej

i is enabled for a consistent cut Σ iff Σ ∪ e j

i is also consistent.

• let Σ0 _{= Σ}00...0 _{be the consistent cut before starting computation. Then a consistent}

multithreaded run, or consistent permutation e1e2...en generates a sequence of consistent

cuts Σ0_Σ1_...Σn _{s.t. for all 1 ≤ r ≤ n, Σ}r−1 _{is consistent, e}

r is enabled for Σr−1 and

Σr_{= Σ ∪ e} r.

• Σ Σ0_{when there exists a consistent multithreaded run in which Σ and Σ}0_{are consecutive}

cuts. The set of all consistent cuts with relation ∗, the reflexive transitive closure of , forms a computation lattice with all consistent runs forming paths starting from Σ0 _and

ending in the final cut, where all events from the trace have been executed.

2.4.1 Analysis

To analyse the computation lattice, we define a monitor [3] as a tuple hM, m0, b, pi:

• M is the set of states. • m0∈ M is the initial state.

• b ∈ M is the final state, or ’bad state’.

• p : M × E → 2M _{is a non-deterministic transition relation s.t. p(b, e) = b for all e ∈ E.}

These monitors can then be defined for all properties that need to be checked (for instance data races) and evaluated by ’stepping through’ the computation lattice, updating it for every new event added to the cuts.

2.4.2 Computation lattice algorithm

To calculate all consistent permutations, we can define a computation lattice algorithm [3] as follows: Let Q be the set of messages we are analysing. We generate the computation lattice level by level, starting with a level that only contains Σ0. While Q 6= ∅, for every cut c in the current level and for every message m ∈ Q, check if m is enabled for c. If that is the case, generate a new cut for the next level c ∪ m and save all property violations it produces. After all cuts in the current level have been checked, filter out all messages m ∈ Q that have happened in every cut. This process is shown in Algorithm 2.

(14)

Algorithm 2: Main computation lattice algorithm [3] currentLevel = Σ0_; nextLevel = ∅; violations = ∅; while Q 6= ∅ do for c ∈ currentLevel do for m ∈ Q do if enabled(c, m) then

newCut, newV iolations = createCut(c, m); nextLevel = nextLevel ∪ newCut;

violations = violations ∪ newV iolations; end end end Q = f ilter(Q, currentLevel); currentLevel = nextLevel; nextLevel = ∅; end return violations;

To check whether a message is enabled for a certain cut, we use an enabled function, based on the definition of enabledness (see Section 2.4). This process is shown in Algorithm 3. Let Vc

be the current vector clock of cut c and let Vmbe the timestamp of message m, carrying event

ej_i. Additionally, let AIc be a map Object → N of atomicity identifiers per object for c and let

o be the object of m, with atomicity identifier s. Algorithm 3: Enabled function [3]

for T hreadId, V alue ∈Vc do

if T hreadId == i then if j 6= V alue + 1 then

return f alse; end

else

if value ≤ Vm[T hreadId] then

return f alse; end

end end

if AIc[o] > 0 ∧ s > 0 ∧ AIc[o] 6= s then

return f alse; end

return true;

The createCut function creates a new cut based on the old cut. It takes the old cut’s vector clock and advances it by one on the Thread of the new message mi with vector clock timestamp,

T hreadId. It takes the old cuts atomicity identifier map as well. If there is still a message left that this cut has not gotten to yet that operates on the same object as the current message, o, and both of them have the same atomicity identifier s, the atomicity identifier of this object in the map gets updated to the value of the current messages atomicity identifier. Else, it gets set to zero. Finally, the new cut gathers all monitors of the old cut, updates them using the current message, and returns all created property violations. This process is shown in Algorithm 4.

(15)

Algorithm 4: createCut function [3] newCut.V C = oldCut.V C;

newCut.V C[T hreadId] = newCut.V C[T hreadId] + 1; newCut.AI = oldCut.AI;

violations = ∅; if s > 0 then

setIncomplete = f alse; for m ∈ Q do

if m.o == mi.o ∧ m.s == mi.s ∧ ¬m.timestamp ≤ mi.timestamp then

setIncomplete = true; end end if setIncomplete then newCut.AI[o] = mi.s; else newCut.AI[o] = 0; end end

for monitor ∈ oldCut.M onitors do newM onitor = monitor.step(mi);

newCut.M onitors = newCut.M onitors ∪ newM onitor; if newM onitor.error then

violations = violations ∪ newM onitor.error; end

end

return newCut, violations;

filter function

The filter function filters out all messages m ∈ Q that have been traversed by all cuts in the current level. This is done by creating a minimal timestamp t = min(Vc1, Vc1, ...Vcn) for all

ci∈ currentLevel. All messages m s.t. m ≤ t are filtered out.

2.5 Types of incorrectness

The most notable access anomalies in concurrent programs are data races, atomicity violations and atomic-set serializability violations [11]. In addition to deadlocks, these seem to be the most common concurrency bugs detected by PTA techniques [7]. This section describes them, as well as Goroutine leaking, a type of incorrectness specific to Go programs.

A data race occurs when two directly subsequent events ej_i and e0j_i00 s.t. i 6= i0 both access

the same variable x, with at least one of them being a write of x.

An atomicity violation happens when one thread accesses variable x through events ej_i and e00j_i 00, but between their executions another event e0j_i00 s.t. i 6= i0 accesses x and from both e and

e0 as well as e0 and e00 at least one of them is a write of x.

Atomic-set serializability [12] is a notion designed as a more advanced definition of a data race that is able to detect problems with concurrent data access beyond what a traditional data race definition is capable of detecting. It is split into 11 problematic interleaving scenarios, each able to detect a different type of access anomaly.

When for a set of threads T s.t. |T | > 1 it holds that for every ti∈ T , tiis waiting for release

of lock l, held by ti0 ∈ T s.t. i 6= i0 there is a deadlock on all t ∈ T .

Due to the nature of Go’s design, the execution of a Go program ends when the main Gorou-tine exits. Leaking GorouGorou-tines are those that have not finished up execution when the main Goroutine exits. These leaking Goroutines take up memory that cannot be released until the program finishes.

(16)

(17)

CHAPTER 3

Related work

In the research done on Predictive Trace Analysis (PTA), the main differences in approach are in the causality model used, the way permutations are calculated and what incorrectness is considered. We describe a few different approaches when it comes to those facets in the following sections.

3.1 Causality models

Different causality models have been proposed in various works on PTA.

The research [5] that introduces the concept of PTA uses the following causality model: A multithreaded program is a sequence of events e ∈ E and consists of threads t1, t2, ...tn.

Let ej_i denote the jth event in threads ti. Then eji happens before e 0j0

i0 if:

• i == i0 _{AND j < j}0_{, meaning both events happen in the same thread and e happens in}

that thread before e0.

• For a shared variable x, both e and e0 _{access x, with at least one of them being a write of}

x, and in the recorded execution, e happens before e0.

The work on Universal Causality Graphs [13] considers next to thread creation and destruc-tion (through f ork and join) also waitpre, waitpost and notif y operations. They use a more

relaxed notion of causality: A multithreaded program is a sequence of events e ∈ E and consists of threads t1, t2, ...tn. Let eji denote the jth event in threads ti. Then eji happens before e

0j0

i0 if:

• i == i0 _{AND j < j}0_{, meaning both events happen in the same thread and e happens in}

that thread before e0. • e has action fork(ti0).

• e0 _{has action join(t} i).

• e has action waitpre for some variable x and e0 has the matching notif y on x.

• e0 _{has action wait}

post for some variable x and e has the matching notif y on x.

Note that this approach purposely disregards variable read and write operations and therefore overestimates permutations. They tackle this problem by checking every found property violation with a satisfiability solver to make sure they only report actual violations and no false positives. An interesting differentiation in their causality model is made in GPredict [7], where they allow permutation of read and write events in any order, as long as the read events still get the same value. This means that read events are no longer bound to a certain write event such as in our design for GoCART, but rather to all write events that can provide the right value. The advantage is that this model provides a larger permutation space compared to our approach.

(18)

Disadvantages are having to keep track of the actual values being read and written in different actions and having a more complicated causality definition, which is harder to translate to other uses, such as Mutex locks.

3.2 Permutation calculation

A Universal Causality Graph [13] is introduced as an alternative to the computation lattice algorithm described in Section 2.4 and used in the design for GoCART. In this graph events represent nodes and the causal relation represents the edges. To check whether a certain property holds, more edges are added to the graph and the graph is then checked for acyclicity. If the graph is indeed acyclic, the property is said to hold. A similar graph-centered approach is taken for PECAN [6].

3.3 Types of incorrectness

The initial work on PTA [5] did not focus on finding specific types of incorrectness such as data races. Instead, they allow for user-defined properties to be defined in past time Linear Temporal Logic [14].

Later work [3] by the same authors introduces the concept of property monitors, which are defined as non-deterministic automata that contain a ’bad’ state indicating that the property has been violated. As an example of such an automaton, they define one that is able to detect data races.

The study on Universal Causality Graphs [13] defines properties as added edges to the graph. Provided are properties for two types of violations, data races and atomicity violations.

data races, atomicity violations as well as atomic-set serializability violations are considered in research [11] that focuses on the scalability of PTA algorithms, where properties that are to be checked are defined as groups of equal-length sequences of the following:

• The characteristic event sequence of the property. • The thread IDs of the events.

• The shared variables accessed by the events. • The atomic region of the events.

• The access types of the events (either read or write). The same property definition is also used for PECAN [6].

Finally, GPredict [7] introduces their own property specification language, which is based on MOP [15]. The main advantage of this comparatively complex syntax is the ability to define higher-level properties, for example “a resource must be authenticated before use”, outside of lower-level properties such as data races.

For the implementation of GoCART, we chose not to restrict properties to a certain formal specification, but rather allow the user to define his own property monitors, more in line with the earlier described automata approach. Monitors only have to comply with the interface described in Section 4.4.2, and monitors for data race detection and Goroutine leaking are included. The advantage is that the user is free in choosing what type of monitors they want to use. A disadvantage is that if the user would like to, for instance, define his properties in MOP, then they would have to construct a monitor type that can interpret MOP formulas themselves.

(19)

CHAPTER 4

Approach

In this chapter we describe the approach we took for the design of GoCART, as well as its implementation, starting with creating a trace from a program execution to finally reporting all property violations found.

Figure 4.1: Overview of GoCART

(20)

sections:

Trace generation takes as input any Go file and instruments it in a way that a trace file will be generated upon execution. trace.log calls are added throughout the file to supplement the standard set of events logged by the Go trace package. Afterwards an instrumented version of the program is executed and the trace file is given as output.

Trace formatting takes as input the trace file generated in the last step, filters out unneeded events and transforms them into objects to make them easier to work with. To prevent mistakes in causality calculation, some reordering is done on the objects. Output is a list of event objects. In the next step, Causality deduction, the happens before relation between these event objects is calculated using an adapted vector clock algorithm. A list of messages is outputted, each containing an event as well as a vector clock timestamp and atomicity identifier.

The last step, property checking, takes as input these messages in addition to a list of prop-erty monitors.Using the messages a computation lattice is computed, and the propprop-erty monitors monitor every step through the lattice. Output consists of all violations found by the various monitors.

4.1 Trace generation

To trace a program we used the Go trace package [9]. Unfortunately, most events logged by default are not directly applicable for our PTA analysis. This is because the events are geared towards performance analysis, rather than telling the exact story of what was executed when. For instance, there are no events for when something was sent/received over a channel, only when sending/receiving blocked the execution of a specific Goroutine. We had to use the package’s addition of user logs to provide most of the events that we used for analysis.

4.1.1 Instrumentation

To get the desired resulting trace, we instrument a program using the Go AST package [16] as well as the Go astutil package [17]. The program’s main function is made responsible for starting and ending the trace and creating the file to write the trace to. Furthermore user logs were added for the following cases:

• Assignment, increment and decrement statements. These are considered the ’write’ oper-ations to variables and are logged as such.

• Any use of a variable identifier, as long as they are not of type channel, sync.WaitGroup or sync.Mutex, as they are handled separately. Variable types are calculated using the Go Types package [18]. These are considered ’read’ operations.

• Send statements get a user log both before and after the statement, a ’pre-send’ and ’post-send’ one. These are later used to correctly implement the causality with the corresponding receive event.

• Go doesn’t have receive statements in the same way it implements send statements. Rather, a channel receive is only an expression, and can thus be used more flexibly than send state-ments. For the scope of this project, we decided to only support singular channel receive operations on the right hand side of an assignment statement. This means that programs might have to be slightly altered manually (for instance using a temporary variable) in order to be instrumented properly. These are the ’receive’ operations.

• add(), done() and wait() operations on a sync.WaitGroup object. • lock() and unlock() operations on a sync.Mutex object.

• Range and For statements. A range statement in Go is a for statement with a range clause. Both types of statements write to a loop variable in every iteration, and the for statement might need to read a variable for its condition as well.

(21)

• End of main Goroutine. A defer statement is added to the main function that logs an event that denotes the end of execution of the main Goroutine. This is added in order to determine when the main Goroutine has ended, and assists in finding leaking Goroutines. This instrumentation can increase the size of the source program significantly. Some of the source programs used for testing more than doubled in size, but it needs to be said that we consider them to be worst case examples when it comes to size expansion, as they are designed in a way that practically every single line in the program requires some kind of added logging. After instrumentation is done an instrumented version of the source program is built and run, which generates a trace file that we can then format and analyse.

4.2 Trace formatting

Using the source code of the Go internal trace package [19] (as Go forbids importing internal packages) we parse the generated trace and loop over the events. Since most of the information we are interested in is logged using UserLog events, we only consider the EvGoCreate and EvUserLog event types that the parsed trace gives us. For each of these events we create our own event object that consists of the following properties:

• Goroutine identifier. We save the Goroutine that the event executed in here.

• Timestamp. We order the events by timestamp before we feed them to our vector clock algorithm.

• Category. This determines the type of event. It can be one of the following: • Goroutine creation.

• read and write for operations on shared variables. • presend, postsend and receive for operations on channels. • add, done and wait for operations on WaitGroups. • lock and unlock for operations on Mutex locks. • end of main Goroutine.

• Object. This holds the Goroutine id (in case of Goroutine creation) or address of what was affected by the operation.

• Position. The position in the source code of what operation we are logging.

4.2.1 Event filtering

By this point of the process we have a list of event objects. However, not all of these events are relevant for our analysis. For instance, access events on a local variable cannot be responsible for concurrency bugs, because local variables do not get accessed concurrently. To speed up the analysis, we filter out the irrelevant events first. For this we apply two types of filters: non-shared object filtering and irrelevant Goroutine creation filtering.

Non-shared object filtering. It is not a simple task to determine a priori which objects are shared between different Goroutines. A local variable could for instance be used by a Goroutine started through an anonymous function without being explicitly passed as a parameter. Therefore we opted for the approach of first logging all object accesses, and filtering out the irrelevant ones later. We do this by sorting all events by object they access (except for Goroutine creation and end of main Goroutine events) and then determine per object if all events on it are executed in the same Goroutine. If they are, this means that we are dealing with a local variable as opposed to a shared one and we filter these events out. For example, if we have two Goroutines, 1 (shown in Listing 1) and 2 (shown in Listing 2), and they execute the following code:

(22)

1 ... 2 var a int 3 a = 5 4 ... Listing 1: Goroutine 1 1 ... 2 var c int 3 c = a 4 ... Listing 2: Goroutine 2

Then, assuming they access the same a, before filtering we would have a write event on a from Goroutine 1, a read on a from Goroutine 2 and a write on c from Goroutine 2. Because in this subset of events all accesses of a are shared between Goroutine 1 and 2, we do not filter those out. However, the only access on c is from Goroutine 2, which means it will get filtered out.

Irrelevant Goroutine creation filtering. Goroutine creation events are the only events that we use for analysis that are not added user logs, but created by default by the trace package. This also has a drawback: events are also created for Goroutines that we do not care about for our analysis, for instance the Goroutine(s) that the trace package spawns for tracing, or the Goroutine that is responsible for garbage collection. We filter these irrelevant Goroutines out by sorting all events by Goroutine in which they execute. If there are no user logged events executed on a Goroutine, we filter it out.

4.2.2 Channel event ordering

Due to the nature of how user logs work, it is impossible to log the exact moment a channel receive (or any possibly blocking event) happens. That is why for channels we adopted the approach of a presend → receive → postsend logging form. In reality, the receive event is logged after the actual event happens, which means that in our logging we might end up with an order of presend → postsend → receive. To fix this problem, we look for any presend event and look for the next postsend and receive events on that channel. If the timestamp of the receive is after that of the postsend event, we modify the timestamp of the receive to that of the postsend − 1.

4.3 Causal dependency deduction

In order to compute the happens before relations that exist in a given execution trace, we need a theoretical causality model as well as a definition of consistent permutations and a vector clock algorithm that implements it. In this section we describe for all of those the design we used for GoCART.

4.3.1 Causality model

For this project we use an adapted causality definition that covers the following Go primitives for concurrency:

Goroutine creation. One of the staples of the languages is dynamic creation of Goroutines. Without this primitive, no concurrent Go programs would be covered and this analysis would make little sense.

read/write to variables that are shared between Goroutines. Although this approach is not encouraged in Go, it is possible to use and a likely candidate for concurrency errors.

send/receive over channels. The de facto way of sharing information between Goroutines, channels form an important primitive within the Go language. For the scope of this project we only consider unbuffered channels. This is mainly due to the fact that a causality model that supports buffered channels would be significantly more complicated to design and would likely also require modification on the computation lattice algorithm.

sync.Waitgroups & sync.Mutex. We chose this subset of the sync standard library because of their inclusion in A tour of Go [20].

Considering these events, we can define a partial order ≺ on them as follows:

A concurrent Go program is a sequence of events e ∈ E and consists of Goroutines t1, t2, ...tn.

We use ej_i to denote the jth event in Goroutine ti. We say eji happens before e 0j0

(23)

• i == i0 _{AND j + 1 == j}0_{, meaning both events happen in the same Goroutine and e is the}

last event that happened in that Goroutine before e0. • e is the creation of Goroutine ti, and e01i .

• For a shared variable x, e is the last write of x before e0 _{and e}0 _{is a read of x.}

• For a channel c, e is the last send over c before e0 _{and e}0 _{is a receive over c.}

• For a channel c, e is a receive over c and e00j0−1

i0 is the last send over c before e.

• For a sync.Waitgroup wg, e is the last add() on wg before e0 _{and e}0 _{is a done() on wg.}

• For a sync.Waitgroup wg, e is a done() on wg since the last add() and e0 _{is an add() or}

wait() on wg.

• For a sync.Mutex m, e is the last lock() on m before e0 _{and e}0 _{is an unlock() on m.}

The partial order ≺ is then defined as the transitive closure over all happens before relations. Additionally any write of shared variable x followed by reads of that variable must be regarded as atomic with respect to other reads and writes of x that are not part of the set. This ensures that all reads of x have the same value as that same read operation in the original trace. The same goes for any send over channel c followed by its corresponding receive, any lock() on m followed by its corresponding unlock() as well as any number of add() and done() calls followed by a wait() on any wg. These atomicity rules are defined as m.

4.3.2 Consistent permutations

Now that we have defined our causality model and relations ≺ and m, we can define consistent permutations as follows: Any linearization of events e ∈ E is called a consistent permutation, or consistent multithreaded run, iff it does not violate ≺ or m.

4.3.3 Vector clock algorithm

To implement this adapted causality model, we use a similarly adapted vector clock algorithm. For every Goroutine ti we define a vector clock Vi, for every shared variable x we define a vector

clock Vx and for every WaitGroup x we define an additional vector clock Vx0. Additionally, let

an atomicity identifier be a counter cxfor shared variable x, with cx= 0 being the initial value.

Then, an atomic set is defined as all operations on shared variable x that have the same value of cx.

Consider a trace of events e, with ej_i to denote the jth event in threads ti. We define an algorithm

(24)

Algorithm 5: Vector clock algorithm Vi = Vi[i] + 1;

if ej_i == a write of shared variable x OR ej_i == a lock of Mutex lock x OR ej_i == a channel presend of channel x then

Vx= Vi;

cx= cx+ 1;

else if ej_i == a read of shared variable x OR ej_i == an unlock of Mutex lock x OR ej_i == a channel postsend of channel x then

Vi = max(Vi, Vx);

else if ej_i == a channel receive of channel x then Vi = max(Vi, Vx);

Vx= max(Vi, Vx);

else if ej_i == an add of WaitGroup x then Vi = max(Vi, Vx0);

Vx= max(Vi, Vx0);

if the last operation on x was a wait or if this is the first operation on x then cx= cx+ 1;

end

else if ej_i == a done of WaitGroup x then Vi = max(Vi, Vx);

V_x0 = max(Vi, Vx0);

else if ej_i == a wait of WaitGroup x then Vi = max(Vi, Vx0);

else if ej_i == a creation of Goroutine i0 then Vi0 = V_i;

else

Compared to the ’simple’ vector clock algorithm found in Algorithm 1, these are the most notable additions:

Firstly, for channels, a channel receive operates as if it is both a read (by updating its local clock) as well as a write (by updating the object clock). In conjunction with having the presend model a write and the postsend model a read, this ensures that for any send/receive pair we have: presend ≺ receive ≺ postsend.

Secondly, for WaitGroups, an atomic set is comprised of all events since the last wait() operation up until and including the next wait() operation. They use two different object clocks, Vx and Vx0, so that an add() models a read on Vx0 and a write on Vx, a done() models

a read on Vx and a write on Vx0 and a wait() models a read on Vx0. This ensures that done()

operations relate to the last add() operation the same way that variable reads relate to the last write and that an add() or wait() operation only happens once all done() operations of the previous add() are executed.

Finally, a Goroutine creation writes the local clock of the Goroutine it creates with its local clock to ensure that all operations on that new Goroutine happen after its creation.

(25)

4.4 Predictive property checking

With causality calculated, we can create and analyse the computation lattice to find property violations.

4.4.1 Computation lattice algorithm

Because the computation lattice algorithm defined in Section 2.4 does not depend on the causality model considered or types of property monitors used, we were able to use that definition without having to alter or modify it in any way. Important to note is that the algorithm computes all consistent permutations.

4.4.2 Property violations

In order to find property violations, we need to define property monitors. A property monitor gets as input a message per level in the computation lattice, until all messages have been received. After a message is received the monitor checks whether its property is violated by that message. If it is, it reports a violation. Because all monitors need to operate in this way and to make it easier to build more types of monitors, we define a monitor interface as follows:

type Monitor interface {

step(m Message) (Monitor, Violation, bool) copy() Monitor

equals(m Monitor) bool }

This means that any type of monitor needs to implement a step, copy and equals function. The step function is used to ’step’ the monitor through the computation lattice and takes as input a message. It updates the state of the monitor with the message and returns the updated monitor. If the message violates the monitor’s property, it returns a violation and the Boolean true value. Else, an empty violation and the Boolean false are returned.

The copy function is meant to return an exact copy of the current monitor. Note that this function will no longer have to be included in the interface when/if Go eventually releases support for Generics [21].

The equals function takes as parameter another monitor and returns whether the two mon-itors are exactly the same, save for location in memory. This function is used to reduce the total number of monitors in the computation lattice, as there is no need to raise the exact same violation multiple times.

For the scope of this project we implemented two different monitors: a data race detector and a leaking Goroutine detector. We chose these two as data race detection is the most common application for PTA techniques and Goroutine leaking is a unique problem to Go. However, due to the nature of how property monitors are set up, implementing other monitors, such as an atomicity violation detector or a custom monitor, should be a relatively simple task. The data race detector is a very simple monitor that only keeps track of the last message it receives. Because of how PTA works, if there exists a data race in a consistent run, the two accesses that make up the race will be consecutively processed by the monitor. So, for every message that it gets the monitor simply has to check if the current message, together with the previous message it received, constitutes a data race. If so, the violation is returned.

For the leaking Goroutine detector, a special log was added to the trace that indicates the end of the main function. Once that message gets processed by the monitor, every following message it receives contain events that come from leaking Goroutines, and for every leaking Goroutine a violation is returned.

(26)

(27)

CHAPTER 5

Experiments and results

To test GoCART we split our experiments into two parts: one part for data race detection, and the other for leaking Goroutine detection. For each part we are using a test set made up of cases where we compute beforehand the total amount of violations that it contains and then run them both on GoCART as well as on current state of the art solutions. For data race detection, we are using the built in Go race detector [22] for this purpose. For leaking Goroutine detection, we are using goleak [23], an open source tool developed by Uber.

5.1 Experimental setup

For our experiments the following versions and settings were used:

• Go version go1.14.2 linux/amd64, including the Go race detector version that comes stan-dard with this version.

• The GOMAXPROCS setting was not changed from the default setting (equal to the amount of cores in the system), and experiments were ran on a machine with 12 cores.

• goleak version 1.0.0 [23], which is the latest version at the time of writing.

5.2 Data race detection experiment

For our data race detection experiment, we made a test set containing five cases: one where no synchronization primitives are used to control the flow of the program, one where a sync.WaitGroup is used, another where a sync.Mutex lock is used, one in which two shared variables are accessed in reverse order by two Goroutines and finally one that has no potential data races. They are described below. All results are collected in Table 5.1.

5.2.1 No synchronization primitives test case

Our first test case consists of the main Goroutine, which spawns two different Goroutines that both access a shared variable c. After spawning the two Goroutines, the main Goroutine also accesses c. The code for this example is shown in Listing 3. Note that the time.Sleep call was added here and in other experiments to prevent the program from terminating before both spawned Goroutines could execute. 100 milliseconds proved to be enough for this purpose during our testing, but this is of course dependant on the system that the tests are ran on.

(28)

1 var c int 2

3 for _, val := range []int{4, 5} {

4 go func() { 5 c = val 6 }() 7 } 8 9 time.Sleep(100 * time.Millisecond) 10 fmt.Println(c)

Listing 3: Unsynchronized shared variable access

This code contains three threads that all access variable c concurrently once, of which only one is a read. Additionally, variable val gets written twice by the main Goroutine, and read once by both spawned Goroutines. This is shown in Listings 4, 5 and 6.

1 ... 2 val = 4 3 go 2 4 val = 5 5 go 3 6 fmt.Println(c) 7 ... Listing 4: Goroutine 1 1 ... 2 c = val 3 ... Listing 5: Goroutine 2 1 ... 2 c = val 3 ... Listing 6: Goroutine 3

Of these accesses, we can make three access pairs that can make up a data race on c: 1. Line 6 in Goroutine 1 and line 2 in Goroutine 2 (RW).

2. Line 6 in Goroutine 1 and line 2 in Goroutine 3 (RW). 3. Line 2 in Goroutine 2 and line 2 in Goroutine 3 (WW).

Furthermore, we can make a single access pair that make up a data race on val: 1. Line 4 in Goroutine 1 and line 2 in Goroutine 2 (RW).

This makes for a total of four potential data races. Note that although an access pair exists on val between line 2 in Goroutine 2 and line 2 in Goroutine 3, both of those accesses are a read (RR), which does not constitute a data race. Additionally, other access pairs cannot exist on val, as Goroutines 2 and 3 are only created after one or both of the writes on val by Goroutine 1.

Running the program using the Go race detector, the program prints out ’5’ and it reports two data races: one on val, between line 4 in Goroutine 1 and line 2 in Goroutine 2 (RW), and one on c, between line 2 in Goroutine 2 and line 2 in Goroutine 3 (WW).

Running the program using GoCART, the program prints out ’5’ and it reports all four data races.

5.2.2 sync.WaitGroup controlled test case

Our second test case looks very similar to the first, but uses a sync.WaitGroup to control the flow of the program. This also enables us to remove the time.Sleep call required in the last case. The code for this example is shown in Listing 7.

(29)

1 var c int

2 var wg sync.WaitGroup

3

5 wg.Add(1) 6 go func() { 7 c = val 8 wg.Done() 9 }() 10 } 11 12 wg.Wait() 13 fmt.Println(c)

Listing 7: sync.WaitGroup controlled shared variable access

Even though in this case we still have three Goroutines accessing the same two shared vari-ables, val and c, due to the sync.WaitGroup we no longer read c in Goroutine 1 concurrently with the writes in Goroutines 2 and 3. The concurrent part of the program is shown in Listings 8, 9 and 10. 1 ... 2 val = 4 3 go 2 4 val = 5 5 go 3 6 ... Listing 8: Goroutine 1 1 ... 2 c = val 3 ... Listing 9: Goroutine 2 1 ... 2 c = val 3 ... Listing 10: Goroutine 3

Of these accesses, we can make two access pairs that can make up a data race: 1. on c, line 2 in Goroutine 2 and line 2 in Goroutine 3 (WW).

2. on val, line 4 in Goroutine 1 and line 2 in Goroutine 2 (RW).

Note that although an access pair exists on val between line 2 in Goroutine 2 and line 2 in Goroutine 3, both of those accesses are a read (RR), which does not constitute a data race.

Running the program using the Go race detector, the program prints out ’5’ and reports both data races.

Running the program using GoCART, the program prints out ’5’ and also reports both data races.

5.2.3 sync.Mutex controlled test case

For our third test case, we again use code very similar to that of the first test case. This time, however, we use a sync.Mutex lock to control when each Goroutine can access shared variable c. The code for this example is shown in Listing 11.

(30)

1 var c int

2 var mux sync.Mutex

3

5 go func() { 6 mux.Lock() 7 c = val 8 mux.Unlock() 9 }() 10 } 11 12 time.Sleep(100 * time.Millisecond) 13 mux.Lock() 14 fmt.Println(c) 15 mux.Unlock()

Listing 11: sync.Mutex controlled shared variable access

At first sight, one might be tempted to think that there are no concurrent access pairs in this example. However, Listings 12, 13 and 14 show that there is still a possible data race: line 4 of Goroutine 1 and line 3 of Goroutine 2.

1 ... 2 val = 4 3 go 2 4 val = 5 5 go 3 6 mux.Lock() 7 fmt.Println(c) 8 mux.Unlock() 9 ... Listing 12: Goroutine 1 1 ... 2 mux.Lock() 3 c = val 4 mux.Unlock() 5 ... Listing 13: Goroutine 2 1 ... 2 mux.Lock() 3 c = val 4 mux.Unlock() 5 ... Listing 14: Goroutine 3

Running this test case on the Go race detector once again the program prints ’5’ and it reports the data race successfully. The same behaviour is seen when running it on GoCART.

5.2.4 Reversed access order

The next test case consists of only two Goroutines, both accessign two shared variables. The code for this is shown in Listing 15.

1 var c int 2 var d int 3 4 go func() { 5 c = 4 6 d = 5 7 }() 8 9 time.Sleep(100 * time.Millisecond) 10 fmt.Println(d) 11 fmt.Println(c)

Listing 15: Two shared variables accessed

As shown by Listing 16 and 17, two possible data races exist: one between line 2 in Goroutine 1 and line 3 in Goroutine 2 (RW) and another between line 3 in Goroutine 1 and line 2 in Goroutine 2 (RW).

(31)

1 ... 2 fmt.Println(d) 3 fmt.Println(c) 4 ... Listing 16: Goroutine 1 1 ... 2 c = 4 3 d = 5 4 ... Listing 17: Goroutine 2

Running this test case on the Go race detector, the program prints ’5’ followed by ’4’ and both data races get reported correctly.

When running the same case on GoCART, after the programs prints ’5’ followed by ’4’, only the data race between line 2 in Goroutine 1 and line 3 in Goroutine 2 gets reported.

5.2.5 No data races to be detected

Finally we have constructed a test case that contains no data races at all. The code for this approach is shown in Listing 18.

1 var c int

2 var mux sync.Mutex

3 4 go func() { 5 mux.Lock() 6 c = 4 7 mux.Unlock() 8 }() 9 10 go func() { 11 mux.Lock() 12 c = 5 13 mux.Unlock() 14 }() 15 16 time.Sleep(100 * time.Millisecond) 17 mux.Lock() 18 fmt.Println(c) 19 mux.Unlock()

Listing 18: No data races

In this case, only a single shared variable, c, gets accessed. However, each access is protected by a sync.Mutex lock, thus eliminating any possibility for data races to occur.

Running this test case on the Go race detector, the program prints ’5’ and correctly no data races get reported.

When running the same case on GoCART, the programs prints ’4’ and also reports no data races.

real data races Go race detector GoCART

case 1 4 2 4 case 2 2 2 2 case 3 1 1 1 case 4 2 2 1 case 5 0 0 0 total 9 7 8

(32)

5.3 Leaking Goroutine detection experiment

For our leaking Goroutine detection experiment, we constructed a test case containing four cases: one where two spawned Goroutines access a shared variable, another based on the first that adds a causal dependency between the main Goroutine and the spawned Goroutines, a third, in which two spawned Goroutines exercise an infinite loop and finally one where two spawned Goroutines wait for a channel send that never comes. A table of results can be found at the end in Table 5.2.

5.3.1 Shared variable access

For our first case we use a test that spawns two Goroutines, both of which access a shared variable. It looks very similar to the first test case of our data race detection experiment in Section 5.2.1 and the code is shown in Listing 19.

1 var c int 2 3 for i := 0; i < 2; i++ { 4 go func() { 5 c = 4 6 fmt.Println(c) 7 }() 8 } 9 10 time.Sleep(100 * time.Millisecond)

Listing 19: Shared variable access

Since the main Goroutine that spawned the two other Goroutines does not wait for them to finish, we have two potentially leaking Goroutines.

Running the program using the goleak tool, the program prints ’4’ twice and reports no leaking Goroutines.

Running the program using GoCART, the program prints ’4’ twice and reports both leaking Goroutines.

5.3.2 Added causal dependency

The second test case is very similar to the first, but includes a read of variable c at the end. The code for this case is shown in Listing 20.

1 var c int 2 3 for i := 0; i < 2; i++ { 4 go func() { 5 c = 4 6 }() 7 } 8 9 time.Sleep(100 * time.Millisecond) 10 fmt.Println(c)

Listing 20: Added causal dependency

Similarly to the first case, there are two potentially leaking Goroutines found in this example. Running the program using the goleak tool, the program prints ’4’ and reports no leaking Goroutines.

Running the program using GoCART, the program prints ’4’ and reports a single leaking Goroutine.

(33)

5.3.3 Infinite loops

This test case consists once more of a main Goroutine spawning two additional Goroutines. However, in this case, the spawned Goroutines do not access any shared variables, channels or other synchronization primitives, and instead execute infinite loops. The code for this case is shown in Listing 21. 1 var c int 2 3 for i := 0; i < 2; i++ { 4 go func() { 5 for true { 6 time.Sleep(1 * time.Millisecond) 7 } 8 }() 9 } 10 11 time.Sleep(100 * time.Millisecond) 12 fmt.Println(c)

Listing 21: Infinite loops

Similarly to the first and second case, there are two potentially leaking Goroutines found in this example. Running the program using the goleak tool, the program prints ’0’ and reports both leaking Goroutines. Using GoCART, the program prints ’0’ and reports no leaking Goroutines.

5.3.4 Waiting on a channel

The final test case is similar to the last, but instead of executing infinite loops, the spawned Goroutines now attempt to receive over a channel, but no corresponding sends over the channel are included. The code for this case is shown in Listing 22.

1 var c int

2 ch := make(chan int)

3

4 for i := 0; i < 2; i++ {

5 go func(ch chan int) {

6 c := <-ch 7 fmt.Println(c) 8 }(ch) 9 } 10 11 time.Sleep(100 * time.Millisecond) 12 fmt.Println(c)

Listing 22: Waiting on a channel

Just like in all previous cases, there are two potentially leaking Goroutines found in this example. Running the program using the goleak tool, the program prints ’0’ and reports both leaking Goroutines. Using GoCART, the program prints ’0’ and reports no leaking Goroutines.

real leaking Goroutines goleak GoCART

case 1 2 0 2

case 2 2 0 1

case 3 2 2 0

case 4 2 2 0

total 8 4 3

(34)

(35)

CHAPTER 6

Discussion

In this chapter we discuss a few subjects important to this work. We start with an evaluation of the experiments conducted in the last chapter. Then, we discuss the ethical implications of our work. Following that we describe possible threats to the validity of our work, both technical and theoretical in nature. Finally we talk about the real world applicability of GoCART.

6.1 Data race detection experiment

In this section we discuss the results of the data race detection experiment. We look at where and why GoCART and the Go race detector act differently and in which scenarios which tool is better suited.

In the first test case, GoCART was able to identify two additional data races compared to the Go race detector. This is due to the permutation space allowing for permutations in which those events happen consecutively, even if they do not in the recorded execution, which is why the Go race detector is not able to identify them. Despite the permutation constraint that the read of c in Goroutine 1 must read the value written by the write of c in Goroutine 2, there exists a consistent run in which line 6 in Goroutine 1 and line 2 in Goroutine 3 happen consecutively: this is when the write of Goroutine 3 happens directly after the read by Goroutine 1.

Both GoCART as well as the Go race detector report all data races present in the second test case, as they both show up in the recorded trace. Interesting here is that GoCART no longer reports any of the now no longer possible data races, as the happens before relation of the sync.WaitGroup limits the permutation space that is searched appropriately.

In the third test case, again, both tools report the same results, here a single data race. Note that although no data races on c are reported anymore, that does not mean that no potential bugs on that variable exist anymore. Depending on the thread schedule, the program can still print either ‘0’ (default value for an integer), ‘4’ or ‘5’. While there are no more data races on that variable, a new potential problem arises: Lock contention on lock mux.

The fourth test case is the only test case in which the Go race detector is able to detect more data races than GoCART. This is likely due to the implementation of the former being able to detect the race on a lower (assembly) level. Also, in order for the latter to construct a schedule in which line 3 in Goroutine 1 and line 2 in Goroutine 2 happen consecutively, line 3 in Goroutine 2 needs to happen after line 2 in Goroutine 1. This would violate the happens before relation those two events have, making this potential run inconsistent with the recorded execution. This shows the limitations of what a PTA approach can and cannot detect.

No false positives are reported by either tool in the fifth and final test case, but they do give different results. This shows that although no data races exist in this code, it suffers from the same problem as our test case in Section 5.2.3: lock contention on mux.

Table 5.1 shows a summary of all test cases. Overall, GoCART has a detection rate of approximately 89%, compared to the Go race detector ’s ∼ 78%. The results show that Go-CART thrives when a large permutation space can be explored such as in Section 5.2.1, but has

(36)

suboptimal results when the permutation space is very small, such as in Section 5.2.4.

6.2 Leaking Goroutine detection experiment

In this section we discuss the results of the leaking Goroutine detection experiment. We look at where and why GoCART and goleak act differently and in which scenarios which tool is better suited for detecting the incorrectness.

The first test case should function as a best case scenario for GoCART, as there are very few happens before constraints, leaving a large permutation space to be explored. Because of the fact that both Goroutines finish before the main Goroutine does, goleak is unable to detect any leaking Goroutines here. On the other hand, due to no happens before relation existing between the execution of both spawned Goroutines and the main Goroutine, GoCART was able to construct permutations in which both Goroutines finish after the main Goroutine does, creating leaking Goroutines.

With an added causal dependency between the spawned Goroutines and the main one, more happens before relations are introduced in the second test case, and as a result the permutation space is decreased in size. Due to that, GoCART was only able to construct permutations in which one of the two Goroutines finishes after the main Goroutine does, creating a single leaking Goroutine. However, it still outperforms goleak in this case, as again, both Goroutines finish before the main Goroutine does and therefore it was unable to report any leaking Goroutines.

Due to the fact that neither spawned Goroutine accesses any shared variables, channels or other synchronization primitives in the third test case, no events of these Goroutines are logged in the trace and this serves as a worst case example for GoCART because of that. Therefore, GoCART is unable to construct any permutations in which they still execute after the main Goroutine ends, as it is simply unaware of their existence. For goleak, however, both spawned Goroutines are still executing their infinite loops when the main Goroutine ends and the program terminates, creating two leaking Goroutines that it reports.

The fourth and final test case shows similar results to the previous one. Despite both spawned Goroutines attempting to receive over a channel as well as accessing a shared variable, they actually never operate on either of them: no value is received over the channel and as a result the shared variable is not written. Therefore, GoCART is unable to construct any permutations in which they still execute after the main Goroutine ends, as it is still unaware of their existence. For goleak, however, both spawned Goroutines are waiting to receive something over the channel when the main Goroutine ends and the program terminates, creating two leaking Goroutines that it reports.

Table 5.2 shows a summary of all test cases. Overall, GoCART has a detection rate of 37.5%, compared to goleak ’s 50%. The table shows clearly the strength of GoCART compared to goleak : predicting leaking Goroutines that fail to show in the recorded run. The weakness of GoCART, however, is its inability to detect leaking Goroutines in the recorded trace, which is precisely where goleak shines. This makes it an interesting proposition to use them both in tandem. Together they still will not be able to report every possibly leaking Goroutine, but their combined coverage is much larger than when using any one of them in isolation.

6.3 Ethical considerations

At first sight it might seem that there are no clear ethical consequences to the work presented here. However, a few things have to be considered.

The most prominent ethical result of reducing incorrectness in a program, which this work aims to assist in, is that it should reduce the vulnerabilities of said program. This contributes to a society where systems can be trusted more and, for instance, data breaches happen less frequently, which helps protect people’s privacy in a society where that is becoming increasingly important. Of course, there is also a drawback to this: systems being harder to breach also has as an effect that malignant systems can be more difficult to stop or take down.

One of the most important aspects of PTA is that it improves ability to find incorrectness, compared to traditional coverage testing, at a fraction of the resource cost of model testing.

GoCART: Determining incorrectness in concurrent Go programs

Bachelor Informatica