Understanding Real-‐World Concurrency Bugs in Go
Tengfei Tu1, Xiaoyu Liu2, Linhai Song1, and Yiying Zhang21Pennsylvania State University
2Purdue University
1
Golang
• A young but widely-‐used programming lang.
• Designed for efficient and reliable concurrency– Provide lightweight threads, called goroutines– Support both message passing and shared memory
Lightweight threads (goroutines) Memory
message
2
Massage Passing vs. Shared Memory
Message Passing Shared Memory
Thread 1 Thread 2 Thread 1 Thread 2
Memory
Concurrency Bug
3
Does Go Do Better?
• Message passing better than shared memory?
• How well does Go prevent concurrency bugs?
4
• Collect 171 Go concurrency bugs from 6 apps– through manually inspecting GitHub commit log
• How we conduct the study?– Taxonomy based on two orthogonal dimensions• Root causes and fixing strategies
– Evaluate two built-‐in concurrency bug detectors
The 1st Empirical Study
BlotDB
171 Real-‐World Go Concurrency Bugs
Behavior
××××
Cause
shared memory
message passing
blocking non-‐blocking5
• Message passing can make a lot of bugs– sometimes even more than shared memory
• 9 observations for developers’ references• 8 insights to guide future research in Go
Highlighted Results
6
Outline
• Introduction• A real bug example• Go concurrency bug study– Taxonomy– Blocking Bug– Non-‐blocking Bug
• Conclusions
7
Outline
• Introduction• A real bug example• Go concurrency bug study– Taxonomy– Blocking Bug– Non-‐blocking Bug
• Conclusions
• Introduction• A real bug example• Go concurrency bug study– Taxonomy– Blocking Bug– Non-‐blocking Bug
• Conclusions
8
Message Passing in Go
• How to pass messages across goroutines?– Channel: unbuffered channel vs. buffered channel– Select: waiting for multiple channel operations
ch <-‐ mm <-‐ ch
Goroutine 1 Goroutine 2
ch <-‐ mm <-‐ ch
Goroutine 1 Goroutine 2
unbuffered channel buffered channel
select {case <-‐ ch1:…
case <-‐ ch2:…
}
select
Non-‐deterministic
9
An Example of Go Concurrency Bug
func finishRequest(t sec) r object { ch := make(chan object)go func() result := fn() ch <-‐ result
}()select {case result = <-‐ ch:return result
case <-‐ time.timeout(t):return nil
}} //Kubernetes#5316
Parent Goroutine
10
An Example of Go Concurrency Bug
func finishRequest(t sec) r object { ch := make(chan object)go func() result := fn() ch <-‐ result
}()select {case result = <-‐ ch:return result
case <-‐ time.timeout(t):return nil
}} //Kubernetes#5316
Parent Goroutine
11
An Example of Go Concurrency Bug
func finishRequest(t sec) r object { ch := make(chan object)go func() result := fn() ch <-‐ result
}()select {case result = <-‐ ch:return result
case <-‐ time.timeout(t):return nil
}} //Kubernetes#5316
go func() result := fn() ch <-‐ result
}()
Child GoroutineParent Goroutine
12
An Example of Go Concurrency Bug
func finishRequest(t sec) r object { ch := make(chan object)go func() result := fn() ch <-‐ result
}()select {case result = <-‐ ch:return result
case <-‐ time.timeout(t):return nil
}} //Kubernetes#5316
go func() result := fn() ch <-‐ result
}()
timeout signal
blocking and goroutine leak
Child GoroutineParent Goroutine
13
An Example of Go Concurrency Bug
func finishRequest(t sec) r object { ch := make(chan object)go func() result := fn() ch <-‐ result
}()select {case result = <-‐ ch:return result
case <-‐ time.timeout(t):return nil
}} //Kubernetes#5316
go func() result := fn() ch <-‐ result
}()
, 1)
not blocking any more
Child GoroutineParent Goroutine
14
New Concurrency Features in Go
func finishRequest(t sec) r object { ch := make(chan object)go func() result := fn() ch <-‐ result
}()select {case result = <-‐ ch:return result
case <-‐ time.timeout(t):return nil
}} //Kubernetes#5316
anonymousfunction
buffered channel vs.unbuffered channel
use select to wait for multiple channels
15
New Concurrency Features in Go
func finishRequest(t sec) r object { ch := make(chan object)go func() result := fn() ch <-‐ result
}()select {case result = <-‐ ch:return result
case <-‐ time.timeout(t):return nil
}} //Kubernetes#5316
anonymousfunction
buffered channel vs.unbuffered channel
use select to wait for multiple channels
16
Outline
• Introduction• A real bug example• Go concurrency bug study– Taxonomy– Blocking Bug– Non-‐blocking Bug
• Conclusions
• Introduction• A real bug example• Go concurrency bug study– Taxonomy– Blocking Bug– Non-‐blocking Bug
• Conclusions
17
• Categorize bugs based on two dimensions– Root cause: shared memory vs. message passing
message passing
Bug Taxonomy
func finishRequest(t sec) r object { ch := make(chan object)go func() result := fn() ch <-‐ result
}()select {case result = <-‐ ch:return result
case <-‐ time.timeout(t):return nil
}} //Kubernetes#5316
channel
Cause
shared memory
18
• Categorize bugs based on two dimensions– Root cause: shared memory vs. message passing– Behavior: blocking vs. non-‐blocking
Bug Taxonomy
Behavior
××××
Cause
shared memory
message passing
blocking non-‐blocking
func finishRequest(t sec) r object { ch := make(chan object)go func() result := fn() ch <-‐ result
}()select {case result = <-‐ ch:return result
case <-‐ time.timeout(t):return nil
}} //Kubernetes#5316
blocking
19
• Categorize bugs based on two dimensions– Root cause: shared memory vs. message passing– Behavior: blocking vs. non-‐blocking
Bug Taxonomy
CauseBehavior
blocking non-‐blocking
sharedmemory 36 69
message passing 49 17
85 86
105
66 Behavior
××××
Cause
shared memory
message passing
blocking non-‐blocking
20
Concurrency Usage Study
0%
20%
40%
60%
80%
100%
Docker Kubernetes etcd CockroachDB gRPC BoltDB
Shared Memory Message Passing
Observation: Share memory synchronizations are used more often in Go applications.
21
Outline
• Introduction• A real bug example• Go concurrency bug study– Taxonomy– Blocking Bug– Non-‐blocking Bug
• Conclusions
• Introduction• A real bug example• Go concurrency bug study– Taxonomy– Blocking Bug– Non-‐blocking Bug
• Conclusions
22
• Conducting blocking operations– to protect shared memory accesses– to pass message across goroutines
0
5
10
15
20
25
30
Mutex Wait RWMutex Chan Chan w/ Lib
BoltDB
gRPC
CockroachDB
etcd
Kubernetes
Docker
Root Causes
Shared Memory Message Passing
23
(mis)Protecting Shared Memory
0
5
10
15
20
25
30
Mutex Wait RWMutex Chan Chan w/ Lib
BoltDB
gRPC
CockroachDB
etcd
Kubernetes
Docker
Observation: Most blocking bugs caused by shared memory synchronizations have the same causes as traditional languages.
24
Misuse of Channel
0
5
10
15
20
25
30
Mutex Wait RWMutex Chan Chan w/ Lib
BoltDB
gRPC
CockroachDB
etcd
Kubernetes
Docker
ch <-‐ mm <-‐ ch
Goroutine 1 Goroutine 2
blocking
25
Misuse of Channel with Lock
0
5
10
15
20
25
30
Mutex Wait RWMutex Chan Chan w/ Lib
BoltDB
gRPC
CockroachDB
etcd
Kubernetes
Docker
func goroutine1() {m.Lock()ch <-‐ request
m.Unlock()}
func goroutine2() {for {m.Lock()m.Unlock()request <-‐ ch
}}
26
Observation
0
5
10
15
20
25
30
Mutex Wait RWMutex Chan Chan w/ Lib
BoltDB
gRPC
CockroachDB
etcd
Kubernetes
Docker
20% More
Observation: more blocking bugs in our studied Go applications are caused by wrong message passing.
27
Implication
0
5
10
15
20
25
30
Mutex Wait RWMutex Chan Chan w/ Lib
BoltDB
gRPC
CockroachDB
etcd
Kubernetes
Docker
20% More
Implication: we call for attention to the potential danger in programming with message passing.
28
Outline
• Introduction• A real bug example• Go concurrency bug study– Taxonomy– Blocking Bug– Non-‐blocking Bug
• Conclusions
• Introduction• A real bug example• Go concurrency bug study– Taxonomy– Blocking Bug– Non-‐blocking Bug
• Conclusions
29
0
10
20
30
40
50
traditional anon. waitgroup lib chan misc
BoltDB
gRPC
CockroachDB
etcd
Kubernetes
Docker
• Failing to protect shared memory• Errors during message passing
Root Causes
Shared Memory
Message Passing
30
0
10
20
30
40
50
traditional anon. waitgroup lib chan misc
BoltDB
gRPC
CockroachDB
etcd
Kubernetes
Docker
Traditional Bugs
> 50%
31
0
10
20
30
40
50
traditional anon. waitgroup lib chan misc
BoltDB
gRPC
CockroachDB
etcd
Kubernetes
Docker
Misusing Channel
close(ch)ch <-‐ m
Thread 1 Thread 2
panic!
close(ch)close(ch)
Thread 1 Thread 2
panic!
32
Implication
Implication: new concurrency mechanisms Go introduced can themselves be the reasons of more concurrency bugs.
330
10
20
30
40
50
traditional anon. waitgroup lib chan misc
BoltDB
gRPC
CockroachDB
etcd
Kubernetes
Docker
• 1st empirical study on go concurrency bugs– shared memory vs. message passing– blocking bugs vs. non-‐blocking bugspaper contains more details (contact us for more)
• Future works– Statically detecting go concurrency bugs• checkers built based on identified buggy patterns• Already found concurrency bugs in real applications
Conclusions
34
35
Thanks a lot!
Questions?
Behavior
××××
Cause
shared memory
message passing
blocking non-‐blocking
BlotDB
171 Real-‐World Go Concurrency Bugs
Data Set: https://github.com/system-‐pclub/go-‐concurrency-‐bugs
36
Top Related