CHESS: Analysis and Testing of Concurrent Programs · 2009-11-29 · CHESS: Analysis and Testing of...

60
CHESS: Analysis and Testing of Concurrent Programs Sebastian Burckhardt, Madan Musuvathi, Shaz Qadeer Microsoft Research Joint work with Tom Ball, Peli de Halleux, and interns Gerard Basler (ETH Zurich), Katie Coons (U. T. Austin), P. Arumuga Nainar (U. Wisc. Madison), Iulian Neamtiu (U. Maryland, U.C. Riverside) Adjusted by Maria Christakis

Transcript of CHESS: Analysis and Testing of Concurrent Programs · 2009-11-29 · CHESS: Analysis and Testing of...

Page 1: CHESS: Analysis and Testing of Concurrent Programs · 2009-11-29 · CHESS: Analysis and Testing of Concurrent Programs Sebastian Burckhardt, MadanMusuvathi, ShazQadeer Microsoft

CHESS:

Analysis and Testing of

Concurrent Programs

Sebastian Burckhardt, Madan Musuvathi, Shaz Qadeer

Microsoft Research

Joint work with

Tom Ball, Peli de Halleux, and interns

Gerard Basler (ETH Zurich),

Katie Coons (U. T. Austin),

P. Arumuga Nainar (U. Wisc. Madison),

Iulian Neamtiu (U. Maryland, U.C. Riverside)

Adjusted by

Maria Christakis

Page 2: CHESS: Analysis and Testing of Concurrent Programs · 2009-11-29 · CHESS: Analysis and Testing of Concurrent Programs Sebastian Burckhardt, MadanMusuvathi, ShazQadeer Microsoft

Concurrent Programming is HARD

� Concurrent executions are highly nondeterminisitic

� Rare thread interleavings result in Heisenbugs

� Difficult to find, reproduce, and debug

� Observing the bug can “fix” it

� Likelihood of interleavings changes, say, when you add printfs

� A huge productivity problem

� Developers and testers can spend weeks chasing a single

Heisenbug

Page 3: CHESS: Analysis and Testing of Concurrent Programs · 2009-11-29 · CHESS: Analysis and Testing of Concurrent Programs Sebastian Burckhardt, MadanMusuvathi, ShazQadeer Microsoft

Main Takeaways

� You can find and reproduce Heisenbugs

� new automatic tool called CHESS

� for Win32 and .NET

� CHESS used extensively inside Microsoft

� Parallel Computing Platform (PCP)

� Singularity

� Dryad/Cosmos

� Released by DevLabs

Page 4: CHESS: Analysis and Testing of Concurrent Programs · 2009-11-29 · CHESS: Analysis and Testing of Concurrent Programs Sebastian Burckhardt, MadanMusuvathi, ShazQadeer Microsoft

CHESS in a nutshell

� CHESS is a user-mode scheduler

� Controls all scheduling nondeterminism

� Guarantees:

� Every program run takes a different thread interleaving

� Reproduce the interleaving for every run

� Provides monitors for analyzing each execution

Page 5: CHESS: Analysis and Testing of Concurrent Programs · 2009-11-29 · CHESS: Analysis and Testing of Concurrent Programs Sebastian Burckhardt, MadanMusuvathi, ShazQadeer Microsoft

CHESS Architecture

CHESS

Scheduler

CHESS

Scheduler

Unmanaged

Program

Unmanaged

Program

WindowsWindows

Managed

Program

Managed

Program

CLRCLR

• Every run takes a different interleaving

• Reproduce the interleaving for every run

CHESS

Exploration

Engine

CHESS

Exploration

Engine

Win32

Wrappers

.NET

Wrappers

Concurrency

Analysis

Monitors

Concurrency

Analysis

Monitors

Page 6: CHESS: Analysis and Testing of Concurrent Programs · 2009-11-29 · CHESS: Analysis and Testing of Concurrent Programs Sebastian Burckhardt, MadanMusuvathi, ShazQadeer Microsoft

CHESS Specifics

� Ability to explore all interleavings

� Need to understand complex concurrency APIs (Win32,

System.Threading)

� Threads, threadpools, locks, semaphores, async I/O, APCs,

timers, …

� Does not introduce false behaviours

� Any interleaving produced by CHESS is possible on the real

scheduler

Page 7: CHESS: Analysis and Testing of Concurrent Programs · 2009-11-29 · CHESS: Analysis and Testing of Concurrent Programs Sebastian Burckhardt, MadanMusuvathi, ShazQadeer Microsoft

CHESS Demo

• Find a simple Heisenbug

CHESS Demo

• Find a simple Heisenbug

Page 8: CHESS: Analysis and Testing of Concurrent Programs · 2009-11-29 · CHESS: Analysis and Testing of Concurrent Programs Sebastian Burckhardt, MadanMusuvathi, ShazQadeer Microsoft

CHESS: Find and Reproduce Heisenbugs

Kernel:

Threads, Scheduler,

Synchronization Objects

Kernel:

Threads, Scheduler,

Synchronization Objects

While(not done) {

TestScenario()

}

While(not done) {

TestScenario()

}

TestScenario() {

}

Program

CHESSCHESS runs the scenario in a loop

• Every run takes a different interleaving

• Every run is repeatable

Win32/.NET

Uses the CHESS scheduler • To control and direct interleavings

Detect• Assertion violations

• Deadlocks

• Dataraces

• Livelocks

CHESS

scheduler

CHESS

scheduler

Page 9: CHESS: Analysis and Testing of Concurrent Programs · 2009-11-29 · CHESS: Analysis and Testing of Concurrent Programs Sebastian Burckhardt, MadanMusuvathi, ShazQadeer Microsoft

The Design Space for CHESS� Scale

� Apply to large programs

� Precision� Any error found by CHESS is possible in the wild

� CHESS should not introduce any new behaviors

� Coverage� Any error found in the wild can be found by CHESS

� Capture all sources of nondeterminism

� Exhaustively explore the nondeterminism

Page 10: CHESS: Analysis and Testing of Concurrent Programs · 2009-11-29 · CHESS: Analysis and Testing of Concurrent Programs Sebastian Burckhardt, MadanMusuvathi, ShazQadeer Microsoft

CHESS Scheduler

Page 11: CHESS: Analysis and Testing of Concurrent Programs · 2009-11-29 · CHESS: Analysis and Testing of Concurrent Programs Sebastian Burckhardt, MadanMusuvathi, ShazQadeer Microsoft

Concurrent Executions are Nondeterministic

x = 1;

y = 1;

x = 1;

y = 1;

x = 2;

y = 2;

x = 2;

y = 2;

2,12,1

1,01,0

0,00,0

1,11,1

2,22,2

2,22,22,12,1

2,02,0

2,12,12,22,2

1,21,2

2,02,0

2,22,2

1,11,1

1,11,1 1,21,2

1,01,0

1,21,2 1,11,1

y = 1;y = 1;

x = 1;x = 1;

y = 2;y = 2;

x = 2;x = 2;

Page 12: CHESS: Analysis and Testing of Concurrent Programs · 2009-11-29 · CHESS: Analysis and Testing of Concurrent Programs Sebastian Burckhardt, MadanMusuvathi, ShazQadeer Microsoft

High level goals of the scheduler

� Enable CHESS on real-world applications

� IE, Firefox, Office, Apache, …

� Capture all sources of nondeterminism

� Required for reliably reproducing errors

� Ability to explore these nondeterministic choices

� Required for finding errors

Page 13: CHESS: Analysis and Testing of Concurrent Programs · 2009-11-29 · CHESS: Analysis and Testing of Concurrent Programs Sebastian Burckhardt, MadanMusuvathi, ShazQadeer Microsoft

Sources of Nondeterminism

1. Scheduling Nondeterminism

� Interleaving nondeterminism

� Threads can race to access shared variables or monitors

� OS can preempt threads at arbitrary points

� Timing nondeterminism

� Timers can fire in different orders

� Sleeping threads wake up at an arbitrary time in the

future

� Asynchronous calls to the file system complete at an

arbitrary time in the future

Page 14: CHESS: Analysis and Testing of Concurrent Programs · 2009-11-29 · CHESS: Analysis and Testing of Concurrent Programs Sebastian Burckhardt, MadanMusuvathi, ShazQadeer Microsoft

Sources of Nondeterminism

1. Scheduling Nondeterminism

� Interleaving nondeterminism

� Threads can race to access shared variables or monitors

� OS can preempt threads at arbitrary points

� Timing nondeterminism

� Timers can fire in different orders

� Sleeping threads wake up at an arbitrary time in the

future

� Asynchronous calls to the file system complete at an

arbitrary time in the future

� CHESS captures and explores this nondeterminism

Page 15: CHESS: Analysis and Testing of Concurrent Programs · 2009-11-29 · CHESS: Analysis and Testing of Concurrent Programs Sebastian Burckhardt, MadanMusuvathi, ShazQadeer Microsoft

Sources of Nondeterminism

2. Input nondeterminism

� User Inputs

� User can provide different inputs

� The program can receive network packets with different

contents

� Nondeterministic system calls

� Calls to gettimeofday(), random()

� ReadFile can either finish synchronously or

asynchronously

Page 16: CHESS: Analysis and Testing of Concurrent Programs · 2009-11-29 · CHESS: Analysis and Testing of Concurrent Programs Sebastian Burckhardt, MadanMusuvathi, ShazQadeer Microsoft

Sources of Nondeterminism

2. Input nondeterminism

� User Inputs

� User can provide different inputs

� The program can receive network packets with different

contents

� CHESS relies on the user to provide a scenario

� Nondeterministic system calls

� Calls to gettimeofday(), random()

� ReadFile can either finish synchronously or

asynchronously

� CHESS provides wrappers for such system calls

Page 17: CHESS: Analysis and Testing of Concurrent Programs · 2009-11-29 · CHESS: Analysis and Testing of Concurrent Programs Sebastian Burckhardt, MadanMusuvathi, ShazQadeer Microsoft

Sources of Nondeterminism

3. Memory Model Effects

� Hardware relaxations

� The processor can reorder memory instructions

� Can potentially introduce new behavior in a concurrent

program

� Compiler relaxations

� Compiler can reorder memory instructions

� Can potentially introduce new behavior in a concurrent

program (with data races)

Page 18: CHESS: Analysis and Testing of Concurrent Programs · 2009-11-29 · CHESS: Analysis and Testing of Concurrent Programs Sebastian Burckhardt, MadanMusuvathi, ShazQadeer Microsoft

Sources of Nondeterminism

3. Memory Model Effects

� Hardware relaxations

� The processor can reorder memory instructions

� Can potentially introduce new behavior in a concurrent

program

� CHESS contains a monitor for detecting such relaxations

� Compiler relaxations

� Compiler can reorder memory instructions

� Can potentially introduce new behavior in a concurrent

program (with data races)

� Future Work

Page 19: CHESS: Analysis and Testing of Concurrent Programs · 2009-11-29 · CHESS: Analysis and Testing of Concurrent Programs Sebastian Burckhardt, MadanMusuvathi, ShazQadeer Microsoft

Interleaving Nondeterminism: Example

void Deposit100(){

EnterCriticalSection(&cs);

balance += 100;

LeaveCriticalSection(&cs);

}

void Deposit100(){

EnterCriticalSection(&cs);

balance += 100;

LeaveCriticalSection(&cs);

}

Deposit Thread

void Withdraw100(){

int t;

EnterCriticalSection(&cs);

t = balance;

LeaveCriticalSection(&cs);

EnterCriticalSection(&cs);

balance = t - 100;

LeaveCriticalSection(&cs);

}

void Withdraw100(){

int t;

EnterCriticalSection(&cs);

t = balance;

LeaveCriticalSection(&cs);

EnterCriticalSection(&cs);

balance = t - 100;

LeaveCriticalSection(&cs);

}

Withdraw Thread

init:

balance = 100;

init:

balance = 100;

final:

assert(balance = 100);

final:

assert(balance = 100);

Page 20: CHESS: Analysis and Testing of Concurrent Programs · 2009-11-29 · CHESS: Analysis and Testing of Concurrent Programs Sebastian Burckhardt, MadanMusuvathi, ShazQadeer Microsoft

Invoke the Scheduler at Preemption Points

void Deposit100(){

ChessSchedule();

EnterCriticalSection(&cs);

balance += 100;

ChessSchedule();

LeaveCriticalSection(&cs);

}

void Deposit100(){

ChessSchedule();

EnterCriticalSection(&cs);

balance += 100;

ChessSchedule();

LeaveCriticalSection(&cs);

}

Deposit Thread

void Withdraw100(){

int t;

ChessSchedule();

EnterCriticalSection(&cs);

t = balance;

ChessSchedule();

LeaveCriticalSection(&cs);

ChessSchedule();

EnterCriticalSection(&cs);

balance = t - 100;

ChessSchedule();

LeaveCriticalSection(&cs);

}

void Withdraw100(){

int t;

ChessSchedule();

EnterCriticalSection(&cs);

t = balance;

ChessSchedule();

LeaveCriticalSection(&cs);

ChessSchedule();

EnterCriticalSection(&cs);

balance = t - 100;

ChessSchedule();

LeaveCriticalSection(&cs);

}

Withdraw Thread

Page 21: CHESS: Analysis and Testing of Concurrent Programs · 2009-11-29 · CHESS: Analysis and Testing of Concurrent Programs Sebastian Burckhardt, MadanMusuvathi, ShazQadeer Microsoft

Introducing Unpredictable Delays

void Deposit100(){

Sleep( rand () );

EnterCriticalSection(&cs);

balance += 100;

Sleep( rand() );

LeaveCriticalSection(&cs);

}

void Deposit100(){

Sleep( rand () );

EnterCriticalSection(&cs);

balance += 100;

Sleep( rand() );

LeaveCriticalSection(&cs);

}

Deposit Thread

void Withdraw100(){

int t;

Sleep( rand() );

EnterCriticalSection(&cs);

t = balance;

Sleep( rand() );

LeaveCriticalSection(&cs);

Sleep( rand() );

EnterCriticalSection(&cs);

balance = t - 100;

Sleep( rand() );

LeaveCriticalSection(&cs);

}

void Withdraw100(){

int t;

Sleep( rand() );

EnterCriticalSection(&cs);

t = balance;

Sleep( rand() );

LeaveCriticalSection(&cs);

Sleep( rand() );

EnterCriticalSection(&cs);

balance = t - 100;

Sleep( rand() );

LeaveCriticalSection(&cs);

}

Withdraw Thread

Page 22: CHESS: Analysis and Testing of Concurrent Programs · 2009-11-29 · CHESS: Analysis and Testing of Concurrent Programs Sebastian Burckhardt, MadanMusuvathi, ShazQadeer Microsoft

Introduce Predictable Delays with Additional

Synchronization

void Deposit100(){

WaitEvent( e1 );

EnterCriticalSection(&cs);

balance += 100;

LeaveCriticalSection(&cs);

SetEvent( e2 );

}

void Deposit100(){

WaitEvent( e1 );

EnterCriticalSection(&cs);

balance += 100;

LeaveCriticalSection(&cs);

SetEvent( e2 );

}

Deposit Thread

void Withdraw100(){

int t;

EnterCriticalSection(&cs);

t = balance;

LeaveCriticalSection(&cs);

SetEvent( e1 );

WaitEvent( e2 );

EnterCriticalSection(&cs);

balance = t - 100;

LeaveCriticalSection(&cs);

}

void Withdraw100(){

int t;

EnterCriticalSection(&cs);

t = balance;

LeaveCriticalSection(&cs);

SetEvent( e1 );

WaitEvent( e2 );

EnterCriticalSection(&cs);

balance = t - 100;

LeaveCriticalSection(&cs);

}

Withdraw Thread

Page 23: CHESS: Analysis and Testing of Concurrent Programs · 2009-11-29 · CHESS: Analysis and Testing of Concurrent Programs Sebastian Burckhardt, MadanMusuvathi, ShazQadeer Microsoft

Blindly Inserting Synchronization Can Cause Deadlocks

void Deposit100(){

EnterCriticalSection(&cs);

balance += 100;

WaitEvent( e1 );

LeaveCriticalSection(&cs);

}

void Deposit100(){

EnterCriticalSection(&cs);

balance += 100;

WaitEvent( e1 );

LeaveCriticalSection(&cs);

}

Deposit Thread

void Withdraw100(){

int t;

EnterCriticalSection(&cs);

t = balance;

LeaveCriticalSection(&cs);

SetEvent( e1 );

EnterCriticalSection(&cs);

balance = t - 100;

LeaveCriticalSection(&cs);

}

void Withdraw100(){

int t;

EnterCriticalSection(&cs);

t = balance;

LeaveCriticalSection(&cs);

SetEvent( e1 );

EnterCriticalSection(&cs);

balance = t - 100;

LeaveCriticalSection(&cs);

}

Withdraw Thread

Page 24: CHESS: Analysis and Testing of Concurrent Programs · 2009-11-29 · CHESS: Analysis and Testing of Concurrent Programs Sebastian Burckhardt, MadanMusuvathi, ShazQadeer Microsoft

CHESS Scheduler Basics

� Introduce an event per thread

� Every thread blocks on its event

� The scheduler wakes one thread at a time by enabling

the corresponding event

� The scheduler does not wake up a disabled thread

� Need to know when a thread can make progress

� Wrappers for synchronization provide this information

� The scheduler has to pick one of the enabled threads

� The exploration engine decides for the scheduler

Page 25: CHESS: Analysis and Testing of Concurrent Programs · 2009-11-29 · CHESS: Analysis and Testing of Concurrent Programs Sebastian Burckhardt, MadanMusuvathi, ShazQadeer Microsoft

CHESS Algorithms

Page 26: CHESS: Analysis and Testing of Concurrent Programs · 2009-11-29 · CHESS: Analysis and Testing of Concurrent Programs Sebastian Burckhardt, MadanMusuvathi, ShazQadeer Microsoft

x = 1;

y = k;

x = 1;

y = k;

State space explosion

x = 1;

y = k;

x = 1;

y = k;

n threads

k steps

each

� Number of executions

= O( nnk )

� Exponential in both n and k

� Typically: n < 10 k > 100

� Limits scalability to large

programs

Goal: Scale CHESS to large programs (large k)

Page 27: CHESS: Analysis and Testing of Concurrent Programs · 2009-11-29 · CHESS: Analysis and Testing of Concurrent Programs Sebastian Burckhardt, MadanMusuvathi, ShazQadeer Microsoft

x = 1;

if (p != 0) {

x = p->f;

}

x = 1;

if (p != 0) {

x = p->f;

}

Preemption bounding� CHESS, by default, is a non-preemptive, starvation-free scheduler

� Execute huge chunks of code atomically

� Systematically insert a small number preemptions� Preemptions are context switches forced by the scheduler

� e.g. Time-slice expiration

� Non-preemptions – a thread voluntarily yields� e.g. Blocking on an unavailable lock, thread end

x = p->f;

}

x = p->f;

}

x = 1;

if (p != 0) {

x = 1;

if (p != 0) {

p = 0;p = 0;

preemption

non-preemption

Page 28: CHESS: Analysis and Testing of Concurrent Programs · 2009-11-29 · CHESS: Analysis and Testing of Concurrent Programs Sebastian Burckhardt, MadanMusuvathi, ShazQadeer Microsoft

Polynomial state space� Terminating program with fixed inputs and deterministic

threads

� n threads, k steps each, c preemptions

� Number of executions <= nkCc . (n+c)!

= O( (n2k)c. n! )

Exponential in n and c, but not in k

x = 1;

y = k;

x = 1;

y = k;

x = 1;

y = k;

x = 1;

y = k;

x = 1;

x = 1;

x = 1;

x = 1;

y = k;

y = k;

y = k;y = k;

• Choose c preemption points

• Permute n+c atomic blocks

Page 29: CHESS: Analysis and Testing of Concurrent Programs · 2009-11-29 · CHESS: Analysis and Testing of Concurrent Programs Sebastian Burckhardt, MadanMusuvathi, ShazQadeer Microsoft

Advantages of preemption bounding� Most errors are caused by few (<2) preemptions

� Generates an easy to understand error trace

� Preemption points almost always point to the root-cause of the bug

� Leads to good heuristics

� Insert more preemptions in code that needs to be tested

� Avoid preemptions in libraries

� Insert preemptions in recently modified code

� A good coverage guarantee to the user

� When CHESS finishes exploration with 2 preemptions, any remaining bug requires 3 preemptions or more

Page 30: CHESS: Analysis and Testing of Concurrent Programs · 2009-11-29 · CHESS: Analysis and Testing of Concurrent Programs Sebastian Burckhardt, MadanMusuvathi, ShazQadeer Microsoft

CHESS Demo

• Finding and reproducing CCR heisenbug

CHESS Demo

Page 31: CHESS: Analysis and Testing of Concurrent Programs · 2009-11-29 · CHESS: Analysis and Testing of Concurrent Programs Sebastian Burckhardt, MadanMusuvathi, ShazQadeer Microsoft

Concurrent programs have cyclic state spaces

L1: while( ! done) {

L2: Sleep();

}

L1: while( ! done) {

L2: Sleep();

}

M1: done = 1;M1: done = 1;

! done

L2

! done

L2

! done

L1

! done

L1

done

L2

done

L2

done

L1

done

L1

Page 32: CHESS: Analysis and Testing of Concurrent Programs · 2009-11-29 · CHESS: Analysis and Testing of Concurrent Programs Sebastian Burckhardt, MadanMusuvathi, ShazQadeer Microsoft

A demonic scheduler unrolls any cycle

ad-infinitum

! done! done

donedone! done! done

donedone! done! done

donedone

while( ! done)

{

Sleep();

}

while( ! done)

{

Sleep();

}

done = 1;done = 1;

! done! done

Page 33: CHESS: Analysis and Testing of Concurrent Programs · 2009-11-29 · CHESS: Analysis and Testing of Concurrent Programs Sebastian Burckhardt, MadanMusuvathi, ShazQadeer Microsoft

Depth bounding

! done! done

donedone! done! done

donedone! done! done

donedone! done! done

� Prune executions beyond a bounded number of steps

Depth bound

Page 34: CHESS: Analysis and Testing of Concurrent Programs · 2009-11-29 · CHESS: Analysis and Testing of Concurrent Programs Sebastian Burckhardt, MadanMusuvathi, ShazQadeer Microsoft

Problem 1: Ineffective state coverage

! done! done

! done! done

! done! done

! done! done

� Bound has to be large enough to

reach the deepest bug

� Typically, greater than 100

synchronization operations

� Every unrolling of a cycle

redundantly explores reachable

state space

Depth bound

Page 35: CHESS: Analysis and Testing of Concurrent Programs · 2009-11-29 · CHESS: Analysis and Testing of Concurrent Programs Sebastian Burckhardt, MadanMusuvathi, ShazQadeer Microsoft

Problem 2: Cannot find livelocks

� Livelocks : lack of progress in a program

temp = done;

while( ! temp)

{

Sleep();

}

temp = done;

while( ! temp)

{

Sleep();

}

done = 1;done = 1;

Page 36: CHESS: Analysis and Testing of Concurrent Programs · 2009-11-29 · CHESS: Analysis and Testing of Concurrent Programs Sebastian Burckhardt, MadanMusuvathi, ShazQadeer Microsoft

Key idea

� This test terminates only when the scheduler is fair

� Fairness is assumed by programmers

All cycles in correct programs are unfair

A fair cycle is a livelock

while( ! done)

{

Sleep();

}

while( ! done)

{

Sleep();

}

done = 1;done = 1;! done! done! done! done

donedonedonedone

Page 37: CHESS: Analysis and Testing of Concurrent Programs · 2009-11-29 · CHESS: Analysis and Testing of Concurrent Programs Sebastian Burckhardt, MadanMusuvathi, ShazQadeer Microsoft

We need a fair scheduler

� Avoid unrolling unfair cycles

� Effective state coverage

� Detect fair cycles

� Find livelocks

Concurrent

Program

Concurrent

Program

Test

Harness

Test

Harness

Win32 API

Demonic

Scheduler

Fair

Demonic

Scheduler

Page 38: CHESS: Analysis and Testing of Concurrent Programs · 2009-11-29 · CHESS: Analysis and Testing of Concurrent Programs Sebastian Burckhardt, MadanMusuvathi, ShazQadeer Microsoft

� What notion of “fairness” do we use?

Page 39: CHESS: Analysis and Testing of Concurrent Programs · 2009-11-29 · CHESS: Analysis and Testing of Concurrent Programs Sebastian Burckhardt, MadanMusuvathi, ShazQadeer Microsoft

Weak fairness� A thread that remains enabled should eventually be

scheduled

� A weakly-fair scheduler will eventually schedule Thread 2

� Example: round-robin

while( ! done)

{

Sleep();

}

while( ! done)

{

Sleep();

}

done = 1;done = 1;

Page 40: CHESS: Analysis and Testing of Concurrent Programs · 2009-11-29 · CHESS: Analysis and Testing of Concurrent Programs Sebastian Burckhardt, MadanMusuvathi, ShazQadeer Microsoft

Weak fairness does not suffice

Lock( l );

While( ! done)

{

Unlock( l );

Sleep();

Lock( l );

}

Unlock( l );

Lock( l );

While( ! done)

{

Unlock( l );

Sleep();

Lock( l );

}

Unlock( l );

Lock( l );

done = 1;

Unlock( l );

Lock( l );

done = 1;

Unlock( l );

en = {T1, T2}en = {T1, T2}

T1: Sleep()

T2: Lock( l )

en = {T1, T2}en = {T1, T2}

T1: Lock( l )

T2: Lock( l )

en = { T1 }en = { T1 }

T1: Unlock( l )

T2: Lock( l )

en = {T1, T2}en = {T1, T2}

T1: Sleep()

T2: Lock( l )

Page 41: CHESS: Analysis and Testing of Concurrent Programs · 2009-11-29 · CHESS: Analysis and Testing of Concurrent Programs Sebastian Burckhardt, MadanMusuvathi, ShazQadeer Microsoft

Strong Fairness� A thread that is enabled infinitely often is scheduled

infinitely often

� Thread 2 is enabled and competes for the lock infinitely often

Lock( l );

While( ! done)

{

Unlock( l );

Sleep();

Lock( l );

}

Unlock( l );

Lock( l );

While( ! done)

{

Unlock( l );

Sleep();

Lock( l );

}

Unlock( l );

Lock( l );

done = 1;

Unlock( l );

Lock( l );

done = 1;

Unlock( l );

Page 42: CHESS: Analysis and Testing of Concurrent Programs · 2009-11-29 · CHESS: Analysis and Testing of Concurrent Programs Sebastian Burckhardt, MadanMusuvathi, ShazQadeer Microsoft

Implementing a strongly-fair scheduler

� A round-robin scheduler with priorities

� Operating system schedulers

� Priority boosting of threads

Page 43: CHESS: Analysis and Testing of Concurrent Programs · 2009-11-29 · CHESS: Analysis and Testing of Concurrent Programs Sebastian Burckhardt, MadanMusuvathi, ShazQadeer Microsoft

We also need to be demonic

� Cannot generate all fair schedules

� There are infinitely many, even for simple programs

� It is sufficient to generate enough fair schedules to

� Explore all states (safety coverage)

� Explore at least one fair cycle, if any (livelock coverage)

Page 44: CHESS: Analysis and Testing of Concurrent Programs · 2009-11-29 · CHESS: Analysis and Testing of Concurrent Programs Sebastian Burckhardt, MadanMusuvathi, ShazQadeer Microsoft

(Good) Programs indicate lack of progress

� Good Samaritan assumption:� A thread when scheduled infinitely often yields the processor

infinitely often

� Examples of yield:� Sleep()

� Blocking on synchronization operation

� Thread completion

while( ! done)

{

Sleep();

}

while( ! done)

{

Sleep();

}

done = 1;done = 1;

Page 45: CHESS: Analysis and Testing of Concurrent Programs · 2009-11-29 · CHESS: Analysis and Testing of Concurrent Programs Sebastian Burckhardt, MadanMusuvathi, ShazQadeer Microsoft

Fair demonic scheduler

� Maintain a priority-order (a partial-order) on threads

� t < u : t will not be scheduled when u is enabled

� Threads get a lower priority only when they yield

� When t yields, add t < u if

� Thread u was continuously enabled since last yield of t, or

� Thread u was disabled by t since the last yield of t

� A thread loses its priority once it executes

� Remove all edges t < u when u executes

Page 46: CHESS: Analysis and Testing of Concurrent Programs · 2009-11-29 · CHESS: Analysis and Testing of Concurrent Programs Sebastian Burckhardt, MadanMusuvathi, ShazQadeer Microsoft

Data Races

Page 47: CHESS: Analysis and Testing of Concurrent Programs · 2009-11-29 · CHESS: Analysis and Testing of Concurrent Programs Sebastian Burckhardt, MadanMusuvathi, ShazQadeer Microsoft

What is a Data Race?

� If two conflicting memory accesses happen

concurrently, we have a data race.

� Two memory accesses conflict if

� They target the same location

� They are not both reads

� They are not both synchronization operations

� Best practice: write “correctly synchronized“

programs that do not contain data races.

Page 48: CHESS: Analysis and Testing of Concurrent Programs · 2009-11-29 · CHESS: Analysis and Testing of Concurrent Programs Sebastian Burckhardt, MadanMusuvathi, ShazQadeer Microsoft

What Makes Data Races significant?

� Data races may reveal synchronization errors

� Most typically, programmer forgot to take a lock, or

declare a variable volatile.

� Race-free programs are easier to verify

� If a program is race-free, it is enough to consider

schedules that preempt on synchronizations only

� CHESS heavily relies on this reduction

Page 49: CHESS: Analysis and Testing of Concurrent Programs · 2009-11-29 · CHESS: Analysis and Testing of Concurrent Programs Sebastian Burckhardt, MadanMusuvathi, ShazQadeer Microsoft

How do we find races?

� Remember: races are concurrent conflicting accesses.

� But what does concurrent actually mean?

� Two general approaches to do race-detection

Lockset-Based(heuristic)

Concurrent ≈

“Disjoint locksets”

Happens-Before-Based(precise)

Concurrent =

“Not ordered by happens-

before”

Page 50: CHESS: Analysis and Testing of Concurrent Programs · 2009-11-29 · CHESS: Analysis and Testing of Concurrent Programs Sebastian Burckhardt, MadanMusuvathi, ShazQadeer Microsoft

Synchronization = Locks ???

� This C# code contains neither locks nor a data race:

� CHESS is precise: does not report this as a race. But does

report a race if you remove the ‘volatile’ qualifier.

data = 1;

flag = true;

data = 1;

flag = true;

while (!flag)

yield();

int x = data;

while (!flag)

yield();

int x = data;

Thread 1 Thread 2

int data;

volatile bool flag;

int data;

volatile bool flag;

Page 51: CHESS: Analysis and Testing of Concurrent Programs · 2009-11-29 · CHESS: Analysis and Testing of Concurrent Programs Sebastian Burckhardt, MadanMusuvathi, ShazQadeer Microsoft

Happens-Before Order [Lamport]

� Use logical clocks and timestamps to define a partial

order called happens-before on events in a concurrent

system

� States precisely when two events are logically

concurrent (abstracting away real time)

1

2

3

1

2

3

1

2

3

(0,0,1)� Cross-edges from send

events to receive events

� (a1, a2, a3) happens before (b1, b2, b3) iff a1 ≤ b1 and a2

≤ b2 and a3 ≤ b3

(2,1,0)(1,0,0)

(0,0,2)(2,2,2)(2,0,0)

(0,0,3)(2,3,2)(3,3,2)

Page 52: CHESS: Analysis and Testing of Concurrent Programs · 2009-11-29 · CHESS: Analysis and Testing of Concurrent Programs Sebastian Burckhardt, MadanMusuvathi, ShazQadeer Microsoft

Happens-Before for Shared Memory

� Distributed Systems:

Cross-edges from send to receive events

� Shared Memory systems:

Cross-edges represent ordering effect of synchronization

� Edges from lock release to subsequent lock acquire

� Edges from volatile writes to subsequent volatile reads

� Long list of primitives that may create edges

� Semaphores

� Waithandles

� Rendezvous

� System calls (asynchronous IO)

� Etc.

Page 53: CHESS: Analysis and Testing of Concurrent Programs · 2009-11-29 · CHESS: Analysis and Testing of Concurrent Programs Sebastian Burckhardt, MadanMusuvathi, ShazQadeer Microsoft

Example

1

2

1

2

3

(1,0)

(2,4)

data = 1;

flag = true;

while (!flag)

yield();

int x = data;

Thread 1 Thread 2

int data;

volatile bool flag;data = 1;

flag = true;

(!flag)->true

yield()

(!flag)->false

4x = data

� Not a data race because (1,0) ≤ (2,4)

� If flag were not declared volatile, we would not add a

cross-edge, and this would be a data race.

Page 54: CHESS: Analysis and Testing of Concurrent Programs · 2009-11-29 · CHESS: Analysis and Testing of Concurrent Programs Sebastian Burckhardt, MadanMusuvathi, ShazQadeer Microsoft

CHESS Demo

• Find a simple data race in a toy example

CHESS Demo

Page 55: CHESS: Analysis and Testing of Concurrent Programs · 2009-11-29 · CHESS: Analysis and Testing of Concurrent Programs Sebastian Burckhardt, MadanMusuvathi, ShazQadeer Microsoft

Refinement Checking

Page 56: CHESS: Analysis and Testing of Concurrent Programs · 2009-11-29 · CHESS: Analysis and Testing of Concurrent Programs Sebastian Burckhardt, MadanMusuvathi, ShazQadeer Microsoft

Concurrent Data Types

� Frequently used building blocks for parallel or

concurrent applications.

� Typical examples:

� Concurrent stack

� Concurrent queue

� Concurrent deque

� Concurrent hashtable

� ….

� Many slightly different scenarios, implementations,

and operations

Page 57: CHESS: Analysis and Testing of Concurrent Programs · 2009-11-29 · CHESS: Analysis and Testing of Concurrent Programs Sebastian Burckhardt, MadanMusuvathi, ShazQadeer Microsoft

Correctness Criteria

� Say we are verifying concurrent X

(for X ∈ queue, stack, deque, hashtable …)

� Typically, concurrent X is expected to behave like

atomically interleaved sequential X

� We can check this without knowing the semantics of X

Page 58: CHESS: Analysis and Testing of Concurrent Programs · 2009-11-29 · CHESS: Analysis and Testing of Concurrent Programs Sebastian Burckhardt, MadanMusuvathi, ShazQadeer Microsoft

Observation Enumeration Method [CheckFence, PLDI07]

� Given concurrent test, e.g.

� (Step 1 : Enumerate Observations) Enumerate coarse-grained interleavings and record observations

1. b1=true i1=1 b2=false i2=0

2. b1=false i1=0 b2=true i2=1

3. b1=false i1=0 b2=false i2=0

� (Step 2 : Check Observations) Check refinement: all concurrent executions must look like one of the recorded observations

Stack s = new ConcurrentStack();

s.Push(1); b1 = s.Pop(out i1);

b2 = s.Pop(out i2);

Page 59: CHESS: Analysis and Testing of Concurrent Programs · 2009-11-29 · CHESS: Analysis and Testing of Concurrent Programs Sebastian Burckhardt, MadanMusuvathi, ShazQadeer Microsoft

CHESS Demo

• Show refinement checking on simple stack

example

CHESS Demo

Page 60: CHESS: Analysis and Testing of Concurrent Programs · 2009-11-29 · CHESS: Analysis and Testing of Concurrent Programs Sebastian Burckhardt, MadanMusuvathi, ShazQadeer Microsoft

Conclusion

� CHESS is a tool for

� Systematically enumerating thread interleavings

� Reliably reproducing concurrent executions

� Coverage of Win32 and .NET API

� Isolates the search & monitor algorithms from their

complexity

� CHESS is extensible

� Monitors for analyzing concurrent executions