CHESS : Systematic Testing of Concurrent Programs Madan Musuvathi Shaz Qadeer Microsoft Research.
-
Upload
colin-simmons -
Category
Documents
-
view
227 -
download
4
Transcript of CHESS : Systematic Testing of Concurrent Programs Madan Musuvathi Shaz Qadeer Microsoft Research.
![Page 1: CHESS : Systematic Testing of Concurrent Programs Madan Musuvathi Shaz Qadeer Microsoft Research.](https://reader033.fdocuments.in/reader033/viewer/2022061304/5513cadb55034674748b49fd/html5/thumbnails/1.jpg)
CHESS : Systematic Testing of Concurrent Programs
Madan MusuvathiShaz Qadeer
Microsoft Research
![Page 2: CHESS : Systematic Testing of Concurrent Programs Madan Musuvathi Shaz Qadeer Microsoft Research.](https://reader033.fdocuments.in/reader033/viewer/2022061304/5513cadb55034674748b49fd/html5/thumbnails/2.jpg)
Testing multithreaded programs is HARD
Specific thread interleavings expose subtle errorsTesting often misses these errors
Even when found, errors are hard to debugNo repeatable traceSource of the bug is far away from where it manifests
![Page 3: CHESS : Systematic Testing of Concurrent Programs Madan Musuvathi Shaz Qadeer Microsoft Research.](https://reader033.fdocuments.in/reader033/viewer/2022061304/5513cadb55034674748b49fd/html5/thumbnails/3.jpg)
Concurrency is a real problemWindows 2000 hot fixes
Concurrency errors most common defects among “detectable errors”
Incorrect synchronization and protocol errors most common defects among all coding errors
Windows Server 2003 late cycle defectsSynchronization errors second in the list, next to buffer
overruns
Race conditions can result in security exploits
![Page 4: CHESS : Systematic Testing of Concurrent Programs Madan Musuvathi Shaz Qadeer Microsoft Research.](https://reader033.fdocuments.in/reader033/viewer/2022061304/5513cadb55034674748b49fd/html5/thumbnails/4.jpg)
Current practiceConcurrency testing == Stress testing
Example: testing a concurrent queueCreate 100 threads performing queue operationsRun for days/weeksPepper the code with sleep ( random() )
Stress increases the likelihood of rare interleavingsMakes any error found hard to debug
![Page 5: CHESS : Systematic Testing of Concurrent Programs Madan Musuvathi Shaz Qadeer Microsoft Research.](https://reader033.fdocuments.in/reader033/viewer/2022061304/5513cadb55034674748b49fd/html5/thumbnails/5.jpg)
CHESS: Unit testing for concurrencyExample: testing a concurrent queue
Create 1 reader thread and 1 writer threadExhaustively try all thread interleavings
Run the test repeatedly on a specialized scheduler
Explore a different thread interleaving each timeUse model checking techniques to avoid redundancy
Check for assertions and deadlocks in every runThe error-trace is repeatable
![Page 6: CHESS : Systematic Testing of Concurrent Programs Madan Musuvathi Shaz Qadeer Microsoft Research.](https://reader033.fdocuments.in/reader033/viewer/2022061304/5513cadb55034674748b49fd/html5/thumbnails/6.jpg)
Systematic Stress Testing Using CHESS
Kernel: Threads, Scheduler, Synchronization Objects
While(not done) { TestScenario()}
While(not done) { TestScenario()}
TestScenario() { …}
ProgramTester Provides a Test Scenario CHESS
CHESS runs the scenario in a loop • Every run takes a different interleaving• Every run is repeatable
Win32 API
![Page 7: CHESS : Systematic Testing of Concurrent Programs Madan Musuvathi Shaz Qadeer Microsoft Research.](https://reader033.fdocuments.in/reader033/viewer/2022061304/5513cadb55034674748b49fd/html5/thumbnails/7.jpg)
Conditions on Test ScenarioTest scenario should terminate in all interleavings
Test scenario should be idempotentFree all resources (handles, memory, …)Clear the hardware state
Key observation:Existing stress tests already have these propertiesBecause they repeatedly run for ever
![Page 8: CHESS : Systematic Testing of Concurrent Programs Madan Musuvathi Shaz Qadeer Microsoft Research.](https://reader033.fdocuments.in/reader033/viewer/2022061304/5513cadb55034674748b49fd/html5/thumbnails/8.jpg)
Perturb the System as Little as Possible
Kernel: Threads, Scheduler, Synchronization Objects
While(not done){ TestScenario()}
While(not done){ TestScenario()}
TestScenario(){ …}
Program
CHESS
Win32 API
Detour Win32 API calls• To control and introduce nondeterminism
Run the system as is• On the actual OS, hardware• Using system threads, synchronization
Advantages• Avoid reporting false errors• Easy to add to existing test frameworks• Use existing debuggers
![Page 9: CHESS : Systematic Testing of Concurrent Programs Madan Musuvathi Shaz Qadeer Microsoft Research.](https://reader033.fdocuments.in/reader033/viewer/2022061304/5513cadb55034674748b49fd/html5/thumbnails/9.jpg)
Implementation detailsHandle all the Win32 synchronization mechanisms
Critical sections, locks, semaphores, events,…ThreadpoolsAsynchronous procedure callsTimersIO Completions
No modification to the kernel scheduler / Win32 library
CHESS drives the system along a desired by interleaving by ‘hijacking’ the scheduler
![Page 10: CHESS : Systematic Testing of Concurrent Programs Madan Musuvathi Shaz Qadeer Microsoft Research.](https://reader033.fdocuments.in/reader033/viewer/2022061304/5513cadb55034674748b49fd/html5/thumbnails/10.jpg)
Controlling the Scheduling NondeterminismNondeterministic choices for the scheduler
Determine when to context switchOn context switch, pick the next runnable thread to runOn resource release, wake up one of the waiting threads
Hijack these choices from the schedulerEnsure at most one thread is runnableNo thread is waiting on a resourceAt chosen schedule points, block the current thread while
waking the next threadEmulate program execution on a uniprocessor with
context switches only at synchronization points
![Page 11: CHESS : Systematic Testing of Concurrent Programs Madan Musuvathi Shaz Qadeer Microsoft Research.](https://reader033.fdocuments.in/reader033/viewer/2022061304/5513cadb55034674748b49fd/html5/thumbnails/11.jpg)
Partial-order reductionMany thread interleavings are equivalent
Accesses to separate memory locations by different threads can be reordered
Avoid exploring equivalent thread interleavings
![Page 12: CHESS : Systematic Testing of Concurrent Programs Madan Musuvathi Shaz Qadeer Microsoft Research.](https://reader033.fdocuments.in/reader033/viewer/2022061304/5513cadb55034674748b49fd/html5/thumbnails/12.jpg)
Partial-order reduction in CHESSAlgorithm:
Assume the program is data-race freeContext switch only at synchronization pointsCheck for data-races in each execution
Theorem:If the algorithm terminates without reporting races,
then the program has no assertion failures
![Page 13: CHESS : Systematic Testing of Concurrent Programs Madan Musuvathi Shaz Qadeer Microsoft Research.](https://reader033.fdocuments.in/reader033/viewer/2022061304/5513cadb55034674748b49fd/html5/thumbnails/13.jpg)
Executions on Multi-coresCHESS checks for data-racesIf a Test Scenario manifests a bug on a multi-core
machine, then CHESS willEither report a data-raceOr the bug
CHESS systematically enumerates all sequentially consistent executionsAny data-race free multi-core execution is equivalent to
a sequentially consistent execution
![Page 14: CHESS : Systematic Testing of Concurrent Programs Madan Musuvathi Shaz Qadeer Microsoft Research.](https://reader033.fdocuments.in/reader033/viewer/2022061304/5513cadb55034674748b49fd/html5/thumbnails/14.jpg)
State space explosion
x = 1;y = 1;x = 1;y = 1;
x = 2;y = 2;x = 2;y = 2;
2,12,1
1,01,0
0,00,0
1,11,1
2,22,2
2,22,22,12,1
2,02,0
2,12,12,22,2
1,21,2
2,02,0
2,22,2
1,11,1
1,11,1 1,21,2
1,01,0
1,21,2 1,11,1
y = 1;y = 1;
x = 1;x = 1;
y = 2;y = 2;
x = 2;x = 2;
![Page 15: CHESS : Systematic Testing of Concurrent Programs Madan Musuvathi Shaz Qadeer Microsoft Research.](https://reader033.fdocuments.in/reader033/viewer/2022061304/5513cadb55034674748b49fd/html5/thumbnails/15.jpg)
x = 2; … … … … … y = 2;
x = 2; … … … … … y = 2;
State space explosion
x = 1; … … … … …y = 1;
x = 1; … … … … …y = 1;
…
n threads
k steps each
Number of executions = O( nnk )
Exponential in both n and kTypically: n < 10 k > 100
Limits scalability to large programs (large k)
![Page 16: CHESS : Systematic Testing of Concurrent Programs Madan Musuvathi Shaz Qadeer Microsoft Research.](https://reader033.fdocuments.in/reader033/viewer/2022061304/5513cadb55034674748b49fd/html5/thumbnails/16.jpg)
Bounding execution depthWorks very well for message-passing programs
Limit the number of message exchanges
Message processing code executed atomicallyCan go ‘deep’ in the state space
Does not work for multithreaded programsEven toy programs can have large number of steps
(shared-variable accesses)
![Page 17: CHESS : Systematic Testing of Concurrent Programs Madan Musuvathi Shaz Qadeer Microsoft Research.](https://reader033.fdocuments.in/reader033/viewer/2022061304/5513cadb55034674748b49fd/html5/thumbnails/17.jpg)
x = 1;if (p != 0) { x = p->f;}
x = 1;if (p != 0) { x = p->f;}
Iterative context bounding
x = p->f;} x = p->f;}
x = 1;if (p != 0) {x = 1;if (p != 0) {
p = 0;p = 0;
preemption
non-preemption
![Page 18: CHESS : Systematic Testing of Concurrent Programs Madan Musuvathi Shaz Qadeer Microsoft Research.](https://reader033.fdocuments.in/reader033/viewer/2022061304/5513cadb55034674748b49fd/html5/thumbnails/18.jpg)
Iterative context-bounding algorithmThe scheduler has a budget of c preemptions
Nondeterministically choose the preemption pointsResort to non-preemptive scheduling after c
preemptionsOnce all executions explored with c preemptions
Try with c+1 preemptions
Iterative context-bounding has desirable propertiesProperty 0: Easy to implement
![Page 19: CHESS : Systematic Testing of Concurrent Programs Madan Musuvathi Shaz Qadeer Microsoft Research.](https://reader033.fdocuments.in/reader033/viewer/2022061304/5513cadb55034674748b49fd/html5/thumbnails/19.jpg)
Property 1: Polynomial state spaceTerminating program with fixed inputs and deterministic threads
n threads, k steps each, c preemptionsNumber of executions <= nkCc . (n+c)! = O( (n2k)c. n! )
Exponential in n and c, but not in k
x = 1; … … … … …y = 1;
x = 1; … … … … …y = 1;
x = 2; … … … … … y = 2;
x = 2; … … … … … y = 2;
x = 1; … … … …
x = 1; … … … …
x = 2; … … …
x = 2; … … …
…y = 1; …y = 1;
… … … …
y = 2;y = 2;
• Choose c preemption points
• Permute n+c atomic blocks
![Page 20: CHESS : Systematic Testing of Concurrent Programs Madan Musuvathi Shaz Qadeer Microsoft Research.](https://reader033.fdocuments.in/reader033/viewer/2022061304/5513cadb55034674748b49fd/html5/thumbnails/20.jpg)
Property 2: Deep exploration possible with small boundsA context-bounded execution has unbounded depth
a thread may execute unbounded number of steps within each context
Event a context-bound of zero yields complete terminating executions
![Page 21: CHESS : Systematic Testing of Concurrent Programs Madan Musuvathi Shaz Qadeer Microsoft Research.](https://reader033.fdocuments.in/reader033/viewer/2022061304/5513cadb55034674748b49fd/html5/thumbnails/21.jpg)
Property 3: Finds the ‘simplest’ error traceFinds smallest number of preemptions to the
error
Number of preemptions better metric of error complexity than execution length
![Page 22: CHESS : Systematic Testing of Concurrent Programs Madan Musuvathi Shaz Qadeer Microsoft Research.](https://reader033.fdocuments.in/reader033/viewer/2022061304/5513cadb55034674748b49fd/html5/thumbnails/22.jpg)
Property 4: Coverage metricIf search terminates with context-bound of c, then any
remaining error must require at least c+1 preemptions
Intuitive estimate forThe complexity of the bugs remaining in the programThe chance of their occurrence in practice
![Page 23: CHESS : Systematic Testing of Concurrent Programs Madan Musuvathi Shaz Qadeer Microsoft Research.](https://reader033.fdocuments.in/reader033/viewer/2022061304/5513cadb55034674748b49fd/html5/thumbnails/23.jpg)
Property 5: Lots of bugs with small number of preemptionsA non-blocking implementation of the work-
stealing queue algorithmbounded circular buffer accessed concurrently by
readers and stealersDeveloper provided
test harnessthree buggy variations of the program
Each bug found with at most 2 preemptionsexecutions with 35 preemptions are possible!
![Page 24: CHESS : Systematic Testing of Concurrent Programs Madan Musuvathi Shaz Qadeer Microsoft Research.](https://reader033.fdocuments.in/reader033/viewer/2022061304/5513cadb55034674748b49fd/html5/thumbnails/24.jpg)
Context-bounding + Partial-order reductionAlgorithm:
Assume the program is data-race freeContext switch only at synchronization pointsExplore executions with c preemptionsCheck for data-races in each execution
Theorem:If the algorithm terminates without reporting races,
Then the program has no assertion failures reachable with c preemptions
Requires that a thread can block only at synchronization pointsProof (Musuvathi-Q, PLDI 2007)
![Page 25: CHESS : Systematic Testing of Concurrent Programs Madan Musuvathi Shaz Qadeer Microsoft Research.](https://reader033.fdocuments.in/reader033/viewer/2022061304/5513cadb55034674748b49fd/html5/thumbnails/25.jpg)
Bugs found
Program KLOC Max Num Threads
Bugs Reachable with Preemption Count
0 1 2 3 Total
Bluetooth 0.4 3 0 1 0 0 1
Work-Stealing Queue
1.3 3 0 1 2 0 3
Transaction Manager
7.0 2 0 0 2 1 3
APE 18.9 4 2 1 1 - 4
Dryad Channels 16.0 5 1 5 1 - 7
![Page 26: CHESS : Systematic Testing of Concurrent Programs Madan Musuvathi Shaz Qadeer Microsoft Research.](https://reader033.fdocuments.in/reader033/viewer/2022061304/5513cadb55034674748b49fd/html5/thumbnails/26.jpg)
// Function called by a worker thread // of RChannelReaderImplvoid RChannelReaderImpl::AlertApplication(RChannelItem* item){ // Notify Application
// XXX: Preempt here for the bug EnterCriticalSection(&m_baseCS); // process before exit LeaveCriticalSection(&m_baseCS);}
// Function called by the main threadvoid TestChannel(WorkQueue* workQueue, ...){ // Creating a channel // allocates worker threads RChannelReader* channel = new RChannelReaderImpl(..., workQueue);
// ... do work here
channel->Close(); // wrong assumption that channel->Close() // waits for worker threads to be finished
delete channel; // BUG: deleting the channel when // worker threads still have a valid // reference to the channel}
![Page 27: CHESS : Systematic Testing of Concurrent Programs Madan Musuvathi Shaz Qadeer Microsoft Research.](https://reader033.fdocuments.in/reader033/viewer/2022061304/5513cadb55034674748b49fd/html5/thumbnails/27.jpg)
// Function called by a worker thread // of RChannelReaderImplvoid RChannelReaderImpl::AlertApplication(RChannelItem* item){ // Notify Application
// XXX: Preempt here for the bug EnterCriticalSection(&m_baseCS); // process before exit LeaveCriticalSection(&m_baseCS);}
// Function called by the main threadvoid TestChannel(WorkQueue* workQueue, ...){ // Creating a channel // allocates worker threads RChannelReader* channel = new RChannelReaderImpl(..., workQueue);
// ... do work here
channel->Close(); // wrong assumption that channel->Close() // waits for worker threads to be finished
delete channel; // BUG: deleting the channel when // worker threads still have a valid // reference to the channel}
![Page 28: CHESS : Systematic Testing of Concurrent Programs Madan Musuvathi Shaz Qadeer Microsoft Research.](https://reader033.fdocuments.in/reader033/viewer/2022061304/5513cadb55034674748b49fd/html5/thumbnails/28.jpg)
// Function called by a worker thread // of RChannelReaderImplvoid RChannelReaderImpl::AlertApplication(RChannelItem* item){ // Notify Application
// XXX: Preempt here for the bug EnterCriticalSection(&m_baseCS); // process before exit LeaveCriticalSection(&m_baseCS);}
// Function called by the main threadvoid TestChannel(WorkQueue* workQueue, ...){ // Creating a channel // allocates worker threads RChannelReader* channel = new RChannelReaderImpl(..., workQueue);
// ... do work here
channel->Close(); // wrong assumption that channel->Close() // waits for worker threads to be finished
delete channel; // BUG: deleting the channel when // worker threads still have a valid // reference to the channel}
![Page 29: CHESS : Systematic Testing of Concurrent Programs Madan Musuvathi Shaz Qadeer Microsoft Research.](https://reader033.fdocuments.in/reader033/viewer/2022061304/5513cadb55034674748b49fd/html5/thumbnails/29.jpg)
// Function called by a worker thread // of RChannelReaderImplvoid RChannelReaderImpl::AlertApplication(RChannelItem* item){ // Notify Application
// XXX: Preempt here for the bug EnterCriticalSection(&m_baseCS); // process before exit LeaveCriticalSection(&m_baseCS);}
// Function called by the main threadvoid TestChannel(WorkQueue* workQueue, ...){ // Creating a channel // allocates worker threads RChannelReader* channel = new RChannelReaderImpl(..., workQueue);
// ... do work here
channel->Close(); // wrong assumption that channel->Close() // waits for worker threads to be finished
delete channel; // BUG: deleting the channel when // worker threads still have a valid // reference to the channel}
![Page 30: CHESS : Systematic Testing of Concurrent Programs Madan Musuvathi Shaz Qadeer Microsoft Research.](https://reader033.fdocuments.in/reader033/viewer/2022061304/5513cadb55034674748b49fd/html5/thumbnails/30.jpg)
// Function called by a worker thread // of RChannelReaderImplvoid RChannelReaderImpl::AlertApplication(RChannelItem* item){ // Notify Application
// XXX: Preempt here for the bug EnterCriticalSection(&m_baseCS); // process before exit LeaveCriticalSection(&m_baseCS);}
// Function called by the main threadvoid TestChannel(WorkQueue* workQueue, ...){ // Creating a channel // allocates worker threads RChannelReader* channel = new RChannelReaderImpl(..., workQueue);
// ... do work here
channel->Close(); // wrong assumption that channel->Close() // waits for worker threads to be finished
delete channel; // BUG: deleting the channel when // worker threads still have a valid // reference to the channel}
![Page 31: CHESS : Systematic Testing of Concurrent Programs Madan Musuvathi Shaz Qadeer Microsoft Research.](https://reader033.fdocuments.in/reader033/viewer/2022061304/5513cadb55034674748b49fd/html5/thumbnails/31.jpg)
Facts about Dryad error trace
Long error trace but requires only one preemptionDepth-bounding cannot find it without a lot of luck
The error trace has 6 non-preempting context switchesIt is important to leave unbounded the number of non-
preempting context switches This (and the other 6 errors) in Dryad remained in
spite of careful regression testing and months of production use
![Page 32: CHESS : Systematic Testing of Concurrent Programs Madan Musuvathi Shaz Qadeer Microsoft Research.](https://reader033.fdocuments.in/reader033/viewer/2022061304/5513cadb55034674748b49fd/html5/thumbnails/32.jpg)
Bugs found
Program KLOC Max Num Threads
Bugs Reachable with Preemption Count
0 1 2 3 Total
Bluetooth 0.4 3 0 1 0 0 1
Work-Stealing Queue
1.3 3 0 1 2 0 3
Transaction Manager
7.0 2 0 0 2 1 3
APE 18.9 4 2 1 1 - 4
Dryad Channels 16.0 5 1 5 1 - 7
![Page 33: CHESS : Systematic Testing of Concurrent Programs Madan Musuvathi Shaz Qadeer Microsoft Research.](https://reader033.fdocuments.in/reader033/viewer/2022061304/5513cadb55034674748b49fd/html5/thumbnails/33.jpg)
Coverage vs. Context-bound
![Page 34: CHESS : Systematic Testing of Concurrent Programs Madan Musuvathi Shaz Qadeer Microsoft Research.](https://reader033.fdocuments.in/reader033/viewer/2022061304/5513cadb55034674748b49fd/html5/thumbnails/34.jpg)
Dryad (coverage vs. time)
![Page 35: CHESS : Systematic Testing of Concurrent Programs Madan Musuvathi Shaz Qadeer Microsoft Research.](https://reader033.fdocuments.in/reader033/viewer/2022061304/5513cadb55034674748b49fd/html5/thumbnails/35.jpg)
Current CHESS applications (work in progress)Dryad (library for distributed dataflow programming)Singularity/Midori (OS in managed code)User-mode drivers
Cosmos (distributed file system)SQL database
![Page 36: CHESS : Systematic Testing of Concurrent Programs Madan Musuvathi Shaz Qadeer Microsoft Research.](https://reader033.fdocuments.in/reader033/viewer/2022061304/5513cadb55034674748b49fd/html5/thumbnails/36.jpg)
ConclusionConcurrency is important
Building robust concurrent software is still a challengeLack of debugging and testing toolsCHESS: Concurrency unit-testing
Exhaustively try all interleavingsAttempt to seamlessly integrate with existing test
frameworksProvide replay capability
Iterative context-bounding algorithm key to the design