An efficient data race detector for DIOTA Michiel Ronsse, Bastiaan Stougie, Jonas Maebe, Frank...

27
An efficient data race An efficient data race detector for DIOTA detector for DIOTA Michiel Ronsse, Bastiaan Stougie, Jonas Maebe, Frank Cornelis, Koen De Bosschere Department of Electronics and Information Systems, Ghent University, Belgium Computer Engineering Lab, Delft University of Technology, The Netherlands Parco2003, September 2-5, Dresden

Transcript of An efficient data race detector for DIOTA Michiel Ronsse, Bastiaan Stougie, Jonas Maebe, Frank...

Page 1: An efficient data race detector for DIOTA Michiel Ronsse, Bastiaan Stougie, Jonas Maebe, Frank Cornelis, Koen De Bosschere Department of Electronics and.

An efficient data race detector An efficient data race detector for DIOTAfor DIOTA

Michiel Ronsse, Bastiaan Stougie, Jonas Maebe, Frank Cornelis, Koen De Bosschere

Department of Electronics and Information Systems, Ghent University, BelgiumComputer Engineering Lab, Delft University of Technology, The Netherlands

Parco2003, September 2-5, Dresden

Page 2: An efficient data race detector for DIOTA Michiel Ronsse, Bastiaan Stougie, Jonas Maebe, Frank Cornelis, Koen De Bosschere Department of Electronics and.

2

ContentsContents

Introduction Non-determinism & data races DIOTA On-the-fly data race detection using DIOTA

Method Implementation

Date Race Detection Example Experimental Evaluation Conclusions

Page 3: An efficient data race detector for DIOTA Michiel Ronsse, Bastiaan Stougie, Jonas Maebe, Frank Cornelis, Koen De Bosschere Department of Electronics and.

3

IntroductionIntroduction Developing parallel programs for

multiprocessors with shared memory is considered difficult: number of threads running simultaneously co-operation & synchronisation through shared

memory Data races occur when:

two threads access the same shared variable (memory location) in an unsynchronised way and at least one thread modifies the variable

Page 4: An efficient data race detector for DIOTA Michiel Ronsse, Bastiaan Stougie, Jonas Maebe, Frank Cornelis, Koen De Bosschere Department of Electronics and.

4

Example codeExample code

#include <pthread.h>

unsigned global=5;

thread2(){ global=global+6; }thread3(){ global=global+7; }

main(){pthread_t t2,t3;pthread_create(&t2, NULL, thread1, NULL);pthread_create(&t3, NULL, thread2, NULL);pthread_join(t2, NULL);pthread_join(t3, NULL);printf(“global=%d\n”, global);

}

Page 5: An efficient data race detector for DIOTA Michiel Ronsse, Bastiaan Stougie, Jonas Maebe, Frank Cornelis, Koen De Bosschere Department of Electronics and.

5

Possible executionsPossible executions

L(5)

global=12 global=18global=11

L(5)

L(5)

L(5)

L(5)

L(11)S(11)

S(12) S(11)S(12)

S(11)

S(18)

+6 +7

+6

+7

+6

+7

Page 6: An efficient data race detector for DIOTA Michiel Ronsse, Bastiaan Stougie, Jonas Maebe, Frank Cornelis, Koen De Bosschere Department of Electronics and.

6

Example code IIExample code II

#include <pthread.h>

unsigned global=5;

thread2(){lock(); global=global+6; unlock();}thread3(){lock(); global=global+7; unlock();}

main(){pthread_t t2,t3;pthread_create(&t2, NULL, thread1, NULL);pthread_create(&t3, NULL, thread2, NULL);pthread_join(t2, NULL);pthread_join(t3, NULL);printf(“global=%d\n”, global);

}

Page 7: An efficient data race detector for DIOTA Michiel Ronsse, Bastiaan Stougie, Jonas Maebe, Frank Cornelis, Koen De Bosschere Department of Electronics and.

7

Detecting Data RacesDetecting Data Races

Automatic data races detection is possible collect all memory references check parallel references

Static methods: checking the source code for all possible

executions with all possible input values NP complete not feasible

Dynamic methods: detects data races during one particular execution

• post mortem (not feasible)• on-the-fly

Page 8: An efficient data race detector for DIOTA Michiel Ronsse, Bastiaan Stougie, Jonas Maebe, Frank Cornelis, Koen De Bosschere Department of Electronics and.

8

Dynamic data race detectionDynamic data race detection

Piece of code between two consecutive synchronisation operations: a segment

We collect two sets for all segments a of all threads: L(a) and S(a) with the addresses of all load and store operations

For all parallel segments a and b,

gives the list of conflicting addresses.

(L(a)S(b)) (S(a)L(b)) (S(a)S(b))

Page 9: An efficient data race detector for DIOTA Michiel Ronsse, Bastiaan Stougie, Jonas Maebe, Frank Cornelis, Koen De Bosschere Department of Electronics and.

9

Logical ClocksLogical Clocks

A logical clock C( ) attaches a timestamp C(a) to an event a

Used for tracing the causal order of events

Clock condition:

Clocks are strongly consistent if

)()( bCaCba

)()( bCaCba

Page 10: An efficient data race detector for DIOTA Michiel Ronsse, Bastiaan Stougie, Jonas Maebe, Frank Cornelis, Koen De Bosschere Department of Electronics and.

10

Scalar ClocksScalar Clocks

Lamport Clocks Simple and fast update algorithm:

Provides only limited information:

1}:{max ii aSCbabSC

baabbSCaSC

babSCaSC

bababSCaSC

//

//

//

or

or

Page 11: An efficient data race detector for DIOTA Michiel Ronsse, Bastiaan Stougie, Jonas Maebe, Frank Cornelis, Koen De Bosschere Department of Electronics and.

11

Scalar Clocks: exampleScalar Clocks: example

10 57

1112

15

13

1414

Page 12: An efficient data race detector for DIOTA Michiel Ronsse, Bastiaan Stougie, Jonas Maebe, Frank Cornelis, Koen De Bosschere Department of Electronics and.

12

Vector ClocksVector Clocks

A vector clock for a program using N processes consists of N scalar values

Such a clock is strongly consistent

0,...,0,1,0,..,0}:{max ii aVCbabVC

baotherwise

abbVCaVC

babVCaVC

//

Page 13: An efficient data race detector for DIOTA Michiel Ronsse, Bastiaan Stougie, Jonas Maebe, Frank Cornelis, Koen De Bosschere Department of Electronics and.

13

Vector Clocks: exampleVector Clocks: example

10,2,4 2,4,63,7,5

11,2,4

10,8,5

12,9,5

10,9,5

10,8,710,10,5

Page 14: An efficient data race detector for DIOTA Michiel Ronsse, Bastiaan Stougie, Jonas Maebe, Frank Cornelis, Koen De Bosschere Department of Electronics and.

14

Vector Clocks: exampleVector Clocks: example

10,2,4 2,4,63,7,5

11,2,4

10,8,5

12,9,5

10,9,5

10,8,710,10,5

Page 15: An efficient data race detector for DIOTA Michiel Ronsse, Bastiaan Stougie, Jonas Maebe, Frank Cornelis, Koen De Bosschere Department of Electronics and.

15

DIOTADIOTA DIOTA (Dynamic Instrumentation, Optimization and

Transformation of Applications) is a generic instrumentation tool

Backends use DIOTA to instrument memory intercept synchronisation functions ….

Deals correctly with data in code, code in data, self-modifying code

Clones processes: the original process is used for the data and the instrumented clone is used for the code

No need for recompilation, relinking or instrumentation of files.

Page 16: An efficient data race detector for DIOTA Michiel Ronsse, Bastiaan Stougie, Jonas Maebe, Frank Cornelis, Koen De Bosschere Department of Electronics and.

16

Execution replayExecution replay

ROLT (Reconstruction of Lamport Timestamps) is used for tracing/replaying the synchronisation operations

Attaches a scalar Lamport timestamp to each synchronisation operation

Delaying synchronisation operations for operations with a smaller timestamp suffices for a correct replay

We only need to log a small subset of all operations

Page 17: An efficient data race detector for DIOTA Michiel Ronsse, Bastiaan Stougie, Jonas Maebe, Frank Cornelis, Koen De Bosschere Department of Electronics and.

17

Collecting memory operationsCollecting memory operations

We need two lists of addresses per segment a: L(a) and S(a)

A multilevel bitmap is used takes spatiality into account low memory consumption comparing two bitmaps is easy

We lose information: two accesses to the same variable are counted once. This is however no problem for data race detection.

Page 18: An efficient data race detector for DIOTA Michiel Ronsse, Bastiaan Stougie, Jonas Maebe, Frank Cornelis, Koen De Bosschere Department of Electronics and.

18

Multilevel Memory bitmapMultilevel Memory bitmap

9 bit 9 bit 14 bit

S(a)

Page 19: An efficient data race detector for DIOTA Michiel Ronsse, Bastiaan Stougie, Jonas Maebe, Frank Cornelis, Koen De Bosschere Department of Electronics and.

19

Detecting parallel segmentsDetecting parallel segments

A vector timestamp is attached to each segment.

All segment information (two bitmaps+vector timestamps) is kept on a list L.

Each new segment is compared against the segments on list L.

Page 20: An efficient data race detector for DIOTA Michiel Ronsse, Bastiaan Stougie, Jonas Maebe, Frank Cornelis, Koen De Bosschere Department of Electronics and.

20

Detecting obsolete segmentsDetecting obsolete segments Obsolete segments should be removed from list L as soon as possible.

An obsolete segment is a segment that can no longer be parallel with new segments.

We use snooped matrix clock in order to detect these segments.

Page 21: An efficient data race detector for DIOTA Michiel Ronsse, Bastiaan Stougie, Jonas Maebe, Frank Cornelis, Koen De Bosschere Department of Electronics and.

21

Detecting obsolete segmentsDetecting obsolete segments

segments on list L

segments in execution

point of execution

the future

Page 22: An efficient data race detector for DIOTA Michiel Ronsse, Bastiaan Stougie, Jonas Maebe, Frank Cornelis, Koen De Bosschere Department of Electronics and.

22

Detecting obsolete segmentsDetecting obsolete segments

segments on list L

obsolete segments

segments in execution

point of execution

the future

Page 23: An efficient data race detector for DIOTA Michiel Ronsse, Bastiaan Stougie, Jonas Maebe, Frank Cornelis, Koen De Bosschere Department of Electronics and.

23

Comparing parallel segmentsComparing parallel segments

segments on list L

obsolete segments

segments in execution

point of execution

the future

Page 24: An efficient data race detector for DIOTA Michiel Ronsse, Bastiaan Stougie, Jonas Maebe, Frank Cornelis, Koen De Bosschere Department of Electronics and.

24

OverviewOverview

Chooseinput

Record Replay+detect

Replay+ident.

Replay+debug

Replay+debug

Choosenew input

Theend

Automatic Requires user intervention

race

race

Page 25: An efficient data race detector for DIOTA Michiel Ronsse, Bastiaan Stougie, Jonas Maebe, Frank Cornelis, Koen De Bosschere Department of Electronics and.

25

Experimental EvaluationExperimental Evaluation

Implementation for Linux running on Intel multiprocessors.

Tested on a dual 500MHz Celeron PC. SPLASH-2 was used as a benchmark

number of multithreaded numeric applications, such as fast fourier transform, a raytracer, ...

Several data races were found, including in SPLASH-2.

Page 26: An efficient data race detector for DIOTA Michiel Ronsse, Bastiaan Stougie, Jonas Maebe, Frank Cornelis, Koen De Bosschere Department of Electronics and.

26

Performance of RecPlayPerformance of RecPlay Slowdown:

Memory consumption: <3.4x

normal diotaprogram exec. no instrument. memory instrum. data race detectionmozilla 7,50 35,00 (4,67x) 169,00 (22,53x) 401,00 (53,47x)

LU.cont -p4 8,06 9,59 (1,19x) 54,15 (6,72x) 85,74 (10,64x)

fft -p4 -m22 11,47 27,59 (2,41x) 200,37 (17,47x) 393,36 (34,29x)

radix -p4 -n41943046,96 11,74 (1,69x) 137,39 (19,74x) 244,18 (35,08x)

cholesky -p4 inputs/tk29.o10,43 12,84 (1,23x) 310,74 (29,79x) 581,97 (55,80x)

ocean -p4 -n51415,59 17,56 (1,13x) 339,06 (21,75x) 667,14 (42,79x)

radiosity -p 4 -batch -room27,50 90,14 (3,28x) 1157,45 (42,09x) 6805,61 (247,48x)

water-spatial < input430,70 52,51 (1,71x) 742,27 (24,18x) 1566,04 (51,01x)

Page 27: An efficient data race detector for DIOTA Michiel Ronsse, Bastiaan Stougie, Jonas Maebe, Frank Cornelis, Koen De Bosschere Department of Electronics and.

27

ConclusionsConclusions

DIOTA is a practical and efficient tool for detecting and removing data races.

Three types of clocks (scalar, vector and matrix) are used to enable a fast and memory-efficient implementation.

Data races have been found.