Interfaces for Runtime Correctness Checking of Parallel Programs · 2020. 8. 11. · Generic Tool...

16
Interfaces for Runtime Correctness Checking of Parallel Programs Joachim Protze ([email protected])

Transcript of Interfaces for Runtime Correctness Checking of Parallel Programs · 2020. 8. 11. · Generic Tool...

Page 1: Interfaces for Runtime Correctness Checking of Parallel Programs · 2020. 8. 11. · Generic Tool Interface for Runtime Correctness Checking Joachim Protze 3 Synchronization in OpenMP

Interfaces for Runtime Correctness Checking of Parallel ProgramsJoachim Protze ([email protected])

Page 2: Interfaces for Runtime Correctness Checking of Parallel Programs · 2020. 8. 11. · Generic Tool Interface for Runtime Correctness Checking Joachim Protze 3 Synchronization in OpenMP

Generic Tool Interface for Runtime Correctness Checking

Joachim Protze

2

Motivation

• OpenMP 3 introduced tasks (2008)

• Several data race detection tools for OpenMP tasks popped up just last year

• How can we effectively reduce the porting effort for new programming paradigms?

Memory accesses

Concurrency

Synchronization

Page 3: Interfaces for Runtime Correctness Checking of Parallel Programs · 2020. 8. 11. · Generic Tool Interface for Runtime Correctness Checking Joachim Protze 3 Synchronization in OpenMP

Generic Tool Interface for Runtime Correctness Checking

Joachim Protze

3

Synchronization in OpenMP

Parallel region

• Encountering a parallel directive

happens before execution of the

parallel region

• Encountering a barrier directive

happens before execution of code

following the barrier region

• Encountering the implicit barrier

happens before the master

continues code following the

parallel region

parallel-begin

implicit-task-begin

implicit-task-end !

parallel-end

barrier-begin

barrier-end

barrier-begin

Page 4: Interfaces for Runtime Correctness Checking of Parallel Programs · 2020. 8. 11. · Generic Tool Interface for Runtime Correctness Checking Joachim Protze 3 Synchronization in OpenMP

Generic Tool Interface for Runtime Correctness Checking

Joachim Protze

4

Synchronization in OpenMP

Task region

• Encountering a task directive

happens before execution of the

task region

• Finishing execution of a child task

happens before execution of code

following a taskwait, barrier, or

taskgroup region

• Finishing a predecessor task

happens before a dependent task

starts execution

• Deferring a task happens before

scheduling the task again

task-create

task-begin

taskwait-end

task-end

task-begin

task-end

+task-dependencies

task depend(out:a)

task depend(in:a)

taskwait

task

task

Page 5: Interfaces for Runtime Correctness Checking of Parallel Programs · 2020. 8. 11. · Generic Tool Interface for Runtime Correctness Checking Joachim Protze 3 Synchronization in OpenMP

Generic Tool Interface for Runtime Correctness Checking

Joachim Protze

5

Archer based on ThreadSanitizer

• ThreadSanitizer comes with clang and gcc (-fsanitize=thread)

• Compiler instrumentation of memory accesses− Less overhead than binary instrumentation (e.g., PIN, valgrind)

• ThreadSanitizer is not aware of OpenMP synchronization

• Happens before analysis with simplified fast track algorithm.− 4 records of memory access to a word, storing (epoch,tid,r/w)

• Archer annotates OpenMP synchronization− Initially instrumentation of the LLVM/OpenMP runtime

− Now based on OMPT events

Page 6: Interfaces for Runtime Correctness Checking of Parallel Programs · 2020. 8. 11. · Generic Tool Interface for Runtime Correctness Checking Joachim Protze 3 Synchronization in OpenMP

Generic Tool Interface for Runtime Correctness Checking

Joachim Protze

6

Data race analysis overhead for SPEC OMP 2012 (train)

• Expected overhead according to base tool: 2-20x

• 359.botsspar and 370.mgrid331 > 20x − Both run <1 second with high synchronization rate

▪ 359.botsspar: 353400 task switches

▪ 370.mgrid331: 6383 parallel regions

0

10

20

30

40

50

350 351 352 357 358 359 360 362 363 367 370 371 372 376

To

ol

Slo

wd

ow

n

2 Threads

4 Threads

12 Threads

99.8

Page 7: Interfaces for Runtime Correctness Checking of Parallel Programs · 2020. 8. 11. · Generic Tool Interface for Runtime Correctness Checking Joachim Protze 3 Synchronization in OpenMP

Generic Tool Interface for Runtime Correctness Checking

Joachim Protze

7

Concurrency for OpenMP Tasksth

read

• Execution of

the thread as

observed by a

tool

• Lamport

happens-

before

• Separating

the logical

slices

Wallclock time

• Observed actual

execution with

HB

Happens-before

Observed execution order

Wallclock timeWallclock timeLogical clock

Page 8: Interfaces for Runtime Correctness Checking of Parallel Programs · 2020. 8. 11. · Generic Tool Interface for Runtime Correctness Checking Joachim Protze 3 Synchronization in OpenMP

Generic Tool Interface for Runtime Correctness Checking

Joachim Protze

8

TLC: Marking execution within a thread as concurrentth

read

• Execution of

the thread as

observed by

the tool

• Lamport

happens-

before

• Separating

the logical

slices

• Observed actual

execution with

HB

Happens-before

Observed execution order

Fork / spawn

Wallclock time Wallclock timeWallclock timeLogical clock

Page 9: Interfaces for Runtime Correctness Checking of Parallel Programs · 2020. 8. 11. · Generic Tool Interface for Runtime Correctness Checking Joachim Protze 3 Synchronization in OpenMP

Generic Tool Interface for Runtime Correctness Checking

Joachim Protze

9

Generic events

• Fork(curr, *new)− Fork(curr, *new, *msg)

• Join(curr, next)

• Switch(curr, next)− Switch(curr, next, msg)

• Send(curr, *msg)

• Recv(curr, msg)

Page 10: Interfaces for Runtime Correctness Checking of Parallel Programs · 2020. 8. 11. · Generic Tool Interface for Runtime Correctness Checking Joachim Protze 3 Synchronization in OpenMP

Generic Tool Interface for Runtime Correctness Checking

Joachim Protze

10

Concurrency / Synchronization in Shared Memory

Parallel, Tasks, Loops

• Fork → P2P synchronization, concurrency

• Join → P2P synchronization

• Barrier → global synchronization− Can translate into N2N synchronization

• Dependencies → P2P synchronization

• Locks → ?− Should be flexible to enable lock-set and HB analysis

• Parallel loop → concurrency for each iteration

• Doacross loops → P2P synchronization

Fork(curr, *new)

Join(curr, next)

Switch(curr, next)

Send(curr, *msg)

Recv(curr, msg)

Page 11: Interfaces for Runtime Correctness Checking of Parallel Programs · 2020. 8. 11. · Generic Tool Interface for Runtime Correctness Checking Joachim Protze 3 Synchronization in OpenMP

Generic Tool Interface for Runtime Correctness Checking

Joachim Protze

11

Applying this semantics to MPI

MPI Non-Blocking

• MPI_Isend / MPI_Irecv → concurrency, P2P synchronization− Bind the new execution unit handle to the request

• MPI_Wait → synchronize

• Buffer access → read/write

thre

ad

task

taskMPI_Irecv

MPI_Wait

Page 12: Interfaces for Runtime Correctness Checking of Parallel Programs · 2020. 8. 11. · Generic Tool Interface for Runtime Correctness Checking Joachim Protze 3 Synchronization in OpenMP

Generic Tool Interface for Runtime Correctness Checking

Joachim Protze

12

Applying this semantics to MPI

MPI One-sided

• MPI One-sided epochs → concurrency, P2P synchronization

• MPI One-sided target completion → synchronize

• Remote memory access → read/write

Page 13: Interfaces for Runtime Correctness Checking of Parallel Programs · 2020. 8. 11. · Generic Tool Interface for Runtime Correctness Checking Joachim Protze 3 Synchronization in OpenMP

Device Offloading

Page 14: Interfaces for Runtime Correctness Checking of Parallel Programs · 2020. 8. 11. · Generic Tool Interface for Runtime Correctness Checking Joachim Protze 3 Synchronization in OpenMP

Generic Tool Interface for Runtime Correctness Checking

Joachim Protze

14

Basic memory operations in device offloading

• Memory access

• Alloc/release memory

• (Dis-)Associate memory

• Update memory (memcopy)

OpenMP mapping semantics:

• Alloc alloc + associate

• Map-to ((alloc +) associate +) update to device

• Map-from update from device (+ disassociate (+ release))

• Update-to/from update to/from device

• Release disassociate + release

Challenge: semantics of global/static memory

Page 15: Interfaces for Runtime Correctness Checking of Parallel Programs · 2020. 8. 11. · Generic Tool Interface for Runtime Correctness Checking Joachim Protze 3 Synchronization in OpenMP

Distributed Memory ?

Page 16: Interfaces for Runtime Correctness Checking of Parallel Programs · 2020. 8. 11. · Generic Tool Interface for Runtime Correctness Checking Joachim Protze 3 Synchronization in OpenMP

Thank you for your attention.