SWORD: A Bounded Memory -Overhead Detector of OpenMP Data …€¦ · • Java ‘volatile’...

69
Simone Atzeni, Ganesh Gopalakrishnan, Zvonimir Rakamaric School of Computing, University of Utah, Salt Lake City , UT 84112 Presented at IPDPS 2018 See paper for details Ignacio Laguna, Greg L. Lee, Dong H. Ahn Lawrence Livermore National Laboratory, Livermore, CA Github.com / PRUNERS SWORD: A Bounded Memory-Overhead Detector of OpenMP Data Races in Production Runs Courtesy Pinterest

Transcript of SWORD: A Bounded Memory -Overhead Detector of OpenMP Data …€¦ · • Java ‘volatile’...

Page 1: SWORD: A Bounded Memory -Overhead Detector of OpenMP Data …€¦ · • Java ‘volatile’ annotations • NOT C ‘volatiles’ ! • C++11 ’atomic’ annotations. A third way

Simone Atzeni, Ganesh Gopalakrishnan, Zvonimir RakamaricSchool of Computing, University of Utah, Salt Lake City, UT 84112

Presented at IPDPS 2018

See paper for details

Ignacio Laguna, Greg L. Lee, Dong H. AhnLawrence Livermore National Laboratory, Livermore, CA

Github.com / PRUNERS

SWORD: A Bounded Memory-Overhead Detectorof OpenMP Data Races

in Production Runs

Courtesy Pinterest

Page 2: SWORD: A Bounded Memory -Overhead Detector of OpenMP Data …€¦ · • Java ‘volatile’ annotations • NOT C ‘volatiles’ ! • C++11 ’atomic’ annotations. A third way

What is a data race?

Page 3: SWORD: A Bounded Memory -Overhead Detector of OpenMP Data …€¦ · • Java ‘volatile’ annotations • NOT C ‘volatiles’ ! • C++11 ’atomic’ annotations. A third way

What is a data race?

Thread 1 Thread 2

Page 4: SWORD: A Bounded Memory -Overhead Detector of OpenMP Data …€¦ · • Java ‘volatile’ annotations • NOT C ‘volatiles’ ! • C++11 ’atomic’ annotations. A third way

What is a data race?

Thread 1 Thread 2

WR/W

Page 5: SWORD: A Bounded Memory -Overhead Detector of OpenMP Data …€¦ · • Java ‘volatile’ annotations • NOT C ‘volatiles’ ! • C++11 ’atomic’ annotations. A third way

What is a data race?

Thread 1 Thread 2

WR/W

No synchronizations

Page 6: SWORD: A Bounded Memory -Overhead Detector of OpenMP Data …€¦ · • Java ‘volatile’ annotations • NOT C ‘volatiles’ ! • C++11 ’atomic’ annotations. A third way

T0 T1

W R/W

One way to eliminate this race

Page 7: SWORD: A Bounded Memory -Overhead Detector of OpenMP Data …€¦ · • Java ‘volatile’ annotations • NOT C ‘volatiles’ ! • C++11 ’atomic’ annotations. A third way

T0 T1

W R/W

One way to eliminate this race

UNLOCK

LOCK

UNLOCK

LOCK

Page 8: SWORD: A Bounded Memory -Overhead Detector of OpenMP Data …€¦ · • Java ‘volatile’ annotations • NOT C ‘volatiles’ ! • C++11 ’atomic’ annotations. A third way

T0 T1

W R/W

One way to eliminate this race

UNLOCK

LOCK

UNLOCK

LOCK

Page 9: SWORD: A Bounded Memory -Overhead Detector of OpenMP Data …€¦ · • Java ‘volatile’ annotations • NOT C ‘volatiles’ ! • C++11 ’atomic’ annotations. A third way

Another way to eliminate this race

T0 T1

W R/W

Page 10: SWORD: A Bounded Memory -Overhead Detector of OpenMP Data …€¦ · • Java ‘volatile’ annotations • NOT C ‘volatiles’ ! • C++11 ’atomic’ annotations. A third way

Another way to eliminate this race

T0 T1

W R/W

RELEASE

ACQUIRE

Signal using `special’ variables

• Java ‘volatile’ annotations• NOT C ‘volatiles’ !

• C++11 ’atomic’ annotations

Page 11: SWORD: A Bounded Memory -Overhead Detector of OpenMP Data …€¦ · • Java ‘volatile’ annotations • NOT C ‘volatiles’ ! • C++11 ’atomic’ annotations. A third way

A third way

T0 T1

W R/W

Page 12: SWORD: A Bounded Memory -Overhead Detector of OpenMP Data …€¦ · • Java ‘volatile’ annotations • NOT C ‘volatiles’ ! • C++11 ’atomic’ annotations. A third way

A third way

T0 T1

W R/W

Put a barrier

Page 13: SWORD: A Bounded Memory -Overhead Detector of OpenMP Data …€¦ · • Java ‘volatile’ annotations • NOT C ‘volatiles’ ! • C++11 ’atomic’ annotations. A third way

Why eliminate races?

Page 14: SWORD: A Bounded Memory -Overhead Detector of OpenMP Data …€¦ · • Java ‘volatile’ annotations • NOT C ‘volatiles’ ! • C++11 ’atomic’ annotations. A third way

Popular answer: “avoid nondeterminism”

T0 T1

X = 0 t = X

Page 15: SWORD: A Bounded Memory -Overhead Detector of OpenMP Data …€¦ · • Java ‘volatile’ annotations • NOT C ‘volatiles’ ! • C++11 ’atomic’ annotations. A third way

Unclear what “nondeterminism” means..

Page 16: SWORD: A Bounded Memory -Overhead Detector of OpenMP Data …€¦ · • Java ‘volatile’ annotations • NOT C ‘volatiles’ ! • C++11 ’atomic’ annotations. A third way

Execution Order is Still Nondeterministic

T0 T1

X = 0 t = X

UNLOCK

LOCK

UNLOCK

LOCK

Page 17: SWORD: A Bounded Memory -Overhead Detector of OpenMP Data …€¦ · • Java ‘volatile’ annotations • NOT C ‘volatiles’ ! • C++11 ’atomic’ annotations. A third way

More relevant: Avoid “pink elephants” !

Page 18: SWORD: A Bounded Memory -Overhead Detector of OpenMP Data …€¦ · • Java ‘volatile’ annotations • NOT C ‘volatiles’ ! • C++11 ’atomic’ annotations. A third way

More relevant: Avoid “pink elephants” !

Pink elephant (Sutter) : “A value you never wrote but managed to read”

Aka ”out of thin air” value

Page 19: SWORD: A Bounded Memory -Overhead Detector of OpenMP Data …€¦ · • Java ‘volatile’ annotations • NOT C ‘volatiles’ ! • C++11 ’atomic’ annotations. A third way

The birth of a pink elephant…

T0 T1

X = 0 t = X

T0 T1

X = 24 t = X

Compiler Optimizations

t is 0 here 24

read here

You may

never have

written “24”

in your program

Page 20: SWORD: A Bounded Memory -Overhead Detector of OpenMP Data …€¦ · • Java ‘volatile’ annotations • NOT C ‘volatiles’ ! • C++11 ’atomic’ annotations. A third way

Details of how a pink elephant is made!

T0 T1

X = 0 t = X

Y = 23

X = Y + 1

T0 T1

t = X

Y = 23

The compiler has NO IDEA thatthe user meant tocommunicate here !!

Compiler optimizationscreate thesepink-elephant

values…

24read here

X = 24

Page 21: SWORD: A Bounded Memory -Overhead Detector of OpenMP Data …€¦ · • Java ‘volatile’ annotations • NOT C ‘volatiles’ ! • C++11 ’atomic’ annotations. A third way

This is why code containing data races

often fail (only) when optimized!

Page 22: SWORD: A Bounded Memory -Overhead Detector of OpenMP Data …€¦ · • Java ‘volatile’ annotations • NOT C ‘volatiles’ ! • C++11 ’atomic’ annotations. A third way

Race-freedom ensures intended communications

T0 T1

W R/W

• You don’t observe

“half baked” values

• Code does not reorder

around sync. points

• No “word tearing”

• Pending writes flushed

(fences inserted)nly

Page 23: SWORD: A Bounded Memory -Overhead Detector of OpenMP Data …€¦ · • Java ‘volatile’ annotations • NOT C ‘volatiles’ ! • C++11 ’atomic’ annotations. A third way

Exploding a myth!

There is no

such thing as a

benign race !!

Page 24: SWORD: A Bounded Memory -Overhead Detector of OpenMP Data …€¦ · • Java ‘volatile’ annotations • NOT C ‘volatiles’ ! • C++11 ’atomic’ annotations. A third way

Races in OpenMP programs are hard to spot

• See#tinyurl.com/ompRaces if#you#wish#• but$later$!

• Static#analysis#tools#never#shown#to#work#well

• First#usable#OpenMPdynamic#race#checker#(afaik)• Archer$[Atzeni,$IPDPS’16]• More$on$that$soon

• This$talk$will#present#the#second#usable#dynamic#race#checker• Sword

Page 25: SWORD: A Bounded Memory -Overhead Detector of OpenMP Data …€¦ · • Java ‘volatile’ annotations • NOT C ‘volatiles’ ! • C++11 ’atomic’ annotations. A third way

This talk: Why and how of another OMP race checker

Page 26: SWORD: A Bounded Memory -Overhead Detector of OpenMP Data …€¦ · • Java ‘volatile’ annotations • NOT C ‘volatiles’ ! • C++11 ’atomic’ annotations. A third way

• HYDRA&porting&on&Sequoia&at&LLNL

• Large&multiphysicsMPI/OpenMPapplication

• Non@deterministic&crashes&in&OpenMP region

• Only&when&the&code&was&optimized!

• Suspected&data&race

• Emergency&hack:

• Disabled&OpenMP&in&Hypre

• Root@cause&found&by&Archer&:• two&threads&writing&0 to&a&common&location&without&synchronization

The Pink Elephant Actually Struck Us!

Page 27: SWORD: A Bounded Memory -Overhead Detector of OpenMP Data …€¦ · • Java ‘volatile’ annotations • NOT C ‘volatiles’ ! • C++11 ’atomic’ annotations. A third way

Archer to the rescue!

Page 28: SWORD: A Bounded Memory -Overhead Detector of OpenMP Data …€¦ · • Java ‘volatile’ annotations • NOT C ‘volatiles’ ! • C++11 ’atomic’ annotations. A third way

Archer [IPDPS’16]• Utah: Simone Atzeni, Ganesh Gopalakrishnan, Zvonimir Rakamaric

• LLNL: Dong H. Ahn, Ignacio Laguna, Martin Schulz, Gregory L. Lee• RWTH: Joachim Protze, Matthias S. Muller

– In production use at LLNL

Part of the “PRUNERS” tool suite

PRUNERS was a finalist of the 2017 R&D 100 Award Selection

Archer to the rescue!

Page 29: SWORD: A Bounded Memory -Overhead Detector of OpenMP Data …€¦ · • Java ‘volatile’ annotations • NOT C ‘volatiles’ ! • C++11 ’atomic’ annotations. A third way

Archer’s “find”

Two$threads$writing$0to$the$same$location$without$synchronization

Page 30: SWORD: A Bounded Memory -Overhead Detector of OpenMP Data …€¦ · • Java ‘volatile’ annotations • NOT C ‘volatiles’ ! • C++11 ’atomic’ annotations. A third way

Archer’s “find”

Two$threads$writing$0to$the$same$location$without$synchronization

Page 31: SWORD: A Bounded Memory -Overhead Detector of OpenMP Data …€¦ · • Java ‘volatile’ annotations • NOT C ‘volatiles’ ! • C++11 ’atomic’ annotations. A third way

Did we live “happily ever after?”

Page 32: SWORD: A Bounded Memory -Overhead Detector of OpenMP Data …€¦ · • Java ‘volatile’ annotations • NOT C ‘volatiles’ ! • C++11 ’atomic’ annotations. A third way

No !

Page 33: SWORD: A Bounded Memory -Overhead Detector of OpenMP Data …€¦ · • Java ‘volatile’ annotations • NOT C ‘volatiles’ ! • C++11 ’atomic’ annotations. A third way

Archer has “memory-outs”; also misses races

Page 34: SWORD: A Bounded Memory -Overhead Detector of OpenMP Data …€¦ · • Java ‘volatile’ annotations • NOT C ‘volatiles’ ! • C++11 ’atomic’ annotations. A third way

• Archer&increases&memory&500%• It&also&misses&races!

• These&were&known&issues• Finally'surfaced'with'the'”right'large'example”

• Root9cause'found'by'Archer':• two'threads'writing'0 to'a'common'location'without'synchronization

Archer has “memory-outs”; also misses races

Page 35: SWORD: A Bounded Memory -Overhead Detector of OpenMP Data …€¦ · • Java ‘volatile’ annotations • NOT C ‘volatiles’ ! • C++11 ’atomic’ annotations. A third way

Reason: Archer employs “shadow cells”

Core 0 Core 1 Core 2 Core 3

ss0

ss1

ss2

ss3

A0

ss0

ss1

ss2

ss3

A1

ss0

ss1

ss2

ss3

Amax

….

A programmable

number of cells

per address

(4 shown, and is

typical)

Page 36: SWORD: A Bounded Memory -Overhead Detector of OpenMP Data …€¦ · • Java ‘volatile’ annotations • NOT C ‘volatiles’ ! • C++11 ’atomic’ annotations. A third way

~4 shadow cells per application location

Core 0 Core 1 Core 2 Core 3

ss0

ss1

ss2

ss3

A0

ss0

ss1

ss2

ss3

A1

ss0

ss1

ss2

ss3

Amax

….

A programmable

number of cells

per address

(4 shown, and is

typical)

Shadow-cells immediately increase memory demand by a factor of four

Page 37: SWORD: A Bounded Memory -Overhead Detector of OpenMP Data …€¦ · • Java ‘volatile’ annotations • NOT C ‘volatiles’ ! • C++11 ’atomic’ annotations. A third way

Archer misses races due to shadow cell eviction

Page 38: SWORD: A Bounded Memory -Overhead Detector of OpenMP Data …€¦ · • Java ‘volatile’ annotations • NOT C ‘volatiles’ ! • C++11 ’atomic’ annotations. A third way

Archer misses races due to shadow cell eviction

Core

0

Core

1

Core

2

Core

3

ss0

ss1

ss2

ss3

A0

ss0

ss1

ss2

ss3

A1

ss0

ss1

ss2

ss3

Amax

….

Page 39: SWORD: A Bounded Memory -Overhead Detector of OpenMP Data …€¦ · • Java ‘volatile’ annotations • NOT C ‘volatiles’ ! • C++11 ’atomic’ annotations. A third way

Core

0

Core

1

Core

2

Core

3

ss0

ss1

ss2

ss3

A0

ss0

ss1

ss2

ss3

A1

ss0

ss1

ss2

ss3

Amax

….

All threads read a[3]Thread 3 writes a[3]All threads read A[3]

Thread 3 writes A[3]

Archer misses races due to shadow cell eviction

Page 40: SWORD: A Bounded Memory -Overhead Detector of OpenMP Data …€¦ · • Java ‘volatile’ annotations • NOT C ‘volatiles’ ! • C++11 ’atomic’ annotations. A third way

Capacity conflict ! evict shadow cell

Core 0 Core 1 Core 2 Core 3

ss0

ss1

ss2

ss3

A0

ss0

ss1

ss2

ss3

A1

ss0

ss1

ss2

ss3

Amax

….

With shadow-cell evicted, races are missed

Page 41: SWORD: A Bounded Memory -Overhead Detector of OpenMP Data …€¦ · • Java ‘volatile’ annotations • NOT C ‘volatiles’ ! • C++11 ’atomic’ annotations. A third way

Archer misses races due to HB-masking

Page 42: SWORD: A Bounded Memory -Overhead Detector of OpenMP Data …€¦ · • Java ‘volatile’ annotations • NOT C ‘volatiles’ ! • C++11 ’atomic’ annotations. A third way

Archer misses races due to HB-masking

These are

concurrent;

there are two

races here!

These races

are missed

in this

interleaving!

Page 43: SWORD: A Bounded Memory -Overhead Detector of OpenMP Data …€¦ · • Java ‘volatile’ annotations • NOT C ‘volatiles’ ! • C++11 ’atomic’ annotations. A third way

Solution : Get rid of shadow cells !!

Page 44: SWORD: A Bounded Memory -Overhead Detector of OpenMP Data …€¦ · • Java ‘volatile’ annotations • NOT C ‘volatiles’ ! • C++11 ’atomic’ annotations. A third way

Offline Analysis

Core 0 Core 1 Core 2 Core 3

Need New Approach with Online/Offline split

RaceReports

Compression Compression Compression Compression

Page 45: SWORD: A Bounded Memory -Overhead Detector of OpenMP Data …€¦ · • Java ‘volatile’ annotations • NOT C ‘volatiles’ ! • C++11 ’atomic’ annotations. A third way

Details of the online phase

Core 0 Core 1 Core 2 Core 3

• Collect'traces'per'core'un#coordinated• Trace-collection-speeds-increased;-we-use-the-OMPT-tracing-method

• Employ-data-compression-to-bring-FULL-traces-out• Only'2.5'MB'compression'buffer'per'thread'(fits-in-L3-cache)

Compression Compression Compression Compression

Page 46: SWORD: A Bounded Memory -Overhead Detector of OpenMP Data …€¦ · • Java ‘volatile’ annotations • NOT C ‘volatiles’ ! • C++11 ’atomic’ annotations. A third way

Consequences for the offline phase

Core 0 Core 1 Core 2 Core 3

•We#would#have#lost#all#the#synchronization#information• We#only#know#what#each#thread#is#doing

•We#must#recover#the#concurrency#structure• And#in#the#context#of#its#happens;before#order,#detect#races!

Compression Compression Compression Compression

Page 47: SWORD: A Bounded Memory -Overhead Detector of OpenMP Data …€¦ · • Java ‘volatile’ annotations • NOT C ‘volatiles’ ! • C++11 ’atomic’ annotations. A third way

Offline synchronization recovery and analysis

0 - [0,1]

1 - [0,1][0,2] 2 - [0,1][1,2]

3 - [0,1][0,2][0,2] 4 - [0,1][0,2][1,2]

7 - [0,1][2,2]

5 - [0,1][1,2][0,2] 6 - [0,1][1,2][1,2]

11 - [0,1][3,2]

12 - [1,1]

8 - [0,1][2,2][0,2] 9 - [0,1][2,2][1,2]

10 - [0,1][4,2]

IBarrier(3)

Barrier(1)read(x)write(y)

write(x)m_acq()

m_rel()

read(y)m_acq(M1)

m_rel(M1)IBarrier(4)

Barrier(2)

write(y)m_acq(M1)

m_rel(M1)

write(x)m_acq()

m_rel()

IBarrier(6)

FOR-LOOP

IBarrier(7)

R1: race on y

R2: race on y

R3: race on x

IBarrier(5)

Core 0 Core 1 Core 2 Core 3

Compression Compression Compression Compression

OpSem(HIPS’18)

Page 48: SWORD: A Bounded Memory -Overhead Detector of OpenMP Data …€¦ · • Java ‘volatile’ annotations • NOT C ‘volatiles’ ! • C++11 ’atomic’ annotations. A third way

Offset-Span Labels: How we record concurrency(Mellor-Crummey, 1991)

Page 49: SWORD: A Bounded Memory -Overhead Detector of OpenMP Data …€¦ · • Java ‘volatile’ annotations • NOT C ‘volatiles’ ! • C++11 ’atomic’ annotations. A third way

Key state in OpSem: Maintain Barrier Intervals

0 - [0,1]

1 - [0,1][0,2] 2 - [0,1][1,2]

3 - [0,1][0,2][0,2] 4 - [0,1][0,2][1,2]

7 - [0,1][2,2]

5 - [0,1][1,2][0,2] 6 - [0,1][1,2][1,2]

11 - [0,1][3,2]

12 - [1,1]

8 - [0,1][2,2][0,2] 9 - [0,1][2,2][1,2]

10 - [0,1][4,2]

IBarrier(3)

Barrier(1)read(x)write(y)

write(x)m_acq()

m_rel()

read(y)m_acq(M1)

m_rel(M1)IBarrier(4)

Barrier(2)

write(y)m_acq(M1)

m_rel(M1)

write(x)m_acq()

m_rel()

IBarrier(6)

FOR-LOOP

IBarrier(7)

IBarrier(5)

Barrier&Interval&1

Barrier&Interval&3

Barrier&Interval&2

Barrier&Interval&5

Page 50: SWORD: A Bounded Memory -Overhead Detector of OpenMP Data …€¦ · • Java ‘volatile’ annotations • NOT C ‘volatiles’ ! • C++11 ’atomic’ annotations. A third way

Examples of Races Reported

0 - [0,1]

1 - [0,1][0,2] 2 - [0,1][1,2]

3 - [0,1][0,2][0,2] 4 - [0,1][0,2][1,2]

7 - [0,1][2,2]

5 - [0,1][1,2][0,2] 6 - [0,1][1,2][1,2]

11 - [0,1][3,2]

12 - [1,1]

8 - [0,1][2,2][0,2] 9 - [0,1][2,2][1,2]

10 - [0,1][4,2]

IBarrier(3)

Barrier(1)read(x)write(y)

write(x)m_acq()

m_rel()

read(y)m_acq(M1)

m_rel(M1)IBarrier(4)

Barrier(2)

write(y)m_acq(M1)

m_rel(M1)

write(x)m_acq()

m_rel()

IBarrier(6)

FOR-LOOP

IBarrier(7)

R1: race on y

R2: race on y

R3: race on x

IBarrier(5)

Barrier&Interval&3

Race&within&same&barrier&interval

Page 51: SWORD: A Bounded Memory -Overhead Detector of OpenMP Data …€¦ · • Java ‘volatile’ annotations • NOT C ‘volatiles’ ! • C++11 ’atomic’ annotations. A third way

Examples of Races Reported

0 - [0,1]

1 - [0,1][0,2] 2 - [0,1][1,2]

3 - [0,1][0,2][0,2] 4 - [0,1][0,2][1,2]

7 - [0,1][2,2]

5 - [0,1][1,2][0,2] 6 - [0,1][1,2][1,2]

11 - [0,1][3,2]

12 - [1,1]

8 - [0,1][2,2][0,2] 9 - [0,1][2,2][1,2]

10 - [0,1][4,2]

IBarrier(3)

Barrier(1)read(x)write(y)

write(x)m_acq()

m_rel()

read(y)m_acq(M1)

m_rel(M1)IBarrier(4)

Barrier(2)

write(y)m_acq(M1)

m_rel(M1)

write(x)m_acq()

m_rel()

IBarrier(6)

FOR-LOOP

IBarrier(7)

R1: race on y

R2: race on y

R3: race on x

IBarrier(5)

Barrier&Interval&3

Races&across&parallel&regions

Barrier&Interval&2

Barrier&Interval&5

Page 52: SWORD: A Bounded Memory -Overhead Detector of OpenMP Data …€¦ · • Java ‘volatile’ annotations • NOT C ‘volatiles’ ! • C++11 ’atomic’ annotations. A third way

Good news

•Online&analysis&proved&really&good•No#memory#pressure#!!

Page 53: SWORD: A Bounded Memory -Overhead Detector of OpenMP Data …€¦ · • Java ‘volatile’ annotations • NOT C ‘volatiles’ ! • C++11 ’atomic’ annotations. A third way

Bad news

Offline'analysis'took$a$day$to$$finish$on“medium$sized”$examples

Page 54: SWORD: A Bounded Memory -Overhead Detector of OpenMP Data …€¦ · • Java ‘volatile’ annotations • NOT C ‘volatiles’ ! • C++11 ’atomic’ annotations. A third way

Two Key Innovations Saved the Approach

• Self%balancing,red%black interval,trees

• On%the%fly,generation,of,Integer,Linear,Programs

Page 55: SWORD: A Bounded Memory -Overhead Detector of OpenMP Data …€¦ · • Java ‘volatile’ annotations • NOT C ‘volatiles’ ! • C++11 ’atomic’ annotations. A third way

• Decompress,*record*strided accesses)in*self0balancing*red0black interval*trees

• Generate*Integer*Linear*Programs*on0the0fly,*and*check*for*overlaps• Handles)bursts)of)accesses)efficiently

Core 0 Core 1 Core 2 Core 3

Reducing “a day” to “under a minute”

Compression Compression Compression Compression

Page 56: SWORD: A Bounded Memory -Overhead Detector of OpenMP Data …€¦ · • Java ‘volatile’ annotations • NOT C ‘volatiles’ ! • C++11 ’atomic’ annotations. A third way

OMP read/writes are bursty with strides!

Page 57: SWORD: A Bounded Memory -Overhead Detector of OpenMP Data …€¦ · • Java ‘volatile’ annotations • NOT C ‘volatiles’ ! • C++11 ’atomic’ annotations. A third way

OMP read/writes are bursty with strides!

Each of this is a multi-word access

Build Integer Linear Programs for each constant-stride intervalILP system encodes accessed byte-addresses in each “burst”

Page 58: SWORD: A Bounded Memory -Overhead Detector of OpenMP Data …€¦ · • Java ‘volatile’ annotations • NOT C ‘volatiles’ ! • C++11 ’atomic’ annotations. A third way

Overlap of Access Bursts: ILP Generation!

Page 59: SWORD: A Bounded Memory -Overhead Detector of OpenMP Data …€¦ · • Java ‘volatile’ annotations • NOT C ‘volatiles’ ! • C++11 ’atomic’ annotations. A third way

Interval Trees to record accesses[335820,335820],1R,4,4208860

[335820,335820],1W,4,4208658

[335824,335824],1W,4,4208639

[335816,335816],1W,4,4208677

[335820,335820],1R,4,4208822

[335820,335820],1W,4,4208884

[335920,335920],1R,4,4208736

[335812,335812],1W,4,4208696

[335820,335820],1R,4,4208926

[335824,335824],1R,4,4208902

[337888,339884],500R,4,4208985

[337892,339888],500W,4,4209028

• Recorded info is: [Begin, End], #Accesses, Kind, Stride, AtWhichPCValue

• Allows efficient comparison of access bursts across threads

• These Red-Black trees are highly tuned

• Used within Linux to realize fair scheduling methods

Page 60: SWORD: A Bounded Memory -Overhead Detector of OpenMP Data …€¦ · • Java ‘volatile’ annotations • NOT C ‘volatiles’ ! • C++11 ’atomic’ annotations. A third way

Concluding Remarks: Sword is now practical!

Both%Archer%and%Sword%are%available

Github.com /%PRUNERS

Page 61: SWORD: A Bounded Memory -Overhead Detector of OpenMP Data …€¦ · • Java ‘volatile’ annotations • NOT C ‘volatiles’ ! • C++11 ’atomic’ annotations. A third way

Conclusions: Time for “Medium” Examples

Online Offline Total Efficacy

Archer 1 0 1 Misses races

Sword 1 10* 11Finds all races within the execution**

* : can be brought down to 1 by using an MPI cluster** : we define the formal semantics of OMP race checking [HIPS’18]

Page 62: SWORD: A Bounded Memory -Overhead Detector of OpenMP Data …€¦ · • Java ‘volatile’ annotations • NOT C ‘volatiles’ ! • C++11 ’atomic’ annotations. A third way

Online Offline Total Efficacy

Archer 1 0 1 Misses races

Sword 1 10* 11

Finds all

races within

the

execution**

Conclusions: Time for Larger Examples

Memory

Page 63: SWORD: A Bounded Memory -Overhead Detector of OpenMP Data …€¦ · • Java ‘volatile’ annotations • NOT C ‘volatiles’ ! • C++11 ’atomic’ annotations. A third way

• Sword&works&well&;&finds&more&races&than&Archer• Applied&to&realistic&benchmarks

• Archer&test&suite• RaceBench from&LLNL

• Offline&analysis&can&be&parallelized• Still&“decent”&on&standard&multicore&platforms

• It&took&many%ideas%working&together&to&realize&Sword• Formal&semantics&of&OpenMPConcurrency• Online&/&Offline&checking&split• Data&compression• SelfAbalancing&interval&trees• ILPAsystems&to&compress&traces

• Employs&standard&tracing&methods&based&on&OMPT

More Concluding Remarks

Page 64: SWORD: A Bounded Memory -Overhead Detector of OpenMP Data …€¦ · • Java ‘volatile’ annotations • NOT C ‘volatiles’ ! • C++11 ’atomic’ annotations. A third way

• Continue to(debug(/(tune(Sword• Incorporate ideas(from(upcoming(pubs• GPU(race checking

Future Work

Page 65: SWORD: A Bounded Memory -Overhead Detector of OpenMP Data …€¦ · • Java ‘volatile’ annotations • NOT C ‘volatiles’ ! • C++11 ’atomic’ annotations. A third way

Group Credits

Simone Zvonimir Dong Ignacio Greg

Page 66: SWORD: A Bounded Memory -Overhead Detector of OpenMP Data …€¦ · • Java ‘volatile’ annotations • NOT C ‘volatiles’ ! • C++11 ’atomic’ annotations. A third way

Extras

Page 67: SWORD: A Bounded Memory -Overhead Detector of OpenMP Data …€¦ · • Java ‘volatile’ annotations • NOT C ‘volatiles’ ! • C++11 ’atomic’ annotations. A third way

• High-level code is just “fiction”• Code%optimizations%are%done%on%a%PER%THREAD%basis• Races%occur%if%you%don’t%tell%a%compiler%what’s%shared

while(!f)%{}%%%%%! r%=%f;%%while%(!r)%{}%%%%%:%this%is%OK%if%“f”%is%purely%local

while(!f)%{}%%%%%! r%=%f;%%while%(!r)%{}%%%%%:%not%OK%if%f is%shared%and%you%don’t%tellthis%to%the%compiler

• How to inform a compiler• Put%the%variables%inside%a%mutex (or%other%synchronization%block)• Declare%them%to%be%a%Java%volatile%or%C++11%atomic

• C-volatiles won’t do (they don’t have a definite concurrency semantics)

Data Races: Gist

Page 68: SWORD: A Bounded Memory -Overhead Detector of OpenMP Data …€¦ · • Java ‘volatile’ annotations • NOT C ‘volatiles’ ! • C++11 ’atomic’ annotations. A third way

• High-level code is just “fiction”• Code%optimizations%are%done%on%a%PER%THREAD%basis• Races%occur%if%you%don’t%tell%a%compiler%what’s%shared

while(!f)%{}%%%%%! r%=%f;%%while%(!r)%{}%%%%%:%this%is%OK%if%“f”%is%purely%local

while(!f)%{}%%%%%! r%=%f;%%while%(!r)%{}%%%%%:%not%OK%if%f is%shared%and%you%don’t%tellthis%to%the%compiler

Data Races: Gist

Page 69: SWORD: A Bounded Memory -Overhead Detector of OpenMP Data …€¦ · • Java ‘volatile’ annotations • NOT C ‘volatiles’ ! • C++11 ’atomic’ annotations. A third way

GPUs races also can lead to “pink-elephants”

Analogy due to Herb Sutter

__global__'void'kernel(int*'x,'int*'y)'{'''int'index'='threadIdx.x;'

''y[index]'='x[index]'+'y[index];'

''if'(index'!='63'&&'index'!='31)'''''y[index+1]'='1111;'

}'

Ini$ally(:(x[i](==(y[i](==(i(

Warp1size(=(32(

The'hardware'schedules'these'instrucKons'in'“warps”'(SIMD'groups).''

However,'this'“warp'view”'oSen'appears'to'be'lost'

E.g.'When'compiling'with'opKmizaKons'

Expected(Answer:(0,(1111,(1111,(…,(1111,(64,(1111,(…('

New(Answer:(0,(2,(4,(6,(8,(…'