A Memory Consistency Model For RISC-V...• C/C++ atomics require same-address LdàLd ordering •...

26
A Memory Consistency Model For RISC-V Formally Evaluated with TriCheck Caroline Trippel Princeton University November 29, 2016 Caroline Trippel, Yatin Manerkar, Daniel Lustig, Michael Pellauer, and Margaret Martonosi. “TriCheck: Memory Model Verification at the Trisection Software, Hardware, and ISA”. In Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '17).

Transcript of A Memory Consistency Model For RISC-V...• C/C++ atomics require same-address LdàLd ordering •...

Page 1: A Memory Consistency Model For RISC-V...• C/C++ atomics require same-address LdàLd ordering • ARM issued errata1: Rewrite compilers to insert fences (with performance penalties)

AMemoryConsistencyModelForRISC-VFormallyEvaluatedwithTriCheck

CarolineTrippelPrincetonUniversityNovember29,2016

CarolineTrippel,Yatin Manerkar,DanielLustig,MichaelPellauer,andMargaretMartonosi.“TriCheck:MemoryModelVerificationattheTrisectionSoftware,Hardware,andISA”.In ProceedingsoftheTwenty-SecondInternationalConferenceonArchitecturalSupportforProgrammingLanguagesandOperatingSystems (ASPLOS'17).

Page 2: A Memory Consistency Model For RISC-V...• C/C++ atomics require same-address LdàLd ordering • ARM issued errata1: Rewrite compilers to insert fences (with performance penalties)

RoleoftheInstructionSetArchitecture(ISA)

Software/HLL

HardwareISA

Weak PPO(e.g.,ARM,POWER)

More orderingprimitives(e.g.,fences/barriers)insertedbycompiler

• Introducedin1964byIBM• 1setofsoftware• >1hardwareimplementations

• Definitivespec.ofhardwareasseenbysoftware:

• Specificationofwhathardwaremustimplement

• Targetforcompilertranslation

Software/HLL

Hardware

ISA

Strong PPO(e.g.,SC,TSO)

Fewer orderingprimitives(e.g.,fences/barriers)insertedbycompiler

Page 3: A Memory Consistency Model For RISC-V...• C/C++ atomics require same-address LdàLd ordering • ARM issued errata1: Rewrite compilers to insert fences (with performance penalties)

OurWork:MemoryConsistencyModelVerification

Software/HLLMemoryModel

ISAMemoryModel

HardwareMemoryModel

Compilation

MicroarchitecturalImplementation PipeCheck [Lustig etal.MICRO-47]CCICheck [Manerkar etal.MICRO-48]

COATCheck [Lustig etal.ASPLOS‘16]

ArMOR [Lustig etal.ISCA‘15]

OperatingSystem

TriCheck [Trippeletal.ASPLOS‘17]

Page 4: A Memory Consistency Model For RISC-V...• C/C++ atomics require same-address LdàLd ordering • ARM issued errata1: Rewrite compilers to insert fences (with performance penalties)

MemoryModelsBugsObservedinPractice

ARMRead-after-ReadHazard[Alglave etal.TOPLAS‘14]• AmbiguousISAspec.regardingsame-addressLdàLd ordering

• ARMcompilersdidnotinsertsynchronizationprimitives(e.g.,fences/barriers)• SomeARMimplementationsrelaxedsame-addressLdàLd ordering(e.g.,Cortex-A9,Snapdragon805)

• C/C++atomicsrequiresame-addressLdàLd ordering• ARMissuederrata1:Rewritecompilerstoinsertfences(withperformancepenalties)

We’veidentifiedandcharacterizedflawsinthecurrentRISC-Vmemorymodel(i.e.,thememorymodeldefinedinthecurrentmanual)[Trippeletal.ASPLOS‘17]

1ARM.Cortex-A9MPCore,programmeradvicenotice,read-after-readhazards.ARMReference761319.,2011.http://infocenter.arm.com/help/topic/com.arm.doc.uan0004a/UAN0004A_a9_read_read.pdf.

Notethatthemodificationstofixtheseissueswillbemostlycompatiblewithcurrentimplementations.

Page 5: A Memory Consistency Model For RISC-V...• C/C++ atomics require same-address LdàLd ordering • ARM issued errata1: Rewrite compilers to insert fences (with performance penalties)

Outline

• RoleofMemoryModelsinISAs• WhatShouldWeRequireFromtheHardware?• WhatFences/BarriersDoWeNeedtoSupportC/C++?• TriCheck FrameworkforFull-StackMemoryModelVerification• On-GoingWork&Conclusions

Page 6: A Memory Consistency Model For RISC-V...• C/C++ atomics require same-address LdàLd ordering • ARM issued errata1: Rewrite compilers to insert fences (with performance penalties)

SequentialConsistency

• Memory models specify the allowed behavior of a multithreaded program executing with shared memory

• First defined by [Lamport 1979], execution is the same as if:(R1) Memory ops of each processor appear in program order(R2) Memory ops of all processors were executed in some global sequential order

Thread 0x=1y=1

Thread 1r1=yr2=x

x=1y=1r1=yr2=x

x=1r1=yy=1r2=x

x=1r1=yr2=xy=1

r1=yr2=xx=1y=1

r1=yx=1r2=xy=1

r1=yx=1y=1r2=x

Program Legal Executions

Page 7: A Memory Consistency Model For RISC-V...• C/C++ atomics require same-address LdàLd ordering • ARM issued errata1: Rewrite compilers to insert fences (with performance penalties)

TwoCategoriesofMemoryModelRelaxation

PreservedProgramOrder:DefinesprogramorderingsthathardwaremustpreservebydefaultStoreAtomicity:Definesorderinwhichstoresbecomevisibletocores• Multiple-copyatomic:

• Allcoresseestoresimultaneously• Read-Own-Write-Early-multiple-copyatomic:

• Storingcorecanreaditsownstorebeforeothercores• Storesmadevisibletoallremotecoressimultaneously

• Non-multiple-copyatomic:• Storingcorecanreaditsownstorebeforeothercores• Storeismadevisibletosomeremotecoresbeforeothers

E.g.,monolithicmemory

E.g.,privatestorebuffer

E.g.,sharedstorebuffer

Page 8: A Memory Consistency Model For RISC-V...• C/C++ atomics require same-address LdàLd ordering • ARM issued errata1: Rewrite compilers to insert fences (with performance penalties)

RISC-VProposedPreservedProgramOrderandStoreAtomicityPreservedProgramOrder:

StoreAtomicity:Non-multiple-copyatomic:

• Storingcorecanreaditsownstorebeforeothercores• Storeismadevisibletosomeremotecoresbeforeothers

Page 9: A Memory Consistency Model For RISC-V...• C/C++ atomics require same-address LdàLd ordering • ARM issued errata1: Rewrite compilers to insert fences (with performance penalties)

EffectsofNon-Multiple-CopyAtomicStores

Initial conditions: x=0, y=0 T0 T1 T2 T3

st [x] ç 1 st [y] ç 1 ld x à [r0] ld y à [r2]F R, R F R, R

ld y à [r1] ld x à [r3]Non-SC Outcome: r0=1, r1=0, r2=1, r3=0

ThisoutcomecorrespondstothecaseinwhichthestoresonthreadsT0andT1arrivetothreadsT2andT3indifferentorders

L1$

L1$

Page 10: A Memory Consistency Model For RISC-V...• C/C++ atomics require same-address LdàLd ordering • ARM issued errata1: Rewrite compilers to insert fences (with performance penalties)

WhyAllowNon-Multiple-CopyAtomicStores?

• CommercialISAsallownon-multiple-copyatomicstores(e.g.ARM,POWER)

• RISC-VisintendedtobeintegratedwithothervendorISAs• Potentialdeploymentinnon-multiple-copyatomicmemorysystems• Ifsharingmemorysystem,awarenessthatstoresmaybeobservedinordersthatdifferfromothercores

Page 11: A Memory Consistency Model For RISC-V...• C/C++ atomics require same-address LdàLd ordering • ARM issued errata1: Rewrite compilers to insert fences (with performance penalties)

Outline

• RoleofMemoryModelsinISAs• WhatShouldWeRequireFromtheHardware?• WhatFences/BarriersDoWeNeedtoSupportC/C++?• TriCheck FrameworkforFull-StackMemoryModelVerification• On-GoingWork&Conclusions

Page 12: A Memory Consistency Model For RISC-V...• C/C++ atomics require same-address LdàLd ordering • ARM issued errata1: Rewrite compilers to insert fences (with performance penalties)

FencestoRestoreMultiple-CopyAtomicity

Initial conditions: x=0, y=0 T0 T1 T2 T3

st [x] ç 1 st [y] ç 1 ld x à [r0] ld y à [r2]pscF RW, RW pscF RW, RW

ld y à [r1] ld x à [r3]Non-SC Outcome: r0=1, r1=0, r2=1, r3=0

Predecessor-/Successor- CumulativeFence:NecessarytoRestoreSCforNon-Multiple-CopyAtomicMemorySystems

Page 13: A Memory Consistency Model For RISC-V...• C/C++ atomics require same-address LdàLd ordering • ARM issued errata1: Rewrite compilers to insert fences (with performance penalties)

OtherFences/Barriers/OrderingPrimitives

• BaselineMemoryModel• PPOrequiressame-addressR-Rordertobemaintained• PPOrequiresordertobemaintainedbetweenmostdependentinstructions• Predecessor-/Successor-CumulativeFRW,RW;FIO,IO;FIORW,IORW

• Baseline+AtomicsExtension• Predecessor-CumulativeFRW,W

Page 14: A Memory Consistency Model For RISC-V...• C/C++ atomics require same-address LdàLd ordering • ARM issued errata1: Rewrite compilers to insert fences (with performance penalties)

Outline

• RoleofMemoryModelsinISAs• WhatShouldWeRequireFromtheHardware?• WhatFences/BarriersDoWeNeedtoSupportC/C++?• TriCheck FrameworkforFull-StackMemoryModelVerification• On-GoingWork&Conclusions

Page 15: A Memory Consistency Model For RISC-V...• C/C++ atomics require same-address LdàLd ordering • ARM issued errata1: Rewrite compilers to insert fences (with performance penalties)

TriCheck Full-StackVerificationFramework

SuiteofC/C++LitmusTests

SuiteofSmallC/C++Programs

CompilerMappingsfromC/C++toRISC-V

C/C++HerdModel RISC-VCheckModel

ISALevelOutcomeForbiddenC/C++OutcomeForbidden implies

TriCheck comparesHLLoutcomestoISA-leveloutcomes

foraspectrumoflegalISAmicroarchitectures.

Page 16: A Memory Consistency Model For RISC-V...• C/C++ atomics require same-address LdàLd ordering • ARM issued errata1: Rewrite compilers to insert fences (with performance penalties)

TriCheck Full-StackVerificationFramework

SuiteofC/C++LitmusTests

SuiteofSmallC/C++Programs

CompilerMappingsfromC/C++toRISC-V

C/C++HerdModel RISC-VCheckModel

ISALevelOutcomeForbiddenC/C++OutcomeForbidden implies

ISADOESNOTALLOWoutcomesprohibitedbytheISA

Page 17: A Memory Consistency Model For RISC-V...• C/C++ atomics require same-address LdàLd ordering • ARM issued errata1: Rewrite compilers to insert fences (with performance penalties)

TriCheck Full-StackVerificationFramework

SuiteofC/C++LitmusTests

SuiteofSmallC/C++Programs

CompilerMappingsfromC/C++toRISC-V

C/C++HerdModel RISC-VCheckModel

ISALevelOutcomeForbiddenC/C++OutcomeForbidden implies

ISAALLOWSoutcomesprohibitedbytheISA

Page 18: A Memory Consistency Model For RISC-V...• C/C++ atomics require same-address LdàLd ordering • ARM issued errata1: Rewrite compilers to insert fences (with performance penalties)

RISC-VBase:LackofCumulativeFences

050100150200250

WR

rWR

rWM

rMM

nWR

nMM

A9like

WR

rWR

rWM

rMM

nWR

nMM

A9like

riscv-curr riscv-ours

wrc

RISC-VBaseline(Base)

TestVa

riatio

nsBugs OverlyStrict Equivalent

μSpec Model:

Variation:

Litmustest:

ISA:

• C/C++acquire/releasesynchronizationistransitive:• Accessesbeforeareleasewriteinprogramorder,andobservedbythe

releasingcorepriortothereleasewrite mustbeorderedbeforethereleasefromtheviewpointofanacquirereadthatreadsfromthereleasewrite

• BaseRISC-VISAlackscumulativefences• Minimally,theISArequiresaPredecessor-/SuccessorCumulativeFRW,RW• Cannotfixbugsby modifyingcompilercurrently

OurcurrentRISC-VproposalrequiresonlyaP-/S-CumulativeFRW,RWintheRISC-VBaseISA,andincludesaweakerP-CumulativeFRW,WFenceintheBase+Atomics extension.

Page 19: A Memory Consistency Model For RISC-V...• C/C++ atomics require same-address LdàLd ordering • ARM issued errata1: Rewrite compilers to insert fences (with performance penalties)

Outline

• RoleofMemoryModelsinISAs• WhatShouldWeRequireFromtheHardware?• WhatFences/BarriersDoWeNeedtoSupportC/C++?• TriCheck FrameworkforFull-StackMemoryModelVerification• On-GoingWork&Conclusions

Page 20: A Memory Consistency Model For RISC-V...• C/C++ atomics require same-address LdàLd ordering • ARM issued errata1: Rewrite compilers to insert fences (with performance penalties)

On-GoingWork&Conclusions

• WehaveformulatedanEnglishlanguagediff.ofthecurrentspec.withourproposedchanges

• CurrentlyweareconstructingaformalmodelinHerd[Alglave etal.,TOPLAS‘14]ofourproposedmemorymodelmodifications

• Memorymodeldesignchoicesarecomplicatedandinvolvereasoningaboutthesubtleinterplaybetweenmanydiversefeatures

• DefininganISAspecificationinlightoftheevaluationofasinglemicroarchitectureisnotsufficient

• TriCheck isgeneralizabletoanyISA anduncovered/quantifiedflawsintheRISC-Vmemorymode.

Page 21: A Memory Consistency Model For RISC-V...• C/C++ atomics require same-address LdàLd ordering • ARM issued errata1: Rewrite compilers to insert fences (with performance penalties)

[email protected]://check.cs.princeton.edu/

21

Page 22: A Memory Consistency Model For RISC-V...• C/C++ atomics require same-address LdàLd ordering • ARM issued errata1: Rewrite compilers to insert fences (with performance penalties)

RISC-VBase+A:LackofTransitiveReleases

• C/C++acquire/releasesynchronizationistransitive:• Accessesbeforeareleasewriteinprogramorder,andobservedbythe

releasingcorepriortothereleasewrite mustbeorderedbeforethereleasefromtheviewpointofanacquirereadthatreadsfromthereleasewrite

0

50

100

150

200

250

WR

rWR

rWM

rMM

nWR

nMM

A9like

WR

rWR

rWM

rMM

nWR

nMM

A9like

riscv-curr riscv-ours

wrc

RISC-VBaseline+Atomics(Base+A)

TestVa

riatio

nsBugs OverlyStrict Equivalent

μSpec Model:

Variation:

Litmustest:

ISA:

• Base+A RISC-VISAlackstransitivereleases• i.e.,RISC-VacquiresdonotsynchronizewithRISC-VreleasesasrequiredbyC/C++• AMO.rl andstrongerAMO.aq.rl arebothinsufficeint• Cannotfixbugsby modifyingcompiler

• Oursolution: redefinereleaseoperationsintheBase+A RISC-VISAtobetransitive

Page 23: A Memory Consistency Model For RISC-V...• C/C++ atomics require same-address LdàLd ordering • ARM issued errata1: Rewrite compilers to insert fences (with performance penalties)

0153045607590

WR

rWR

rWM

rMM

nWR

nMM

A9like

WR

rWR

rWM

rMM

nWR

nMM

A9like

WR

rWR

rWM

rMM

nWR

nMM

A9like

WR

rWR

rWM

rMM

nWR

nMM

A9like

riscv-curr riscv-ours riscv-curr riscv-ours

mp sb

RISC-VBaseline+Atomics(Base+A)

TestVariations

Bugs OverlyStrict Equivalent

μSpec Model:

Variation:

Litmustest:

ISA:

RISC-VBase+A:NoRoach-MotelMovementforSCAtomics• Roach-motelmovement=expansionofacquire-release

criticalsection• C++SCloadhaveC++Acquiresemantics• C++SCstoreshaveC++Releasesemantics

• RISC-VSCloadsandstoresrequirebothaq andrl bitssetonAMOs• Operationhasacquireandreleasesemantics• Prohibitsroach-motelmovement

• Oursolution: addansc bitforimplementingAMO.aq.sc andAMO.rl.sc instructionswhicharecapableofimplementingC/C++SCloadsandstores

Page 24: A Memory Consistency Model For RISC-V...• C/C++ atomics require same-address LdàLd ordering • ARM issued errata1: Rewrite compilers to insert fences (with performance penalties)

RISC-VBase:SameAddressLdàLd Re-Ordering

• C/C++forbidssame-addressLdàLd reordering• BugsalwayswhenC/C++loadsaremappedtoregularRISC-Vloads

0153045607590

WR rWRrWMrMMnWRnMM A9 WR rWRrWMrMMnWRnMM A9

riscv-curr riscv-ours

corr

RISC-VBaseline(Base)

TestVa

riatio

nsBugs OverlyStrict Equivalent

μSpec Model:

Variation:

Litmustest:

ISA:

Initial conditions: x=0, y=0

T0 T1

a: sw x1, (x5) c: lw x3, (x5)

b: sw x2, (x5) d: lw x4, (x5)

Forbidden HLL Outcome: x1=1, x2=2, x3=2, x4=1• BaseRISC-VISAincludesFR,R• Possibletofixbugsbymodifyingcompilerwithpotentialperformancepenalty• 20.3%preliminaryestimateoffenceinsertionperformancepenaltyforARM

• Oursolution: modifyBaseRISC-Vmemorymodeltorequiresame-addressLdàLd ordering

OurcurrentRISC-Vproposalelimites FR,RfromtheRISC-VBaseISA,andrequireshardwaretoenforcesame-addressLdàLdorderbydefault.

Page 25: A Memory Consistency Model For RISC-V...• C/C++ atomics require same-address LdàLd ordering • ARM issued errata1: Rewrite compilers to insert fences (with performance penalties)

Re-orderingDependentOperations• RISC-Vdoesnotrequireorderingfordependentinstructions• ManycommercialISAs– x86,ARM,Power– respectdependencies

• Canalsobeusedaslightweightsynchronization

• Explicitsynchronization/fencesneededwhendependencyorderingisrequiredbutnotenforcedbydefault,e.g.,Linux

• Macroread_barrier_depends()optionallyinsertsabarrier• InsertsafenceforAlpha,whichdoesnotrespectdependencies1

• InsertsnothingforRISC-V,whichdoesnotrespectdependencies2

• Oursolution:modifyBaseRISC-Vmemorymodeltorequirethepreservationofdependencyorderings.

1LinusTorvaldsetal.Linuxkernel,2016.https://github.com/torvalds/linux/blob/master/arch/alpha/include/asm/barrier.h2RISC-VFoundation.RISC-VportofLinuxkernel,2016.https://github.com/riscv/riscv-linux/blob/master/rch/riscv/include/asm/barrier.h

Page 26: A Memory Consistency Model For RISC-V...• C/C++ atomics require same-address LdàLd ordering • ARM issued errata1: Rewrite compilers to insert fences (with performance penalties)

[email protected]://check.cs.princeton.edu/

26