RISC-Vweb.cecs.pdx.edu/~harry/riscv/RISCV-Summary.pdf · 2018-01-26 · RISC-V: An Overview of the...

RISC-V:AnOverviewofthe

InstructionSetArchitecture

HarryH.PorterIIIPortlandStateUniversity

[email protected]

January26,2018

TheRISC-Vprojectde2inesanddescribesastandardizedInstructionSetArchitecture(ISA).RISC-Visanopen-sourcespeci2icationforcomputerprocessorarchitectures,notaparticularchiporimplementation.Todate,severaldifferentgroupshavedesignedandfabricatedsiliconimplementationsoftheRISC-Vspeci2ications.Basedontheperformanceoftheseimplementationsandthegrowingneedforinteroperabilityamongvendors,itappearsthattheRISC-Vstandardwillincreaseinimportance.

ThisdocumentintroducesandexplainstheRISC-Vstandard,givinganinformaloverviewofthearchitectureofaRISC-Vprocessor.Thisdocumentdoesnotdiscussaparticularimplementation;insteadwediscusssomeofthevariousdesignoptionsthatareincludedintheformalRISC-Vspeci2ication.

ThisdocumentgivesthereaderaninitialintroductiontotheRISC-Vdesign.

DateLastModi=ied:January26,2018 AvailableOnline: www.cs.pdx.edu/~harry/riscv

http://www.cs.pdx.edu/~harry/riscv

TableofContentsListofInstructions 6

Chapter1:Introduction 11Introduction 11TheRISC-VNamingConventions 13TheUsualDisclaimer/RequestForCorrections 15DocumentRevisionHistory/PermissiontoCopy 16BasicTerminology 17

Chapter2:BasicOrganization 20MainMemory 20LittleEndian 20TheRegisters 23ControlandStatusRegisters(CSRs) 25Alignment 26

Chapter3:Instructions 28Instructions 28TheCompressedInstructionExtension 28InstructionEncoding 31User-LevelInstructions 36ArithmeticInstructions(ADD,SUB,…) 38LogicalInstructions(AND,OR,XOR,…) 49ShiftingInstructions(SLL,SRL,SRA,…) 52MiscellaneousInstructions 57BranchandJumpInstructions 65LoadandStoreInstructions 78IntegerMultiplyandDivide 89

Chapter4:FloatingPointInstructions 103FloatingPointing–ReviewofBasicConcepts 103TheFloatingPointExtensions 119FloatingPointLoadandStoreInstructions 124BasicFloatingPointArithmeticInstructions 127FloatingPointConversionInstructions 133

RISC-VArchitectureSummary/Porter Page� of�2 323

TableofContents

FloatingPointMoveInstructions 137FloatingPointCompareandClassifyInstructions 142

Chapter5:RegisterandCallingConventions 146StandardUsageofGeneralPurposeRegisters 146SavingRegistersAcrossCalls 148Registerx0-TheZeroRegister(“zero”) 150Registerx1-TheReturnAddress(“ra”) 150Registerx2-TheStackPointer(“sp”) 151Registerx3-TheGlobalPointer(“gp”) 152Registerx4-TheThreadBasePointer(“tp”) 152Registerx5-x7,x28-x31-TempRegisters(“t0-t6”) 153Registerx8,x9,x18-x27-SavedRegisters(“s0-s11”) 154Registerx8-FramePointer(“fp”) 154Registerx10-x17-ArgumentRegisters(“a0-a7”) 155CallingConventionsforArguments 156FloatingPointRegisters 158

Chapter6:CompressedInstructions 160CompressedInstructions 160InstructionFormats 161LoadInstructions 162StoreInstructions 168Jump,Call,andConditionalBranchInstructions 175LoadConstantintoaRegister 178Arithmetic,Shift,andLogicInstructions 180MiscellaneousInstructions 189InstructionEncoding 191

Chapter7:ConcurrencyandAtomicInstructions 195HardwareThreads(HARTs) 195TheRISC-VMemoryModel 196ConcurrencyControl-ReviewandBackground 198AReviewofCachingConcepts 200RISC-VConcurrencyControl 205TheFENCEInstructions 205Load-Reserved/Store-ConditionalSemantics 210


TableofContents

AtomicMemoryControlBits:“aq”and“rl” 213UsingLRandSCInstructions 217DeadlockandStarvation–Background 219StarvationandLR/SCSequences 220TheAtomicMemoryOperation(AMO)Instructions 221

Chapter8:PrivilegeModes 227PrivilegeModes 227InterfaceTerminology 230Exceptions,Interrupts,TrapsandTrapHandlers 234ControlandStatusRegisters(CSRs) 235CSRListing 237InstructionstoRead/WritetheCSRs 241BasicCSRs:CYCLE,TIME,INSTRET 246CSRRegisterMirroring 249AnOverviewofImportantCSRs 252ExceptionProcessingandInvokingaTrapHandler 257TheStatusRegister 268AdditionalCSRs 280SystemCallandReturnInstructions 284

Chapter9:VirtualMemory 288PhysicalMemoryAttributes(PMAs) 288CachePMAs 291PMAEnforcement 291VirtualMemory 296Sv32(Two-LevelPageTables) 302TheSv32PageTableAlgorithm 309Sv39(Three-LevelPageTables) 311Sv48(Four-LevelPageTables) 313

Chapter10:WhatisNotCovered 316TheDecimalFloatingPointExtension(“L”) 316TheBitManipulationExtension(“B”) 316TheDynamicallyTranslatedLanguagesExtension(“J”) 316TheTransactionalMemoryExtension(“T”) 316ThePackedSIMDExtension(“P”) 317


TableofContents

TheVectorSIMDExtension(“V”) 317PerformanceMonitoring 317Debug/TraceMode 318

AcronymList 319

AbouttheAuthor 323


ListofInstructions

AddImmediate 38AddImmediateWord 39AddImmediateDouble 40Add 41AddWord 42AddDouble 42Subtract 45SubtractWord 46SubtractDouble 46SignExtendWordtoDoubleword 47SignExtendDoublewordtoQuadword 47Negate 48NegateWord 48NegateDoubleword 49AndImmediate 49And 49OrImmediate 50Or 50XorImmediate 50Xor 51Not 51ShiftLeftLogicalImmediate 52ShiftLeftLogical 52ShiftRightLogicalImmediate 53ShiftRightLogical 53ShiftRightArithmeticImmediate 54ShiftRightArithmetic 54ShiftInstructionsforRV64andRV128 56Nop 57Move(RegistertoRegister) 57LoadUpperImmediate 58LoadImmediate 58AddUpperImmediatetoPC 59LoadAddress 60SetIfLessThan(Signed) 61SetLessThanImmediate(Signed) 61SetIfGreaterThan(Signed) 62


ListofInstructions

SetIfLessThan(Unsigned) 62SetLessThanImmediate(Unsigned) 62SetIfGreaterThan(Unsigned) 63SetIfEqualToZero 63SetIfNotEqualToZero 64SetIfLessThanZero(signed) 64SetIfGreaterThanZero(signed) 65BranchifEqual 65BranchifNotEqual 67BranchifLessThan(Signed) 67BranchifLessThanOrEqual(Signed) 67BranchifGreaterThan(Signed) 68BranchifGreaterThanOrEqual(Signed) 68BranchifLessThan(Unsigned) 69BranchifLessThanOrEqual(Unsigned) 69BranchifGreaterThan(Unsigned) 70BranchifGreaterThanOrEqual(Unsigned) 70JumpAndLink(Short-DistanceCALL) 71Jump(Short-Distance) 73JumpAndLinkRegister 73JumpRegister 74Return 74CallFarawaySubroutine 75TailCall(FarawaySubroutine)/Long-DistanceJump 77LoadByte(Signed) 78LoadByte(Unsigned) 78LoadHalfword(Signed) 79LoadHalfword(Unsigned) 79LoadWord(Signed) 80LoadWord(Unsigned) 81LoadDoubleword(Signed) 81LoadDoubleword(Unsigned) 82LoadQuadword 83StoreByte 83StoreHalfword 84StoreWord 85StoreDoubleword 85StoreQuadword 86Multiply 89Multiply–HighBits(Signed) 91

RISC-VArchitectureSummary/Porter Page� of� 7 323

ListofInstructions

Multiply–HighBits(Unsigned) 91Multiply–HighBits(SignedandUnsigned) 92MultiplyWord 94Divide(Signed) 98Divide(Unsigned) 98Remainder(Signed) 99Remainder(Unsigned) 99DivideWord(Signed) 100DivideWord(Unsigned) 100RemainderWord(Signed) 101RemainderWord(Unsigned) 101FloatingLoad(Word) 124FloatingLoad(Double) 125FloatingLoad(Quad) 125FloatingStore(Word) 126FloatingStore(Double) 126FloatingStore(Quad) 127FloatingAdd 127FloatingSubtract 128FloatingMultiply 129FloatingDivide 129FloatingMinimum 130FloatingMaximum 130FloatingSquareRoot 131FloatingFusedMultiply-Add 132FloatingFusedMultiply-Subtract 132FloatingFusedMultiply-Add/Subtract(Quad) 133FloatingPointConversion(Integerto/fromSinglePrecision) 134FloatingPointConversion(Integerto/fromDoublePrecision) 135FloatingPointConversion(Integerto/fromQuadPrecision) 136FloatingPointConversion(Single/Doubleto/fromQuadPrecision) 136FloatingSignInjection(SinglePrecision) 137FloatingSignInjection(DoublePrecision) 138FloatingSignInjection(DoublePrecision) 138FloatingMove 139FloatingNegate 139FloatingAbsoluteValue 140FloatingMoveTo/FromIntegerRegister(SinglePrecision) 140FloatingMoveTo/FromIntegerRegister(DoublePrecision) 141FloatingPointComparison 142


ListofInstructions

FloatingPointClassify 144C.LW-LoadWord 162C.LW-LoadDoubleword 162C.LQ-LoadQuadword 163C.FLW-LoadSingleFloat 164C.FLD-LoadDoubleFloat 164C.LWSP-LoadWordfromStackFrame 165C.LDSP-LoadDoublewordfromStackFrame 166C.LQSP-LoadQuadwordfromStackFrame 166C.FLWSP-LoadSingleFloatfromStackFrame 167C.FLDSP-LoadDoubleFloatfromStackFrame 168C.SW-StoreWord 168C.SD-StoreDoubleword 169C.SQ-StoreQuadword 170C.FSW-StoreSingleFloat 170C.FSD-StoreDoubleFloat 171C.SWSP-StoreWordtoStackFrame 172C.SDSP-StoreDoublewordtoStackFrame 172C.SQSP-StoreQuadwordtoStackFrame 173C.FSWSP-StoreSingleFloattoStackFrame 173C.FSDSP-StoreDoubleFloattoStackFrame 174C.J-Jump(PC-relative) 175C.JAL-JumpandLink/Call(PC-relative) 175C.JR-JumpRegister 176C.JALR-JumpRegisterandLink/Call 176C.BEQZ-BranchifZero(PC-relative) 177C.BNEQZ-BranchifNotZero(PC-relative) 177C.LI-LoadImmediate 178C.LUI-LoadUpperImmediate 179C.ADDI-AddImmediate 180C.ADDIW-AddImmediate(Word) 180C.ADDI16SP-AddImmediatetoStackPointer 181C.ADDI4SPN-AddImmediatetoStackPointer 182C.SLLI-ShiftLeftLogical(Immediate) 182C.SRLI-ShiftRightLogical(Immediate) 183C.SRAI-ShiftRightArithmetic(Immediate) 184C.ANDI-LogicalAND(Immediate) 185C.MV-MoveRegistertoRegister 185C.ADD-AddRegistertoRegister 186C.AND-AddRegistertoRegister 186


ListofInstructions

C.OR-AddRegistertoRegister 187C.XOR-AddRegistertoRegister 187C.SUB-AddRegistertoRegister 188C.ADDW-AddRegistertoRegister 188C.SUBW-AddRegistertoRegister 189C.NOP-NopInstruction 189C.EBREAK-DebuggingBreakInstruction 190C.ILLEGAL-IllegalInstruction 190FENCE 207FENCE.I–(InstructionCacheFlushing) 208LoadReserved(Word) 211LoadReserved(Doubleword) 212StoreConditional(Word) 212StoreConditional(Doubleword) 213AtomicSwap 221AtomicAdd/AND/OR/XOR/Max/Min(Word) 223AtomicAdd/AND/OR/XOR/Max/Min(Doubleword) 224CSRRead/Write 242CSRReadandSetBits 243CSRReadandClearBits 243CSRRead/WriteImmediate 244CSRReadandSetBitsImmediate 245CSRReadandClearBitsImmediate 245Read“CYCLE” 247Read“CYCLEH” 247Read“TIME” 247Read“TIMEH” 248Read“INSTRET” 248Read“INSTRETH” 248EnvironmentCall(SystemCall) 284Break(InvokeDebugger) 285MRET:MachineModeTrapHandlerReturn 285SRET:SupervisorModeTrapHandlerReturn 285URET:UserModeTrapHandlerReturn 286WFI:WaitForInterrupt 286SFENCE.VMA:SupervisorFenceforVirtualMemory 301


Chapter1:Introduction

Introduction

AnInstructionSetArchitecture(ISA)de2ines,describes,andspeci2ieshowaparticularcomputerprocessorcoreworks.TheISAdescribestheregistersanddescribeseachmachine-levelinstruction.TheISAtellsexactlywhateachinstructiondoesandhowitisencodedintobits.

TheISAformstheinterfacebetweenhardwareandsoftware.HardwareengineersdesigndigitalcircuitstoimplementagivenISAspeci2ication.Softwareengineerswritecode(operatingsystems,compilers,etc.)basedonagivenISAspeci2ication.

ThereareanumberofInstructionSetArchitecturesinwidespreaduse,forexample:

x86-64(AMD,Intel) ARM(ARMHoldings) SPARC(Sun/Oracle)

EachoftheseISAsisproprietaryandverycomplex.ThedetailsareoftenobscuredinlengthymanualsandsomedetailsoftheISAarenotmadepublicatall.Furthermore,thewidelyusedISAshavebeenaroundforyearsandtheirdesignscarrybaggageasaresult,e.g.,forbackwardcompatibility.Sincetheselegacydesignswere2irstcreated,we’velearnedmoreabouthowtodesigncomputers.Changesinsiliconhardwaretechnologyhavealsohadanimpactonwhichdesignchoicesarenowoptimal.

TheRISC-VprojectcameoutofUCBerkeleytoaddresssomeoftheseissues.RISC-Vispronounced“RISC-2ive”.

OnegoalwastocreateamodernISAincorporatingthebestcurrentideasinprocessordesign.TheystrovetocreateanISAthatwasmuchsimplerthanthelegacyISAs,butatthesametime,wasalsopracticalandintendedtoaccommodatereallyfasthardwareimplementations.



AnothergoalwastocreateapureReducedInstructionSet(RISC)architecture.Thegoalistobeabletoexecuteoneinstructionperclockcycleandtoachievethis,eachinstructionneedstobesimpleandlimited.

Anothergoalwastocreateanopen-sourceISA.ExistingISAsareproprietary.Theyareowned,managed,andcontrolledbycorporateentitieslikeIntel,SunMicrosystems,andARMHoldings.Theopen-sourceapproachtakenbyRISC-VmeansthatmanydifferentcompaniescanprovidehardwareimplementationsoftheRISC-Varchitecture.CreatinganecosysteminwhichmultiplevendorscancompeteinimplementingasingleISAshouldresultinmanyofthebene2itsseeninotheropen-sourceprojects.

TheRISC-Vdesignisnotasingle,fullyspeci2ied,concreteISA.Instead,RISC-Visasomewhatgeneralizedspeci2icationwhichcanbeinstantiatedor2leshed-outtodescribeanactualsiliconpart.

Thedesignersunderstoodthattherearemanydifferentmarkets,manydifferentapplicationareasforcomputers,manydifferentdesignconstraints,andsoon.Forexample,anembeddedcomputerforadishwasherneedstobecheap,reliable,andsimple,butdoesn’trequirespeed,supportforanoperatingsystem,multiplecores,orsupportfor64-bitoperations.Ontheotherhand,othercomputerswillhavemultiplecores,64-bitoperations,etc.

TheRISC-VprojectapproachesthisplethoraofdesignchoicesbyintroducinganumberofoptionsintotheISA.Inthisrespect,RISC-VisreallyasingleInstructionSetArchitecture;itisacollectionofrelatedISAs.Perhapsitcanbeviewedasamenu,fromwhichaparticularimplementationwillchoosesomeitems,butnotothers.ThedocumentationprovidedbyRISC-Visincomplete;anactualprocessorcorewillpresumablycomewithdocumentationsayingexactlyhowandwhichpartsoftheRISC-Vspeci2icationitimplements.

Forexample,theRISC-Vspeci2icationdoesnotanswerthesequestions:

Howwidearetheregisters?Optionsare:32bits,64bits,and128bits

Howmanyregistersarethere? Optionsare:16or32 Howlongareinstructions? 32bitinstructionsaremandatory;16bitformsareoptional



Ishardwaremultiply/divideincluded? Is9loatingpointsupported?

Optionsare:notsupported,single,ordoubleprecision Issupportforanoperatingsystem(i.e.,SupervisorMode)included? Arepagetablessupportedand,ifso,whichoptionisselected?

Asaresultofallthisparameterization,theRISC-Vspeci2icationisdif2iculttoread.Forexample,thelengthofregistersisgivenasavariableXLEN.Soinsteadofsayingsomethinglike

“…loadsa32bitvalueintobits[31:0]”thedocumentationsays

“…loadsanXLENbitvalueintobits[XLEN-1:0]”.

ThisdocumentisanattempttodescribeRISC-Vusingamoretraditionalapproach.WeproceedbychoosingaparticularISAthatmeetstheRISC-Vspeci2icationsanddescribethat.Tomakethingsmoreconcrete,wewilldescribethe32bitRISC-Vvariant,butmakeadditionalcommentsaboutthe64-bitand128-bitvariants.

TheRISC-VNamingConventions

Aspreviouslydescribed,theRISC-Vspeci2icationisnotasingleISA.Instead,itisacollectionofISAoptions.AnamingconventionisusedinwhichaparticularISAvariationisgivenacodednametellingwhichISAoptionsarepresentandsupported.Aparticularhardware(chip)canbedescribedorsummarizedwithsuchacodedname,indicatingwhichRISC-Vfeaturesareimplementedbythechip.

Forexample,considerthefollowingRISC-Vname:

RV32IMAFD

The“RV”standsfor“RISC-V”andallcodednamesbeginwith“RV”.

The“32”indicatesthatregistersare32bitswide.Otheroptionsare64bitsand128bits:

RV32 32-bitmachines RV64 64-bitmachines RV128 128-bitmachines



Theremaininglettershavethesemeanings:

I–Basicintegerarithmeticissupported M–Multiplyanddividearesupportedinhardware A–Theinstructionsimplementingatomicsynchronizationaresupported F–Singleprecision(32bit)2loatingpointissupported D–Doubleprecision(64bit)2loatingpointissupported

Eachoftheseisconsideredtobean“extension”ofthebaseISA,exceptforthe“I”(basicintegerinstructions),whichisalwaysrequired.

Theletter“G”isusedasanabbreviationfor“IMAFD”:

RV32G=RV32IMAFD

Inaddition,thereareadditionalvariantsandextensions:

S–Supervisormodeisimplemented Q–Quad-precision(128bit)2loatingpointissupported C–Compressed(i.e.,16bit)instructionsaresupported E–Embeddedmicroprocessors,withonly16registers

TheRISC-VdocumentationalsomentionsseveraladditionalISAdesignextensions,butonlysaystheseinstructionswillbespeci2iedatsomefuturedate.Presently,thereisnothingspeci2icforthefollowingextensions:

L–Decimalarithmeticinstructions V–Vectorarithmeticinstructions P–PackedSIMDinstructions B–Bitmanipulationinstructions T–Transactionalmemorysupport

TheapproachwetakeinthisdocumentistodescribeaRISC-Varchitecturewiththefollowingfeatures:

•32registers •Registersare32bitswide •Multiplyanddivideinstructionsarepresent •Singleanddoubleprecision2loatingpointinstructionsarepresent



•Atomicinstructionsarepresent •Supervisormodeissupported •Virtualmemoryissupported

Ratherthanspecifyallvariantssimultaneouslyusinggeneralities,wewillfocusonaspeci2icISAandthencommentonpossiblevariations.

TheRISC-VdocumentationprovidesanISA“standard”whichcanbeadoptedandusedfreelybydifferentgroups.Thestandardleavessomedecisionsopen,givingimplementersseveralchoices.Forexample,theimplementerisfreetochoosethesizeoftheregisters;thestandardmentions32-bits,64-bits,and128-bitsbutdoesnotmandateaparticularchoice.

Inotherareas,thestandardisun2inished.Forexample,mentionismadeofdecimal2loatingpointinstructions,butthesectiondiscussingthemsays“tobe2illedinlater”.

Also,theRISC-Vdocumentationacknowledgesthatsomeimplementerswillchoosetoviolatethestandardorextendthestandard.Theyusetheterminology“non-standardextensions”torefertofeaturesthatmightbepresentinagivenchip,butwhichdonotconformtotheRISC-Vstandard.

Commentary:TheRISC-Vdocumentationcontainsanumberofenlighteningparentheticalremarksdescribingthevariousdesignchoicestheyconsideredandofferingjusti2icationsforthedesigndecisionstheymade.ThislevelofthoughtfulnessisabsentinmostextantISAs.Insomecases,theRISC-VdocumentationseemstoaddressthedesignspaceitselfandoffersalanguageandframeworkforfutureISAdevelopment.

TheUsualDisclaimer/RequestForCorrections

Thisisaderivativework,basedon:

• TheRISC-VInstructionSetManual,VolumeI:User-LevelISADocument(Version2.2,May7,2017)

• TheRISC-VInstructionSetManual,VolumeII:PrivilegedArchitecture(Version1.10,May7,2017)

Consulttheof2icialdocumentation,whichprevails.



TheRISC-Viscomplex.Thisdocumenttakeslibertiesandsimpli2iesthings.Ourgoalistointroducethegeneraldesignandexplainthemainideas.Whilethismaylooklikeamanualorreferencework,itisnot.Tomakethismaterialcomprehensible…

•Somedetailsaresimpli2ied. •Somestatementsarenotstrictlytrueorcomplete. •Somematerialmaysimplybeincorrect.

Thisdocumentuses“???”tomarkincompleteorquestionableinformation.Pleasecontacttheauthorifyou2ind…

•Inaccurateinformationthatyoucancorrect •Incompleteinformationthatyoucan2illin •Confusingtextthatneedstobereworded

DocumentRevisionHistory/PermissiontoCopy

Versionnumbersarenotusedtoidentifyrevisionstothisdocument.Insteadthedateandtheauthor’snameisused.Thedocumenthistoryis:

Date AuthorJanuary26,2018 HarryH.PorterIII<Initialversion>

InthespiritofRISC-Vandtheopen-sourcemovement,theauthorgrantspermissiontofreelycopyand/ormodifythisdocument,withthefollowingrequirement:

Youmustnotalterthissection,excepttoaddtotherevisionhistory.Youmustappendyourdate/nametotherevisionhistory.

Youarealsofreetoadaptthisdocumenttodescribeaparticularimplementation.Ifyouspecializethisdocumenttoaspeci2ichardwaredesign,youshouldchangethetitletore2lectthisnewuse.Anymaterialliftedshouldbereferenced.



BasicTerminology

There has been some confusion in computer science documentation regardingabbreviationsforlargenumbers.Forexample:

4K=? 4,000 4,096

Weusethefollowingpre2ixnotationforlargenumbers,whichisbecomingcommoninthecontextofcomputerarchitecture:

Pre=ix Example Value Ki kibi KiByte 210 1,024 ~103 Mi mebi MiByte 220 1,048,576 ~106 Gi gibi GiByte 230 1,073,741,824 ~109 Ti tebi TiByte 240 1,099,511,627,776 ~1012 Pi pebi PiByte 250 1,125,899,906,842,624 ~1015 Ei exbi EiByte 260 1,152,921,504,606,846,976 ~1018

Contrastthistothestandardmetricpre2ixes,whichweavoid:

Pre=ix Example Value K kilo KByte 103 1,000 M mega MByte 106 1,000,000 G giga GByte 109 1,000,000,000 T tera TByte 1012 1,000,000,000,000 P peta PByte 1015 1,000,000,000,000,000 E exa EByte 1018 1,000,000,000,000,000,000

Inthisdocument,weusetheterms“byte”,“halfword”,“word”,“doubleword”,and“quadword”torefertovarioussizesofbinarydata.

number number of bytes of bits example value (in hex) ======== ======= =================================== byte 1 8 A4 halfword 2 16 C4F9 word 4 32 AB12CD34 doubleword 8 64 01234567 89ABCDEF quadword 16 128 4B6D073A 9A145E40 35D0F241 DE849F03



The bits within an 8-bit byte are numbered from 0 (lower, least signi2icant) to 7(upper,mostsigni2icant).

7654 3210 ==== ==== 0000 0000

Thebitswithina16-bithalfwordarenumberedfrom0to15.

15 12 8 4 0 ==== ==== ==== ==== 0000 0000 0000 0000

Thebitswithina32-bitwordarenumberedfrom0to31.

31 28 24 20 16 12 8 4 0 ==== ==== ==== ==== ==== ==== ==== ==== 0000 0000 0000 0000 0000 0000 0000 0000

Likewise, thebitswithina64-bitdoublewordarenumbered from0 to63and thebitswithina128-bitquadwordarenumberedfrom0to127.

Weusethefollowingnotationtorepresentarangeofbits:

Example Meaning ========= ====================================================== [7:0] Bits 0 through 7; e.g., all bits in a byte [31:0] Bits 0 through 31; e.g., all bits in a word [31:28] Bits 28 through 31; e.g., the upper 4 bits in a word [1:0] Bits 0 through 1; e.g., the least significant 2 bits

Asinglehexdigit canbeused to represent4bits (half abyte, sometimes calleda“nibble”),asfollows:



Binary Hex ====== === 0000 0 0001 1 0010 2 0011 3 0100 4 0101 5 0110 6 0111 7 1000 8 1001 9 1010 A 1011 B 1100 C 1101 D 1110 E 1111 F

The 8 bits within a byte are conveniently expressed with two hex digits. Forexample:

8-bit byte In Hex ========== ======== 1010 0100 A4

The32bitsinawordaregivenwith8hexdigits.Forexample:

32-bit word In Hex ============================================= ======== 1010 1011 0001 0010 1100 1101 0011 0100 AB12CD34


Chapter2:BasicOrganization

MainMemory

Mainmemoryisbyteaddressable.Addressesare4bytes(i.e.,32bits)long,allowingforupto4GiBytestobeaddressed.

TheRISC-Vdocumentationdiscussesseveraldifferentaddresssizeoptions,butwe’llstartsimplewith32bitaddresses.

Memorycanbeviewedasasequenceofbytes:

address data (in hex) (in hex) ======== ======== 00000000 89 00000001 AB 00000002 CD 00000003 EF 00000004 01 00000005 23 00000006 45 00000007 67 ... ... FFFFFFFC E0 FFFFFFFD E1 FFFFFFFE E2 FFFFFFFF E3

“Low”memoryreferstosmallermemoryaddresses,whichwillbeshownhigheronthepagethan“high”memoryaddresses,asintheaboveexample.

LittleEndian

TheRISC-Visalittleendianarchitecture.



Asanexample,assumethatmainmemoryholdsthefollowingbytes:

address data (in hex) (in hex) ======== ======== ... ... E5000004 1A E5000005 2B E5000006 3C E5000007 4D E5000008 5E E5000009 6F E500000A 70 E500000B 81 E500000C 92 E500000D A3 E500000E B4 E500000F C5 ... ...

Considerloadingaregistersthatis32bits(4bytes)wide.ThereareseveralLOADinstructions,whichcanmoveeitherabyte,ahalfword,orawordfrommemoryintoaregister.

AssumethataLOADinstructionloadsabytefromaddress0xE5000004.Theregisterwillthencontain:

0x0000001A

AssumethataLOADinstructionloadsahalfword(2bytes)fromthesameaddress.Theregisterwillthencontain:

0x00002B1A

AssumethataLOADinstructionloadsaword(4bytes)fromthesameaddress.Theregisterwillthencontain:

0x4D3C2B1A

Notethatwecanviewthememoryasholdingasequenceofproperlyalignedhalfwords,whichwecannaturallyrepresentasfollows:



address data (in hex) (in hex) ======== ======== ... ... E5000004 2B1A E5000006 4D3C E5000008 6F5E E500000A 8170 E500000C A392 E500000E C5B4 ... ...

Orwecanviewthismemoryasholdingasequenceofproperlyalignedwords,whichlookslikethis:

address data (in hex) (in hex) ======== ======== ... ... E5000004 4D3C2B1A E5000008 81706F5E E500000C C5B4A392 ... ...

Commentary:Inalittleendianarchitecture,theorderofthebytesischangedwheneverdataiscopiedfrommemorytoaregisterorstoredfromaregisterintomemory.Thiscanbeasourceofconfusion,particularlywhenhumanslookataprintoutofmemorycontents.

Bigendianarchitecturesaresimplertounderstandsincethebytesarenotreorderedduringloadsandstores.

Noticethatwithalittleendianarchitecture,the2irstbyteinmemory(0x1Aintheexample)alwaysgoesintothesamebitsintheregister,regardlessofwhethertheinstructionismovingabyte,halfword,orword.Thiscanresultinasimpli2icationofthecircuitry.

Thespecmentionsthattheyselectedlittleendiansinceitiscommerciallydominant.Theyalsomentionthepossibilityof“bi-endian”architectures,whichmixbig-andlittle-endianness.



TheRegisters

Forconcreteness,thisdocumentfocusesprimarilyonRV32,the32-bitvariantofRISC-V.Theboxbelowdiscusses“OtherRegisterSizes”.

Thegeneralpurposeregistersare32bits(i.e.,4bytesoroneword)inwidth.

Thereare32registers.Theregistersarenamedx0,x1,x2,…x31.

OtherRegisterSizes:WemainlyfocusondescribingtheRV32variantofRISC-V,inwhichtheregistersare32bitsinwidth.

IntheRV64variant,theregistersare64bits(i.e.,8bytesoradoubleword)insize.

IntheRV128variant,theregistersare128bits(i.e.,16bytesoraquadword)insize.

Inallcases,thenumberofregistersis32.

If2loatingpointissupported,therewillbeadditional=loatingpointregistersnamedf0,f1,f2,…f31.Theseregistersaredistinctfromx0,x1,x2,…x31.Floatingpointregisterswillbediscussedlater.

Registerx0isaspecial“zeroregister”.Whenread,itsvalueisalways0x00000000.Wheneverthereisanattempttowritetox0,thedataissimplydiscarded.

AllotherregistersaretreatedidenticallybytheISA;thereisnothingspecialaboutanyregister.

Thereisnospecialsupportfora“stackpointer,”a“framepointer,”ora“linkregister”.Althoughtheseareimportantconceptsandareusedintheimplementationofprogramminglanguages,theISAdoesnothaveanyspecializedinstructionsforthem.Anyregistercanbeusedforthesefunctions.



ReducedNumberofGeneralPurposeRegisters:IntheRISC-V“E”extension,thenumberofregistersisreducedto16.

Theregistersarenamedx0,x1,x2,…x15.

Thisextensionisonlyapplicableto32-bitmachines.Inotherwords,thebaseISAwilleitherbeRV32IorRV32E,withtheonlydifferencebeingthenumberofgeneralpurposeregisters.For64-bitand128-bitmachines,therewillalwaysbeafullsizedregistersetwith32registers,so“E”doesnotapplytoRV64orRV128.

Allinstructionformatsarethesame.Inotherwords,thereisnochangetotheinstructionencodingsbetweenthenormalRV32IandthereducedRV32Evariants.Theonlydifferenceisthatthe5-bit2ieldsforencodingaregisternumberarelimitedtocontainingonlythevaluesof0through15(inbinary:00000through01111).

Thisextensionismeantforsimpli2ied,embedded(“E”)computers.Thespecsuggeststhatthecircuitryforafullsizedregistersetcanconsume50%ofthechiprealestate,notcountingthespaceusedforcache.Thus,dividingthenumberofregistersinhalfwillresultina25%savingsinrealestate.

CompressedInstructions:IntheRISC-V“C”extensionforcompressedinstructions,8registers(x8,x9,…x15)areeasilyaccessible.Toaccesstheotherregisters,theprogrammermayhavetouseanormal,uncompressedinstruction.Therefore,compilersareencouraged(butnotrequired)touseonlyregistersx8,x9,…x15.

Inthisextension,thefactthatcertainregisterswillbeusedasstackpointer,linkregister,etc.ismadeuseof,sosomeregistersaretreatedslightlydifferentlyhere.

Wediscusscompressedinstructionslater.

Inadditiontothegeneralpurposeregisters,thereisalsoaprogramcounter(PC),whosewidthmatchesthewidthofthegeneralpurposeregisters.

Inthevariantwefocuson,thePCregisteris32bitswide.IntheRV64andRV128variants,thesizeofthePCincreasesandmatchesthesizeofthegeneralpurposeregisters.



ManyprocessorISAsincludea“statusregister”,sometimescalleda“conditioncoderegister.”Sucharegisterusuallycontainsbitssuchas:

•Sign/NegativeValue •Zero/Equal •CarryBit •Over2low

InotherISAs,thereisusuallyaCOMPAREinstruction(whichwillsetbitsinthestatusregister)andseveralBRANCHinstructions(whichwilltestthestatusregisterbitsandconditionallyjump).

TheRISC-Vdoesnotincludeanysuch“statusregister.”Instead,theBRANCHinstructionsinRISC-Vwillperformboththetestandtheconditionaljump.

Commentary:Thenormalpatternofmostcodeinothernon-RISC-VarchitecturesistoexecuteaCOMPAREinstructionand,immediatelyafterward,executeaBRANCHinstruction.Theygotogetherandeffectivelyperformasingle“test-and-jump”operation.BycombiningthemintoasingleinstructionintheRISC-Varchitecture,greaterperformanceef2iciencycanbeachievedwheneverthis“test-and-jump”operationmustbeperformed.

Furthermore,byeliminatingthe“statusregister”,theprocessingofinterruptsisstreamlined.Here’swhy:Thecodeinanyinterrupthandlerroutinewillcertainlymodifythestatusregister,sothereforethestatusregistermustbeautomaticallysavedwheneveraninterrupthandlerisinvokedandrestoredwheneverthehandlerreturns.Additionalarchitecturalcomplexitywouldberequiredtosupporttheseoperations(whichmustbeatomic),butthiscomplexityisavoidedwiththeRISC-Vapproach.

ControlandStatusRegisters(CSRs)

TheentirestateofarunningRISC-Vcoreconsistsof:

•Theintegerregistersx1,x2,…x31 •The2loatingpointregisters,if2loatingpointissupported. •TheProgramCounter(PC) •Asetof“ControlandStatusRegisters”(CSRs)



The“ControlandStatusRegisters”(CSRs)areusedfortheprotectionandprivilegesystem.TheprivilegesystemisusedbytheOSkerneltoprotectitselfandmanageuser-levelprocesses.

Atanymoment,theRISC-Vprocessorwillbeexecutingeitherinuser-modeorinsupervisor-mode.Kernelcodeisexecutedinsupervisor-modeandapplicationprogramsareexecutedinuser-mode.[Actuallytherearetwonon-usermodes;we’llgointodetailslater.]

Each“ControlandStatusRegister”(CSR)hasaspecialnameandeachhasauniquefunction.Readingand/orwritingaCSRwillhaveaneffectontheprocessoroperation.ReadingandwritingtotheCSRsisusedtoperformalloperationsthatcannotbeperformedwithnormalinstructions.ThebehaviorassociatedwitheachCSRisoftenquitecomplexandeachregistermustbeunderstoodindependently.

Thereisalarge2ileofCSRs:therecanbeupto4,096CSRsde2ined.TheCSRsarereadandwrittenwithjustacoupleofgeneral-purposeinstructions.

AfewdozenCSRsarede2inedinthespec,givingtheirnames,behaviors,andeffects.ThespecleavesopenthepossibilityofmoreCSRsbeingde2inedanditisclearthattherewillbesigni2icantvariationbetweenimplementationsinthedetailsofwhichCSRsareimplementedandexactlyhoweachoneworks.

Fortunately,theCSRscanbeignoredat2irst.TheCSRsareprimarilyusedbyOScode.Forexample,theCSRsareusedforinterruptprocessing,threadswitching,andpagetablemanipulation.

Inorderunderstandtheuser-modeinstructionsetandtocreateuser-levelcode,theCSRscanandshouldbeignored,especiallyonyour2irstintroductiontoRISC-V.

Alignment

A“halfwordaligned”addressisanaddressthatisamultipleof2.Thelastbitofahalfword-alignedaddresswillalwaysbe0.Likewise,a“wordaligned”addressisamultipleof4,andendswiththebits00.Similarlyfor“doublewordalignment”and“quadwordalignment”.



Ahalfword-sizedvalueissaidtobe“properlyaligned”ifitisstoredatahalfword-alignedaddress.Aword-sizedvalueisproperlyalignedifitisstoredataword-alignedaddress,andsimilarlyforothersizes.

RISC-VdoesnotrequiredatatobeproperlyalignedfortheLOADandSTOREinstructions.Inotherwords,anyvaluemaybestoredatanyaddress.However,properalignmentisgoodpracticeandencouraged.Inmostimplementations,LOADandSTOREinstructionswillperformmuchfasterwhenthedataisproperlyaligned.

However,thereisanalignmentrequirementforinstructions.

Instructionsare32bitsinlengthandmustbestoredatword-alignedlocations.Anattempttojumptoanunalignedaddressedwillcausean“instructionmisaligned”exception.

Commentary:IntheRISC-V“C”extensionforcompressedinstructions,halfword-sizedinstructionsareallowed.Inthiscase,thealignmentrequirementisslightlyrelaxed:Instructions(whether16or32bits),mustonlybehalfwordaligned,notwordaligned.

Sinceallinstructionsmustbeatleasthalfwordaligned,allbranchtargetaddressandtheprogramcounter(PC)valuesmustbeevennumbers,i.e.,mustendwithasingle0bit.Someinstructionsmakeuseofthefactthattheleastsigni2icantbitisalways0,allowingamoreef2icientencodingofinstructionsbyleavingthis2inalbitimplicit.


Chapter3:Instructions

Instructions

Instructionsare32bits(4bytes,1word)inlengthandmustbestoredatword-alignedmemorylocations.InstructionsfortheRV64andRV128variantsarestill32bitslong.

Somecomputerarchitecturesaresaidtoemploy“two-address”instructions.Forexample,thefollowinginstructionmentionsonlytworegisters.(ThisisnotRISC-V.)

ADD x4,x5 # x4 = x4 + x5

RISC-Visa“threeaddress”architecture.TheADDinstructionlookslikethis:

ADD x4,x5,x7 # x4 = x5 + x7

NotethatintheRISC-Vassemblylanguage,thedestinationistypicallytheleftmostoperand.

TheCompressedInstructionExtension

Thischapterdescribesthe32-bitinstructions.Howeverinthissection,weintroducethe16-bitcompressedinstructions.Wewillcoverthecompressedinstructionsfullyinalaterchapter.

IntheRISC-V“C”extensionforcompressedinstructions,halfword-sizedinstructionsarealsoallowed.Ifthisextensionisimplemented,both16-bitand32-bitinstructionscanbeused.



The16-bitand32-bitinstructionscanbeintermixed.Thereisno“mode”bittoputtheprocessorinto“compressed-instruction”mode,asthereisinotherprocessors.

Withthisextension,thealignmentrestrictionforinstructionsisrelaxedtohalfword-alignment.

Eachcompressed16-bitinstructionisexactlyequivalentinfunctiontoa32-bitinstruction.However,therearemany32-bitinstructionsforwhichthereisnoequivalent16-bitversion.

Thus,eachcompressed16-bitinstructioncanbethoughtofasashorthandforsomelonger32-bitinstruction.Theideaisthatthemostfrequentlyusedinstructionshave16-bitversions.Byusingtheshorterversions,thesizeofprogramcodecanbereduced.TheRISC-Vdesignershavedoneresearchtodeterminewhichinstructionsarethebestcandidatestobegivencompressedvariants.

Commentary:Reducingthesizeofcoderesultsinincreasedprocessorperformancesinceitallowsmoreinstructionstobecached,reducingthetimetofetchinstructionsfrommainmemory,whichisoftenaperformancebottleneck.

Inatypicalhardwareimplementation,whenacompressedinstructionisfetchedandloadedintotheInstructionRegister(IR)priortobeingexecuted,thehardwarewillnoticethatitisacompressedinstruction.Atthattime,thecompressedinstructionwillimmediatelybeexpandedfrom16bitsintotheequivalent32bitinstruction.Thereafter,thereisnoneedforanyadditionalhardwarelogictosupportthecompressedinstructionset.

Toaddress32registers,5bitsarerequired.Consideraninstructionwhichrefersto3registers,suchas:

ADD x8,x9,x10

Ifthisinstructionisencodedusing5bitsforeachregister,15bitswouldbeconsumed.Withonly16bitstoworkwithincompressedinstructions,thiswillclearlynotwork,sinceonly1bitisleftforencodingtheopcode.

Instead,manyofthecompressedinstructionsrestrictaccesstoonly8oftheregisters.Withthisrestriction,registerscanbeencodedusingonly3bits.



Manyofthecompressedinstructionsonlyallowaccesstoregistersx8,x9,…x15.However,certainotherregistersare(byconventionandhabit)usedforspecialpurposes,suchasfora“stackpointer”(x2)or“linkregister/returnaddress”(x1),andsomecompressedinstructionsimplicitlyrefertotheseregisters.

Somecompressedinstructionsallowanyofthe32registerstobespeci2ied,employinga5-bit2ieldtoencodetheregister.Obviously,theseinstructionsdon’thaveroomforthreesuchregisteroperands.

Ingeneral,RISC-Vusesthree-addressinstructions.However,somecompressedinstructionsusethetwo-addressapproach.Forexample,theinstruction

ADD x8,x8,x10 # Add x10 to x8 canberepresentedincompressedformsincethedestinationregisterisalsoanoperand.

Moreprecisely,acompressedinstructionoftheform:

C.ADDW RegD,RegB # RegD = RegD + RegB

canbeexpandedto

ADDW RegD,RegD,RegB # Equivalent 32-bit inst

Inassemblylanguage,compressedinstructionsareidenti2iedwiththe“C.”pre2ix.

Theinstructionopcodehereendswitha“W”suf2ix,indicatingthatitoperatesona32-bitword,asopposedto64-bitor128-bitvalues.

Inatypicalimplementation,acompressedinstructionwillbeexpandedtotheequivalent32-bitinstructionbytheFETCHandDECODEhardware,afteraninstructionisfetchedfrommemory,directlybeforeitisexecuted.

However,theexpansiontoafull-sized32-bitinstructioncouldbeperformedbytheassembler.Thiswouldbeusefulwhenassemblingforamachinethatdoesnotsupportthecompressedextension.

AssemblerNote:Asophisticatedassemblerwillautomaticallygeneratecompressedinstructionswheneveritcan.Theideaisthattheprogrammer(or



compiler)willcreateonly32-bitinstructions.Uponencounteringa32-bitinstructionthatcanalsobecodedasa16-bitinstruction,theassemblerwillchoosethesmallerinstruction.Suchanassemblerwillrelieveprogrammers(andcompilers)fromtheburdenofselectingcompressedinstructions,althoughasophisticatedcompilermaybeabletogenerateshortercodesequencesifitisawareofwhichinstructionscanbecompressed.

InstructionEncoding

Eachinstructionis32-bitsinlength.

Inotherwords,theISAisbuiltarounda32-bitinstructionsize.Eachcompressed16-bitinstructioncanalsoberepresentedasaequivalent32-bitinstruction,sothesetof32-bitinstructionfullydescribestheavailableinstructions.Theleastsigni2icant2bitsalwaysindicatewhethertheinstructionis16or32bits,sotheprocessorcanimmediatelytellwhetherornotithasjustloadedacompressedinstruction.

Details:Abitpatternendingin11indicatesa32-bitinstruction.Theotherbitpatterns(00,01,10)indicateacompressedinstruction.Thisapproachreducestheavailablenumberofunique16-bitinstructionsby¼,andreducesthenumberofeffectivebitsavailableforthe32-bitinstructionsto30bits:areasonablecompromise.

Thedistinguishingbitsareintheleastsigni2icantpositions,whichisappropriateforalittleendianarchitecture.

Thereisalsoaprovisionforinstructionsthatarelongerthan32bits,butnosuchinstructionsarespeci2ied.Suchinstructionswouldbeanon-standardextension.

Longerinstructionsmustbeamultipleof16bitsinlength.Anencodingschemeisspeci2ied,whichshowshowinstructionsoflength48bits,64bits,etc.aretohavetheirlengthencoded.Beyondthelengthencoding,furtherinstructionsdetailsareleftunspeci2iedanduptotheimplementersofnon-standarddevices.



Forinstructionsoflength32bits,theleastsigni2icant2bitsmustbe11.Inaddition,thenext3bitsmustnotbe111,sincethisindicatesaninstructionoflengthgreaterthan32-bits.

Compressed(16-bit)Instructions:xxxx xxxx xxxx xxAA (whereAA≠11)

Normal(32-bit)Instructions:xxxx xxxx xxxx xxxx xxxx xxxx xxxB BB11 (whereBBB≠111)

LongerInstructions:xxxx xxxx ... xxxx xxxx xxxx xxxx xxx1 1111

Consulttheof2icialdocumentationfordetailsforlongerinstructions.

Inthissection,wewilldiscussonly32-bitinstructions.

Sincethereare32registers,a2ieldwithawidthof5bitsisusedtoencodeeachregisteroperandwithininstructions.

Theprototypicalinstructionmentions3registers,whicharesymbolicallycalled

RegD–Thedestination Reg1–The2irstoperand Reg2–Thesecondoperand

Inaddition,anumberofinstructionscontainimmediatedata.Theimmediatedatavalueisalwayssign-extendedtoyielda32-bitvalue.

Thefollowingsizesofimmediatedatavaluesareused:

Notation Fieldwidth NumberofValues Rangeofvalues Immed-12 12bits 212=4Ki -2,048..+2,047 Immed-20 20bits 220=1Mi -524,288..+524,287

Commentary:Loadinganarbitrary32bitvalueintoaregisterusinginstructionswhicharethemselvesonly32-bitsnecessarilyrequirestwoinstructions.

Notethat12+20equals32.InRISC-V,anarbitrary32-bitvaluecanbesplitintotwopiecesandloadedintoaregisterwithtwoinstructions.Thedesignershavechosenthe12-20splitforanumberofreasons.Manycommonconstantsand



offsetsfallwithintherangeofthesmallerimmed-12values.Also,theRISC-Vvirtualmemorysupportsapagesizeof4KiBytes.

Herearetheinstructionformats:

R-typeinstructions: Operands: RegD,Reg1,Reg2 Example:

ADD x4,x6,x8 # x4 = x6+x8 I-typeinstructions: Operands: RegD,Reg1,Immed-12 Examples:

ADDI x4,x6,123 # x4 = x6+123LW x4,8(x6) # x4 = Mem[8+x6]

S-typeinstructions: Operands: Reg1,Reg2,Immed-12 Example:

SW x4,8(x6) # Mem[8+r6] = x4 (word) B-typeinstructions(avariantofS-type): Operands: Reg1,Reg2,Immed-12 Example:

blt x4,x6,loop # if x4<x6, goto offset(pc) U-typeinstructions: Operands: RegD,Immed-20 Example:

LUI x4,0x12AB7 # x4 = value<<12AUIPC x4,0x12AB7 # x4 = (value<<12) + pc

J-typeinstructions(avariantofU-type): Operands: RegD,Immed-20 Example:

jal x4,foo # call: pc=offset+pc; x4=ret addr

TheonlydifferencebetweenS-typeandB-typeinstructionsishowthe12-bitimmediatevalueishandled.InanB-typeinstruction,theimmediatevalueis



multipliedby2(i.e.,shiftedleft1bit)beforebeingused.IntheS-typeinstruction,thevalueisnotshifted.

Inbothcases,sign-extensionoccurs.Inparticular,thebitstotheleftaresynthesizedby2illingtheminwithacopyofthemostsigni2icantbitactuallypresent.

S-typeimmediatevalues: Actualvalueused(wheres=sign-extension):

ssss ssss ssss ssss ssss VVVV VVVV VVVV Rangeofvalues:

-2,048..+2,0470xFFFFF800..0x000007FF

B-typeimmediatevalues: Actualvalueused(wheres=sign-extension):

ssss ssss ssss ssss sssV VVVV VVVV VVV0 Rangeofvalues:

-4,096..+4,094(inmultiplesof2)0xFFFFF000..0x00000FFE

TheonlydifferencebetweenU-typeandJ-typeinstructionsishowthe20-bitimmediatevalueishandled.InaU-typeinstruction,theimmediatevalueisshiftedleftby12bitstogivea32bitvalue.Inotherwords,theimmediatevalueisplacedintheuppermost20bits,andthelower12bitsarezero-2illed.

InaJ-typeinstruction,theimmediatevalueisshiftedleftby1bit(i.e.,multipliedby2).Itisalsosign-extended.

U-typeimmediatevalues: Actualvalueused:

VVVV VVVV VVVV VVVV VVVV 0000 0000 0000 Thevalueisalwaysalignedtoamultipleof4,096. J-typeimmediatevalues: Actualvalueused(wheres=sign-extension):

ssss ssss sssV VVVV VVVV VVVV VVVV VVV0 Rangeofvalues:

-1,048,576..+1,048,574(inmultiplesof2)0xFFF00000..0x000FFFFE

Next,wegivetheencodingsforthedifferenttypesofinstructions.Inthefollowing,eachletterrepresentsasinglebit,accordingtothefollowinglegend:



DDDDD=RegD 11111=Reg1 22222=Reg2 VVVVV=Immediatevalue XXXXX=Op-code/functioncode

R-typeinstructions: Operands: RegD,Reg1,Reg2 Encoding:

XXXX XXX2 2222 1111 1XXX DDDD DXXX XXXX I-typeinstructions: Operands: RegD,Reg1,Immed-12 Encoding:

VVVV VVVV VVVV 1111 1XXX DDDD DXXX XXXX S-typeandB-typeinstructions: Operands: Reg1,Reg2,Immed-12 Encoding:

VVVV VVV2 2222 1111 1XXX VVVV VXXX XXXX U-typeandJ-typeinstructions: Operands: RegD,Immed-20 Encoding:

VVVV VVVV VVVV VVVV VVVV DDDD DXXX XXXX

Commentary:NotethatReg1,Reg2,andRegDoccurinthesameplaceinallinstructionformats.

Thissimpli2iesthechipcircuitry,whichwouldbemorecomplexif,forexample,RegDwassometimesinonepartoftheinstructionandothertimesinadifferentplaceintheinstruction.

Aconsequenceofkeepingtheregister2ieldsinthatsameplaceisthatimmediatedatavaluesinaninstructionaresometimesnotalwaysincontiguousbits.AndnotethatthebitsencodingtheimmediatevaluesintheS-typeandB-typeinstructionsarenotcontiguous.



I-type: VVVV VVVV VVVV ---- ---- ---- ---- ——

S-typeandB-type: VVVV VVV- ---- ---- ---- VVVV V--- ——

U-typeandJ-type: VVVV VVVV VVVV VVVV VVVV ---- ---- ——

WhilebothS-typeandB-typehavea12-bitimmediatevalue,thepreciseorderofthebitsinthose2ieldsdiffersbetweenthetwoformats.Whileyouwouldreasonablyassumethebitsareinorder,theyareinfactscrambledupalittleintheB-typeinstructionformat.Similarly,thebitsofthe20-bitimmediatevalue2ieldintheJ-typeformatsarescrambled,ascomparedtoU-type.Consultthespecfordetails.

Immediatevaluesarealwayssign-extended.Althoughtheimmediatevaluecanbedifferentsizesandmaybebrokenintomultiple2ields,thesignbitisalwaysinthesameinstructionbit(namelytheleftmostbit,bit31).Thissimpli2iesandspeedssign-extensioncircuitry.

User-LevelInstructions

TheRISC-Vspeci2icationbreakstheinstructionsintotwobroadcategories:“user-level”instructionsand“privileged”instructions.Webeginbydescribingtheindividualuser-levelinstructions.We’llcovertheprivilegedinstructionslater.

Theinstructionsetisquitesmallandtightlyde2ined;therearenotasmanyinstructionsintheRISC-Varchitectureasinotherarchitectures.

Severalfamiliarinstructions(whichyouwould2indinotherarchitectures)aresimplyspecialcasesofmoregeneralRISC-Vinstructions.

Theof2icialRISC-Vdocumentationfocusesontheactualinstructionsandmentionsspecialcasesonlybrie2ly,inconnectionwiththemoregeneralinstruction.Wewillseparateoutsomeofthesespecialcasesandpresentthemasusefulinstructionsintheirownright,mentioningthattheyareactuallyimplementedasspecialcasesof



otherinstructions.Also,someotheruseful“instructions”areactuallyexpandedbytheassemblerintoasequenceoftwosimplerinstructions.

TheRISC-Vstandarddescribethreedifferentmachinesizes:

RV32 32-bitregisters RV64 64-bitregisters RV128 128-bitregisters

AnyparticularRISC-Vhardwarechipwillimplementonlyoneoftheabovestandards.However,eachsizeisstrictlymorepowerfulthanthesmallersizes.AnRV64chipwillincludealltheinstructionsnecessarytomanipulate32-bitdata,soitcaneasilydoanythinganRV32chipcan.Likewise,anRV128chipisastrictsupersetoftheRV32andRV64chips.

Youshouldreadtheinstructiondescriptionsbelowassumingthattheregisterwidthis32bits.However,thesamedescriptionsapplyandmakesensefor64-bitor128-bitregisters,exceptwherespeci2icallynoted.

Regarding64-bitand128-bitextensions:Forthelargermachinesizes,thebasic32-bitinstructionsworkidentically.

Insummary,smallerdatavaluesaresign-extendedto2itintolargerregisters.Thus,32-bitvaluesaresign-extendedto64bitsonanRV64machine.Thismeansyoucanrun32-bitcodedirectlyona64-bitmachinewithnochanges!

Ofcourse,someinstructionsfromaRV64machinewillnotbepresentonaRV32machine,socodethattrulyrequires64-bitswon’twork.

Whenusinga64-bitmachinetoexecutecodewrittenfora32-bitmachine,keepinmindthattheupper32bitsofregisterswillnormallycontainthesignextension,andnotazeroextension.

Likewise,32-bitand64-bitvaluesaresign-extendedto128-bitsonanRV128machine.

The12-bitand20-bitimmediatevaluesininstructionsaresign-extendedtothebasicmachinesize.ThisincludestheimmediatevaluesinthetwoU-typeinstructions(LUIandAUIPC).



Forinstructionsthatarenotpresentonallmachinesizes,wemakespecialnotes.Otherwise,theinstructionispresentinallvariants.

Theof2icialdocumentationdoesnotdwellontheassemblernotation.Theassemblernotationpresentedhereissometimesa“bestguess”ofwhatisacceptedbythetypicalRISC-Vassemblertools.

ArithmeticInstructions(ADD,SUB,…)

AddImmediate

GeneralForm:ADDI RegD,Reg1,Immed-12

Example:ADDI x4,x9,123 # x4 = x9 + 0x0000007B

Description:Theimmediatevalue(asign-extended12-bitvalue,i.e.,-2,048..+2,047)isaddedtothecontentsofReg1andtheresultisplacedinRegD.

Comments:Thereisno“subtractimmediate”instructionbecausesubtractionisequivalenttoaddinganegativevalue.

Encoding: ThisisanI-typeinstruction.

Commentary:Thereisnodistinctionbetweensignedandunsignedaddition;thesamehardwareyieldscorrectresultregardlessofwhetherbothvaluesareconsideredtorepresentsignedvaluesorbothareconsideredtorepresentunsignedvalues.

Theonlydistinctionbetweensignedandunsignedadditionisinhowover2lowistobedetected.Thesameistrueofsubtraction.ThisisnotuniquetoRISC-V;itistrueofallcomputers.

Over2lowisignoredinRISC-V.Whenover2lowoccurs,noexceptionsareraisedandnostatusbitsareset.Thissatis2iestheneedsofthe“C”language,butmaybeproblematicforlanguagesthatmandateover2lowdetection.



Thus,RISC-Vhasonlyonesortofaddition;RISC-Vdoesnotdistinguishbetweensignedandunsignedaddition.Likewise,thereisnodistinctionbetweensignedandunsignedsubtraction.

Notethatover2lowdetectioncanbeprogrammedfairlysimply.Forexample,thefollowingcodesequencewilladdtwounsignedvaluesandbranchonover2low: ADDI RegD,Reg1,Immed-12

BLTU RegD,Reg1,overflow ifRegD<UReg1thenbranch

Thefollowingcodesequencewilladdtwosignedvaluesandbranchonover2low.However,itwillonlyfunctioncorrectlyaslongasweknowthattheimmediatevalueispositive. ADDI RegD,Reg1,Immed-12

BLT RegD,Reg1,overflow ifRegD<SReg1thenbranch

Inthegeneralcaseofsignedaddition,youcanusethefollowingcodesequence,whichwillrequireacoupleofadditionalregisters: ADD RegD,Reg1,Reg2

SLTI Reg3,Reg2,0 Reg3=(Reg2<S0)?1:0 SLT Reg4,RegD,Reg1 Reg4=(RegD<SReg1)?1:0 BNE Reg3,Reg4,overflow ifReg3≠Reg4thenbranch

AddImmediateWord

GeneralForm:ADDIW RegD,Reg1,Immed-12

Example:ADDIW x4,x9,123 # x4 = x9 + 0x0000007B

Description:Thisinstructionisonlypresentin64-bitand128-bitmachines.Theoperationisperformedusing32-bitarithmetic.

Theimmediatevalue(asign-extended12-bitvalue,i.e.,-2,048..+2,047)isaddedtothecontentsofReg1.Theresultisthentruncatedto32-bits,signed-extendedto64or128bitsandplacedinRegD.




RV32/RV64/RV128:TheADDIinstructionispresentinallmachinevariations.TheADDIinstructionisusedtoperform32-bitadditionona32-bitmachine,64-bitadditionona64-bitmachine,and128-bitadditionona128-bitmachine.

Toperform32-bitadditionona64-bitor128-bitmachine,theADDIWinstructionisused.

Toperform64-bitadditionona128-bitmachine,theADDIDinstructionisused.

Whenworkingwith32-bitdataona64-bitmachine,theupperhalfoftheregisterwillcontainnothingbutthesignextensionofthelowerhalf.Thisistrueregardlessofwhetherthedatainthelower32bitsistobeinterpretedasasignedinteger,anunsignedinteger,orsomethingotherthananinteger.

TounderstandhowtheADDIinstructiondiffersfromADDIWona64-bitmachine,considerthisexample:

0x 0000 0000 0000 0001 + 0x 0000 0000 7FFF FFFF -------------------------- 0x 0000 0000 8000 0000 Result of ADDI

0x 0000 0000 0000 0001 + 0x 0000 0000 7FFF FFFF -------------------------- 0x FFFF FFFF 8000 0000 Result of ADDIW

AddImmediateDouble

GeneralForm:ADDID RegD,Reg1,Immed-12

Example:ADDID x4,x9,123 # x4 = x9 + 0x0000007B

Description:Thisinstructionisonlypresentin128-bitmachines.Theoperationisperformedusing64-bitarithmetic.



Theimmediatevalue(asign-extended12-bitvalue,i.e.,-2,048..+2,047)isaddedtothecontentsofReg1.Theresultisthentruncatedto64-bits,signed-extendedto128bitsandplacedinRegD.


SignExtension:Considertheproblemofsign-extendinga32-bitvaluetoa64-bitvalueonaRV64machine.Forexample,the64-bitvalue:

0x 3B4C 204E 92A7 5321

shouldbe“coerced”to2itwithintherangeofa32-bitsignedinteger,resultingin:

0x FFFF FFFF 92A7 5321

TheADDIWinstructioncandothis.Simplyadding0tothevaluewillproducethedesiredresult.

Likewise,theADDIDinstructioncanbeusedtocoercea128-bitvalueintoa64-bitvalue.

Add

GeneralForm:ADD RegD,Reg1,Reg2

Example:ADD x4,x9,x13 # x4 = x9+x13

Description:ThecontentsofReg1isaddedtothecontentsofReg2andtheresultisplacedinRegD.

Comments:Thereisnodistinctionbetweensignedandunsigned.Over2lowisignored.

Encoding: ThisisanR-typeinstruction.

RV32/RV64/RV128:TheADDinstructionispresentinallmachinevariations.TheADDinstructionisusedtoperform32-bitadditionona32-bitmachine,64-bitadditionona64-bitmachine,and128-bitadditionona128-bitmachine.



Toperform32-bitadditionona64-bitor128-bitmachine,theADDWinstructionisused.

Toperform64-bitadditionona128-bitmachine,theADDDinstructionisused.

AddWord

GeneralForm:ADDW RegD,Reg1,Reg2

Example:ADDW x4,x9,x13 # x4 = x9+x13


RV32/RV64/RV128:Thisinstructionisonlypresentin64-bitand128-bitmachines.Theoperationisperformedusing32-bitarithmetic.

Comments:Thereisnodistinctionbetweensignedandunsigned.Over2lowbeyond32-bitsisignored.The32-bitresultissign-extendedto2illtheupperbitsofthedestinationregister.


AddDouble

GeneralForm:ADDD RegD,Reg1,Reg2

Example:ADDD x4,x9,x13 # x4 = x9+x13


RV32/RV64/RV128:Thisinstructionisonlypresentin128-bitmachines.Theoperationisperformedusing64-bitarithmetic.

Comments:



Thereisnodistinctionbetweensignedandunsigned.Over2lowbeyond64-bitsisignored.The64-bitresultissign-extendedto2illtheupperbitsofthedestinationregister.


Commentary:TheRISC-Vauthorsdecidedtode2inemanyinstructionsgenerally,withoutspecifyinghowmanybitstheyoperateon.Instead,thenumberofbitsoperatedonbytheinstructionisdeterminedbytheregistersizeofthemachine.

Forexample,theADDinstructionperforms32-bitadditiononaRV32machine,64-bitadditiononanRV64machine,and128-bitadditiononanRV128machine.

Toperform32-bitadditionona64-bitmachine,anewinstructionisincluded,sincetheupper32bitsoftheoperandsmustbeignoredandtheresultmusthavetheupper32bits2illedwiththepropersign-extension.

Soforthe64-bitmachine(RV64),theyaddedasecondinstruction(ADDW)toadd32-bitquantities.ForRV128,theyaddedathirdinstruction(ADDD)toadd64-bitquantities.

Thefollowingtableshowshowmanybitseachinstructionoperatesonandhighlightsaslightlackoforthogonality:

SizeofMachine 32-bit 64-bit 128-bit BasicInstructionSet: ADD 32 64 128 RV64adds: ADDW 32 32 RV128adds: ADDD 64

Analternateapproachisthefollowing:(ThisisnotRISC-V.)

SpecifythatthebasicADDinstructionwillperform32-bitadditionregardlessofregistersize;for64-bitand128-bitmachines,thisinstructionwillignoretheupperbitsoftheoperandsandsign-extendtheresult.Then,forRV64andRV128machines,anewinstruction(notpresentonRV32machines)isincludedto



perform64bitaddition.Finally,forRV128machines,athirdinstructionisincludedtoperform128bitaddition.

Toillustratethisalternateapproach,wecanmakeupreasonablenamesforthesehypotheticalinstructionsandshowwhichmachineshavewhichinstructionswiththefollowingtable: SizeofMachine 32-bit 64-bit 128-bit BasicInstructionSet: ADDW 32 32 32 RV64adds: ADDD 64 64 RV128adds: ADDQ 128

Thesetwoapproachesareequivalent,havingthesameinstructions,whichonlydifferintheirnames.Thenameschosenforinstructionsisanissueorthogonaltohowtheinstructionsareencoded.TheRISC-Vdesignersselectedthe2irstnamingschemetomirrortheinstructionopcodeencodingpatternstheychose.

Thisissueappliestothefollowinginstructiongroups:

ADDI ADD SUBI SUB NEG SLLI SLL SRLI SRL SRAI SRA LD ST MUL DIV DIVU REM



REMU

Ormorespeci2ically:

RV32 RV64 RV128 ADDI ADDIW ADDID ADD ADDW ADDD SUBI SUBIW SUBID SUB SUBW SUBD NEG NEGW NEGD SLLI SLLIW SLLID SLL SLLW SLLD SRLI SRLIW SRLID SRL SRLW SRLD SRAI SRAIW SRAID SRA SRAW SRAD LD LDW LDD ST STW STD MUL MULW MULD DIV DIVW DIVD DIVU DIVUW DIVUD REM REMW REMD REMU REMUW REMUD

Subtract

GeneralForm:SUB RegD,Reg1,Reg2

Example:SUB x4,x9,x13 # x4 = x9-x13

Description:ThecontentsofReg2issubtractedfromthecontentsofReg1andtheresultisplacedinRegD.

Comments:Thereisnodistinctionbetweensignedandunsigned.Over2lowisignored.




SubtractWord

GeneralForm:SUBW RegD,Reg1,Reg2

Example:SUBW x4,x9,x13 # x4 = x9-x13


RV32/RV64/RV128:Thisinstructionisonlypresentin64-bitand128-bitmachines.Theoperationisperformedusing32-bitarithmetic.



SubtractDouble

GeneralForm:SUBD RegD,Reg1,Reg2

Example:SUBD x4,x9,x13 # x4 = x9-x13


RV32/RV64/RV128:Thisinstructionisonlypresentin128-bitmachines.Theoperationisperformedusing64-bitarithmetic.





SignExtendWordtoDoubleword

GeneralForm:SEXT.W RegD,Reg1

Example:SEXT.W x4,x9 # x4 = Sign-extend(x9)

Description:Thisinstructionisonlyavailablefor64-bitand128-bitmachines.

Thevalueinthelower32bitsofReg1issigned-extendedto64or128bitsandplacedinRegD.

Comments:Thisinstructionisusefulwhena32-bitsignedvaluemustbe“coerced”toalargervalueon64-bitand128-bitmachine.

Encoding:Thisisaspecialcaseofamoregeneralinstruction.Thisinstructionisassembledidenticallyto:

ADDIW RegD,Reg1,0

Commentary:TheRISC-Vdocumentationusestwodistinctsuf2ixnotationstodenotethesizeofinstructions.Forexample:

ADDIWSEXT.W

SignExtendDoublewordtoQuadword

GeneralForm:SEXT.D RegD,Reg1

Example:SEXT.D x4,x9 # x4 = Sign-extend(x9)

Description:Thisinstructionisonlyavailablefor128-bitmachines.

Thevalueinthelower64bitsofReg1issigned-extendedto128bitsandplacedinRegD.




ADDID RegD,Reg1,0

Negate

GeneralForm:NEG RegD,Reg2

Example:NEG x4,x9 # x4 = -x9

Description:ThecontentsofReg2isarithmeticallynegatedandtheresultisplacedinRegD.

Comments:Theresultiscomputedbysubtractionfromzero.Over2lowcanonlyoccurwhenthemostnegativevalueisnegated.Over2lowisignored.


SUB RegD,x0,Reg2

NegateWord

GeneralForm:NEGW RegD,Reg2

Example:NEGW x4,x9 # x4 = -x9


RV64/RV128:Thisinstructionisonlypresentin64-bitand128-bitmachines.Theoperationisperformedusing32-bitarithmetic,whereastheNEGinstructionoperateson64-bitor128-bitquantitiesinthelargermachines.


SUBW RegD,x0,Reg2



NegateDoubleword

GeneralForm:NEGD RegD,Reg2

Example:NEGD x4,x9 # x4 = -x9


RV64/RV128:Thisinstructionisonlypresentin128-bitmachines.Theoperationisperformedusing64-bitarithmetic.


SUBD RegD,x0,Reg2

LogicalInstructions(AND,OR,XOR,…)

AndImmediate

GeneralForm:ANDI RegD,Reg1,Immed-12

Example:ANDI x4,x9,123 # x4 = x9 & 0x0000007B

Description:Theimmediatevalue(asign-extended12-bitvalue,i.e.,-2,048..+2,047)islogicallyANDedwiththecontentsofReg1andtheresultisplacedinRegD.


And

GeneralForm:AND RegD,Reg1,Reg2



Example:AND x4,x9,x13 # x4 = x9 & x13

Description:ThecontentsofReg1islogicallyANDedwiththecontentsofReg2andtheresultisplacedinRegD.


OrImmediate

GeneralForm:ORI RegD,Reg1,Immed-12

Example:ORI x4,x9,123 # x4 = x9 | 0x0000007B

Description:Theimmediatevalue(asign-extended12-bitvalue,i.e.,-2,048..+2,047)islogicallyORedwiththecontentsofReg1andtheresultisplacedinRegD.


Or

GeneralForm:OR RegD,Reg1,Reg2

Example:OR x4,x9,x13 # x4 = x9 | x13

Description:ThecontentsofReg1islogicallyORedwiththecontentsofReg2andtheresultisplacedinRegD.


XorImmediate

GeneralForm:XORI RegD,Reg1,Immed-12



Example:XORI x4,x9,123 # x4 = x9 ^ 0x0000007B

Description:Theimmediatevalue(asign-extended12-bitvalue,i.e.,-2,048..+2,047)islogicalXORedwiththecontentsofReg1andtheresultisplacedinRegD.


Xor

GeneralForm:XOR RegD,Reg1,Reg2

Example:XOR x4,x9,x13 # x4 = x9 ^ x13

Description:ThecontentsofReg1islogicallyXORedwiththecontentsofReg2andtheresultisplacedinRegD.


Not

GeneralForm:NOT RegD,Reg1

Example:NOT x4,x9 # x4 = ~x9

Description:ThecontentsofReg1isfetchedandeachofthebitsis2lipped.TheresultingvalueiscopiedintoRegD.


XORI RegD,Reg1,-1 # Note that -1 = 0xFFFFFFFF



ShiftingInstructions(SLL,SRL,SRA,…)

ShiftLeftLogicalImmediate

GeneralForm:SLLI RegD,Reg1,Immed-12

Example:SLLI x4,x9,5 # x4 = x9<<5

Description:Theimmediatevaluedeterminesthenumberofbitstoshift.ThecontentsofReg1isshiftedleftthatmanybitsandtheresultisplacedinRegD.Theshiftvalueisnotadjusted,i.e.,0meansnoshiftingisdone.(Apparentlyallvalues,including0,areallowed.???)

RV64/RV128:For32-bitmachines,theshiftamountmustbewithin0..31.For64-bitmachines,theshiftamountmustbewithin0..63.For128-bitmachines,theshiftamountmustbewithin0..127.


ShiftLeftLogical

GeneralForm:SLL RegD,Reg1,Reg2

Example:SLL x4,x9,x13 # x4 = x9<<x13

Description:RegisterReg2containstheshiftamount.ThecontentsofReg1isshiftedleftandtheresultisplacedinRegD.





ShiftRightLogicalImmediate

GeneralForm:SRLI RegD,Reg1,Immed-12

Example:SRLI x4,x9,5 # x4 = x9>>5

Description:Theimmediatevaluedeterminesthenumberofbitstoshift.ThecontentsofReg1isshiftedrightthatmanybitsandtheresultisplacedinRegD.Theshiftis“logical”,i.e.,zerobitsarerepeatedlyshiftedinonthemost-signi2icantend.



ShiftRightLogical

GeneralForm:SRL RegD,Reg1,Reg2

Example:SRL x4,x9,x13 # x4 = x9>>x13

Description:RegisterReg2containstheshiftamount.ThecontentsofReg1isshiftedrightandtheresultisplacedinRegD.Theshiftis“logical”,i.e.,zerobitsarerepeatedlyshiftedinonthemost-signi2icantend.





ShiftRightArithmeticImmediate

GeneralForm:SRAI RegD,Reg1,Immed-12

Example:SRAI x4,x9,5 # x4 = x9>>>5

Description:Theimmediatevaluedeterminesthenumberofbitstoshift.ThecontentsofReg1isshiftedrightthatmanybitsandtheresultisplacedinRegD.Theshiftis“arithmetic”,i.e.,thesignbitisrepeatedlyshiftedinonthemost-signi2icantend.



ShiftRightArithmetic

GeneralForm:SRA RegD,Reg1,Reg2

Example:SRA x4,x9,x13 # x4 = x9>>>x13

Description:RegisterReg2containstheshiftamount.ThecontentsofReg1isshiftedrightandtheresultisplacedinRegD.Theshiftis“arithmetic”,i.e.,thesignbitisrepeatedlyshiftedinonthemost-signi2icantend.





RV32/RV64/RV128:Theshiftoperationsde2inedaboveworkontheentireregister,regardlessofitssize.Fora64-bitmachine,theyshiftall64bits;fora128-bitmachine,theyshiftall128bits.

Notethatwhenusing64-bitregisterstocontain32-bitvalues,wecannotsimplyuse64-bitshiftoperations.Weneedsomenewinstructions.

Toillustratetheproblem,considertheresultofright-shiftingthefollowing32-bitvalueby5ona32-bitmachine: before: 0x 8000 0000

shiftrightlogical: 0x 0400 0000shiftrightarithmetic: 0x FC00 0000

Whathappensifwetrytouse64-bitshiftinstructions?

Ifthevalueisrepresentedasa64-bitsign-extendedvalue,wegetthewrongvalueforthelogicalshift: before: 0x FFFF FFFF 8000 0000

shiftrightlogical: 0x 07FF FFFF FC00 0000shiftrightarithmetic: 0x FFFF FFFF FC00 0000

Ontheotherhand,ifthevalueisrepresentedasa64-bitzero-extendedvalue,wegetthewrongvalueforthearithmeticshift: before: 0x 0000 0000 8000 0000

shiftrightlogical: 0x 0000 0000 0400 0000shiftrightarithmetic: 0x 0000 0000 0400 0000

Asaconsequence,severalnewshiftinstructionsareneededfor64-bitmachinestoperform32-bitshiftingcorrectly.FortheRV64architecturalextension,thefollowinginstructionsareadded: SLLIW RegD,Reg1,Immed-12

SLLW RegD,Reg1,Reg2SRLIW RegD,Reg1,Immed-12SRLW RegD,Reg1,Reg2SRAIW RegD,Reg1,Immed-12SRAW RegD,Reg1,Reg2

wheretheshiftamountsarerestrictedto5bits(i.e.,0..31).



Thesenewinstructionsperformashiftofbetween0and31bitsandtheyoperateonlyonthelowerorder32bits,leavingtheupper32bitsunmodi2ied.

Forexample,whenshiftingrightby5wegetthecorrectvalueinthelow-order32-bitsregardlessofwhatisintheupper-bits. before: 0x 89AB CDEF 8000 0000

SRLW(logical): 0x 89AB CDEF 0400 0000SRAW(arithmetic): 0x 89AB CDEF FC00 0000

Similarly,whenwegotoa128-bitmachine,anothergroupofshiftinstructionsisneededtoperform64-bitshiftingcorrectly.FortheRV128architecture,thefollowinginstructionsareadded: SLLID RegD,Reg1,Immed-12

SLLD RegD,Reg1,Reg2SRLID RegD,Reg1,Immed-12SRLD RegD,Reg1,Reg2SRAID RegD,Reg1,Immed-12SRAD RegD,Reg1,Reg2

wheretheshiftamountsarerestrictedto6bits(i.e.,0..63).

ThesenewRV128instructionsperformashiftofbetween0and63bitsandtheyoperateonlyonthelowerorder64bits,leavingtheupper64bitsunmodi2ied.

ShiftInstructionsforRV64andRV128

GeneralForm: SLLIW RegD,Reg1,Immed-12 # RV64 and RV128 only

SLLW RegD,Reg1,Reg2 # RV64 and RV128 onlySRLIW RegD,Reg1,Immed-12 # RV64 and RV128 onlySRLW RegD,Reg1,Reg2 # RV64 and RV128 onlySRAIW RegD,Reg1,Immed-12 # RV64 and RV128 onlySRAW RegD,Reg1,Reg2 # RV64 and RV128 only

SLLID RegD,Reg1,Immed-12 # RV128 onlySLLD RegD,Reg1,Reg2 # RV128 onlySRLID RegD,Reg1,Immed-12 # RV128 onlySRLD RegD,Reg1,Reg2 # RV128 onlySRAID RegD,Reg1,Immed-12 # RV128 onlySRAD RegD,Reg1,Reg2 # RV128 only



Comment: Seeabove.

MiscellaneousInstructions

Nop

GeneralForm:NOP

Example:NOP # Do nothing

Description:Thisinstructionhasnoeffect.

Comment:Thereareseveralwaystoencodethe“nop”operation.UsingADDIisthecanonical,recommendedway.Alsonotethatinstructionswiththebitencodingof0x00000000and0xFFFFFFFFarespeci2icallynotusedfor“nop”.Thesetwovaluesarecommonlyreturnedfrommemoryunitswhentheactualmemoryismissing(i.e.,unpopulated).Thetwovalues0x00000000and0xFFFFFFFFarespeci2iedas“illegalinstructions”andwillcausean“illegalinstructionexception”iffetchedandexecuted.


ADDI x0,x0,0

Move(RegistertoRegister)

GeneralForm:MV RegD,Reg1

Example:MV x4,x9 # x4 = x9

Description:ThecontentsofReg1iscopiedintoRegD.

Encoding:



Thisisaspecialcaseofamoregeneralinstruction.Thisinstructionisassembledidenticallyto:

ADDI RegD,Reg1,0

LoadUpperImmediate

GeneralForm:LUI RegD,Immed-20

Example:LUI x4,0x12345 # x4 = 0x12345<<12 (0x12345000)

Description:Theinstructioncontainsa20-bitimmediatevalue.Thisvalueisplacedintheleftmost(i.e.,upper,mostsigni2icant)20bitsoftheregisterRegDandtherightmost(i.e.,lower,leastsigni2icant)12-bitsaresettozero.

RV64/RV128:Thedescriptionaboveappliesto32-bitmachines.Regardlessofregistersize,theimmediatevalueismovedintobits31:12.Inthecaseof64-bitregistersor128-bitregisters,thevalueissign-extendedto2illtheupper32or96bits(i.e.,bits63:32orbits127:32).

Comment:Thisinstructionisoftenuseddirectlybeforeaninstructioncontaininga12-bitimmediatevalue,whichwillbeaddedintoRegD.Together,theyareusedtoeffectivelymakea32-bitvalue.Inthecaseof64-bitor128-bitmachines,thiswillbea32-bitsignedvalue,intherange-2,147,483,648..2,147,483,647(i.e.,-231..231-1).

Encoding: ThisisaU-typeinstruction.

LoadImmediate

GeneralForm:LI RegD,Immed-32

Example:LI x4,123 # x4 = 0x0000007B

Description:Theimmediatevalue(whichcanbeany32-bitvalue)iscopiedintoRegD.

Encoding:



Thisisaspecialcaseofmoregeneralinstructionsandisassembleddifferentlydependingontheactualvaluepresent.

Iftheimmediatevalueisintherangeof-2,048..+2,047,thenitcanbeassembledidenticallyto:

ADDI RegD,x0,Immed

Iftheimmediatevalueisnotwithintherangeof-2,048..+2,047butiswithintherangeofa32-bitnumber(i.e.,-2,147,483,648..+2,147,483,647)thenitcanbeassembledusingthistwo-instructionsequence:

LUI RegD,Upper-20ADDI RegD,RegD,Lower-12

where“Upper-20”representstheuppermost20bitsofthevalueand“Lower-12”representstheleastsigni2icant12-bitsofthevalue.

Commentary:Noticethattheassemblermustbecarefulwhencomputingthe“Upper-20”and“Lower-12”piecesfromanarbitrary32-bitvalue.Theassemblercannotsimplybreakthevalueapart.

Why?Becausetheimmediate12bitvaluewillbesign-extendedinthesecond(ADDI)instruction.Ifthemostsigni2icantbitof“Lower-12”is1,thentheADDIinstructionwillbeworkingwithanegativevalue,causingtheincorrectvaluetobecomputed.

Instead,theassemblerneedstocomputethis: Given:Value(a32-bitquantity) Compute: Lower12=Value[11:0] x=Value–SignExtend(Lower12) Upper20=(x>>12)[19:0]TheresultingLUI/ADDIsequencewillloadthecorrectsignedvalueon32,64,and128bitmachines,sincethequantitiesaresigned-extendedinbothinstructions.

AddUpperImmediatetoPC

GeneralForm:AUIPC RegD,Immed-20



Example:AUIPC x4,0x12345 # x4 = PC + (0x12345<<12)

Description:Theinstructioncontainsa20-bitimmediatevalue.Thisvalueismovedintotheleftmost(i.e.,upper,mostsigni2icant)20bitsofandtherightmost(i.e.,lower,leastsigni2icant)12-bitsaresettozero.ThenumbersocreatedisthenaddedtothecontentsoftheProgramCounter.TheresultisplacedinRegD.ThevalueofthePCusedhereistheaddressoftheinstructionthatfollowstheAUIPC.

RV64/RV128:Thedescriptionaboveappliesto32-bitmachines.Inthecaseof64-bitor128-bitsregisters,theimmediatevalueissign-extendedbeforebeingaddedtothePC.ThesizeofthePCequaltothesizeoftheregisters.

Comment:Thisinstructionisoftenuseddirectlybeforeaninstructioncontaininga12-bitimmediatevalue,whichwillbeaddedintoRegD.Together,theyareusedtoeffectivelymakea32-bitPC-relativeoffset.Thisisadequatetoaddressanylocationina32-bit(4GiByte)addressspace.

Inthecaseof64-bitor128-bitmachines,thiswillbea32-bitsignedoffset,intherange-2,147,483,648..2,147,483,647(i.e.,-231..231-1).Iftheaddressspaceislargerthan4GiBytes,thistechniquewillfail;adifferentinstructionsequenceisrequired.

ThecurrentPCcanbeobtainedbyusingthisinstructionwithanimmediatevalueofzero.UseofJALtodeterminethecurrentPCisnotrecommended.

Encoding: ThisisaU-typeinstruction.

LoadAddress

GeneralForm:LA RegD,Address

Example:LA x4,MyVar # x4 = &MyVar

Description:TheaddressofsomememorylocationiscopiedintoRegD.Noaccesstomemoryoccurs.

Encoding:



Thereisnoactual“loadaddress”instruction;insteadtheassemblersubstitutesasequenceoftwoinstructionstoachievethesameeffect.

The“address”canrefertoanylocationwithinthe32-bitmemoryspace.TheaddressisconvertedtoaPC-relativeaddress,withanoffsetof32bits.Thisoffsetisthenbrokenintotwopieces:a20-bitpieceanda12-bitpiece.Theinstructionisassembledusingthesetwoinstructions:

AUIPC RegD,Upper-20ADDI RegD,RegD,Lower-12

SetIfLessThan(Signed)

GeneralForm:SLT RegD,Reg1,Reg2

Example:SLT x4,x9,x13 # x4 = (x9<x13) ? 1 : 0

Description:ThecontentsofReg1iscomparedtothecontentsofReg2usingsignedcomparison.IfthevalueinReg1islessthanthevalueinReg2,thevalue1isstoredinRegD.Otherwise,thevalue0isstoredinRegD.


SetLessThanImmediate(Signed)

GeneralForm:SLTI RegD,Reg1,Immed-12

Example:SLTI x4,x9,123 # x4 = (x9<0x0000007B) ? 1 : 0

Description:Theimmediatevalue(asign-extended12-bitvalue,i.e.,-2,048..+2,047)iscomparedtothecontentsofReg1usingsignedcomparison.IfthevalueinReg1islessthantheimmediatevalue,thevalue1isstoredinRegD.Otherwise,thevalue0isstoredinRegD.




SetIfGreaterThan(Signed)

GeneralForm:SGT RegD,Reg1,Reg2

Example:SGT x4,x9,x13 # x4 = (x9>x13) ? 1 : 0

Description:ThecontentsofReg1iscomparedtothecontentsofReg2usingsignedcomparison.IfthevalueinReg1isgreaterthanthevalueinReg2,thevalue1isstoredinRegD.Otherwise,thevalue0isstoredinRegD.

Encoding:Thisisaspecialcaseofadifferentinstruction.Thisinstructionisassembledidenticallyto:

SLT RegD,Reg2,Reg1 # Note: regs are switched

Note:Thereisno“SetIfGreaterThanImmediate(signed)”instruction.

SetIfLessThan(Unsigned)

GeneralForm:SLTU RegD,Reg1,Reg2

Example:SLTU x4,x9,x13 # x4 = (x9<x13) ? 1 : 0

Description:ThecontentsofReg1iscomparedtothecontentsofReg2usingunsignedcomparison.IfthevalueinReg1islessthanthevalueinReg2,thevalue1isstoredinRegD.Otherwise,thevalue0isstoredinRegD.


SetLessThanImmediate(Unsigned)

GeneralForm:SLTIU RegD,Reg1,Immed-12

Example:SLTIU x4,x9,123 # x4 = (x9<0x0000007B) ? 1 : 0



Description:Theimmediatevalue(asign-extended12-bitvalue,i.e.,-2,048..+2,047)iscomparedtothecontentsofReg1usingunsignedcomparison.IfthevalueinReg1islessthantheimmediatevalue,thevalue1isstoredinRegD.Otherwise,thevalue0isstoredinRegD.


SetIfGreaterThan(Unsigned)

GeneralForm:SGTU RegD,Reg1,Reg2

Example:SGTU x4,x9,x13 # x4 = (x9>x13) ? 1 : 0

Description:ThecontentsofReg1iscomparedtothecontentsofReg2usingunsignedcomparison.IfthevalueinReg1isgreaterthanthevalueinReg2,thevalue1isstoredinRegD.Otherwise,thevalue0isstoredinRegD.

Encoding:Thisisaspecialcaseofadifferentinstruction.Thisinstructionisassembledidenticallyto:

SLTU RegD,Reg2,Reg1 # Note: regs are switched

Note:Thereisno“SetIfGreaterThanImmediateUnsigned”instruction.

SetIfEqualToZero

GeneralForm:SEQZ RegD,Reg1

Example:SEQZ x4,x9 # x4 = (x9==0) ? 1 : 0

Description:IfthevalueinReg1iszero,thevalue1isstoredinRegD.Otherwise,thevalue0isstoredinRegD.

Comment:Thisinstructionisimplementedwithanunsignedcomparisonagainst1.Usingunsignednumbers,theonlyvaluelessthan1is0.Thereforeiftheless-thanconditionholds,thevalueinReg1mustbe0.




SLTIU RegD,Reg1,1

SetIfNotEqualToZero

GeneralForm:SNEZ RegD,Reg2

Example:SNEZ x4,x9 # x4 = (x9≠0) ? 1 : 0

Description:IfthevalueinReg2isnotzero,thevalue1isstoredinRegD.Otherwise,thevalue0isstoredinRegD.

Comment:Thisinstructionisimplementedwithanunsignedcomparisonagainst0.Usingunsignednumbers,theonlyvaluenotlessthan0is0.Thereforeiftheless-thanconditionholds,thevalueinReg2mustbenotbe0.


SLTU RegD,x0,Reg2

SetIfLessThanZero(signed)

GeneralForm:SLTZ RegD,Reg1

Example:SLTZ x4,x9 # x4 = (x9<0) ? 1 : 0

Description:IfthevalueinReg1islessthanzero(usingsignedarithmetic),thevalue1isstoredinRegD.Otherwise,thevalue0isstoredinRegD.


SLT RegD,Reg1,x0



SetIfGreaterThanZero(signed)

GeneralForm:SGTZ RegD,Reg2

Example:SGTZ x4,x9 # x4 = (x9>0) ? 1 : 0

Description:IfthevalueinReg2isgreaterthanzero(usingsignedarithmetic),thevalue1isstoredinRegD.Otherwise,thevalue0isstoredinRegD.

Comment:“Reg2>0”isequivalentto“0<Reg2”.


SLT RegD,x0,Reg2

BranchandJumpInstructions

BranchifEqual

GeneralForm:BEQ Reg1,Reg2,Immed-12

Example:BEQ x4,x9,MyLabel # If x4==x9 goto MyLabel

Description:ThecontentsofReg1iscomparedtothecontentsofReg2.Ifequal,controljumps.ThetargetaddressisgivenasaPC-relativeoffset.Moreprecisely,theoffsetissign-extended,multipliedby2,andaddedtothevalueofthePC.ThevalueofthePCusedistheaddressoftheinstructionfollowingthebranch,notthebranchitself(???).Theoffsetismultipliedby2,sinceallinstructionsmustbehalfwordaligned.Thisgivesaneffectiverangeof-4,096..4,094(inmultiplesof2),relativetothePC.

Comment:Mostconditionalbranchesoccurwithinsmallishsubroutines/functions,sothelimitedrangeshouldbeadequate.



Ifthelimitedsizeoftheoffsetisinadequate,codesuchasthefollowing: BEQ x4,x9,MyLabel

mustbealteredto: BNE x4,x9,Skip J MyLabel # a “long jump”

Skip:Thisappliestoallconditionalbranchinstructions.

Encoding: ThisisaB-typeinstruction.

Commentary:Instructionsizes,locations,andrelativeoffsetsaredeterminedbytheassemblerandlinker,notbythecompiler.However,thecompiler(whichselectsinstructionsandproducestheassemblycodesequence)doesnothaveenoughknowledgetodeterminewhetheranoffsetwillbewithinrange.Therefore,itcannotknowwhetherornota“longjump”isnecessary.

Whatisthesolution?

(1)Thecompileralwayserrsonthesafesideandgeneratesmanyunnecessary“longjumps.”Notacceptable.

(2)Thecompilerignorestheproblem,alwaysusesshortjumps,andoccasionallygeneratescodethatwillnotassemblewithoutfurtherattention.Notacceptable.

(3)Thecompilerismodi2iedtocountinstructionsanddeterminewhen“longjumps”shouldbeinserted.Thisputsaburdenonthecompiler,butthecompilermayalreadybekeepingtrackofthelengthsofthecodesequencesitgenerates,inordertocomparealternatives.Inorderforthistowork,theassemblermustbeforbiddenfromdoingthingslikecompilingtheLIinstructionconditionally,asdescribedearlier.

(4)Theassemblerdetectsoffsetover2lowsandquietlyaltersthecodesequencebyreversingthesenseofthetestandinsertinga“longjump”instruction.Thismaybetheeasiest,butviolatesthetraditionalone-to-onemappingbetweenassemblerandmachinecode.



BranchifNotEqual

GeneralForm:BNE Reg1,Reg2,Immed-12

Example:BNE x4,x9,MyLabel # If x4≠x9 goto MyLabel

Description:ThecontentsofReg1iscomparedtothecontentsofReg2.Ifnotequal,controljumpstoaPC-relativetargetaddress.

Comment:Thetargetlocationgivenbytheoffsetmustbewithintherangeof-4,096..4,094(inmultiplesof2),relativetothePC.Seethe“BEQ”instruction.


BranchifLessThan(Signed)

GeneralForm:BLT Reg1,Reg2,Immed-12

Example:BLT x4,x9,MyLabel # If x4<x9 goto MyLabel

Description:ThecontentsofReg1iscomparedtothecontentsofReg2.IfReg1islessthanReg2(usingsignedcomparison),controljumpstoaPC-relativetargetaddress.



BranchifLessThanOrEqual(Signed)

GeneralForm:BLE Reg1,Reg2,Immed-12

Example:BLE x4,x9,MyLabel # If x4<=x9 goto MyLabel



Description:ThecontentsofReg1iscomparedtothecontentsofReg2.IfReg1islessthanorequaltoReg2(usingsignedcomparison),controljumpstoaPC-relativetargetaddress.


Encoding:Thisisaspecialcaseofanotherinstruction.Thisinstructionisassembledidenticallyto:

BGE Reg2,Reg1,Immed-12 # Note: regs are swapped

BranchifGreaterThan(Signed)

GeneralForm:BGT Reg1,Reg2,Immed-12

Example:BGT x4,x9,MyLabel # If x4>x9 goto MyLabel

Description:ThecontentsofReg1iscomparedtothecontentsofReg2.IfReg1isgreaterthanReg2(usingsignedcomparison),controljumpstoaPC-relativetargetaddress.



BLT Reg2,Reg1,Immed-12 # Note: regs are swapped

BranchifGreaterThanOrEqual(Signed)

GeneralForm:BGE Reg1,Reg2,Immed-12

Example:BGE x4,x9,MyLabel # If x4>=x9 goto MyLabel



Description:ThecontentsofReg1iscomparedtothecontentsofReg2.IfReg1isgreaterthanorequaltoReg2(usingsignedcomparison),controljumpstoaPC-relativetargetaddress.



BranchifLessThan(Unsigned)

GeneralForm:BLTU Reg1,Reg2,Immed-12

Example:BLTU x4,x9,MyLabel # If x4<x9 goto MyLabel

Description:ThecontentsofReg1iscomparedtothecontentsofReg2.IfReg1islessthanReg2(usingunsignedcomparison),controljumpstoaPC-relativetargetaddress.



BranchifLessThanOrEqual(Unsigned)

GeneralForm:BLEU Reg1,Reg2,Immed-12

Example:BLEU x4,x9,MyLabel # If x4<=x9 goto MyLabel

Description:ThecontentsofReg1iscomparedtothecontentsofReg2.IfReg1islessthanorequaltoReg2(usingunsignedcomparison),controljumpstoaPC-relativetargetaddress.





BGEU Reg2,Reg1,Immed-12 # Note: regs are swapped

BranchifGreaterThan(Unsigned)

GeneralForm:BGTU Reg1,Reg2,Immed-12

Example:BGTU x4,x9,MyLabel # If x4>x9 goto MyLabel

Description:ThecontentsofReg1iscomparedtothecontentsofReg2.IfReg1isgreaterthanReg2(usingunsignedcomparison),controljumpstoaPC-relativetargetaddress.



BLTU Reg2,Reg1,Immed-12 # Note: regs are swapped

BranchifGreaterThanOrEqual(Unsigned)

GeneralForm:BGEU Reg1,Reg2,Immed-12

Example:BGEU x4,x9,MyLabel # If x4>=x9 goto MyLabel

Description:ThecontentsofReg1iscomparedtothecontentsofReg2.IfReg1isgreaterthanorequaltoReg2(usingunsignedcomparison),controljumpstoaPC-relativetargetaddress.





ComparisonstoZero:Thefollowingabbreviationsaresimplevariationsonthepreviousinstructions.Theycompareavalueinaregistertozerousingsignedcomparison.Thezeroregister(x0)isimplicitintheseforms.

AbbreviatedForm: AssembledAs: BEQZ Reg1,target BEQ Reg1,x0,target BNEZ Reg1,target BNE Reg1,x0,target BLTZ Reg1,target BLT Reg1,x0,target BLEZ Reg1,target BGE x0,Reg1,target BGTZ Reg1,target BLT x0,Reg1,target BGEZ Reg1,target BGE Reg1,x0,target

Commentary:TheRISC-Varchitecturedoesnotincludeconditionallyexecutedinstructions.Theideawithaconditionallyexecutedinstructionisthattheinstructionexecutionispredicatedonsomeconditioncode(s).Theinstructionisexecuted,andwilleitherworknormallyorhavenoeffect,dependingonwhethertheconditionistrueorfalse.Thebene2itisthatconditionalbranchinstructions(withtheirpipelineissues)cansometimesbeavoided,resultinginimprovedef2iciency.TheRISC-Vdesignersconsideredandrejectedconditionalexecution.

JumpAndLink(Short-DistanceCALL)

GeneralForm:JAL RegD,Immed-20

Example:JAL x1,MyFunct # Goto MyFunct, x1=RetAddr

Description:Thisinstructionisusedtocallasubroutine(i.e.,function).Thereturnaddress(i.e.,thePC,whichistheaddressoftheinstructionfollowingtheJAL)issavedinRegD.



ThetargetaddressisgivenasaPC-relativeoffset.Moreprecisely,theoffsetissign-extended,multipliedby2,andaddedtothevalueofthePC.ThevalueofthePCusedistheaddressoftheinstructionfollowingtheJAL,nottheJALitself(???).Theoffsetismultipliedby2,sinceallinstructionsmustbehalfwordaligned.Thisgivesaneffectiverangeof±1MiByte,i.e.,-1,048,576..1,048,574(inmultiplesof2),relativetothePC.

AssemblerShorthand:Byconvention,x1isgenerallyusedasthe“linkregister”.Iftheregisterisnotmentioned,thenx1isimplied.

JAL MyFunct # Call MyFunctComment:

Theprogrammingconventionistouseregisterx1asa“linkregister.”Thisreturnaddressisstoredinx1,ratherthanbeingpushedontoastackinmemory,thusmakingcallsandreturnsfaster.However,ifsomeroutine“foo”willcallanotherroutine“bar”,then“foo”mustsavex1beforecalling“bar”.The“foo”routinemightpushx1ontoastackinmemory,or“foo”mightsavex1ina“callee-saved”registerinthehopeofavoidinganymemoryaccess.

Exceptions:Thismaygeneratean“instructionmisalignedexception.”Thetargetaddresswillnecessarilybeamultipleof2,butitmaynotbemultipleof4.Formachinesthatdonotsupport16-bitinstructions,thiswillcauseanalignmentexception.

Encoding: ThisisaJ-typeinstruction.

Commentary:Mostprogramscodesegmentsarefairlysmall,sothelimitedrangeofJALshouldbeadequate.

Iftheprogramsizeexceeds1MiByte,thelimitedsizeoftheoffsetmaybeinadequate.Codesuchasthefollowing:

JAL x1,MyFunct

mustbealteredto:

LUI Reg2,Offset-20JALR x1,Reg2,Offset-12



TypicallyseveralroutinesareseparatelycompiledsothecompilerwillnotknowhowfarawaythetargetfunctionisfromtheCALLinstruction.Eventheassemblerwillnotknowandonlyatlink-timecanitbedeterminedwhethera20-bitoffsetisadequate.Oneapproachisforthelinkertoprintoutaninscrutableerrormessage.

Jump(Short-Distance)

GeneralForm:J Immed-20

Example:J MyLabel # Goto MyLabel

Description:ThetargetaddressisgivenasaPC-relativeoffset.Theeffectiverangeis±1MiByte,i.e.,-1,048,576..1,048,574(inmultiplesof2),relativetothePC.

Exceptions:Thismaygeneratean“instructionmisalignedexception.”Thetargetaddresswillnecessarilybeamultipleof2,butitmaynotbemultipleof4.Formachinesthatdonotsupport16-bitinstructions,thiswillcauseanalignmentexception.


JAL x0,Immed-20 # Discard return address

JumpAndLinkRegister

GeneralForm:JALR RegD,Reg1,Immed-12

Example:JALR x1,x4,MyFunct # Goto MyFunct, x1=RetAddr

Description:Thisinstructionisusedtocallasubroutine(i.e.,function).Thereturnaddress(i.e.,thePC,whichistheaddressoftheinstructionfollowingtheJAL)issavedinRegD.

ThetargetaddressiscomputedbyaddingtheoffsettothecontentsofReg1.Moreprecisely,theoffsetissign-extendedandaddedtothevalueofReg1.The



offsetisnotmultipliedby2.Thisgivesaneffectiverangeof±2KiByte,i.e.,-2,048..2,047,relativetotheaddressinReg1.

Comment:Thisinstructioncanbeusedinseveralways.SeetheJR,RET,CALL,andTAILinstructions.

AssemblerShorthand:Byconvention,x1isusedasthe“linkregister”.Thefollowingform:

JALR Reg1 # Call *Reg1isusedshorthandfor:

JALR x0,Reg1,0 Exceptions:

Thismaygeneratean“instructionmisalignedexception.”Encoding: ThisisanI-typeinstruction.

JumpRegister

GeneralForm:JR Reg1

Example:JR Reg1 # Goto *Reg1, i.e., PC = Reg1

Description:JumptotheaddressinReg1.

Exceptions:Thismaygeneratean“instructionmisalignedexception.”


JALR x0,Reg1,0 # Discard ret addr; offset=0

Return

GeneralForm:RET

Example:RET # Goto *x1, i.e., PC = x1



Description:Byconvention,x1isusedasthe“linkregister”andwillholdareturnaddress.Thisinstructionreturnsfromasubroutine/function.

Exceptions:Thismaygeneratean“instructionmisalignedexception.”


JALR x0,x1,0 # PC=x1+0; don’t save prev PC

CallFarawaySubroutine

GeneralForm:CALL Immed-32

Example:CALL MyFunct # PC = new addr; x1 = ret addr

Description:Byconvention,x1isusedasthe“linkregister”andwillholdareturnaddress.Thisinstructioncallsasubroutine/functionusingaPC-relativescheme,wherethesubroutineoffsetfromtheCALLinstructionexceedsthe20-bitlimit(i.e.,±1MiByte)oftheJALinstruction.Thisinstructionmodi2iesregisterx6.

Encoding:Inordertodealwiththelargerdistancetothesubroutine,this“synthetic”instructionwillbeassembledusingthefollowingtwo-instructionsequence.Thetargetaddresscanbeexpressedasa32-bitoffsetfromtheProgramCounter.Thisoffsetisbrokenintotwopieces,whichareaddedtothePCintwosteps.

AUIPC x6,Immed-20 JALR x1,x6,Immed-12

TheAUIPCinstructionaddsthePCtotheupper20-bitportionofthe32-bitoffsetandplacestheresultinatemporaryregister.TheJALRinstructionaddsinthelower12bitsofthe32-bitoffsetandtransferscontrolbyloadingthesumintothePC.Italsosavesthereturnaddressinx1.

TheCALLinstructionmakesuseoftheconventionthatx1isthelinkregister.Italsousesx6,whichisa“caller-savedtemporaryregister”byconvention.



Commentary:Theactualvalues2illedintotheinstructions(intheImmed-20andImmed-122ields)mustbecomputedbythelinker,sincetheactualtargetaddresswill,ingeneral,notbeavailabletotheassembler.Thelinkermustconvertthetargetaddressintoa32-bitoffsetfromtheProgramCounter.Atruntime,thePCvalueusedwillbetheaddressoftheAUIPC(nottheJALRinstruction)sothevaluemustberelativetothataddress.

Alsonotethatthe12-bitimmediatevaluewillbesignextendedintheJALRinstruction,sosimplybreakingtheoffsetapartintopieceswillnotwork.Instead,thelinkermust2irstisolatethelow-order12bitsandsign-extendit.Thenthelinkermustsubtractitfromthefull-32-bitoffset,yieldingtheupperImmed-20portion.

ThisappliestotheTAILinstruction,too.

Exceptions: Thissequencemaygeneratean“instructionmisalignedexception”ifthetarget

offsetfromwhichtheimmediatevaluesarecomputedisnotwordalignedandtheprocessordoesnotsupport16-bitinstructions.

TailRecursion:Somesubroutines/functionsendbycallinganotherfunction.Forexample,thelaststatementinafunctionnamed“foo”maybetoinvoke/callafunctionnamed“bar.”Whenthecalledfunction(bar)returns,theoriginalfunction(foo)immediatelycompletesandreturns.Anyvaluereturnedbythecalledfunction(bar)willbeusedasthereturnvaluefromthecaller(foo).

Thispatternoftenoccursinfunctionalprograms,whererecursionisusedtoimplementrepetitiveloopingbehavior.Withoutspecialattention,somefunctionalprogramsthatuserecursioninthiswaycancreateverydeepstacksbypushingthousandsofreturnaddressesontothestack.

Acommonoptimizationiscalledthe“tailrecursionoptimization”.Hereishowitworks.Thecompiledcodeforthecaller(foo)ismodi2iedtonotactuallycallbar,butinsteadtosimplyjumptothe2irstinstructioninbar.Then,whenbarreturns,theprocessorwilleffectivelyreturnfromfoo,sincenootherreturnaddresswassaved.Furthermore,anythingreturnedbybarwillbenaturallybereturnedtofoo’scaller.Thetailrecursionoptimizationeffectivelyturnsfunctionalprogramswhichuserecursiontoperformloopingactivitiesintoinstructionsequencesthatuse



“goto”instructions.Itcovertsrecursiveprogramsintoloopingprogramswith“goto”instructions,makingrecursiveprogrammingtechniquespractical.

Quiteoftenarecursivefunctionwillcallitself.Insuchcasesofthetailrecursionoptimization,the“goto”neednotbealong-distancejumpandtheJinstructionwillsuf2ice.Butinothercases,mutualrecursionmightinvolveseparatelycompiledfunctionsandrequirethelong-distancejumpingabilityoftheTAILinstruction.

TailCall(FarawaySubroutine)/Long-DistanceJump

GeneralForm:TAIL Immed-32

Example:TAIL MyFunct # PC = new addr; Discard ret addr

Description:ThisinstructionisusedtojumptoadistantlocationusingaPC-relativescheme,wherethedisplacementfromtheTAILinstructiontothetargetinstructionexceedsthe20-bitlimit(i.e.,±1MiByte)oftheJinstruction(jumpshortdistance).Thisinstructionmodi2iesregisterx6.

Comment:Thisinstructionisnothingmorethanalong-distance“goto”instruction;thename“TAIL”maybeconfusing,butre2lectshowitwillsometimesbeused.

Encoding:This“synthetic”instructionwillbeassembledusingthefollowingtwo-instructionsequence.

AUIPC x6,Immed-20 JALR x0,x6,Immed-12

SeethecommentsfortheCALLinstruction.Theonlydifferenceisthatherethereturnaddressisdiscarded(x0),insteadofbeingsavedinx1.

Exceptions: LiketheCALLinstruction,thissequencemaygeneratean“instruction

misalignedexception.”



LoadandStoreInstructions

LoadByte(Signed)

GeneralForm:LB RegD,Immed-12(Reg1)

Example:LB x4,1234(x9) # x4 = Mem[x9+1234]

Description:An8-bitvalueisfetchedfrommemoryandmovedintoregisterRegD.ThememoryaddressisformedbyaddingtheoffsettothecontentsofReg1.Thevalueissign-extendedtothefulllengthoftheregister.

Comment:Thetargetlocationgivenbythe12-bitoffsetmustbewithintherangeof-2,048..2,047relativetothevalueinReg1.

Thereisnoalignmentissueandthisinstructionwillexecuteatomically.Encoding: ThisisanI-typeinstruction.

LoadByte(Unsigned)

GeneralForm:LBU RegD,Immed-12(Reg1)

Example:LBU x4,1234(x9) # x4 = Mem[x9+1234]

Description:An8-bitvalueisfetchedfrommemoryandmovedintoregisterRegD.ThememoryaddressisformedbyaddingtheoffsettothecontentsofReg1.Thevalueiszero-extendedtothefulllengthoftheregister.


Thereisnoalignmentissueandthisinstructionwillexecuteatomically.Encoding: ThisisanI-typeinstruction.



LoadHalfword(Signed)

GeneralForm:LH RegD,Immed-12(Reg1)

Example:LH x4,1234(x9) # x4 = Mem[x9+1234]

Description:A16-bitvalueisfetchedfrommemoryandmovedintoregisterRegD.ThememoryaddressisformedbyaddingtheoffsettothecontentsofReg1.Thevalueissign-extendedtothefulllengthoftheregister.


Theaddressofthememorylocationisnotrequiredtobeproperlyaligned(i.e.halfword-aligned),butitisassumedthatthisinstructionwillexecutefasterwithproperlyalignedaddresses.

Thisinstructionisguaranteedtoexecuteatomicallyiftheaddressisproperlyaligned.Iftheaddressisnotaligned,thereisnoguaranteeofatomicoperation.


LoadHalfword(Unsigned)

GeneralForm:LHU RegD,Immed-12(Reg1)

Example:LHU x4,1234(x9) # x4 = Mem[x9+1234]

Description:A16-bitvalueisfetchedfrommemoryandmovedintoregisterRegD.ThememoryaddressisformedbyaddingtheoffsettothecontentsofReg1.Thevalueiszero-extendedtothefulllengthoftheregister.




Theaddressofthememorylocationisnotrequiredtobeproperlyaligned(i.e.word-aligned),butitisassumedthatthisinstructionwillexecutefasterwithproperlyalignedaddresses.



LoadWord(Signed)

GeneralForm:LW RegD,Immed-12(Reg1)

Example:LW x4,1234(x9) # x4 = Mem[x9+1234]

Description:A32-bitvalueisfetchedfrommemoryandmovedintoregisterRegD.ThememoryaddressisformedbyaddingtheoffsettothecontentsofReg1.




RV64/RV128:Foramachinewitharegisterwidthlargerthan32-bits,thevalueissign-extendedtothefulllengthoftheregister.




LoadWord(Unsigned)

GeneralForm:LWU RegD,Immed-12(Reg1)

Example:LWU x4,1234(x9) # x4 = Mem[x9+1234]


A32-bitvalueisfetchedfrommemoryandmovedintoregisterRegD.Thevalueiszero-extendedtothefulllengthoftheregister.ThememoryaddressisformedbyaddingtheoffsettothecontentsofReg1.

Inamachinewith32-bitregisters,neithersign-extensionnorzero-extensionisnecessaryforvaluethatisalready32bitswide.Thereforethe“signedload”instruction(LW)doesthesamethingasthe“unsignedload”instruction(LWU),makingLWUredundant.





LoadDoubleword(Signed)

GeneralForm:LD RegD,Immed-12(Reg1)

Example:LD x4,1234(x9) # x4 = Mem[x9+1234




A64-bitvalueisfetchedfrommemoryandmovedintoregisterRegD.ThememoryaddressisformedbyaddingtheoffsettothecontentsofReg1.Signextensionoccursformachineswith128-bitregisters.





LoadDoubleword(Unsigned)

GeneralForm:LDU RegD,Immed-12(Reg1)

Example:LDU x4,1234(x9) # x4 = Mem[x9+1234]


A64-bitvalueisfetchedfrommemoryandmovedintoregisterRegD.Thevalueiszero-extendedto128-bits.ThememoryaddressisformedbyaddingtheoffsettothecontentsofReg1.







LoadQuadword

GeneralForm:LQ RegD,Immed-12(Reg1)

Example:LQ x4,1234(x9) # x4 = Mem[x9+1234]


A128-bitvalueisfetchedfrommemoryandmovedintoregisterRegD.ThememoryaddressisformedbyaddingtheoffsettothecontentsofReg1.





StoreByte

GeneralForm:SB Reg2,Immed-12(Reg1)

Example:SB x4,1234(x9) # Mem[x9+1234] = x4



Description:An8-bitvalueiscopiedfromregisterReg2tomemory.Theupper(moresigni2icant)bitsinReg2areignored.ThememoryaddressisformedbyaddingtheoffsettothecontentsofReg1.


Thereisnoalignmentissueandthisinstructionwillexecuteatomically.Encoding: ThisisanS-typeinstruction.

StoreHalfword

GeneralForm:SH Reg2,Immed-12(Reg1)

Example:SH x4,1234(x9) # Mem[x9+1234] = x4

Description:A16-bitvalueiscopiedfromregisterReg2tomemory.Theupper(moresigni2icant)bitsinReg2areignored.ThememoryaddressisformedbyaddingtheoffsettothecontentsofReg1.


Theaddressofthememorylocationisnotrequiredtobeproperlyaligned(i.e.halfword-aligned),butitisassumedthatthisinstructionwillexecutefasterwithproperlyalignedaddresses.


Encoding: ThisisanS-typeinstruction.



StoreWord

GeneralForm:SW Reg2,Immed-12(Reg1)

Example:SW x4,1234(x9) # Mem[x9+1234] = x4

Description:A32-bitvalueiscopiedfromregisterReg2tomemory.ThememoryaddressisformedbyaddingtheoffsettothecontentsofReg1.




RV64/RV128:Foramachinewitharegisterwidthlargerthan32-bits,theupperbitsoftheregisterareignored.


StoreDoubleword

GeneralForm:SD Reg2,Immed-12(Reg1)

Example:SD x4,1234(x9) # Mem[x9+1234] = x4

Description:Thisinstructionisonlyavailablein64-bitand128-bitmachines.

A64-bitvalueiscopiedfromregisterReg2tomemory.ThememoryaddressisformedbyaddingtheoffsettothecontentsofReg1.Fora128-bitmachinetheupperbitsoftheregisterareignored.




Theaddressofthememorylocationisnotrequiredtobeproperlyaligned(i.e.doubleword-aligned),butitisassumedthatthisinstructionwillexecutefasterwithproperlyalignedaddresses.



StoreQuadword

GeneralForm:SQ Reg2,Immed-12(Reg1)

Example:SQ x4,1234(x9) # Mem[x9+1234] = x4

Description:Thisinstructionisonlyavailablein128-bitmachines.

A128-bitvalueiscopiedfromregisterReg2tomemory.ThememoryaddressisformedbyaddingtheoffsettothecontentsofReg1.







Commentary:Forthe“store”instructions,theRISC-VdocumentationdoesnotmakeitclearwhethertheaddressisformedusingReg1orReg2,norwhetherthesourceregisterisReg1orReg2.However,basedontheassumptionthatthesameregisterwillbeusedforaddresscalculationinboth“load”and“store”instructionsandoncommentselsewhere,weconcludetheorderiswhatisshownhere.

Commentary:TheRISC-Vdocumentationsuggeststhatassemblerexpectsthememoryaddressoperandtobethesecondoperand,notthe2irst.Thisviolatesthegeneralpatterninassemblycodethatdatamovementistowardtheleft,i.e.,fromthesecondoperandtothe2irst.

Commentary:Byconvention,registerx2isusedasastacktoppointer(SP).However,theRISC-Varchitecturedoesnotcontainanyspecialinstructionsformanipulatingthestack;registerx2istreateduniformlytoallotherregistersbytheISA,eventhoughtheassemblermaysupportsomeshorthandnotationsthatfacilitateusex2asastackpointer.

Compilersfortraditionalprogramminglanguagestypicallyplacelocalvariablesinastackframe(i.e.,anactivationrecord)whichisconvenientlyaccessedusingthestacktoppointer.Toreadandwritesuchlocalvariables,the“load”and“store”instructionswillnaturallyuseoffsetsfromx2:

LB x5,offset(x2) # Load from variableSB x5,offset(x2) # Store into variableADDI x2,x2,4 # Pop 4 bytes off stackLW x5,0(x2) # Fetch word on stack top

Assumingthatstackframestendtohavemodestsizes,theRISC-Vinstructionset,whichutilizes12-bitoffsets(-2,048..+2,047),workswell.

Forglobal(i.e.,sharedorcommon)variables,oneapproachistokeepasinglepointertoablockofglobalvariables.Byconvention,registerx3isoftenusedassucha“globalbasepointer.”Again,assumingtheamountofmemoryrequiredfortheglobalvariablesismodest,the12-bitoffsetworkswell.

Incaseswherethedesiredoffsetsexceedthe12-bitlimit,the“loadupperimmediate”instruction(LUI)comestotherescue.RecallthatLUItakesa20-bitimmediatevalue,shiftsitleft12bits,andmovesitintoaregister.



Asanexample,thefollowinginstructionsequencecanbeusedtoloadavaluefromalocationoffsetfromabasepointer(inthiscasex2),whentheoffsetexceedsthe12-bitlimit.

LUI x5,Immed-20 # Grab the upper 20 bitsADD x5,x5,x2 # Add the base pointerLB x5,Immed-12(x5) # Add the lower 12 bits

Asimilarsequencecanbeusedfor“store”instructions,althoughanotherregisterwouldberequired.

Notethatsomecaremustbetakenwhenbreakinga32-bitoffsetinto12and20bitpiecesbecauseofthesign-extensionthatwilloccurwiththe12-bitportionwhenitisaddedin.

Commentary:Insomecasesitwillbenecessarytousethe“load”or“store”instructionstoaccessarbitrarylocationsinmemory.

Insomecases,aPC-relativeaddressispreferred;inothercasesanabsoluteaddressispreferred.

Anyaddresswithina32-bitaddressspacecanbeaccessedusinga32-bitoffsetfromthePC,sincea32-bitoffset(signedornot)fromthePCwillwraparoundfrom0xFFFFFFFFto0x00000000.Any32-bitoffsetcanbebrokenintotwopieces:a20-bitupperportionanda12-bitlowerportion.

Forcaseswhereanabsoluteaddress(i.e.,notaPC-relativeaddress)isdesired,recallthatthe“LoadUpperImmediate”instruction(LUI)containsa20-bitimmediatevaluewhichisshiftedleft12bitsandmovedintoaregister.

Forexample,toloadahalfwordfromanarbitrarylocationinmemory,aninstructionsequencelikethiscanbeused:

LUI x4,Immed-20LH x4,Immed-12(x4)

(Inthecaseofa“store”instruction,asecondregisterwouldbeneeded.)



Asmentionedelsewhere,caremustbetakenwhenbreakinga32-bitoffsetinto12and20bitpiecesbecauseofthesign-extensionthatwilloccurwiththe12-bitportionwhenitisaddedin.

ForcaseswhereaPC-relativeaddressisdesired,recallthatthe“AddUpperImmediatetoPC”instruction(AUIPC)containsa20-bitimmediatevaluewhichisshiftedleft12bits,addedtothePC,andstoredinaregister.

Forexample,toloadahalfwordusinganarbitraryPC-relativeoffset,aninstructionsequencelikethiscanbeused:

AUIPC x4,Immed-20LH x4,Immed-12(x4)

PC-relativeaddressesarepreferredinsituationswhere(1)thecodemaytherelocatedtoarbitrarylocationsafterlink-time;(2)theloadinstructionandthetargetareassembledinthesame2ileandtheresomereasontocompleteinstructiongenerationbeforelink-time;and(3)theaddressspaceexceeds32-bits(4GiBytes),butthetargetlocationlieswithin±2GiBytesoftheloadinstruction.

IntegerMultiplyandDivide

TheRISC-Vspecmakesthemultiplyanddivideinstructionsoptional.Thissectiondescribestheseinstructions,whichmayormaynotbeimplemented.(TherewasachangebetweenRISC-Vversion1.9and2.0;wedescribethemorerecentversionhere.)

Multiply

GeneralForm:MUL RegD,Reg1,Reg2

Example:MUL x4,x9,x13 # x4 = x9*x13

Description:ThecontentsofReg1ismultipliedbythecontentsofReg2andtheresultisplacedinRegD.



RV32/RV64/RV128:Regardlessofthesizeoftheregisters,theresultoftheirmultiplicationwillbetwiceaslarge,andthereforerequire2registerstocontain.Thisinstructioncapturesthelower-orderhalfoftheresultandmovesitintothedestinationregister.Seethecommentaryandtheothermultiplyinstructionsfortheupperhalf.

Comments:Thereisnodistinctionbetweensignedandunsigned;theresultisidentical.Over2lowisignored.


BinaryMultiplication–UpperBits,Signedvs.Unsigned

Whenevertwovaluesaremultiplied,theresultmayrequireuptotwiceasmanybitstorepresent.Forexample,considermultiplyingthesetwo8bitvalues:

1101 0101 1011 1011

Ifweviewtheseasunsignednumbers,wegetthisresult: 1101 0101 = 213 (unsigned) × 1011 1011 = 187 (unsigned)1001 1011 1001 0111 = 39,831 (unsigned)

Ifweviewtheseassignedvalues,wegetthisresult: 1101 0101 = -43 (signed) × 1011 1011 = -69 (signed)0000 1011 1001 0111 = 2,967 (signed)

Ifweviewonevalueassignedandtheotherasunsigned,wegetthisresult: 1101 0101 = -43 (signed) × 1011 1011 = 187 (unsigned)1110 0000 1001 0111 = -8,041 (signed)

Regardlessofwhetherweareinterpretingtheoperandsassignedorunsignednumbers,notethattheleast-signi2icanthalfoftheresultisalwaysthesamebits.Thisisnotacoincidence;itisalwaystrue.

However,themost-signi2icanthalfcanbedifferent,asthesethreecasesprove.



Forthisreason,theremustbethreeadditionalmultiplyinstructionstoobtainthemost-signi2icanthalfoftheresult:

MULH –signedoperands MULHU –unsignedoperands MULHSU –onesignedoperandandoneunsignedoperand

Wemaywishtodetectover2low,i.e.,tomakesuretheresultissmallenoughto2itintothesamenumberofbitsastheoperands.Foranunsignedresult,wecanchecktomakesurethebitsofthemost-signi2icanthalfareallzero.Forasignedresult,wemustchecktomakesurethatthebitsofthemost-signi2icanthalfareallthesameasthesignbitoftheleast-signi2icanthalf.

Multiply–HighBits(Signed)

GeneralForm:MULH RegD,Reg1,Reg2

Example:MULH x4,x9,x13 # x4 = HighBits(x9*x13)

Description:ThecontentsofReg1ismultipliedbythecontentsofReg2andthemost-signi2icanthalfoftheresultisplacedinRegD.Bothoperandsandtheresultareinterpretedassignedvalues.

Encoding:ThisisaR-typeinstruction.

Multiply–HighBits(Unsigned)

GeneralForm:MULHU RegD,Reg1,Reg2

Example:MULHU x4,x9,x13 # x4 = HighBits(x9*x13)

Description:ThecontentsofReg1ismultipliedbythecontentsofReg2andthemost-signi2icanthalfoftheresultisplacedinRegD.Bothoperandsandtheresultareinterpretedasunsignedvalues.




Multiply–HighBits(SignedandUnsigned)

GeneralForm:MULHSU RegD,Reg1,Reg2

Example:MULHSU x4,x9,x13 # x4 = HighBits(x9*x13)

Description:ThecontentsofReg1ismultipliedbythecontentsofReg2andthemost-signi2icanthalfoftheresultisplacedinRegD.Oneoperandisinterpretedassignedandoneoperandisinterpretedasunsignedandtheresultisinterpretedasasignedvalue.

Thespecsuggeststhat: Reg2=multiplier=signed Reg1=multiplicand=unsignedbutthisinterpretationisalsoapossibility??? Reg2=multiplier=unsigned Reg1=multiplicand=signed


RecommendedUsage:Typically,theprogrammerwillwanttoobtainthefullresultofamultiplication,i.e.,bothupperhalfandlowerhalves.Thisrequirestwomultiplyinstructions.

Forexample,thefollowingsequence

MULH x4,x9,x13 # compute upper halfMUL x5,x9,x13 # compute lower half

willplacetheresultintheregisterpairx4:x5.

Itisrecommendedthattheinstructionsalwaysbeplacedinthisorder,i.e.,withtheupperhalfcomputed2irstandthelowerhalfsecond.Insomeimplementations,theexecutionunitmayrecognizethiscommonprogrammingidiomand,atthemicro-architecturallevel,fusethesetwoinstructionsintoasinglemultiplicationoperation,therebyimprovingperformance.



AdditionalMultiplyInstructionFor64-bitand128-bitMachines

64-bitand128-bitmachinesalsoaddanadditionalmultiplyinstruction(MULW)toproperlyperform32-bitmultiplication,asitwouldbedoneona32-bitmachine.

Tounderstandwhythisisnecessary,considera64-bitmachinebeingusedtoexecute32-bitcodeoperatingon32-bitintegers.ARV64machinewillstoreevery32-bitvalueinthelower-orderbitsofa64-bitregister,withtheupper32bitscontainingthesignextensionofthelower-order32bits.

Butwhenwemultiplytwo32-bitvalues,theresultmightover2lowandbelargerthan32-bits.Theresultwedesireisthelow-order32bitsofthecorrectanswer,withasignextensionintheupper32bits.So,toproperlyemulatethebehaviorofa32-bitmultiply,wedon’tactuallywantthemathematicallycorrectresult.

Let’slookatanexample.Tomakeourexamplemanageable,let’sconsideramachinewith16-bitregisters,whicharebeingusedtocontain8-bitvalues(ratherthan64-bitregisterscontaining32-bitvalues).

Imaginethatwewishtomultiply117×94.Bothnumberscanberepresentedas8-bitsignedvalues,buttheresult(10,998)cannotbe.

0000 0000 0111 0101 = +117× 0000 0000 0101 1110 = +94 0010 1010 1111 0110 = +10,998

Thisisasituationwhereover2lowoccurs,inthesensethattheresultcannotberepresentedinonly8bits.Inordertoemulatean8-bitmachine,wewantthelow-order8bitsofthecorrectresult,withsignextensionintheupper8bits,likethis:

1111 1111 1111 0110

TheMULinstructionwouldgiveusthemathematicallycorrectresult,namely

0010 1010 1111 0110

Butwedon’twantthemathematicallycorrectresult.Insteadweneedanewinstructiontogiveustheappropriatevalueforemulatingthe8-bitmultiplyoperation,namelytherightlower-orderbits,withsignextensionintheupperpartoftheregister:



1111 1111 1111 0110

MultiplyWord

GeneralForm:MULW RegD,Reg1,Reg2

Example:MULW x4,x9,x13 # x4 = x9*x13

Description:ThecontentsofReg1ismultipliedbythecontentsofReg2andtheresultisplacedinRegD.Onlythelowerorder32-bitsoftheresultareused;thelower32bitsaresignedextendedtothefulllengthoftheregister.

Comment:Thisinstructionisusedtoproperlyemulate32-bitmultiplicationona64-bitor128-bitmachine.

Notethatonlytheleast-signi2icant32bitsofReg1andReg2canpossiblyaffecttheresult.

Ifyouwanttheupper32-bitsofthefull64-bitresultusetheMULinstructionona64-bitmachine.

RV32/RV64/RV128:Thisinstructionisonlyavailableon64-bitand128-bitmachines.


Commentary:Considerthefollowingsequencewhenexecutedona32-bitmachine:

MULH x4,x9,x13 # compute upper halfMUL x5,x9,x13 # compute lower half

Itwillplacethe64-bitresultintheregisterpairx4:x5.

Nowconsiderexecutingthissamesequenceona64-bitmachine.Thex5registerwillcontainthefull64-bitvalue,nota32-bit,sign-extendedvalue.Thex4registerwillcontainnothingbutmeaninglesssignbits.



Previouslywesaidthat32-bitcodecouldbeexecutedona64-bitmachinewithnochange:Theideawasthattheupper32bitsoftheregistersaresimplyignored.Thisisaclearexception:inthissequencevalidbitsareplacedintheupper32bits.Therefore,thecodesequencetoperforma32-bitmultiplywith64-bitresultmustbedifferentforRV32andRV64machines.

AdditionalMultiplyInstructionFor128BitMachines

Itwouldseemlogicalfora“double”versionofMUL(withthename“MULD”)toexistforRV128machines,inanalogytotheMULWinstruction.Howeverthespecdoesnotmentionthisinstruction.

IntegerDivision–BasicConcepts

Withdivisionandremainderthereisalwaysquestionabouthownegativeoperandsaretreated.

Considerdividingabyn(thatis,a/n).

q=aDIVn #computequotientr=aREMn #computeremainder

Thereisagreementthatthequotient(q)andremainder(r)mustobeytheseequations:

a=nq+r |r|<|q|

Manylanguages(C,C++,Java)perform“truncateddivision”:

q=trunc(a/n)r=a-ntrunc(a/n)

whichproducestheseresults:

7 / 3 = 2 7 % 3 = 1-7 / 3 = -2 -7 % 3 = -1 7 / -3 = -2 7 % -3 = 1 -7 / -3 = 2 -7 % -3 = -1

Anotherreasonablede2initionis“2looreddivision”:



q=⌊a/n⌋r=a-n⌊a/n⌋

whichproducesthefollowingresults.Thedot(•)indicatesdifferenceswithtruncateddivision.

7 / 3 = 2 7 % 3 = 1-7 / 3 = -3 • -7 % 3 = 2 • 7 / -3 = -3 • 7 % -3 = -2 •-7 / -3 = 2 -7 % -3 = -1

Thereisalsoade2initioncalled“Euclideandivision”,inwhichtheremainderisnevernegative.Thedot(•)indicatesdifferenceswithboththepreviousde2initions.

7 / 3 = 2 7 % 3 = 1-7 / 3 = -3 -7 % 3 = 2 7 / -3 = -2 7 % -3 = 1-7 / -3 = 3 • -7 % -3 = 2 •

TheRISC-VspecmakesnocommitmentregardingtheexactmeaningofDIVandREM.ThefollowingquotefromWikipediaispertinent:

“…Euclideandivisionissuperiortotheotheronesintermsofregularityandusefulmathematicalproperties,although2looreddivision…isalsoagoodde2inition.Despiteitswidespreaduse,truncateddivisionisshowntobeinferiortotheotherde2initions.”

— DaanLeijen,DivisionandModulusforComputerScientists

DivisionErrorConditions

RISC-Vsupportsbothsigneddivision(i.e.,dividingasignednumberbyasignednumber)andunsigneddivision(i.e.,dividinganunsignednumberbyanunsignednumber).Withmultiplication,youcouldmixsignedandunsigned;notsowithdivision.

Concerningthesizesoftheresult,notethatthefollowingmusthold,since|n|≥1:

|q|≤|a||r|<|a|



Thus,iftheoperands(aandn)are32-bits,thentheresults(qandr)willalmostalways2itinto32-bits.

Thereisexactlyoneexceptioninwhichtheresultswillnot2it.Thisiscalled“divisionover2low”anditcanonlyoccurwithsigneddivision.

LetMrepresent-231,whichisthemostnegativenumberrepresentablein32-bits.IfwedivideMby-1,theresultis+231,whichisonegreaterthanthelargest32-bitnumber.

TheRISC-Vspeci2iesthatdivisionover2lowwillbehandledasfollows:

CorrectResult: q=+231 r=0 ActualResult: q=-231 r=0

Theotherwell-knownissueiswithdivisionbyzero.Thespecsaysthattheresultwillbe:

q=0xFFFFFFFF r=a

Thiscanbeinterpretedas:

SignedDivide-By-ZeroResult: q=-1 r=a

UnsignedDivide-By-ZeroResult: q=+232-1 i.e.,themaximumunsignednumber r=a

Ourdiscussionoferrorconditionsisfor32-bitdivision,butnaturallyextendsandappliesto64-bitand128-bitdivision.

Bothdivide-by-zeroanddivision-over2lowerrorsoccurwithoutcausingatraporexceptionandinstructionexecutioncontinueswithnoindication.Infactthereare



noarithmetictrapsorexceptions.Ifsomehigh-levellanguagemandatesthatdivide-by-zeromustbecaught,thenthecompilermustinsertasingleBRANCH-IF-ZEROinstruction.Theactualresultscomputedbythehardwarearechosenbecausethesearethevaluesthatnaturallyresultfromatypicalhardwareimplementation.

Ifthehigh-levellanguagemandatesthatdivision-over2lowshouldbecaught(andtheyalloughtto!)thenatleastoneadditionaltest-and-branchinstructionmustbeexecuted.

Divide(Signed)

GeneralForm:DIV RegD,Reg1,Reg2

Example:DIV x4,x9,x13 # x4 = x9 DIV x13

Description:ThecontentsofReg1isdividedbythecontentsofReg2andthequotientisplacedinRegD.Bothoperandsandtheresultaresignedvalues.

Comments:Divide-by-zeroanddivision-over2lowresultinmathematicallyincorrectresults.Seediscussionabove.


Divide(Unsigned)

GeneralForm:DIVU RegD,Reg1,Reg2

Example:DIVU x4,x9,x13 # x4 = x9 DIV x13

Description:ThecontentsofReg1isdividedbythecontentsofReg2andthequotientisplacedinRegD.Bothoperandsandtheresultareunsignedvalues.

Comments:Divide-by-zeroproducesamathematicallyincorrectresult.Division-over2lowcannotoccur.Seediscussionabove.




Remainder(Signed)

GeneralForm:REM RegD,Reg1,Reg2

Example:REM x4,x9,x13 # x4 = x9 REM x13

Description:ThecontentsofReg1isdividedbythecontentsofReg2andtheremainderisplacedinRegD.Bothoperandsandtheresultaresignedvalues.



Remainder(Unsigned)

GeneralForm:REMU RegD,Reg1,Reg2

Example:REMU x4,x9,x13 # x4 = x9 REM x13

Description:ThecontentsofReg1isdividedbythecontentsofReg2andtheremainderisplacedinRegD.Bothoperandsandtheresultareunsignedvalues.



RecommendedUsage:Oftentheprogrammerwillwanttoobtainboththequotientandremainder.ItisrecommendedthattheDIVbedone2irstandtheREMbedonesecond.

Forexample:



DIV x4,x9,x13 # x4 = x9 DIV x13REM x5,x9,x13 # x5 = x9 REM x13

Insomeimplementations,theexecutionunitmayrecognizethiscommonpatternandfusethesetwoinstructionsintoasingledivisionoperation,therebyimprovingperformance.

AdditionalDivideInstructionsFor64BitMachines

For64-bitmachines,thereare4additionalDIVIDE/REMAINDERinstructionstoperform32-bitoperations.Inotherwords,theabove4instructionswillperform32-bitoperationsonaRV32machinebutwillperform64-bitoperationsonanRV64machine.Thefollowing4instructionswillworkjustliketheaboveinstructionswouldworkona32-bitmachine,bysign-extendingeverythingupto64-bits.

DivideWord(Signed)

GeneralForm:DIVW RegD,Reg1,Reg2

Example:DIVW x4,x9,x13 # x4 = x9 DIV x13


Description:ThecontentsofReg1isdividedbythecontentsofReg2andthequotientisplacedinRegD.Bothoperandsandtheresultaresignedvalues.Onlythelow-order32bitsoftheoperandsareusedandthe32-bitresultissigned-extendedto2illthedestinationregister.



DivideWord(Unsigned)

GeneralForm:DIVUW RegD,Reg1,Reg2



Example:DIVUW x4,x9,x13 # x4 = x9 DIV x13


Description:ThecontentsofReg1isdividedbythecontentsofReg2andthequotientisplacedinRegD.Bothoperandsandtheresultareunsignedvalues.Onlythelow-order32bitsoftheoperandsareusedandthe32-bitresultissigned-extended(???)to2illthedestinationregister.



RemainderWord(Signed)

GeneralForm:REMW RegD,Reg1,Reg2

Example:REMW x4,x9,x13 # x4 = x9 REM x13


Description:ThecontentsofReg1isdividedbythecontentsofReg2andtheremainderisplacedinRegD.Bothoperandsandtheresultaresignedvalues.Onlythelow-order32bitsoftheoperandsareusedandthe32-bitresultissigned-extendedto2illthedestinationregister.



RemainderWord(Unsigned)

GeneralForm:REMUW RegD,Reg1,Reg2



Example:REMUW x4,x9,x13 # x4 = x9 REM x13


Description:ThecontentsofReg1isdividedbythecontentsofReg2andtheremainderisplacedinRegD.Bothoperandsandtheresultareunsignedvalues.Onlythelow-order32bitsoftheoperandsareusedandthe32-bitresultissigned-extended(???)to2illthedestinationregister.



AdditionalDivideInstructionsFor128BitMachines

Itwouldseemlogicalfor“double”versionsoftheseinstructionstoexistforRV128machines.Howeverthespecdoesnotmentionthese.


Chapter4:FloatingPointInstructions

FloatingPointing–ReviewofBasicConcepts

Thissectioncanbeskippedifyouarefamiliarwith2loatingpointarithmetic.Thediscussionhereisgeneral,andnotspeci2ictoRISC-V.

TheIEEE754-2008standarddescribeshow2loatingpointnumbersaretoberepresentedandhow2loatingpointoperationsaretobeexecutedbycomputers.

Thestandardiscomplicatedanddetailed.Thissectionismeanttobeanintroductionandisnotanexhaustivedescription.MostmodernprocessorISAsimplementtheIEEE754-2008speci2ication,butthespeci2icationhasoptionsandsomepartsarenotfullyimplementedonsomecomputers.

Thespeci2icationde2inestwoimportantdatatypes:

SinglePrecision(32-bit“2loat”values) DoublePrecision(64-bit“double”values)

Somecomputersimplementonlysingleprecision;othercomputersimplementboth.

Ineithercase,theideaistorepresentarealrationalnumberinawaysimilartoscienti2icnotation.Forexample,thefollowingnumberisgiveninscienti2icnotation:

6.022×1023(anapproximationtoAvogadro’sconstant)

Withonly32bitsofprecision(or64bitsfordoubleprecision),therearelimitstotheamountofprecisionandthesizeoftheexponents.Theavailablebitsareusedasfollows:



Numberofbitsusedfor… Single Double Sign 1 1 Exponent 8 11 Value 23 52 Total 32 64

Furthermore,with2loatingpointanumericalvalueisrepresentedinbinary(notdecimal)andthisintroducessomesubtletieswhengoingbackandforthbetweentheinternalbitpatternsanddecimalrepresentationswhichhumanscanread.

Everypositiveintegercanberepresentedwitha2initenumberofdigitsanda2initenumberofbits.Forexample,hereisthesamenumber,representedbothways.Ofcourse,thisnumberrequiresafewmorecharactersinbinary,buttherepresentedvalueisequal.

2,468 (decimal) 100110100100 (binary)

Wecommonlyrepresentrationalnumbersindecimalusinga“decimalpoint”,asin:

123.456

Wecanalsorepresentrationalnumbersinbinaryusinga“binarypoint”,asin:

101.0101

Withdecimalnumbers,thepositionofeachdigitisimportantandwetalkabouttheplacevalueofthedigits.Theplacevaluesareallpowersof10:

… 1000 100 10 1 1/10 1/100 1/100 1/1000 … … 103 102 101 100 10-1 10-2 10-3 10-4 …

Wecanusetheplacevaluestocomputethevalueofanumber,asyoulearnedinprimaryschool:

123.456 =(1×102)+(2×101)+(3×100)+(4×10-1)+(5×10-2)+(6×10-3) =(1×100)+(2×10)+(3×1)+(4×0.1)+(5×0.01)+(6×0.001)



Likewise,withbinarynumbers,thevalueofeachbitisscaledaccordingtotheplacevalueofthebit.Butwithbinary,theplacevaluesareallpowersof2:

… 8 4 2 1 1/2 1/4 1/8 1/16 … … 23 22 21 20 2-1 2-2 2-3 2-4 …

Usingthis,wecanconvertbinarynumbersintodecimalnumbers:

101.0101 =(1×22)+(0×21)+(1×20)+(0×2-1)+(1×2-2)+(0×2-3)+(1×2-4) =(1×4)+(0×2)+(1×1)+(0×1/2)+(1×1/4)+(0×1/8)+(1×1/16) =4+1+1/4+1/16 =5.3125

Somerationalnumbersrequireanin2initenumberofdigitsintheirdecimalrepresentation.Forexample:

1/3=0.33333…=0.3(3)*

Likewise,somerationalnumbersmayrequireanin2initenumberofbitsintheirbinaryrepresentation.Butregardlessofwhetherwerepresentarationalnumberindecimalorbinary,thein2initestringsofdigits/bitswillsettleintoasimplerepeatingpattern.Thisistrueofallrationalnumbers,butirrationalnumbers(e.g.,𝜋,√2)donothavesuchsimpledecimalorbinaryrepresentations.Neithertheirdecimalnortheirbinaryexpansionswilleverexhibitarepeatingpattern.

Somenumbersmanyhavea2initerepresentationindecimalbutrequireanin2initesequenceinbinary.Forexample,thefollowingnumber:

4.3

requiresanin2initebinaryexpansiontorepresentit,namely:

100.01001100110011…=100.01(0011)*

Itturnsoutthateverybinarynumberwithoutarepeatingpartcanberepresentedwitha2initenumberofdecimaldigits.Furthermore,thenumberofdigitstotherightofthedecimalpointwillneverexceedthenumberofplacestotherightofthebinarypoint.Forexample:



101.1101(binary)=5.8125(decimal)

Turningto2loatingpointrepresentation,wehavelimitednumberofbitsavailable,whichmeanswecannotaccommodatearbitraryprecision.Noteverynumberisrepresentable,sowemustroundnumberstoanearbynumberthatisrepresentable.

Forexample,thenumber6.022×1023canonlyberepresentedapproximately,eventhoughitappearsnottohaveagreatamountofprecision.Hereistheclosestnumberthatcanberepresentedusingasingleprecision2loatingpoint:

6.02200013124147498450944×1023

Ontheotherhand,itturnsoutthatthisnumber:

2.383496609792×1012

canberepresentedexactlyusingonly32bitsingleprecision.Thenextlargestvaluethatcanberepresentedexactlyhappenstobe:

2.383496871936×1012

WecanmakethefollowingstatementsaboutIEEE754-20082loatingpointnumberrepresentation:

• Every2loatingpointnumbershasasign.Everynumberiseitherpositiveornegative.

• Therearetworepresentationsforzero:positivezero(i.e.,+0.0)andnegativezero(i.e.,-0.0).

• Therearetworepresentationsofin2inity:positivein2inity(+∞or+inf)andnegativein2inity(-∞or-inf)

• Theexponentmaybepositiveornegative,allowingbothverylargenumbersandverysmallnumbers.

• Thereisaspecialrepresentationcalled“notanumber”(“NaN”).Thisvaluecanrepresentamissingvalueortheresultofaunde2inedoperation,suchasdivide



byzero.Insomeimplementationstherearetwovariations,called“quietNaN”and“signalingNaN”.

• Every32-bitinteger(i.e.,everyintegerintherange-2,147,483,648to+2,147,483,647)canberepresentedexactlywitha64bitdoubleprecision2loatingpointnumber,butnotwithasingleprecision2loat.Infact,theintegerrangeisalittlegreater:every54-bitintegercanberepresentedexactlyindoubleprecision2loatingpoint.Almostalllargerintegerswillgetrounded.

• Every25-bitinteger(i.e.,everyintegerintherange-16,777,216to+16,777,215)canberepresentedexactlywitha32bitsingleprecision2loatingpointnumber.Almostallintegersoutsideofthisrangewillgetrounded.

Hereistherangeofvaluesthatcanberepresented.(Weusedecimalnotationhereandapproximatetheexactvalues.)

SinglePrecision Largestvalue: ~3.40282347×10+38 Smallestvalueabovezero: ~1.17549440×10-38 Digitsofaccuracy: about7

DoublePrecision Largestvalue: ~1.7976931348623157×10+308 Smallestvalueabovezero: ~2.2250738585072014×10-308 Digitsofaccuracy: about16

Rememberthatnoteveryvalueintheaboverangescanberepresented.Why?Recallthereisacountablein2inityofrationalnumbersjustbetweenanytwonumbers,yetwithonly32or64bits,weonlyhaveasmallnumberofuniquerepresentations.

Floatingpointarithmeticismeanttomimicmathematicalarithmetic,butitmustberememberedthattheyareonlyapproximatelythesame:

• Theexactvalueorresultofanoperationisnotalwaysrepresentable,sothecomputedanswerisoftennotmathematicallycorrect.

• Floatingpointadditionisnotalwaysassociative,duetoroundingerrors.Thatis,(x+y)+zisnotalwaysequaltox+(y+z).



• Floatingpointmultiplicationisnotalwaysassociative.Thatis,(x*y)*zisnotalwaysequaltox*(y*z).

• Floatingpointmultiplicationdoesnotalwaysdistributeoveradditionwiththeexactsameresults.Thatis,x*(y+z)isnotalwaysequalto(x*y)+(x*z).

However,wecansaythis:

• Floatingpointadditionandmultiplicationarecommutative,likemath.Forexample,x+y=y+x,soyoudon’thavetoworryabouttheorderofoperandsforasingleoperation.

Herearethedifferenttypesofthingsthatcanberepresentedina2loatingpointbitpattern:

•Positivezero(+0.0) •Negativezero(-0.0) •Positivein2inity(+∞or+inf) •Negativein2inity(-∞or-inf) •Not-a-number(NaN) QuietNan(qNaN) SignalingNan(sNaN)

•Normalnumbers(or“normalizednumbers”)•Denormalizednumbers(or“denormals”)

Zero–PositiveandNegative

Thereareexactlytwowaystorepresentzero,oneispositiveandtheotherisnegative.Thisisunlikemath,wherethereisonlyasinglenumbercalledzeroanditisunsigned.

Herearesomeinterestingbehaviors:

1/+0yields+∞ 1/-0yields–∞ +0willnormallycompareasequalto-0(e.g.,the==inthe“C”language) Somelanguagesprovideawaytodistinguish+0and-0.

Thereareadditionalbehaviors,suchas:



-0/-∞yields+0

Although+0and-0maycompareasequal,theymayalsoresultindifferentoutcomesinsomecomputations.Thischallengesourunderstandingofthemeaningof“equal”,tosaytheleast.

Thebitpatternrepresentationofzerois:

single double +0.0 0x00000000 0x0000000000000000 -0.0 0x80000000 0x8000000000000000

Notethatthe2loatingpointrepresentationfor+0.0isbit-for-bitidenticaltotherepresentationfor0inbinaryintegerrepresentation(bothsignedandunsigned).Also,-0.0isrepresentedidenticallytothemostnegativesignedinteger.

NotaNumber-NaN

Operationslikethefollowingaremathematicallyunde2inedand,whenattempted,willresultinaNaNresult,toindicatethattheresultisunde2ined.

0/0 ∞/∞ 0*∞

Otheroperationsaremathematicallyde2inedbutgiveacomplexresult.Complexnumbersarenothandledby2loatingpoint,sooperationssuchasthefollowingwillreturnNaN.

Squarerootofanegativenumber Logofanegativenumber

AnotheruseofNaNistorepresentanuninitializedormissingvalue.Ifavariableisusedbeforeitisinitialized,spuriousincorrectresultsmightoccur,butthiscanbeavoidedifthevariablecontainsNaN.

TheIEEEspecactuallymentionstwokindsofNaN:“signalingNaN”and“quietNaN”.Butusually,wejusttalkaboutNaNwithoutmakinganydistinctionaboutwhetheritissignalingorquiet.



A“signalingNaN”issupposedtocauseabreakinthe2lowofexecutionwhenitisencounteredinacomputation.Thatis,atraporexceptionofsomesortwilloccur,andthenormalinstructionsequencewillbeinterruptedimmediately.SignalingNaNsmightreasonablyusedforuninitializedvalues:theirusemayrepresentaprogrambugwhichneedsattention.Intheory,signalingNaNsmightalsobeusedasplaceholdersforvalues(suchascomplexnumbers)whichrequirespecialhandling.

Theideawitha“quietNaN”,isthatitcanbeusedasanoperandinarithmeticoperations.Furthermore,aquietNaNwillbepropagated.Thatis,ifoneoftheargumentsisaquietNaN,theresultwillalsobeaquietNaN.Thisallowsalengthysequenceofoperationstobeperformedquicklywithnospecialtestingforproblems.OnceaNaNappears,asaresultofsomeerror,itwillpersistinthechainofcomputations.Eachsubsequentoperationwillcompletenormally,withoutcausinganexceptionortrapeventhoughsomesortoferroroccurredearlierinthesequence.Ifanyproblemsoccuratanystepofthecomputation,the2inalresultwillbeaquietNaN.Therefore,itissuf2icienttoperformonlyasingletestforNaNaftertheentirecomputationtoseeifanyerrorsaroseatanystageofthecomputation.

ThespecdoesnotrequiresignalingNaNs;theyareoptional.OneimplementationapproachisforthehardwaretointerpretallNaNvaluesidentically,basicallyasquietNaNs.

ThereareseveralbitpatternsthatcanbeusedtorepresentNaNs,sothereisnotasinglebitpatternforNaN.

Avalueisde2inedtorepresentNaNif(1)theexponent2ieldisall1’s,and(2)thebitsofthefraction2ieldarenotallzero.(Ifthefractionbitsareallzero,thenthevalueiseither+∞or–∞.)ThesignbitofaNaNvalueisignored.

IfadistinctionbetweenquietandsignalingNaNisimplemented,thenoneofthebitsinthefraction2ieldwillbeusedtodistinguishbetweenquietandsignaling.

TheexactbitpatternsforNaNarenotfullyspeci2iedandcanvarybetweenimplementations.

Wecansaythatavaluewithallbitssettoone(i.e.,therepresentationforthesignedinteger-1whichis0xFFFFFFFFor0xFFFFFFFFFFFFFFFF)willde2initelyrepresentaNaNandwillalmostcertainlyrepresentaquietNaN.Forexample,thisall-onespatternwillbeaquietNaNforIntel,AMD,SPARC,ARM,RISC-V,etc.



MixingSingleandDoublePrecisionUsingNaN

Therearemanybitsinthefraction2ield,andtheonlyrequirementforNaNisthattheycannotallbezero.Thus,thereisroomtostoresomeadditionaldatawithintheNaN.SoaNaNcancarryasortof“payload”valueinthefractionbits.ThiscapabilitymayormaynotbeusedinaparticularimplementationofIEEE754-2008.

Forexample,thefraction2ieldinadoubleis52bits.AssumethatonebitisreservedtobealwayssettoindicatethatthisisaNaN,andassumethatasecondbitisreservedandusedtodistinguishbetweenaquietNaNandasignalingNaN.Thisleaves50bitsthatcanbeusedtostoreaarbitraryvalue.Noticethatthisisenoughroomtostoreanentiresingleprecision2loatingpointnumber.

Imagineamachinethatimplementsdoubleprecisionarithmeticanduses64-bitregisterstostore2loatingpointvalues.Howmightthismachinestore32-bitsingleprecisionvaluesinthesesameregisters?

Any64-bitvalueinwhichthehighorder32bitsareset,willbealwaysrecognizedasaNaN.Oneapproachtostoringasingleprecisionvalueina64bitregisteristostorethesingleprecisionvalueintheleastsigni2icantbits32bitsandall1sinthemostsigni2icant32bits.

Allsingleprecisionoperationswillonlylookattheleastsigni2icant32-bitsoftheoperandsand,fortheresultvalue,willalwayssetthemost-signi2icant32bitsto1s.

Anyaccidentalattempttoperformadoubleprecisionoperationonaregistercontainingasingleprecisionvalue,willinterpretthatoperandasaNaN.

NormalandDenormalizedNumbers

Noteverynumberisrepresentableandtherepresentablenumbersarespacedoutonthenumberline.Soeachpossible2loatingpointvalueisseparatedbyanumericaldistancefromthenextsmallestnumberandfromthenextlargestnumber.Asthenumbersgetsmallerandclosertozero,thespacinggetssmallerandthenumbersareclosertogether.Asthenumbersgetlarger,thespacingisfartherapart.

Forexample,thefollowingnumbersdifferbyaverysmallamount:



4.567×10-25 4.568×10-25

Ontheotherhand,thesetwonumbersdifferbyaverylargeamount:

4.567×10+25 4.568×10+25

Howeverinbothexamplesabove,theaccuracyisthesame:4digitsofprecision.

However,thereareonlyalimitednumberofbitsavailabletorepresenttheexponents.Exponentscannotcontinuetogetmorenegativeandwecannotrepresentsmallerandsmallernumbers,evermoreclosetozero.Therefore,thispatternofthe2loatingpointnumbersbecomingspacedevermorecloselyastheygetcloserandclosertozerocannotcontinue.Somethinghastochangeasthenumbersgetsmallerandapproachzero.

Whathappensisthatbelowsomesize,therepresentablevaluesaresimplyspaceduniformlyallthewaydowntozero.Thisistheroleofdenormalizednumbers.

Most2loatingpointnumbersare“normal”numbers.Normalnumbershaveabout7digitsofaccuracy(forsingleprecision)and16digitsofaccuracy(fordoubleprecision).Inotherwords,wecanapproximateanydesiredvaluewithabout7(or16)digitsofaccuracy.

Anotherwaytolookatdenormalizednumbersisthis:Forverysmallvalues,wecannotapproximatethevaluewithfullaccuracy.Aswegetcloserandclosertozero,wecanapproximatethetruevaluewithfewerandfewerplacesofaccuracy.Forreallytinyvalues,wemayevenbeforcedtouse0.0torepresentthevalue,essentiallylosingallaccuracy.

Wecanmakethefollowingstatementsaboutdenormalizednumbers:

• Alldenormalizednumbersareveryclosetozero.• Denormalizednumbersextendonboththepositiveandnegativesidesofzero.

• +0.0and-0.0arethemselvesrepresentedasdenormalizednumbers.• Alldenormalizednumbersareregularlyandevenlyspaced.(Exception:+0.0and-0.0haveanin2initesimaldifferenceandareconsideredequal.)



• Thelargestdenormalizednumberisjustlessthanthesmallestpositivenormalnumber.

• Likewise,themostnegativedenormalizednumberisjustgreaterthantheleastnegativenormalnumber.

• Itisgenerallysafetoignorethedistinctionbetweennormalizedanddenormalizednumberswhenusing2loatingpointinyourapplications.

Therearerulesfordeterminingtheprecisionoftheresultsofanarithmeticcalculationinvolvingscienti2icnotation.Butifverysmallvalues(i.e.,denormalizednumbers)ariseduringacomputation,thenyourassumptionsaboutprecisionwillbeviolatedandthe2inalresultswillhavereducedprecision.Insomecases,the2inalresultwillbeameaningless,incorrectvalue.

Warning:AlwaysrememberthatnumbersasrepresentedincomputersareNOTtruemathematicalnumbers.ComputerarithmeticisNOTmathematicalarithmetic.Remember:“int”sarenotintegersand“2loats”arenotrealorrationalnumbers.

Computervaluesandcomputationaremereapproximationsofmathematicallypureideals.Agoodprogrammerknowshowimportantitistounderstandandremembertheirdifferencesincreatingreliablesoftware.

RepresentationofSinglePrecisionFloatingPointValues

Asingleprecision2loatingpointnumberisrepresentedwitha32-bitwordasshownhere:

�

LetNbethenumberrepresented.Let“sign”bethemostsigni2icantbit.Let“exponent”bethebitpatterninbits30:23.Let“fraction”bethebitpatterninbits22:0.

Thenumberrepresentedis:

N=(-1)sign×1.fraction×2exponent



The2irsttermsimplygivesthesignofthenumber:0=positiveand1=negative.Notethatthemostsigni2icantbitholdsthesignbitforboth2loatingpointnumbersandsignedintegers.

Whenconvertinganumbersuchas101.0101into2loatingpointformat,youshould2irstshiftthedecimalplacetojustaftertheleftmost1bit.Everynumber(exceptzero)willalwayscontainatleastasingle1bit.Thus,themostsigni2icantbitmustbea1andrepresentingitisredundant.Thisexplainswhywepre2ixthefractionalpartwith“1.”.(Thistrickofmakingonebitimplicitdoesn’tworkwithdecimalnumbers:theleadingdigitcanbeanythingexcept0,sowecannotmakeitimplicit.)

Thereare8bitsintheexponent2ield.Theinterpretationofthe“exponent”bitpatternsis:

BitPattern MeaningofExponentField 0000 0000 DenormalizedNumbers,includingzero 0000 0001 -126 ... ... 0111 1110 -1 0111 1111 0 1000 0000 +1 ... ... 1111 1110 +127 1111 1111 InZinity,Not-a-Number

Fornormalizednumbers,theexponenthasaneffectiverangeof-126..+127.

Thesmallestpositivenormalizednumberis:

Inbinary: 1.00000000000000000000000×2-126 (Thereare23zeros)

Inbits: 0x00800000=00000000100000000000000000000000

Decimalapproximation: 1.17549435×10-38



Thelargestnormalizednumberis:

Inbinary: 1.11111111111111111111111×2+127 (Thereare1+23ones)

Inbits: 0x7F7FFFFF=01111111011111111111111111111111

Decimalapproximation: 3.4028235×10+38

Iftheexponentisallones(i.e.,11111111),thenthevalueofthefractionmatters.Ifthefractionisallzeros,thenthevalueis+∞or–∞dependingonthesignbit.

+∞: 0x7F800000=01111111100000000000000000000000

-∞: 0xFF800000=11111111100000000000000000000000

Iftheexponentisallones(i.e.,11111111)andthevalueofthefractionisnotallzeros,thenNaNisrepresented.TherearemultiplerepresentationsthataretobeinterpretedasNaNvalues.Thecanonical,preferredrepresentationofNaNisoftenthis:

NaN(typical): 0xFFFFFFFF=11111111111111111111111111111111

Iftheexponent2ieldisallzeros(i.e.,00000000),thenthevalueisadenormalizednumber.Thevalueofthenumberis:

N=(-1)sign×0.fraction×2-126

Noticethattheleadingimplicit“1”bitisnolongerassumed;itisnow“0”.Alsotheexponentisalways-126,whichhappenstobethesmallestexponentfornormalizednumbers.

Herearesomesamplenumbersthatmayhelpexplaindenormalizednumbers:



Smallestnormalizednumber: 1.00000000000000000000000×2-126 (24bitsofprecision) Largestdenormalizednumber: 0.11111111111111111111111×2-126 (23bitsofprecision) … Randomdenormalizednumber: 0.00000000001100101110101×2-126 (13bitsofprecision) … Smallestdenormalizednumber: 0.00000000000000000000001×2-126 (1bitofprecision) +0.0: 0.00000000000000000000000×2-126 (0bitsofprecision)

RepresentationofDoublePrecisionFloatingPointValues

Doubleprecision2loatingpointnumbersarerepresentedusingananalogousscheme.Theonlydifferenceisthenumberofbitsinthe“exponent”and“fraction”2ields.

Hereistherepresentationofa64-bitdoubleprecision2loatingpointvalue:

�

Thereare11bitsintheexponent2ield,insteadof8asinsingleprecision.Theinterpretationofthe“exponent”bitpatternsis:



BitPattern MeaningofExponentField 000 0000 0000 DenormalizedNumbers,includingzero 000 0000 0001 -1022 ... ... 011 1111 1110 -1 011 1111 1111 0 100 0000 0000 +1 ... ... 111 1111 1110 +1023 111 1111 1111 InZinity,Not-a-Number

Fornormalizednumbers,theexponenthasaneffectiverangeof-1022..+1023.

Thesmallestpositivenormalizednumberis:

Inbinary: 1.000000000...00000000000×2-1022 (Thereare52zeros)

Inbits: 0x0010000000000000


Thelargestnormalizednumberis:

Inbinary: 1.111111111...11111111111×2+1023 (Thereare1+52ones)

Inbits: 0x7FEFFFFFFFFFFFFF

Decimalapproximation: 1.7976931348623157×10+308

Thesmallestdenormalizednumberis:

Inbinary: 0.000000000...00000000001×2-1022

Inbits: 0x0000000000000001




OtherFloatingPointSizes

Inadditiontothewellknownsingleprecision(32-bit)anddoubleprecision(64-bit)sizes,theIEEE754-2008standardalsodescribesthese2loatingpointsizes.Theyaremuchlesscommon.

16bits(halfprecision) 128bits(quadrupleprecision) 256bits(octupleprecision)

Thereisalsomentionofdecimal-basedrepresentations.

Rounding

Theresultofacomputationmaybeanumberthatisnotpreciselyrepresentable.Forexample,theadditionoftwodoublenumbersmaynotbeexactlyrepresentableasadouble.

Forexample,herearetwonumberswith5digitsofprecision.Theirsumhas8digitsofprecision.

12.345 =1.2345×101 + .067891 =6.7891×10-2 12.412891 =1.2412891×101

TheIEEEspecsaysthattheexactresultshouldbe“rounded”toanumberthatcanberepresented.Forexample,whentwodoublesareadded,theirresultwillbesomedoublevalue.

Intheaboveexample,tobringtheresultbackdownto5digitsofprecision,someaccuracymustbesacri2iced.Likewise,when2loatingpointoperationsareperformed,theremaybealossofaccuracyasaresultofrounding.

TheIEEEspeclistsseveralwaysthatavaluecanberoundedtosomethingthatcanberepresented:



•Roundtothenearestnumber (Foratie,thevaluewithazerointheleastsigni2icantbitischosen.) •Roundtowardzero(i.e.,truncate) •Roundtowardpositivein2inity(i.e.,roundup) •Roundtowardnegativein2inity(i.e.,rounddown)

Inordertoperformroundingcorrectly,acomputermayneedtoperformcalculations(e.g.,multiplication)withgreaterprecisionto2irstcomputethecorrectvalue.Then,asthe2inalstepinthecalculation,thevaluemustbeproperlyroundedto2itintotheavailable2loatingpointbits.

Hereisanotherexample,showingthattheentireeffectofanoperationcanbelostastheresultofrounding. 12.345 =1.2345×101 + .00000067891 =6.7891×10-7 12.34500067891 =1.234500067891×101

Roundingtoavaluewiththesameprecision(eitherroundingdown,roundingtozero,orroundingtothenearest)givestheinitialoperand,unchanged.Hereistheroundedresult:

12.345 =1.2345×101

TheFloatingPointExtensions

TheRISC-Vspecdescribestheseextensionstosupport2loatingpointarithmetic

F–Singleprecision2loatingpoint(32bitvalues) D–Doubleprecision2loatingpoint(64bitvalues) Q–Quadprecision2loatingpoint(128bitvalues)

The“D”extensionisasupersetof“F”;whendoubleprecisionisimplemented,allinstructionsoperatingonsingleprecisionvalueswillalsobeincluded.

Likewise,the“Q”isasupersetof“D”;whenthe“Q”extensionisimplemented,all“F”and“D”instructionswillalsobeimplemented.



Inthisdocument,weintermixthedescriptionsof“F”,“D”,and“Q”extensions,ratherthandescribeoneaftertheother.

FloatingPointRegisters

Thereare322loatingpointregisters,namedf0,f1,…f31.

Inthe“F”extension,eachregistercanholdonesingleprecision2loatingpointvalue.Thus,eachregisteris32bitswide.Allregistersfunctionidentically;thereisnothingspecialaboutf0,asthereiswithx0oftheintegerregisters.

Inthe“D”extension,theseregistersareall64bitswideandcanholdeitherasingleprecisionvalueoradoubleprecisionvalue.Iftheregistercontainsasingleprecision2loatingpoint,thenthemostsigni2icant32bitsoftheregisterwillallbe1.Wheninterpretedasadouble,suchavaluewillberecognizedasaNaN.

Inthe“Q”extension,theregistersare128bitswide.Eachregistercanholdeitherasingle,double,orquadprecision2loatingpointvalue.Inthecaseofsmallervalues,theupperbitswillbesetto1sothatthevaluewillappeartobeNaNifinterpretedasalargerprecisionvalue.

Inadditionthereisa“FloatingPointControlandStatusRegister”(called“FCSR”).

TheFCSRisdividedintothefollowing2ields:

Bits Widthinbits Description 0 1 NX:Inexact 1 1 UF:Under2low 2 1 OF:Over2low 3 1 DZ:DivideByZero 4 1 NV:InvalidOperation 5:7 3 FloatingPointRoundingMode(FRM) 8:31 24 (unused)

TheFloatingPointControlandStatusRegister(FCSR)isoneofthe“ControlandStatusRegisters(CSRs),whicharediscussedelsewhereinthisdocument.Thisparticularregisterisfreelyaccessibleregardlessoftheoperatingmode(user,supervisor,ormachinemode),sothisneednotbeaconcernhere.



TherearesomegeneralpurposeinstructionsusedforreadingandwritingtheCSRsandthesearediscussedelsewhere.

Collectively,the5singlebit2ieldsarereferredtoasthe“FloatingPointFlags”(FFLAGS):

FFLAGS(FloatingPointFlags): NX:Inexact UF:Under2low OF:Over2low DZ:DivideByZero NV:InvalidOperation

These52lagscanberesettozerobywritingtotheFloatingPointControlandStatusRegister(FCSR).Theymaybesettoonefromtime-to-timeas2loatingpointinstructionsareexecuted.Onceset,theywillremainset.Therefore,thesebitscanbequeriedafterasequenceofinstructionstodetermineifanyofseveralunusualconditionshasoccurredduringthepreviousinstructionsequence.

Themeaningofeachbitisstraightforward.

Iftheresultofanoperationhadtoberounded,thenthe“NX:Inexact”bitwillbeset.Iftheresultistoosmallto2itinanormalizedformandisalsoinexact,thenthe“UF:Under2low”bitwillbesetandtheresultwillbereturnedasadenormalizedvalue.Iftheresultistoolargetoberepresented,thenthe“OF:Over2low”bitwillbesetand+∞or-∞willusuallybereturnedastheresult.The“DZ:DivideByZero”bitissetforoperationslike1/0andlog(0)and+∞or-∞willbereturnedastheresult.Certainoperationsareconsideredinvalid,suchas“squarerootofanegativenumber”.The“NV:InvalidOperation”bitwillbesetandNaN(bydefault,quietNaN)willbereturned.

FloatingPointinstructionsthathaveproblemswillsettheFFLAGSbitsbutwillnevercauseatraporexception.Instead,instructionexecutionwillcontinueuninterrupted.

IftheresultofaninstructionisNaN,thenaquietNaNwillbeinserted,unlessotherwisedocumented.RISC-VwillinsertthefollowingrepresentationforaquietNaN:



SinglePrecisionquietNaN: 0x7FC00000=0 11111111 10000000000000000000000

The“unused”zerobitsmaybeusedforotherRISC-Vextensions,butnormallytheywillappeartobezerowhenread.Floatingpointcodeshouldignorethevalueofthis2ieldandnotattempttochangeit.

Thereareseveraldifferentwaysthata2loatingpointresultcanberoundedto2itintotheavailableprecision:

•Roundtothenearestnumber (Foratie,thevaluewithazerointheleastsigni2icantbitischosen.) •Roundtowardzero(i.e.,truncate) •Roundtowardnegativein2inity(i.e.,rounddown) •Roundtowardpositivein2inity(i.e.,roundup)

Theroundingmodecanbeeitherstaticordynamic.

Thereisa“RoundingMode”(RM)2ieldineach2loatingpointinstruction.Instaticmode,theroundingmethodtobeusedisencodedintheRMbitswithintheinstruction,accordingtothesecodes:

000 Roundtothenearestnumber (Foratie,thevaluewithazerointheleastsigni2icantbitischosen.) 001 Roundtowardzero(i.e.,truncate) 010 Roundtowardnegativein2inity(i.e.,rounddown) 011 Roundtowardpositivein2inity(i.e.,roundup) 100 Roundtothenearestnumber (Foratie,roundawayfromzero.) 101 invalid 110 invalid 111 Usedynamicmode:ConsulttheCSRfortheroundingmethod.

Withthedynamicmode,theroundingmodewillbedeterminedbytheFloatingPointRoundingMode(FRM)bitsintheFloatingPointControlandStatusRegister(FCSR).IftheinstructionsRM2ieldcontains111,thentheroundingmodetobeusedisdetermineddynamicallywhentheinstructionisexecutedbyconsultingtheFCSR.TheFRMbitsintheFCSRtellwhichroundingmethodtouse,accordingtothefollowingtable.Thistableisthesameasabove,exceptforthelastline.



000 Roundtothenearestnumber (Foratie,thevaluewithazerointheleastsigni2icantbitischosen.) 001 Roundtowardzero(i.e.,truncate) 010 Roundtowardnegativein2inity(i.e.,rounddown) 011 Roundtowardpositivein2inity(i.e.,roundup) 100 Roundtothenearestnumber (Foratie,roundawayfromzero.) 101 invalid 110 invalid 111 invalid

Ifa2loatingpointinstructionwillnotneedtoperformroundingbutcontainsanRM2ield(forexample,theFNEGinstruction),theRM2ieldshouldbesetto000.

TheRISC-Vspecdoesnotindicateorsuggesthowtheroundingmodeshouldbewritteninassemblycode.???

The2lagsportion(NX,UF,OF,DZ,NV)oftheFloatingPointStatusRegister(FCSR)isaccessibleandmirroredinasecondCSRcalled(FFLAGS).Likewise,theroundingmodebitsoftheFloatingPointStatusRegister(FCSR)areaccessibleandmirroredinathirdCSRcalled(FRM).

ControlandStatusRegistersarecoveredelsewhere,buttheregisterconcernedwiththe2loatingpointextensionare:

CSRAddr Name Description001 f=lags Floatingpointing2lags002 frm Dynamicroundingmode003 fcsr Concatenationoffrm+f2lags

Theseregsareread/writeinanyprivilegemode.

Separateassemblerinstructionsarede2inedtoaccesstheseregisters,butthesearejustshorthand(syntacticsugar)forother,moregeneralinstructions.Theinstructionsare:

FRFLAGS-ReadtheFFLAGSregisterFRFLAGS RegD RegD=CSR[FFLAGS]

FSFLAGS-SwaptheFFLAGSregisterFSFLAGS RegD,Reg1 RegD=CSR[FFLAGS];CSR[FFLAGS]=Reg1



FRRM-ReadtheFRM(RoundingMode)registerFRRM RegD RegD=CSR[FRM]

FSRM-SwaptheFRM(RoundingMode)registerFSRM RegD,Reg1 RegD=CSR[FRM];CSR[FRM]=Reg1

Not-A-Number-NaN

Thespecsaysthatthe“canonicalNaN”is0x7fc00000.Presumably,theymeanaquietNaN.TheRISC-VspecmentionstheexistenceofasignalingNaN,sayingthatifoneisencounteredinanoperation,aninvalidinstructionexceptionwilloccur.However,thespecdoesnotsayhowasignalingNaNisdistinguished.???

FloatingPointLoadandStoreInstructions

Thefollowinginstructionswillmovedatabetweenmemoryanda2loatingpointregister.

WeusethenotationFReg1,FReg2,andFRegDtoindicate2loatingpointregisters,justasweuseReg1,Reg2,andRegDtoindicateintegerregisters.

FloatingLoad(Word)

GeneralForm:FLW FRegD,Immed-12(Reg1)

Example:FLW f4,1234(x9) # f4 = Mem[x9+1234]

Description:A32-bitvalueisfetchedfrommemoryandmovedinto2loatingregisterFRegD.ThememoryaddressisformedbyaddingtheoffsettothecontentsofReg1.






RISC-VExtensions:Thisinstructionrequiresthe“F”extension.


FloatingLoad(Double)

GeneralForm:FLD FRegD,Immed-12(Reg1)

Example:FLD f4,1234(x9) # f4 = Mem[x9+1234]

Description:A64-bitvalueisfetchedfrommemoryandmovedinto2loatingregisterFRegD.ThememoryaddressisformedbyaddingtheoffsettothecontentsofReg1.




RISC-VExtensions:Thisinstructionrequiresthe“D”extension.


FloatingLoad(Quad)

GeneralForm:FLQ FRegD,Immed-12(Reg1)



Comment: Analogous;Onlyavailablein“Q”quadprecision2loatingpointextension.

FloatingStore(Word)

GeneralForm:FSW FReg2,Immed-12(Reg1)

Example:FSW f4,1234(x9),f4 # Mem[x9+1234] = f4

Description:A32-bitvalueiscopiedfromregisterFReg2tomemory.ThememoryaddressisformedbyaddingtheoffsettothecontentsofReg1.




RISC-VExtensions:Thisinstructionrequiresthe“F”extension.


FloatingStore(Double)

GeneralForm:FSD FReg2,Immed-12(Reg1)

Example:FSD f4,1234(x9) # Mem[x9+1234] = f4

Description:A64-bitvalueiscopiedfromregisterFReg2tomemory.ThememoryaddressisformedbyaddingtheoffsettothecontentsofReg1.






RISC-VExtensions:Thisinstructionrequiresthe“D”extension.


FloatingStore(Quad)

GeneralForm:FSQ FReg2,Immed-12(Reg1)


BasicFloatingPointArithmeticInstructions

FloatingAdd

GeneralForm:FADD.S FRegD,FReg1,FReg2 (single precision)FADD.D FRegD,FReg1,FReg2 (double precision)FADD.Q FRegD,FReg1,FReg2 (quad precision)

Example:FADD.S f4,f9,f13 # f4 = f9+f13 (32 bits)FADD.D f4,f9,f13 # f4 = f9+f13 (64 bits)FADD.Q f4,f9,f13 # f4 = f9+f13 (128 bits)



Description:ThevalueinFReg1isaddedtothevalueinFReg2andtheresultisplacedinFRegD.

RoundingMode:Thisinstructioncontainsa3bitRoundingMode(RM)2ield,whichwilldeterminetheroundingmethodtobeused.

RISC-VExtensions:FADD.Srequiresthe“F”extension.FADD.Drequiresthe“D”extension.FADD.Qrequiresthe“Q”extension.


FloatingSubtract

GeneralForm:FSUB.S FRegD,FReg1,FReg2 (single precision)FSUB.D FRegD,FReg1,FReg2 (double precision)FSUB.Q FRegD,FReg1,FReg2 (quad precision)

Example:FSUB.S f4,f9,f13 # f4 = f9-f13 (32 bits)FSUB.D f4,f9,f13 # f4 = f9-f13 (64 bits)FSUB.Q f4,f9,f13 # f4 = f9-f13 (128 bits)

Description:ThevalueinFReg1issubtractedfromthevalueinFReg2andtheresultisplacedinFRegD.


RISC-VExtensions:FSUB.Srequiresthe“F”extension.FSUB.Drequiresthe“D”extension.FSUB.Qrequiresthe“Q”extension.




FloatingMultiply

GeneralForm:FMUL.S FRegD,FReg1,FReg2 (single precision)FMUL.D FRegD,FReg1,FReg2 (double precision)FMUL.Q FRegD,FReg1,FReg2 (quad precision)

Example:FMUL.S f4,f9,f13 # f4 = f9*f13 (32 bits)FMUL.D f4,f9,f13 # f4 = f9*f13 (64 bits)FMUL.Q f4,f9,f13 # f4 = f9*f13 (128 bits)

Description:ThevalueinFReg1ismultipliedbythevalueinFReg2andtheresultisplacedinFRegD.


RISC-VExtensions:FMUL.Srequiresthe“F”extension.FMUL.Drequiresthe“D”extension.FMUL.Qrequiresthe“Q”extension.


FloatingDivide

GeneralForm:FDIV.S FRegD,FReg1,FReg2 (single precision)FDIV.D FRegD,FReg1,FReg2 (double precision)FDIV.Q FRegD,FReg1,FReg2 (quad precision)

Example:FDIV.S f4,f9,f13 # f4 = f9/f13 (32 bits)FDIV.D f4,f9,f13 # f4 = f9/f13 (64 bits)FDIV.Q f4,f9,f13 # f4 = f9/f13 (128 bits)

Description:ThevalueinFReg1isdividedbythevalueinFReg2andtheresultisplacedinFRegD.




RISC-VExtensions:FDIV.Srequiresthe“F”extension.FDIV.Drequiresthe“D”extension.FDIV.Qrequiresthe“Q”extension.


FloatingMinimum

GeneralForm:FMIN.S FRegD,FReg1,FReg2 (single precision)FMIN.D FRegD,FReg1,FReg2 (double precision)FMIN.Q FRegD,FReg1,FReg2 (quad precision)

Example:FMIN.S f4,f9,f13 # f4 = Min(f9,f13) (32 bits)FMIN.D f4,f9,f13 # f4 = Min(f9,f13) (64 bits)FMIN.Q f4,f9,f13 # f4 = Min(f9,f13) (128 bits)

Description:ThevaluesinFReg1andFReg2arecomparedandthesmalleroneisplacedinFRegD.

RISC-VExtensions:FMIN.Srequiresthe“F”extension.FMIN.Drequiresthe“D”extension.FMIN.Qrequiresthe“Q”extension.


FloatingMaximum

GeneralForm:FMAX.S FRegD,FReg1,FReg2 (single precision)FMAX.D FRegD,FReg1,FReg2 (double precision)FMAX.Q FRegD,FReg1,FReg2 (quad precision)

Example:FMAX.S f4,f9,f13 # f4 = Max(f9,f13) (32 bits)



FMAX.D f4,f9,f13 # f4 = Max(f9,f13) (64 bits)FMAX.Q f4,f9,f13 # f4 = Max(f9,f13) (128 bits)

Description:ThevaluesinFReg1andFReg2arecomparedandthelargeroneisplacedinFRegD.

RISC-VExtensions:FMAX.Srequiresthe“F”extension.FMAX.Drequiresthe“D”extension.FMAX.Qrequiresthe“Q”extension.


FloatingSquareRoot

GeneralForm:FSQRT.S FRegD,FReg1 (single precision)FSQRT.D FRegD,FReg1 (double precision)FSQRT.Q FRegD,FReg1 (quad precision)

Example:FSQRT.S f4,f9 # f4 = sqrt(f9) (32 bits)FSQRT.D f4,f9 # f4 = sqrt(f9) (64 bits)FSQRT.Q f4,f9 # f4 = sqrt(f9) (128 bits)

Description:ThesquarerootofthevalueinFReg1iscomputedandtheresultisplacedinFRegD.


RISC-VExtensions:FSQRT.Srequiresthe“F”extension.FSQRT.Drequiresthe“D”extension.FSQRT.Qrequiresthe“Q”extension.

Encoding: ThisisaR-typeinstruction,inwhichtheReg22ieldisallzeros.



FloatingFusedMultiply-Add

GeneralForm:FMADD.S FRegD,FReg1,FReg2,FReg3 (single precision)FNMADD.S FRegD,FReg1,FReg2,FReg3 (single precision)FMADD.D FRegD,FReg1,FReg2,FReg3 (double precision)FNMADD.D FRegD,FReg1,FReg2,FReg3 (double precision)

Example:FMADD.S f2,f5,f6,f7 # f2 = (f5*f6)+f7FNMADD.S f2,f5,f6,f7 # f2 = (-(f5*f6))+f7FMADD.D f2,f5,f6,f7 # f2 = (f5*f6)+f7FNMADD.D f2,f5,f6,f7 # f2 = (-(f5*f6))+f7

Description:Thisinstructionperformsamultiplicationandanaddition.Avariation(whichisshownherewiththeopcodeFNMADD)willoptionallynegatetheproductbeforetheaddition.Seethespecconcerningcaseswhentheargumentsare0,∞,andNaN.


RISC-VExtensions:FMADD.SandFNMADD.Srequirethe“F”extension.FMADD.DandFNMADD.Drequirethe“D”extension.

Encoding: Thisinstructionisunusualinthatithasfouroperands,allofwhichare

registers.Thus,itdoesnot2itintoanyoftheinstructionformatsthatweredescribedearlier.ThisinstructiontypeiscalledanR4-typeinstruction.TheR4-typeinstructionformatisusedonlyforthisandthefusedmultiply-subtractinstructions.

FloatingFusedMultiply-Subtract

GeneralForm:FMSUB.S FRegD,FReg1,FReg2,FReg3 (single precision)FNMSUB.S FRegD,FReg1,FReg2,FReg3 (single precision)FMSUB.D FRegD,FReg1,FReg2,FReg3 (double precision)FNMSUB.D FRegD,FReg1,FReg2,FReg3 (double precision)



Example:FMSUB.S f2,f5,f6,f7 # f2 = (f5*f6)-f7FNMSUB.S f2,f5,f6,f7 # f2 = (-(f5*f6))-f7FMSUB.D f2,f5,f6,f7 # f2 = (f5*f6)-f7FNMSUB.D f2,f5,f6,f7 # f2 = (-(f5*f6))-f7

Description:Thisinstructionperformsamultiplicationandasubtraction.Avariation(whichisshownherewiththeopcodeFNMSUB)willoptionallynegatetheproductbeforethesubtraction.Seethespecconcerningcaseswhentheargumentsare0,∞,andNaN.


RISC-VExtensions:FMSUB.SandFNMSUB.Srequirethe“F”extension.FMSUB.DandFNMSUB.Drequirethe“D”extension.

Encoding: Thisinstructionisunusualinthatithasfouroperands,allofwhichare

registers.Thus,itdoesnot2itintoanyoftheinstructionformatsthatweredescribedearlier.ThisinstructiontypeiscalledanR4-typeinstruction.TheR4-typeinstructionformatisusedonlyforthisandthefusedmultiply-addinstructions.

FloatingFusedMultiply-Add/Subtract(Quad)

GeneralForm:FMADD.Q FRegD,FReg1,FReg2,FReg3FNMADD.Q FRegD,FReg1,FReg2,FReg3FMSUB.Q FRegD,FReg1,FReg2,FReg3FNMSUB.Q FRegD,FReg1,FReg2,FReg3


FloatingPointConversionInstructions



Thereisasetofinstructionsforconvertingfromonerepresentationtoanother.Thefollowingcodesareusedtonamethedifferentformats:

W 32-bitsignedinteger WU 32-bitunsignedinteger L 64-bitsignedinteger LU 64-bitunsignedinteger S Single-precision2loatingpointnumber D Double-precision2loatingpointnumber

FloatingPointConversion(Integerto/fromSinglePrecision)

GeneralForms:FCVT.W.S RegD,FReg1FCVT.WU.S RegD,FReg1FCVT.L.S RegD,FReg1 (availableinRV64orRV128only)FCVT.LU.S RegD,FReg1 (availableinRV64orRV128only)FCVT.S.W FRegD,Reg1FCVT.S.WU FRegD,Reg1FCVT.S.L FRegD,Reg1 (availableinRV64orRV128only)FCVT.S.LU FRegD,Reg1 (availableinRV64orRV128only)

Examples:FCVT.W.S r2,f5 # r2(int32) = f5(float)FCVT.WU.S r2,f5 # r2(uint32) = f5(float)FCVT.L.S r2,f5 # r2(int64) = f5(float)FCVT.LU.S r2,f5 # r2(uint64) = f5(float)FCVT.S.W f4,r7 # f4(float) = r7(int32)FCVT.S.WU f4,r7 # f4(float) = r7(uint32)FCVT.S.L f4,r7 # f4(float) = r7(int64)FCVT.S.LU f4,r7 # f4(float) = r7(uint64)

Description:Theseinstructionsperformaconversionofasingleprecision2loatingformatto/fromanintegerformat.

Rounding:TheseinstructionscontainRoundingMode(RM)bitswhichdeterminehowroundingistobedone.

Whenconvertingtoaninteger,thevalues+∞,NaN,andvaluesthataretoolargetoberepresentedinthetargetformatarestoredasthelargestinteger



valuethatcanberepresented.Thevalue–∞andvaluesthataretoonegativetoberepresentedinthetargetformatareconvertedintothesmallestintegervaluethatcanberepresented.

RISC-VExtensions:Theseinstructionsrequirethe“F”extension.Furthermore,theinstructionsinvolving64bitintegersarenotavailableinRV32,asnotedinthegeneralformsabove.

Encoding: ThisisanR-typeinstruction,inwhichtheReg22ieldisusedforadditional

opcodebits.

FloatingPointConversion(Integerto/fromDoublePrecision)

GeneralForms:FCVT.W.D RegD,FReg1FCVT.WU.D RegD,FReg1FCVT.L.D RegD,FReg1 (availableinRV64orRV128only)FCVT.LU.D RegD,FReg1 (availableinRV64orRV128only)FCVT.D.W FRegD,Reg1FCVT.D.WU FRegD,Reg1FCVT.D.L FRegD,Reg1 (availableinRV64orRV128only)FCVT.D.LU FRegD,Reg1 (availableinRV64orRV128only)

Examples:FCVT.W.D r2,f5 # r2(int32) = f5(double)FCVT.WU.D r2,f5 # r2(uint32) = f5(double)FCVT.L.D r2,f5 # r2(int64) = f5(double)FCVT.LU.D r2,f5 # r2(uint64) = f5(double)FCVT.D.W f4,r7 # f4(double) = r7(int32)FCVT.D.WU f4,r7 # f4(double) = r7(uint32)FCVT.D.L f4,r7 # f4(double) = r7(int64)FCVT.D.LU f4,r7 # f4(double) = r7(uint64)

Description:Theseinstructionsperformaconversionofadoubleprecision2loatingformatto/fromanintegerformat.

Rounding:TheseinstructionscontainRoundingMode(RM)bitswhichdeterminehowroundingistobedone.



Whenconvertingtoaninteger,thevalues+∞,NaN,andvaluesthataretoolargetoberepresentedinthetargetformatarestoredasthelargestintegervaluethatcanberepresented.Thevalue–∞andvaluesthataretoonegativetoberepresentedinthetargetformatareconvertedintothesmallestintegervaluethatcanberepresented.

RISC-VExtensions:Theseinstructionsrequirethe“D”extension.Furthermore,theinstructionsinvolving64bitintegersarenotavailableinRV32,asnotedinthegeneralformsabove.

Encoding: ThisisanR-typeinstruction,inwhichtheReg22ieldisusedforadditional

opcodebits.

ProgrammingHint:Tostore+0.0ina2loatingregister,usetheseinstructions:

FCVT.S.W FRegD,x0 # Single precisionFCVT.D.W FRegD,x0 # Double precision

FloatingPointConversion(Integerto/fromQuadPrecision)

GeneralForms:FCVT.W.Q RegD,FReg1FCVT.WU.Q RegD,FReg1FCVT.L.Q RegD,FReg1 (availableinRV64orRV128only)FCVT.LU.Q RegD,FReg1 (availableinRV64orRV128only)FCVT.Q.W FRegD,Reg1FCVT.Q.WU FRegD,Reg1FCVT.Q.L FRegD,Reg1 (availableinRV64orRV128only)FCVT.Q.LU FRegD,Reg1 (availableinRV64orRV128only)


FloatingPointConversion(Single/Doubleto/fromQuadPrecision)

GeneralForms:FCVT.S.Q FRegD,FReg1FCVT.Q.S FRegD,FReg1



FCVT.D.Q FRegD,FReg1FCVT.Q.D FRegD,FReg1


FloatingPointMoveInstructions

Thenextthreeinstructionscopyasignedvaluefromoneregistertoanotherregister,whilemodifyingthesignbitbasedonthesignfromanothervalue.

FloatingSignInjection(SinglePrecision)

GeneralForms:FSGNJ.S FRegD,FReg1,FReg2 (inject)FSGNJN.S FRegD,FReg1,FReg2 (negate)FSGNJX.S FRegD,FReg1,FReg2 (exclusive-or)

Examples:FSGNJ.S f2,f5,f6 # f2 = sign(f6) * |f5|FSGNJN.S f2,f5,f6 # f2 = -sign(f6) * |f5|FSGNJX.S f2,f5,f6 # f2 = sign(f6) * f5

Description:EachoftheseinstructionscopiesthevalueinFReg1intoFRegD.Howeverthesignoftheresultischanged,basedonthesignofthevalueinFReg2.Inthe“inject”instruction,thesignfromFReg2isusedfortheresult.Inthe“negate”instruction,thesignfromFReg2is2irst2lippedandthenusedfortheresult.Inthe“exclusive-or”instruction,thesignsfromFReg1andFReg2areXOR-edtogetherandthenusedfortheresult.

Comments:TheseinstructionsareusefulasimplementationsofspecialcasesfortheinstructionsFNEG.S,FABS.S,andFMV.S.Inaddition,theIEEE754-2008speccallsfora“copysign”operation.Theseoperationsmayalsohaveuseinsomemathlibraryfunctions.

RISC-VExtensions:Theseinstructionsrequirethe“F”extension.Presumably,inthe“D”extension,thisinstructionwillonlymodifythelowerorder32bits,butitispossiblethatitwillalsosettheupper32bitstoallones.???




FloatingSignInjection(DoublePrecision)

GeneralForms:FSGNJ.D FRegD,FReg1,FReg2 (inject)FSGNJN.D FRegD,FReg1,FReg2 (negate)FSGNJX.D FRegD,FReg1,FReg2 (exclusive-or)

Examples:FSGNJ.D f2,f5,f6 # f2 = sign(f6) * |f5|FSGNJN.D f2,f5,f6 # f2 = -sign(f6) * |f5|FSGNJX.D f2,f5,f6 # f2 = sign(f6) * f5

Description:EachoftheseinstructionscopiesthevalueinFReg1intoFRegD.Howeverthesignoftheresultischanged,basedonthesignofthevalueinFReg2.Inthe“inject”instruction,thesignfromFReg2isusedfortheresult.Inthe“negate”instruction,thesignfromFReg2is2irst2lippedandthenusedfortheresult.Inthe“exclusive-or”instruction,thesignsfromFReg1andFReg2areXOR-edtogetherandthenusedfortheresult.

Comments:TheseinstructionsareusefulasimplementationsofspecialcasesfortheinstructionsFNEG.D,FABS.D,andFMV.D.Inaddition,theIEEE754-2008speccallsfora“copysign”operation.Theseoperationsmayalsohaveuseinsomemathlibraryfunctions.

RISC-VExtensions:Theseinstructionsrequirethe“D”extension.


FloatingSignInjection(DoublePrecision)

GeneralForms:FSGNJ.Q FRegD,FReg1,FReg2 (inject)FSGNJN.Q FRegD,FReg1,FReg2 (negate)FSGNJX.Q FRegD,FReg1,FReg2 (exclusive-or)




FloatingMove

GeneralForms:FMV.S FRegD,FReg1 (single precision)FMV.D FRegD,FReg1 (double precision)FMV.Q FRegD,FReg1 (quad precision)

Examples:FMV.S f2,f5 # f2 = f5 (32 bits)FMV.D f2,f5 # f2 = f5 (64 bits)FMV.Q f2,f5 # f2 = f5 (128 bits)

Description:ThevalueinFReg1iscopiedintoFRegD.

RISC-VExtensions:FMV.Srequiresthe“F”extension.FMV.Drequiresthe“D”extension.FMV.Qrequiresthe“Q”extension.

Encoding:Theseinstructionsarespecialcasesofmoregeneralinstructions.Theyareassembledidenticallyto:

FSGNJ.S FRegD,FReg1,FReg1 (inject) FSGNJ.D FRegD,FReg1,FReg1 (inject)

FloatingNegate

GeneralForms:FNEG.S FRegD,FReg1 (single precision)FNEG.D FRegD,FReg1 (double precision)FNEG.Q FRegD,FReg1 (quad precision)

Examples:FNEG.S f2,f5 # f2 = -(f5) (32 bits)FNEG.D f2,f5 # f2 = -(f5) (64 bits)FNEG.Q f2,f5 # f2 = -(f5) (128 bits)

Description:ThevalueinFReg1isnegatedandthencopiedintoFRegD.



RISC-VExtensions:FNEG.Srequiresthe“F”extension.FNEG.Drequiresthe“D”extension.FNEG.Qrequiresthe“Q”extension.


FSGNJN.S FRegD,FReg1,FReg1 (negate) FSGNJN.D FRegD,FReg1,FReg1 (negate)

FloatingAbsoluteValue

GeneralForms:FABS.S FRegD,FReg1 (single precision)FABS.D FRegD,FReg1 (double precision)FABS.Q FRegD,FReg1 (quad precision)

Examples:FABS.S f2,f5 # f2 = |f5| (32 bits)FABS.D f2,f5 # f2 = |f5| (64 bits)FABS.Q f2,f5 # f2 = |f5| (128 bits)

Description:TheabsolutevalueofthequantityinFReg1iscopiedintoFRegD.

RISC-VExtensions:FABS.Srequiresthe“F”extension.FABS.Drequiresthe“D”extension.FABS.Qrequiresthe“Q”extension.


FSGNJX.S FRegD,FReg1,FReg1 (exclusive-or) FSGNJX.D FRegD,FReg1,FReg1 (exclusive-or)

FloatingMoveTo/FromIntegerRegister(SinglePrecision)

GeneralForms:FMV.X.W RegD,FReg1FMV.W.X FRegD,Reg1



Examples:FMV.X.W x2,f5 # x2 = f5FMV.W.X f5,x2 # f5 = x2

Description:32-bitsarecopiedfroma2loatingpointregistertoanintegerregister,orfromanintegerregistertoa2loatingpointregister.Thebitsareunaltered;i.e.,noconversionisperformed.

InthecaseofRV64andRV128(wheretheintegerregistersarelargerthan32bits),whenthedestinationisanintegerregister,theupperbitsoftheintegerregisterwillbe2illedwithzeros.Whenthesourceisanintegerregister,theupperbitswillbeignored.

Noerrorconditionscanarise.

Inanolderversionofthespec,theletter“S”wasusedinplaceof“W”intheopcode,i.e.,“FMV.X.S”and“FMV.S.X”.

RISC-VExtensions:Theseinstructionsrequirethe“F”extension.


FloatingMoveTo/FromIntegerRegister(DoublePrecision)

GeneralForms:FMV.X.D RegD,FReg1FMV.D.X FRegD,Reg1

Examples:FMV.X.D x2,f5 # x2 = f5FMV.D.X f5,x2 # f5 = x2

Description:64-bitsarecopiedfroma2loatingpointregistertoanintegerregister,orfromanintegerregistertoa2loatingpointregister.Thebitsareunaltered;i.e.,noconversionisperformed.

InthecaseofRV128(wheretheintegerregistersarelargerthan64bits),whenthedestinationisanintegerregister,theupperbitsoftheintegerregisterwillbe2illedwithzeros.Whenthesourceisanintegerregister,theupperbitswillbeignored.



Noerrorconditionscanarise.RISC-VExtensions:

Theseinstructionsrequirethe“D”extension.Encoding: ThisisanR-typeinstruction.

Note:The“Q”extensiondoesnotprovidethefollowinginstructions.Thisisintentional.

FMV.X.Q RegD,FReg1FMV.Q.X FRegD,Reg1

Inordertomoveaquadprecisionvaluefroma2loatingpointregistertoanintegerregister,you’llneedtomoveittomemory2irst,thenmoveittotheintegerregister.

FloatingPointCompareandClassifyInstructions

FloatingPointComparison

GeneralForms:FLT.S RegD,FReg1,FReg2 (single precision)FLE.S RegD,FReg1,FReg2 (single precision)FEQ.S RegD,FReg1,FReg2 (single precision)

FLT.D RegD,FReg1,FReg2 (double precision)FLE.D RegD,FReg1,FReg2 (double precision)FEQ.D RegD,FReg1,FReg2 (double precision)

FLT.Q RegD,FReg1,FReg2 (quad precision)FLE.Q RegD,FReg1,FReg2 (quad precision)FEQ.Q RegD,FReg1,FReg2 (quad precision)



Examples:FLT.S x2,f5,f6 # x2 = (f5<f6) ? 1 : 0FLE.S x2,f5,f6 # x2 = (f5≤f6) ? 1 : 0FEQ.S x2,f5,f6 # x2 = (f5=f6) ? 1 : 0

FLT.D x2,f5,f6 # x2 = (f5<f6) ? 1 : 0FLE.D x2,f5,f6 # x2 = (f5≤f6) ? 1 : 0FEQ.D x2,f5,f6 # x2 = (f5=f6) ? 1 : 0

FLT.Q x2,f5,f6 # x2 = (f5<f6) ? 1 : 0FLE.Q x2,f5,f6 # x2 = (f5≤f6) ? 1 : 0FEQ.Q x2,f5,f6 # x2 = (f5=f6) ? 1 : 0

Description:Two2loatingpointvaluesinthe2loatingpointregistersarecomparedandavalueindicatingtheresultisstoredinanintegerregister.Theinstructionsstorea“1”iftherelationship(<,≤,or=)holdstrueand“0”iftherelationshipdoesnothold.

Thereisnoneedfor>or≥instructionssincetheprogrammercanachievethesameresultbyusingthe<and≤instructionsbysimplyswappingthetworegisteroperands.

ErrorConditions:IfeitheroperandisaNot-a-Number(NaN)valuethena“0”willbestoredinthedestinationregister.Thisappliestoallthreecomparisoninstructions(<,≤,and=).

TheIEEEspecrequiresthatthe<or≤shouldresultina“signalingNaN”ifeitheroperandisaNaN.ForRISC-V,theinstructionexecutionwillneverbeinterrupted,soasignalingNaNcannotbeimplementedbyinterruptingtheinstruction2low.RISC-Vhandlesthiscaseasfollows:For<and≤comparisons,ifeitheroperandisNaN,thentheNV(invalidoperation)bitintheFloatingPointerControlandStatusRegister(FCSR)willbesetto1.Otherwise,thebitwillbeleftunchanged.

TheIEEEspectreatsthe=comparisondifferently,sayingthatifeitheroperandisNaN,thentheresultshouldbea“quietNaN”.TheRISC-Vimplementsthisafollows:Forthe=comparison,theNV(invalidoperation)bitwillneverbemodi2ied.

RISC-VExtensions:The“.S”instructionsrequirethe“F”extension.



The“.D”instructionsrequirethe“D”extension.The“.Q”instructionsrequirethe“Q”extension.


FloatingPointClassify

GeneralForms:FCLASS.S RegD,FReg1 (single precision)FCLASS.D RegD,FReg1 (double precision)FCLASS.Q RegD,FReg1 (quad precision)

Examples:FCLASS.S x2,f5 # x2 = class(f5)FCLASS.D x2,f5 # x2 = class(f5)FCLASS.Q x2,f5 # x2 = class(f5)

Description:Theinstructionlooksata2loatingpointvalueanddetermineswhatsortofanumberitis.Forexample,doesthevaluerepresent+0.0,-0.0,Nan,+∞,-∞,etc?

Theresultoftheclassi2icationofthe2loatingpointvaluewillbestoredinanintegerregister,asfollows:Allbitsoftheintegerregisterwillbeclearedto“0”,exceptthatexactlyonebitwillbesetto“1”,toindicatewhatsortofthingthe2loatingpointregistercontains.

Herearethepossibleresultsthatcanbestoredintotheintegerregister:

BitSet ValueStored Meaning0 1 FReg1contains−∞.1 2 FReg1containsanegativenormalnumber.2 4 FReg1containsanegativesubnormalnumber.3 8 FReg1contains−0.4 16 FReg1contains+0.5 32 FReg1containsapositivesubnormalnumber.6 64 FReg1containsapositivenormalnumber.7 128 FReg1contains+∞.8 256 FReg1containsasignalingNaN.9 512 FReg1containsaquietNaN.



RISC-VExtensions:The“.S”instructionsrequirethe“F”extension.The“.D”instructionsrequirethe“D”extension.The“.Q”instructionsrequirethe“Q”extension.



Chapter5:RegisterandCallingConventions

StandardUsageofGeneralPurposeRegisters

The32bitversionoftheRISC-Varchitecturetreatsallregistersidenticallysotheassemblylanguageprogrammerisfreetouseanyregisterashe/shewishes.Theonlyexceptionisregisterx0,whichalwayscontainszero,andcanbeusedasadestinationwhenthedatashouldbediscarded.

However,thereisaRISC-V“standardcallingconvention”whichspeci2ieshowregisterswillnormallybeusedbythecompilerandmostassemblylanguageprogrammerswillfollowthisconvention.

Inthe“C”(CompressedInstructions)extension,16bitinstructionsaresupported,andeachcompressedinstructionisashorter,abbreviatedformofa32-bitinstruction.Thecompressedinstructionsetwasdesignedundertheassumptionthattheregisterswillbeusedinthestandardwaysandonlythemostcommoninstructionsaregiven16bitequivalents.Assuch,theregistersarenottreatedidenticallyinthecompressedinstructionset.

Forexample,the32bit“call”instruction(JAL)cansavethereturnaddressinanyregister,althoughthestandardconventionsmandatethatregisterx1willalwaysbeusedtosavethereturnaddress.Makinguseofthisconvention,the16bitversionoftheinstruction(C.JAL)savesthereturnaddressinx1andonlythisregister.Thereisnocompressedinstructiontosavethereturnaddressinanyotherregister,whichisreasonablesincethisisnotsomethingnormallydone.

Thenamesofthegeneralpurposeregistersarex0,x1,…x31.Theregistersarealsogivensecond,alternatenames.Theassemblerwillaccepteithername,sotheprogrammercanusewhichevernameseemsclearest.

Herearetheregisters:



Other SavedAcross Name Name Description Calls x0 zero Zero x1 ra ReturnAddress x2 sp StackPointer yes /calleesaved x3 gp GlobalPointer x4 tp ThreadPointer x5 t0 Temp x6 t1 Temp

x7 t2 Temp x8 s0,fp SavedReg/FramePointer yes/calleesaved x9 s1 SavedReg yes/calleesaved x10 a0 Functionargument x11 a1 Functionargument x12 a2 Functionargument x13 a3 Functionargument x14 a4 Functionargument x15 a5 Functionargument x16 a6 Functionargument x17 a7 Functionargument x18 s2 SavedReg yes /calleesaved x19 s3 SavedReg yes /calleesaved x20 s4 SavedReg yes /calleesaved x21 s5 SavedReg yes /calleesaved x22 s6 SavedReg yes /calleesaved x23 s7 SavedReg yes /calleesaved x24 s8 SavedReg yes /calleesaved x25 s9 SavedReg yes /calleesaved x26 s10 SavedReg yes /calleesaved x27 s11 SavedReg yes /calleesaved x28 t3 Temp x29 t4 Temp x30 t5 Temp x31 t6 Temp

Thecompressedinstructionsaredesignedtoalloweasyaccessto8registers,namelyx8,x9,…x15.Intheabovetable,these8registersarehighlighted.



SavingRegistersAcrossCalls

Byconvention,someregistersaresavedacrossfunctioncalls.Forprogramsthatfollowthestandardcallingconventions(andweassumethatalmostalldo),afunction(say“foo”)willcallanotherfunction(say“bar”).The“caller”isfooandthe“callee”isbar.

Wenormallyusetheterminology“calleesaved”toindicatethataregisterwillbepreservedacrossacall.Forexample,functionbarwillnotmodifytheregister(orifitdoes,itwill2irstsaveandthenrestoretheregister),sothatthecallerfoocanrelyonitsvalueremainingunchangedbybar.

Presumablythecallerfoowillhavelocalvariableswhosevaluesmustbepreservedacrossacalltobar.Presumablythecalleebarwillneedsometemporaryvariablestocompleteitstaskandcomputeitsresult.Iffoocannottrustbartopreserveitslocalvariables,thenfoowillneedtosaveregistersand,afterbarreturns,restoretheirvalues.Thissavingandrestoringisdonetomainmemory(typicallytothelocalstackframe),andsuchmemoryreferencesareveryslow.Iffooreliesonbartodothesavingandrestoringtheregisters,thentheburdenisplacedonbartoperformtheseslowmemorytasks.

Therearetradeoffsandgeneratingoptimalcodeistricky.Itmaybemostef2icientforthecallerfootosavetheregisters,sinceitmaybethecasethatcallerfoocansavearegisterjustonce,makeanumberofcallstobar,andthenrestoretheregister.Iftheregisterhadbeensavedinthecalleebar,therewouldhavebeenmultiplesavesandrestores.Ontheotherhand,itmaybebestforthesavingandrestoringtobeperformedinthecalleebar.Perhapsbarwilloftentakeanexecutionpaththatdoesnotdisrupttheregister,sothesavingandrestoringcanbeavoidedaltogetherinmanycases.

Thetradeoff(whethertoperformthesavinginthecallerorcallee)isadecisionthatmustbemadeforeachoftheregistersinthecallerfoo.Ifbothfunctionsareprogrammedsimultaneouslybyacleverprogrammer,thenhe/shecanmakeadhocdecisionsaboutthesavingandrestoringofresistersinanattempttoproduceoptimalcode.

However,almostallcodeiscompiledandmostcompilerslookateachfunctionincompleteisolation.Effectively,eachfunctionisseparatelycompiled,sothereisno



possibilityofcoordinatingandoptimizingthesavingofregistersbetweenmultiplefunctions.

Instead,theapproachistocreateastandardconventionmandatingwhichregistersaresavedbythecalleebarandwhicharenot.Registersthatarenotrequiredbytheconventiontobepreservedacrossacallaretermed“callersaved”,whichmeansthatitisuptothecallerfootosaveandrestoretheregisters,iftheycontaindatathatmustbepreservedacrossthefunctioncalltobar.

Theconventiondictateswhichregistersarerequiredtobepreservedacrossacall,anditisassumedthatcompilersandprogrammerswillfollowthecallingconventions.Iftheydo,thentheseparatelycompiledorseparatelywrittencodeisinteroperable.

Assumingthattheconventionsaretobefollowedwhencompilingsomefunctionfoo,thecompiler/programmerisfreetousewhicheverregistermakesmostsense.Forexample,avariablethatmustbepreservedacrossfunctioncalls(e.g.,tootherfunctionslikebar)willmostlikelybeplacedina“calleesaved”register.Therefore,thesaveandrestoreinstructionsareavoidedinthecaller’scode,andmightbeavoidedaltogetherifbardoesn’tevenusetheregisterinquestion.Ontheotherhand,thecompiler/programmermightprefertouseatemporary(callersaved)register,especiallyiftherearenocallswithinfoo,orifthevariabledoesnotneedtobesavedacrossanycallsthatdooccur.

Byestablishingacallingconventionsaboutwhichregistersarecalleesavedandwhicharenot,thedesignersaremakingtheimportantdecisionabouttheratiobetweentemporarycalleesavedregisters.Assumingthatthereareenoughregistersineachclass,thennoregistersaving/restoringwillbenecessary.

Butnotethatwithrecursiveroutines,therecanbeacon2lict.Consideralocalvariablethatmustbepreservedacrossarecursivecall.Byplacingthevariableinatemporary(callersaved)registertherecanbeaproblem.Sincethisregisterwillnotbesavedacrosstherecursivecall,thefunctionmustsaveitbeforethecallandrestoreitafterwards.Ontheotherhand,ifitisplacedinacalleesavedregisterwedon’thavetobotherwithsaving/restoringitaroundtherecursivecall.Butsincethefunctionwillbeusingacalleesavedregister,thefunctionitselfmustsavethepreviousvalueuponentryandrestoreitbeforereturn.

Soeitherway,theregistermustbesaveandrestored.Thismakessenseforrecursivefunctionsthathavelocalvariablesthatmustbepreservedacrosstherecursivecall.



Theonlyquestioniswhethertoperformthesavebeforethecallinstructionorafterthecallinstruction,andlikewiseperformtherestoreafterthereturninstructionorbeforethereturninstruction.Thecompilermakesthisdecisionwhenitchoosestoplacethevariableinacallee-orcaller-savedregister.

Notethatcompilers/programmersaresometimesfreetoviolatethecallingconventionsordertoproducesuperiorcode.Thisisonlyfeasiblewhenthecompiler/programmerhasaccesstoboththecalledfunction(bar)andallpotentialcallers(suchasfoo).Insuchcases,thecompiler/programmercanchoosetousetheregistersinanon-standardwaytoimproveef2iciency.Forexample,itmaybethatbarwouldbene2itfromhavingmoretemporaryregistersandnoneofthecallershavedatathatmustbepreservedacrossthecalltobar,sothecompiler/programmercanuseregistersthatwouldotherwisebe“calleesaved”as“callersaved”.Compilerscankeepallthedetailsstraightbutprogrammersshouldbecareful.Manyassemblylanguageprogrammingbugsaretheresultofmistakesinvolvingthepropersavingandrestoringofregisters.

Registerx0-TheZeroRegister(“zero”)

Asmentionedearlier,registerx0isadummyregister.Whenread,itsvalueisalwayszero;whenstoredinto,thedataissimplydiscarded.

Registerx1-TheReturnAddress(“ra”)

Whenafunctioniscalled,thereturnaddressmustbesavedsothattheRETURNinstructioncanreturntothecorrectlocationinthecaller’scode.Inoldercomputers,theCALLinstructionstoredthereturnaddressinmemoryonthestack;inmodernISAs(includingRISC-V)theCALLinstructionsavesthereturnaddressinaregister.TheRETURNinstructionretrievesthevaluefromthisregister.TheRETURNinstructionisthennothingmorethanaJR(jumpthroughregister)instruction.

Byconvention,registerx1isusedforthispurpose.

SavingthereturnaddressinaregisterisfasterthansavingitonthestacksincethememoryWRITEandREADoperationsareavoided.Thisschemeisidealforleaffunctions.(“Leaf”functionsdonotcallotherfunctionsandarethusleafnodesinthe



activation/callingtree.)Manyfunctioncallsaretoleaffunctionssothissavesalotmemoryactivity.

Fornon-leaffunctions,thereturnaddressmustbesavedsomewhereelse.Sincex1willbeusedinsubsequentcalls,thereturnaddressinx1mustbemovedsomewhereelsetoavoidgettingclobberedbeforeitcanbeusedintheRETURNinstruction.Thevaluecaneitherbesavedinacalleesavedregisteroronthestack.(Movingthereturnaddresstoacalleesavedregistermaynecessitateotherregistersavingelsewhere,forcingasavetostackmemoryelsewhere.Forrecursivefunctions,thereisnowaytoavoidusingthein-memorystack;wecanonlyshiftaroundwhenthememoryREADs/WRITEsoccurs.)

Registerx2-TheStackPointer(“sp”)

RISC-Vassumesthatx2containsthestacktoppointerandthatthestackgrowsdownward.Thestackisassumedtoalwaysbequadword(16byte)aligned.Somefunctions/methodsmayoperateentirelyusingregistersandmaynotneedanystackspace.TheRISC-Vdesigntriestofacilitatesuchfunctionssincetheymakenomemoryaccesses.However,manyfunctionswillallocatestackspacetoholdreturnaddresses,localvariables,savedregisters,etc.Thelocalvariables,etc.,arestoredina“stackframe”,whichissometimescalledan“activationrecord.”Recursivefunctionsareknownforneedingstackspaceandallocatinganewstackframeforeachinvocation.

Afunctionthatrequiresstackstoragewillgrowthestack(alwaysbyamultipleof16)bysubtractingfromthestacktoppointersp.Variableswithinthestackframecanbeaddressedusingpositiveoffsetsfromregisterx2.Thisorganizationmotivatesseveralofthecompressedinstructions,inwhichtheoffsetmustbepositiveandisneversignextended.

Registerx2iscallee-saved,whichmeansthatanyfunctionthatgrowsthestacktocreateastackframemustalsoshrinkitbyanequalamountbeforereturning.Inotherwords,afunctionmusthavetheneteffectofpoppingeverythingthatitpushed,leavingthestackasitfoundit,andeffectivelypreservingthevalueofx2.Thus,x2issaidtobe“callee-saved”.



Registerx3-TheGlobalPointer(“gp”)

Globalvariablesarealsocalledstaticvariables,andarecontrastedwithlocalvariables.Globalvariablesaresharedbyallfunctions.Globalvariablesremaininexistencethroughouttheentireprogram,whilelocalvariablescomeintoexistencewhenafunctioniscalledandgooutofexistencewhenthefunctionreturns.

Whilelocalvariablesareoftenkeptinregistersoronthestackatunpredictablelocations,globalvariablesareplacedat2ixedlocationsinmemory.Unfortunatelytheselocationsareoftenfarawayfromthecodethataccessesthem.PC-relativeaddressingcanbeused,butisoftenineffectivesincetherequiredoffsetscanbelarge.Absoluteaddressescanoftenbeproblematictoo,sincetheabsoluteaddressescanalsobelarge,whichwouldnecessitateusingextrainstructions.

ThesolutionsupportedbytheRISC-Vcallingconventionistoplacetheglobalvariablestogetherandinitializearegistertopointtothisarea.Byconvention,registerx3isusedforthis.Thentheindividualvariablescanbeconvenientlyaddressedbyusingasmalloffsetfromtheglobalpointer.

Theglobalpointeristypicallyinitializedearlyintheprogramandneverchanged.So,insomesense,itis“callee-saved”althoughwedidn’tmarkitassuchinthetableabove.

Notallprogramswillusethistechnique,sothisregistermayactuallybeusedforsomethingcompletelydifferent.

Registerx4-TheThreadBasePointer(“tp”)

Inmulti-threadedprograms,eachthreadmayhaveacollectionof“thread-speci2ic”variables,whicharenotlocaltoanyfunction.Instead,theyaresharedbyallcoderunninginthisthread,butareinvisibletootherthreads.

Thread-speci2icvariablesaredistinctfromglobalvariables.Thereisonlyonecopyofeachglobalvariableandeachvariablehasauniqueoffsetfromtheglobalbaseregister,x3/gp.Thethreadbaseregister(tp)hasthesamevalueinallcoderunninginallthreads,whichcausesallcodetoaccessthatsamesetofvariables,regardlessofwhichthreaditisexecutingin.



Inamulti-threadedprogram,eachthreadhasauniqueregisterset.WhentheOSswitchesfromonethreadtoanother,allregistersfromthepreviouslyrunningthreadwillbesavedtosomeareaofmemoryandthenewlyactivethreadwillhaveitsregistersrestoredfromadifferentmemoryarea.Ingeneral,theregistersinonethreadwillbecompletelydifferentfromtheregistersinanotherthread,eventhoughthethreadsarecooperatingandexecutingintheexactsameaddressspace.Forexample,eachthreadwillhaveitsownstack,sothex2(sp)registerineachthreadwillcontainadifferentvalue.

Sinceallthreadssharetheglobalvariables,registerx3(gp)willhavethesamevalueinallthreads.Inaddition,eachthreadcanhaveitsownprivatesetofvariables,whicharecalled“thread-speci2ic”variables.Byconvention,thisblockofvariableswillbepointedtobyregisterx4(tp),soeachthreadwillhaveadifferentvalueinitsx4register.Asinglesectionofcodethataccessesthread-speci2icvariableswillaccessadifferentsetofvariables,dependingonwhichthreaditisrunningin,andthisworksbecauseeachthreadhasauniquevalueinitsx4(tp)register.

Manyapplicationsarenotmulti-threaded,orcontainnothread-speci2icvariables.Insuchcases,itseemstheconventionleavesunspeci2iedhowx4istobeused.

Thethreadpointeristypicallyinitializedearlyintheexecutionofanewthreadandneverchanged.So,insomesense,itis“callee-saved”althoughwedidn’tmarkitassuchinthetableabove.

Registerx5-x7,x28-x31-TempRegisters(“t0-t6”)

Byconvention,thereare7registerswhicharefreetobeusedbyanyfunctionwithoutsavingtheirpreviouscontents.Typicallythecompilerwillplacetheintermediateresultsofcomputationintotempregisters.Insomecases,thecompilermayplacelocalvariablesinthetempregisters.

Thetempregistersare“callersaved”,whichmeansthatthecodewithinafunction“foo”isnotrequiredtosavetheirpreviouscontentsbeforeusingthem.Ontheotherhand,sincethisruleappliestoallfunctions,itmustbeassumedthatcallinganyotherfunction(suchas“bar”)mayresultintheseregistersbeingchanged.Thus,ifthecaller(foo)caresaboutthevalueofatempregister,itmustsavetheregisterbeforecallingbar.



Registerx8,x9,x18-x27-SavedRegisters(“s0-s11”)

Byconvention,thereare12registerswhicharedesignatedascallee-savedregisters.Theconventionisthatafunctionmustnotmodifytheseregisters.Ifsomefunctionwishesorneedstouseoneoftheseregisters,thatfunctionmust2irstsaveitspreviousvalueandthenrestorethatvaluebeforereturning.

Thismakestheseregisterespeciallyusefulinstoringlocalvariablesthatmustbepreservedacrosscallstootherfunctions.Thereisareasonablechancethatthecalledfunctionwillnottouchtheseregistersatall,sosaving/restoringtomainmemoryisoftenavoidedcompletely.

Twooftheseregisters(x8/s0andx9/s1)areespeciallyeasytoaccessusingthe16bitcompressedinstructions.Thereforethecompilershouldprefertouseoneofthesetworegisterswheneverpossible.

Registerx8-FramePointer(“fp”)

Registerx8(whichisalsos0,oneofthecallee-savedregisters)isdesignatedastheframepointerandgivenasecondname,“fp”.

Noteveryfunctionwillneedaframepointerand,forafunctionthatdoesnotneedaframepointer,thisregistercanbeviewedass0,justanothercallee-savedregister.

Tounderstandthepurposeofaframepointer,considertwosortsoffunction.

The2irstsortoffunctionwillhaveasmallnumberoflocalvariables.Furthermore,eachofthelocalvariableswillhavea2ixedsize,knownatcompile-time.Forsuchafunction,thestackframecanbelaidoutbythecompilerandalloffsetswithinthestackframewillbeknown,smallnumbers.Thissortoffunctioncanaccessitsvariablesinthestackframebyusingpositiveoffsetsfromthestacktoppointer(x2).Thissortofafunctionwillnotneedaframepointer.

Thesecondsortoffunctionmayhavealargenumberoflocalvariablesormaycreatealocalvariable(suchasadynamicarray)whosesizeisnotknownuntilruntimewhenthefunctionisactuallycalled.Insuchafunction,thecompilerwillnotknow



thelayoutoftheframe.Thelocalvariablescannotbeaccessedusingsmallpositiveoffsetsfromthestackpointerregister,x2.

Theideawithusingaframepointeristhis:Uponentrytothefunction,theframepointerregisterwillbesettoaknown,2ixedvalue.Typically,thiswouldbetheoldvalueofthestackpointerbeforethenewstackframeisallocated.Then,thestackframeisallocatedinacoupleofsteps.Firstthe2ixedsizedvariablesareallocatedbyadjustingthestacktoppointer.Thenthedynamicallysizedvariablesareallocated,bydecrementingthestacktoppointerbyvaluesdeterminedatruntime.Afterthisinitializationandcreationoflocalvariables,theframepointerisleftpointingtooneend(thetop,higherend)oftheframe(ormorepreciselytothe2irstbyteofthepreviousframe)andthestackpointerisleftpointingtothebottom,lowerendoftheframe.Offsetsrelativetothestacktoppointerareproblematic,sincethecompilerwillnotknowexactlyhowmuchthestackpointerwasadjusted.Instead,thelocalvariableswillbeaccessedusingnegativeoffsetsrelativetotheframepointer.

(Inanotherapproach,thelocalvariablescanbepushedontothestack2irst.Thentheframepointerisinitializedtothecurrentstacktop.Finally,thedynamicallysizedvariablesarepushedontothestack.Thisallowsthelocalvariablestobeaccessedusingapositiveoffsettotheframepointer.IntheRISC-Vdesign,positiveoffsetsarepreferredinthe16bitcompressedinstructionformat,sothismightbeabetterapproach.)

Theframepointeriscalleesaved.Thismeansthatthevalueoftheregistermustberestoredbeforethefunctionreturns.Typically,thefunctionprologuewillstorethepreviousvalueofthefpregisterintheframeandthefunctionepiloguewillrestorefprightbeforereturning.

Registerx10-x17-ArgumentRegisters(“a0-a7”)

Typically,argumentstofunctionsarepassedinregistersandthisgroupofregistersismeanttobeusedforthatpurpose.

Also,avaluethatistobereturnedfromafunctionwillusuallybereturnedinaregister.



Thereturnvaluewillbereturnedinx10(a0),orifthevalueistoolargeto2itintoasingleregister,itwillbereturnedintheregisterpairx10-x11(a0-a1).Thisisthesameregister(s)thatisusedtopassinthe2irstargument.

CallingConventionsforArguments

Argumentsarenormallypassedtoafunctioninregistersandthecallingconventiondetailsexactlyhowargumentvaluesaretobeplacedinregisters.

Iftherearetoomanyarguments,thenthe2irstargumentsarepassedintheregisters,andtheremainderarepassedonthestack.Also,ifanindividualargumentistoolarge,itwillbepassedinmemoryandtheregisterwillcontainapointertothismemoryregion.

Ifthereturnedvalueissmallenough,itwillbereturnedina0-a1,otherwiseitisreturnedinmemory.

Here,wesummarizethecallingconventionsforRV32,the32bitarchitecture.

• Floatsarepassedinthe2loatingpointregisters,iftheyexist.Otherwise,2loatsarepassedinthegeneralpurposeregisters,justlikeints.

• Othervaluesbrides2loats(e.g.,ints,chars,bools,pointers,structs)arepassedinthe8generalpurposeregistersa0-a7,aslongastheyarenolargerthan64bits.So,ifthevaluewill2itintooneortworegisters,theywillbepassedinregisters.

• Valuessmallerthan32bitsaresign-extendedto32bitsandpassedinaregister.[Apparently,theyare“widenedaccordingtothesignoftheirtype,thensignextended”;myinterpretationmaybeincorrect.???]

• Valuesof32bitsarepassedinasingleregister.

• Valuesofupto64bitsinsizearepassedinaregisterpair.Theleastsigni2icant32-bitswillbeplacedintheregisterwiththesmallernumber,inlittleendianstyle.



• Valueslargerthan64bitsarealwayspassed“byreference”.Thismeansthatthevalueisplacedinaregionofmemory.Thisregioniseitherinthecaller’sstackframeorispushedontothestackatthetimeofthefunctioncall.Theaddresstothisregionappearsinthelistofargumentsandispassedinaregister,inplaceofthevalue.[Itseemsthatanaddressisalwaysused,regardlessofwhetherornotthereisroomintheregistersfortheaddress.]

• Iftherearenotenoughregisters,onlythe2irstfewargumentswillbepassedinregistersandtheremainderwillbepassedinmemory.Theextra,remainingargumentswillpushedontothestack,sotheywillbeatknownoffsetsfromthestacktoppointeruponentrytothecalledfunction.Thus,theycanbeeasilyaccessedusingpositiveoffsetsfromtheframepointerregister(fp).

• Iftherearenotquiteenoughregisters,a64bitvaluecanbesplitintotwo32-bitparts,withtheleastsigni2icantwordpassedinaregisterandthemostsigni2icantwordpassedinmemory.

• Followingthe“C”languageconvention,argumentsarealways“passedbyvalue”,whichmeanstheyareeffectivelylocalvariableswhichcanbemodi2ied;afterthefunctionreturns,themodi2iedvalueislost.Forexample,structspassedasargumentswillbecopiedandanychangestothestuccowillbelost.Ifthisisnotwhattheprogrammerwants,thenhe/shecanexplicitlypassanaddress,inwhichcasethepointeris“passedbyvalue”,andthevaluepointedtoiseffectively“passedbyreference”.

• Allvaluespassedinmemory(e.g.,argumentsinthestackframe)arealigned,accordingtothesizeofthevalue.

• Ifthereturnedvalueis64bitsorless,itwillbereturnedinregistera0-a1(orthe2loatingpointregisters,ifapplicable).

• Ifthereturnedvalueislargerthan64bits,thecallerwillallocatememoryspaceforthereturnedvalue(inthecaller’sstackframe)andwillpassapointertothisblockofmemory.Thepointerwillbepassedasthe2irstargument,implicitlyinsertedinfrontoftheotherarguments.

Sixoftheseregisters(x10-x15,i.e.a0-a5)areespeciallyeasytoaccessusingthe16bitcompressedinstructions.Therefore,themajorityoffunctions(whichhave6orfewerarguments)canbecalledusingregistersalone.



Commentary:Theargumentregistersareotherwisetreatedliketemporaries,inthattheyarecallersaved,i.e.,notpreservedacrosscalls.Ifanargumentregisterisnotusedforpassingarguments,thenititisfreetobeusedasatemporaryworkregister.

Itisunclearwhyitisnecessarytodifferentiatebetweenthesetwoclasses.Whycan’ttempregistersbeusedtopassargumentsincaseswheretherearealotofarguments?Ifafunctionwithmanyargumentstrulyneedsatemporaryregister,thenitcansavesomeofitsargumentsinitsstackframe.Thisisnoextrawork,sincethecallerwouldhavehadtosavethevaluesinitsframeotherwise.Anditwillprobablybethecasethatthecalleefunctioncanbeabitsmarteraboutwhathastobesavedinmemorythanthecaller.

FloatingPointRegisters

Herearethe2loatingpointregisters,withtheirintendedusage.The16bitcompressedinstructionsfavorregistersf8-f15andthesearehighlightedbelow.



Other SavedAcross Name Name Description Calls f0 ft0 Temp f1 ft1 Temp f2 ft2 Temp f3 ft3 Temp f4 ft4 Temp f5 ft5 Temp f6 ft6 Temp

f7 ft7 Temp f8 fs0 SavedReg yes/calleesaved f9 fs1 SavedReg yes/calleesaved f10 fa0 Functionargument f11 fa1 Functionargument f12 fa2 Functionargument f13 fa3 Functionargument f14 fa4 Functionargument f15 fa5 Functionargument f16 fa6 Functionargument f17 fa7 Functionargument f18 fs2 SavedReg yes /calleesaved f19 fs3 SavedReg yes /calleesaved f20 fs4 SavedReg yes /calleesaved f21 fs5 SavedReg yes /calleesaved f22 fs6 SavedReg yes /calleesaved f23 fs7 SavedReg yes /calleesaved f24 fs8 SavedReg yes /calleesaved f25 fs9 SavedReg yes /calleesaved f26 fs10 SavedReg yes /calleesaved f27 fs11 SavedReg yes /calleesaved f28 ft8 Temp f29 ft9 Temp f30 ft10 Temp f31 ft11 Temp


Chapter6:CompressedInstructions

CompressedInstructions

Normally,RISC-Vinstructionsare32bitsinlength.However,the“C”(CompressedInstructions)extensionaddsinstructionswhichare16bitslong.

The“C”extensionmerelyaddsinstructionstotheISA;alltheother32bitinstructionsremainvalid,unchangedandfullyfunctional.

Each16bitinstructionisanabbreviationforalonger32bitinstruction.Thus,nonewfunctionalityisadded.Thepurposeofthe16bitinstructionsistoreducecodesize:byreplacinglongerinstructionswiththeirequivalentshorterversions,codesizeisreduced.

Notall32bitinstructionshaveanequivalent16bitcounterpart.ThegoaloftheRISC-Vdesignistoprovide16bitinstructionsforthemostpopularandfrequentlyoccurring32bitinstructions,therebyreducingthecodesizeasmuchaspossible.Ablockofcodeinwhichall32bitinstructionshappentohave16bitcounterpartscanbereducedtohalfitssize.Buttypicalcodesequenceswillcontainsomeinstructionswhichhaveno16bitcounterparts,sothereductioninsizewillnotbeasgreat.Thedesignersestimatean25%reductionincodesize.

InotherISAdesignsthereisa“compressedmode”:Whentheprocessorisplacedincompressedmode,theshorterinstructionsareexecuted.RISC-Vdoesnotworkthisway.TheRISC-Vdesignhascarefullyencodedtheinstructionssothatthelengthoftheinstructioncanbedeterminedandthebytesfetchedfrommemorycanbeunambiguouslyinterpretedaseither16bitor32bitinstructions.Thus,theprogrammercanfreelyintermixlongandshortinstructions.



InstructionFormats

Thereareseveralinstructionformatsusedforthecompressedinstructions.Inlistingtheformatsbelow,wewillusetheseabbreviationsfor2ields:

------ OpcodeBitsDDDDD RegDDDD RegD(x8..x15only)

aaaaa Reg1aaa Reg1(x8..x15only)bbbbb Reg2bbb Reg2(x8..x15only)VVVVV ImmediateValueXXXXX Offset

Herearethemaininstructionforms.(Eachcharacterindicatesasinglebit.)

RegisterFormat----aaaaabbbbb------DDDDDbbbbb--

ImmediateFormat---VaaaaaVVVVV-----VDDDDDVVVVV--

WideImmediateFormat---VVVVVVVVDDD--

Stack-relativeStoreFormat---VVVVVVbbbbb--

LoadFormat---VVVaaaVVDDD--

StoreFormat---VVVaaaVVbbb--

BranchFormat---XXXaaaXXXXX--

JumpFormat---XXXXXXXXXXX--

Theseformatsareschematic.Fortheindividualinstructionsdescribedinthefollowingsections,wegivetheexactencoding,includingtheactualopcodebits,aswellasotherencodingconstraints.



LoadInstructions

C.LW-LoadWord

GeneralForm:C.LW RegD,Immed-6(Reg1)

Description:Movea32bitvaluefrommemoryintoregisterRegD.Toformtheaddress,theImmed-6valueiszero-extended,multipliedby4,andaddedasapositiveoffsettothebaseaddressinregisterReg1.

SincetheImmed-6valueisscaledby4,thisgivesaneffectiverangeof0..252,inmultiplesof4.

RegistersRegDandReg1arerestrictedtox8..x15.Encoding:

010VVVaaaVVDDD00WhereDDD=RegD.

Whereaaa=Reg1.WhereVVVVVV=Immed-6.

Availability:Requires“C”(CompressedInstruction)extension.

Equivalent32BitInstruction:LW RegD,Offset(Reg1)

C.LW-LoadDoubleword

GeneralForm:C.LD RegD,Immed-6(Reg1)






011VVVaaaVVDDD00WhereDDD=RegD.

Whereaaa=Reg1.WhereVVVVVV=Immed-6.

Availability:Requires“C”(CompressedInstruction)extension.OnlyavailableforRV64andRV128.

Equivalent32BitInstruction:LD RegD,Offset(Reg1)

C.LQ-LoadQuadword

GeneralForm:C.LQ RegD,Immed-6(Reg1)


SincetheImmed-6valueisscaledby16,thisgivesaneffectiverangeof0..1,008,inmultiplesof16.


001VVVaaaVVDDD00 WhereaaadenotesReg1.

WhereVVVVVV=Immed-6.Availability:

Requires“C”(CompressedInstruction)extension.OnlyavailableforRV128.

Equivalent32BitInstruction:LQ RegD,Offset(Reg1)



C.FLW-LoadSingleFloat

GeneralForm:C.FLW FRegD,Immed-6(Reg1)

Description:Movea32bitvaluefrommemoryinto2loatingpointregisterFRegD.Toformtheaddress,theImmed-6valueiszero-extended,multipliedby4,andaddedasapositiveoffsettothebaseaddressingeneralpurposeregisterReg1.


RegisterFRegDisrestrictedtof8..f15.

RegisterReg1isrestrictedtox8..x15.Encoding:



Requires“C”(CompressedInstruction)extension.Requires“F”(SinglePrecisionFloatingPoint)extension.OnlyavailableforRV32.

Equivalent32BitInstruction:FLW FRegD,Offset(Reg1)

C.FLD-LoadDoubleFloat

GeneralForm:C.FLD FRegD,Immed-6(Reg1)

Description:Movea64bitvaluefrommemoryinto2loatingpointregisterFRegD.Toformtheaddress,theImmed-6valueiszero-extended,multipliedby8,andaddedasapositiveoffsettothebaseaddressingeneralpurposeregisterReg1.


RegisterFRegDisrestrictedtof8..f15.



RegisterReg1isrestrictedtox8..x15.Encoding:



Requires“C”(CompressedInstruction)extension.Requires“D”(DoublePrecisionFloatingPoint)extension.OnlyavailableforRV32andRV64.

Equivalent32BitInstruction:FLD FRegD,Offset(Reg1)

C.LWSP-LoadWordfromStackFrame

GeneralForm:C.LWSP RegD,Immed-6

Description:Movea32bitvaluefrommemoryintoregisterRegD.Toformtheaddress,theImmed-6valueiszero-extended,multipliedby4,andaddedtothestackpointer(x2).


Thedestinationregistercanbeanyofthe32registers,exceptx0.Encoding:

010VDDDDDVVVVV10WhereDDDDD=RegDandcannotbex0=00000.WhereVVVVVV=Immed-6.


Equivalent32BitInstruction:LW RegD,Offset(x2)



C.LDSP-LoadDoublewordfromStackFrame

GeneralForm:C.LDSP RegD,Immed-6



Thedestinationregistercanbeanyofthe32registers,exceptx0.Encoding:

011VDDDDDVVVVV10WhereDDDDD=RegDandcannotbex0=00000.WhereVVVVVV=Immed-6.


Equivalent32BitInstruction:LD RegD,Offset(x2)

C.LQSP-LoadQuadwordfromStackFrame

GeneralForm:C.LQSP RegD,Immed-6


SincetheImmed-6valueisscaledby16,thisgivesaneffectiverangeof0..1008inmultiplesof16.

Thedestinationregistercanbeanyofthe32registers,exceptx0.



Encoding:001VDDDDDVVVVV10

WhereDDDDD=RegDandcannotbex0=00000.WhereVVVVVV=Immed-6.

Availability:Requires“C”(CompressedInstruction)extension.OnlyavailableforRV128.

Equivalent32BitInstruction:LQ RegD,Offset(x2)

C.FLWSP-LoadSingleFloatfromStackFrame

GeneralForm:C.FLWSP FRegD,Immed-6

Description:Movea32bitvaluefrommemoryinto2loatingpointregisterFRegD.Toformtheaddress,theImmed-6valueiszero-extended,multipliedby4,andaddedtothestackpointer(x2).


Thedestinationregistercanbeanyofthe322loatingpointregisters.Encoding:

011VDDDDDVVVVV10WhereDDDDD=FRegD.WhereVVVVVV=Immed-6.

Availability:Requires“C”(CompressedInstruction)extension.Requires“F”(SinglePrecisionFloatingPoint)extension.OnlyavailableforRV32.

Equivalent32BitInstruction:FLW FRegD,Offset(x2)



C.FLDSP-LoadDoubleFloatfromStackFrame

GeneralForm:C.FLDSP FRegD,Immed-6

Description:Movea64bitvaluefrommemoryinto2loatingpointregisterFRegD.Toformtheaddress,theImmed-6valueiszero-extended,multipliedby8,andaddedtothestackpointer(x2).


Thedestinationregistercanbeanyofthe322loatingpointregisters.Encoding:

001VDDDDDVVVVV10WhereDDDDD=FRegD.WhereVVVVVV=Immed-6.

Availability:Requires“C”(CompressedInstruction)extension.Requires“D”(DoublePrecisionFloatingPoint)extension.OnlyavailableforRV32andRV64.

Equivalent32BitInstruction:FLD FRegD,Offset(x2)

StoreInstructions

C.SW-StoreWord

GeneralForm:C.SW Reg2,Immed-6(Reg1)

Description:Movea32bitvaluefromregisterReg2tomemory.Toformtheaddress,theImmed-6valueiszero-extended,multipliedby4,andaddedasapositiveoffsettothebaseaddressinregisterReg1.




RegistersReg1andReg2arerestrictedtox8..x15.Encoding:

110VVVaaaVVbbb00 WhereaaadenotesReg1andbbbdenotesReg2.


Requires“C”(CompressedInstruction)extension.Equivalent32BitInstruction:

SW Reg2,Offset(Reg1)

C.SD-StoreDoubleword

GeneralForm:C.SD Reg2,Immed-6(Reg1)






Requires“C”(CompressedInstruction)extension.OnlyavailableforRV64andRV128.

Equivalent32BitInstruction:SD Reg2,Offset(Reg1)



C.SQ-StoreQuadword

GeneralForm:C.SQ Reg2,Immed-6(Reg1)







Equivalent32BitInstruction:SQ Reg2,Offset(Reg1)

C.FSW-StoreSingleFloat

GeneralForm:C.FSW FReg2,Immed-6(Reg1)

Description:Movea32bitvaluefrom2loatingpointregisterFReg2tomemory.Toformtheaddress,theImmed-6valueiszero-extended,multipliedby4,andaddedasapositiveoffsettothebaseaddressingeneralpurposeregisterReg1.


RegisterReg1isrestrictedtox8..x15.RegisterFReg2isrestrictedtof8..f15.



Encoding:111VVVaaaVVbbb00

WhereaaadenotesReg1andbbbdenotesFReg2.WhereVVVVVV=Immed-6.

Availability:Requires“C”(CompressedInstruction)extension.Requires“F”(SinglePrecisionFloatingPoint)extension.OnlyavailableforRV32.

Equivalent32BitInstruction:FSW FReg2,Offset(Reg1)

C.FSD-StoreDoubleFloat

GeneralForm:C.FSD FReg2,Immed-6(Reg1)

Description:Movea64bitvaluefrom2loatingpointregisterFReg2tomemory.Toformtheaddress,theImmed-6valueiszero-extended,multipliedby8,andaddedasapositiveoffsettothebaseaddressingeneralpurposeregisterReg1.


RegisterReg1isrestrictedtox8..x15.RegisterFReg2isrestrictedtof8..f15.Encoding:

101VVVaaaVVbbb00 WhereaaadenotesReg1andbbbdenotesFReg2.


Requires“C”(CompressedInstruction)extension.Requires“D”(DoublePrecisionFloatingPoint)extension.OnlyavailableforRV32andRV64.

Equivalent32BitInstruction:FSD FReg2,Offset(Reg1)



C.SWSP-StoreWordtoStackFrame

GeneralForm:C.SWSP Reg2,Immed-6

Description:Movea32bitvaluefromregisterReg2tomemory.Toformtheaddress,theImmed-6valueiszero-extended,multipliedby4,andaddedtothestackpointer(x2).


Thesourceregistercanbeanyofthe32generalpurposeregisters.Encoding:

110VVVVVVbbbbb10 WherebbbbbdenotesReg2.



SW Reg2,Offset(x2)

C.SDSP-StoreDoublewordtoStackFrame

GeneralForm:C.SDSP Reg2,Immed-6




111VVVVVVbbbbb10



WherebbbbbdenotesReg2.WhereVVVVVV=Immed-6.


Equivalent32BitInstruction:SD Reg2,Offset(x2)

C.SQSP-StoreQuadwordtoStackFrame

GeneralForm:C.SQSP Reg2,Immed-6




101VVVVVVbbbbb10 WherebbbbbdenotesReg2.



Equivalent32BitInstruction:SQ Reg2,Offset(x2)

C.FSWSP-StoreSingleFloattoStackFrame

GeneralForm:C.FSWSP FReg2,Immed-6



Description:Movea32bitvaluefromregisterFReg2tomemory.Toformtheaddress,theImmed-6valueiszero-extended,multipliedby4,andaddedtothestackpointer(x2).


Thesourceregistercanbeanyofthe322loatingpointregisters.Encoding:

111VVVVVVbbbbb10 WherebbbbbdenotesFReg2.


Requires“C”(CompressedInstruction)extension.Requires“F”(SinglePrecisionFloatingPoint)extension.OnlyavailableforRV32.

Equivalent32BitInstruction:FSW FReg2,Offset(x2)

C.FSDSP-StoreDoubleFloattoStackFrame

GeneralForm:C.FSDSP FReg2,Immed-6

Description:Movea64bitvaluefromregisterFReg2tomemory.Toformtheaddress,theImmed-6valueiszero-extended,multipliedby8,andaddedtothestackpointer(x2).


Thesourceregistercanbeanyofthe322loatingpointregisters.Encoding:

101VVVVVVbbbbb10 WherebbbbbdenotesFReg2.


Requires“C”(CompressedInstruction)extension.



Requires“D”(DoublePrecisionFloatingPoint)extension.OnlyavailableforRV32andRV64.

Equivalent32BitInstruction:FSD FReg2,Offset(x2)

Jump,Call,andConditionalBranchInstructions

C.J-Jump(PC-relative)

GeneralForm:C.J Immed-11

Description:AjumpistakenusingaPC-relativeoffset.The11bitimmediatevalueismultipliedbytwo(sinceinstructionsmustbehalfwordaligned),sign-extended,andaddedtotheprogramcounter(PC),togivethetargetaddress.

SincetheImmed-11valueisscaledby2,thisgivesaneffectiverangeof-2,048..+2,046,inmultiplesof2.

Encoding:101XXXXXXXXXXX01

WhereXXXXXXXXXXXdenotestheImmed-11offset.Availability:


JAL x0,Offset

C.JAL-JumpandLink/Call(PC-relative)

GeneralForm:C.JAL Immed-11

Description:AfunctioncallistakenusingaPC-relativeoffset.The11bitimmediatevalueismultipliedbytwo(sinceinstructionsmustbehalfwordaligned),sign-extended,andaddedtotheprogramcounter(PC),togivethetargetaddress.



SincetheImmed-11valueisscaledby2,thisgivesaneffectiverangeof-2,048..+2,046,inmultiplesof2.

Thereturnaddress(i.e.,theaddressoftheinstructionfollowingtheC.JAL)issavedinregisterx1(alsoknownasregisterra).

Encoding:001XXXXXXXXXXX01

WhereXXXXXXXXXXXdenotestheImmed-11offset.Availability:


Equivalent32BitInstruction:JAL x1,Offset

C.JR-JumpRegister

GeneralForm:C.JR Reg1

Description:AjumpistakentotheaddressinregisterReg1.

RegisterReg1canbex1,x2,…x31,i.e.,anygeneralpurposeregisterexceptx0.Encoding:

1000aaaaa0000010 WhereaaaaadenotesReg1andcannotbex0=00000.Availability:


JAL x0,Reg1,0

C.JALR-JumpRegisterandLink/CallGeneralForm:

C.JALR Reg1 Description:

AfunctioncallistakentotheaddressinregisterReg1.

Thereturnaddress(i.e.,theaddressoftheinstructionfollowingtheC.JALR)issavedinregisterx1(alsoknownasregisterra).



RegisterReg1canbex1,x2,…x31,i.e.,anygeneralpurposeregisterexceptx0.Encoding:

1001aaaaa0000010 WhereaaaaadenotesReg1andcannotbex0=00000.Availability:


JALR x1,Reg1,0

C.BEQZ-BranchifZero(PC-relative)

GeneralForm:C.BEQZ Reg1,Immed-8

Description:AconditionalbranchistakenusingaPC-relativeoffset.ThebranchistakenifthevalueinregisterReg1iszero.

The8bitimmediatevalueismultipliedbytwo(sinceinstructionsmustbehalfwordaligned),sign-extended,andaddedtotheprogramcounter(PC),togivethetargetaddress.

SincetheImmed-8valueisscaledby2,thisgivesaneffectiverangeof-256..+254,inmultiplesof2.

Encoding:110XXXaaaXXXXX01

WhereXXXXXXXXdenotestheImmed-8offset. WhereaaadenotesReg1.Availability:


BEQ Reg1,x0,Offset

C.BNEQZ-BranchifNotZero(PC-relative)

GeneralForm:C.BNEQZ Reg1,Immed-8



Description:AconditionalbranchistakenusingaPC-relativeoffset.ThebranchistakenifthevalueinregisterReg1isnotzero.

The8bitimmediatevalueismultipliedbytwo(sinceinstructionsmustbehalfwordaligned),sign-extended,andaddedtotheprogramcounter(PC),togivethetargetaddress.

SincetheImmed-8valueisscaledby2,thisgivesaneffectiverangeof-256..+254,inmultiplesof2.

Encoding:111XXXaaaXXXXX01

WhereXXXXXXXXdenotestheImmed-8offset. WhereaaadenotesReg1.Availability:


BNE Reg1,x0,Offset

LoadConstantintoaRegister

C.LI-LoadImmediate

GeneralForm:C.LI RegD,Immed-6

Description:Loadanimmediatevalueintherange-32..+31intoregisterRegD.

Thetargetregistercanbex1,x2,…x31,i.e.,anygeneralpurposeregisterexceptx0.

The6bitimmediatevalueissign-extended.Encoding:

010VDDDDDVVVVV01 WhereVVVVVVdenotestheImmed-6value. WhereDDDDDdenotesRegD,andcannotbex0=00000.




Equivalent32BitInstruction:ADDI RegD,x0,Value

C.LUI-LoadUpperImmediate

GeneralForm:C.LUI RegD,Immed-6

Description:ThisinstructionisashortformfortheLUIinstruction,butwitharestrictedrangeofvalues.Thisinstructionloadsbits[17:12]ofthedestinationregister.Theupperbits[…:18]are2illedwiththesignextension.Thelower12bits[11:0]are2illedwithzeros.

Toputitanotherway,theimmediatevalueissignextended,multipliedby4,096,andloadedintothedestinationregister.SincetheImmed-6valueisscaledby212=4,096,thisgivesaneffectiverangeofvaluesof-131,040..+126,976,inmultiplesof4,096.

Thetargetregistercanbeanygeneralpurposeregisterexceptx0orx2.(Registerx2isnormallyusedforthestackpointer.)


WhereVVVVVVdenotestheImmed-6value. WhereDDDDDdenotesRegD,andcannotbex0=00000orx2=00010.Availability:


LUI RegD,Value



Arithmetic,Shift,andLogicInstructions

C.ADDI-AddImmediate

GeneralForm:C.ADDI RegD,Immed-6

Description:Addsanimmediatevalueintherange-32..+31toregisterRegD.

Theregistercanbex1,x2,…x31,i.e.,anygeneralpurposeregisterexceptx0.

The6bitimmediatevalueissign-extended.Encoding:

000VDDDDDVVVVV01 WhereVVVVVVdenotestheImmed-6valueandcannotbezero. WhereDDDDDdenotesRegD,andcannotbex0=00000.Comment: IfVVVVVV=000000andDDDDD=00000,thisencodingisidenticaltoC.NOP. IfVVVVVV≠000000andDDDDD=00000,thisencodingcanbeahint. IfVVVVVV=000000andDDDDD≠00000,thisencodingisreserved.Availability:


ADDI RegD,RegD,Value

C.ADDIW-AddImmediate(Word)

GeneralForm:C.ADDIW RegD,Immed-6

Description:Addsanimmediatevalueintherange-32..+31toregisterRegD.

Theregistercanbex1,x2,…x31,i.e.,anygeneralpurposeregisterexceptx0.

The6bitimmediatevalueissign-extended.

ThepreviouslydescribedC.ADDIinstructionperformstheadditionineither32,64,or128bits,asdeterminedbythearchitecture,RV32,RV64,orRV128.



ThisinstructionC.ADDIWwilltruncatetheresultto32bits,thensignextendedthe32bitstothefullregistersize.

Avalueofzeroisallowed.Thiswillsimplyforcealargervalueintoa32bitvalue,signextendingtothefullregistersize.


WhereVVVVVVdenotestheImmed-6value.(Zeroisallowed). WhereDDDDDdenotesRegD,andcannotbex0=00000.Comment: IfDDDDD=00000,thisencodingcanbeahint.Availability:

Requires“C”(CompressedInstruction)extension.OnlyavailableforRV64andRV128.

Equivalent32BitInstruction:ADDIW RegD,RegD,Value

C.ADDI16SP-AddImmediatetoStackPointer

GeneralForm:C.ADDI16SP Immed-6

Description:Thisinstructionisusedtogroworshrinkthestackbyadjustingthestackpointerregister.

The6bitimmediatevalueismultipliedby16,sign-extended,andaddedtothestackpointerregister(x2).

Itisassumedthatthestackpointerisalwaysquadwordaligned,i.e.,amultipleof16.SincetheImmed-6valueisscaledby16,thisgivesaneffectiveadjustmentvalueof-512..+496,inmultiplesof16.

Avalueofzeroisnotallowed,sinceadjustingthestackpointerbyzeroispointless.

Encoding:011V00010VVVVV01

WhereVVVVVVdenotestheImmed-6value,andcannotbe000000.Availability:

Requires“C”(CompressedInstruction)extension.



Equivalent32BitInstruction:ADDI x2,x2,Value

C.ADDI4SPN-AddImmediatetoStackPointer

GeneralForm:C.ADDI14SPN RegD,Immed-8

Description:Thisinstructionisusedtoloadtheaddressofalocalstackvariableintoaregister.Weassumethevariableislocatedwiththestackframe,thatthestackgrowsdownward,andthatthestackpointerpointstothelowestaddress.Therefore,alllocalvariables(locatedwithinthestackframe)willbeaccessedwithpositiveoffsetsfromregisterx2(thestackpointer).

The8bitimmediatevalueismultipliedby4,zero-extended,andaddedtothestackpointerregister(x2),andmovedintothedestinationregisterRegD.

SincetheImmed-8valueisscaledby4,thisgivesanoffsetintherange-512..+508,inmultiplesof4.

Thedestinationregisterislimitedtox8,x9,…x15.

Avalueofzeroisnotallowed,sinceaMVinstructioncanbeusedinstead.Encoding:

000VVVVVVVVDDD00 WhereVVVVVVVVdenotesImmed-8,andcannotbe00000000. WhereDDDdenotesRegD.Availability:

Requires“C”(CompressedInstruction)extension.AvailableonlyforRV32andRV64.

Equivalent32BitInstruction:ADDI RegD,x2,Value

C.SLLI-ShiftLeftLogical(Immediate)

GeneralForm:C.SLLI RegD,Immed-6



Description:ThisinstructionshiftsthevalueinaregisterRegDleftbyanamountspeci2iedbytheImmed-62ield.

ForRV32(32-bitarchitectures),theshiftamountmustbe1..31(i.e.,000001-011111).


ForRV128(128-bitarchitectures),theshiftamountmaybe1..64.(Shiftamountsof1..63areencodedas000001-111111;ashiftamount64isencodedas000000.)

Thedestinationregistercanbex1,x2,…x31,i.e.,anygeneralpurposeregisterexceptx0.


WhereVVVVVVdenotesImmed-6.Seerestrictionsinthedescription. WhereDDDDDdenotesRegD,andcannotbex0=00000.Availability:


SLLI RegD,RegD,Value

C.SRLI-ShiftRightLogical(Immediate)

GeneralForm:C.SRLI RegD,Immed-6

Description:ThisinstructionshiftsthevalueinaregisterRegDrightbyanamountspeci2iedbytheImmed-62ield.Thisisa“logicalshift,i.e.,zerosareshiftedinfromtheleft.





ForRV128(128-bitarchitectures),thepossibleshiftamountsare1..31,64,and96-127.[Myreadingofthespecsuggeststhattheencodingoftheshiftamountisthis:000001-011111means1-31.000000means64.100000-111111isinterpretedasasignedvalueintherange-32..-1,whichwecancallX.Theshiftamountisrightby128-Xbits.Thisgivesarangeof96..127,resp.]

Thedestinationregistercanbex8,x9,…x15.Encoding:

100V00DDDVVVVV01 WhereVVVVVVdenotesImmed-6.Seerestrictionsinthedescription. WhereDDDdenotesRegD.Availability:


SRLI RegD,RegD,Value

C.SRAI-ShiftRightArithmetic(Immediate)

GeneralForm:C.SRAI RegD,Immed-6

Description:ThisinstructionshiftsthevalueinaregisterRegDrightbyanamountspeci2iedbytheImmed-62ield.Thisisa“arithmeticshift,i.e.,thesignbitisrepeatedlyshiftedinfromtheleft.



ForRV128(128-bitarchitectures),thepossibleshiftamountsare1..31,64,and96-127.[Myreadingofthespecsuggeststhattheencodingoftheshiftamountisthis:000001-011111means1-31.000000means64.100000-111111isinterpretedasasignedvalueintherange-32..-1,whichwecancallX.Theshiftamountisrightby128-Xbits.Thisgivesarangeof96..127,resp.]



Thedestinationregistercanbex8,x9,…x15.Encoding:

100V01DDDVVVVV01 WhereVVVVVVdenotesImmed-6.Seerestrictionsinthedescription. WhereDDDdenotesRegD.Availability:


SRAI RegD,RegD,Value

C.ANDI-LogicalAND(Immediate)

GeneralForm:C.ANDI RegD,Immed-6

Description:ThisinstructionperformsthelogicalANDoperationwiththevalueinaregisterandwritestheresulttothatsameregister.

TheImmed-6valueissign-extended.

TheregisterRegDcanbex8,x9,…x15.Encoding:

100V10DDDVVVVV01 WhereVVVVVVdenotesImmed-6. WhereDDDdenotesRegD.Availability:


ANDI RegD,RegD,Value

C.MV-MoveRegistertoRegister

GeneralForm:C.MV RegD,Reg2

Description:ThisinstructionmovesavaluefromReg2toRegD.



TheregistersRegDandReg2canbex1,x2,…x31,i.e.,anygeneralpurposeregisterexceptx0.

Encoding:1000DDDDDbbbbb10

WhereDDDDDdenotesRegD. WherebbbbbdenotesReg2.Availability:


ADD RegD,x0,Reg2

C.ADD-AddRegistertoRegister

GeneralForm:C.ADD RegD,Reg2

Description:ThisinstructionaddsthevaluesinRegDandReg2andplacestheresultinRegD.

TheregistersRegDandReg2canbex1,x2,…x31,i.e.,anygeneralpurposeregisterexceptx0.

Encoding:1001DDDDDbbbbb10

WhereDDDDDdenotesRegD. WherebbbbbdenotesReg2.Availability:


ADD RegD,RegD,Reg2

C.AND-AddRegistertoRegister

GeneralForm:C.AND RegD,Reg2

Description:ThisinstructionlogicallyANDsthevaluesinRegDandReg2andplacestheresultinRegD.



TheregistersRegDandReg2canbex8,x9,…x15.Encoding:

100011DDD11bbb01 WhereDDDdenotesRegD. WherebbbdenotesReg2.Availability:


AND RegD,RegD,Reg2

C.OR-AddRegistertoRegister

GeneralForm:C.OR RegD,Reg2

Description:ThisinstructionlogicallyORsthevaluesinRegDandReg2andplacestheresultinRegD.




OR RegD,RegD,Reg2

C.XOR-AddRegistertoRegister

GeneralForm:C.XOR RegD,Reg2

Description:ThisinstructionlogicallyXORsthevaluesinRegDandReg2andplacestheresultinRegD.

TheregistersRegDandReg2canbex8,x9,…x15.



Encoding:100011DDD01bbb01

WhereDDDdenotesRegD. WherebbbdenotesReg2.Availability:


XOR RegD,RegD,Reg2

C.SUB-AddRegistertoRegister

GeneralForm:C.SUB RegD,Reg2

Description:ThisinstructionsubtractsthevalueinReg2fromthevalueinRegDandplacestheresultinRegD.




SUB RegD,RegD,Reg2

C.ADDW-AddRegistertoRegister

GeneralForm:C.ADDW RegD,Reg2

Description:ThisinstructionaddsthevalueinReg2tothevalueinRegDandplacestheresultinRegD.Theresultvalueistruncatedto32bitsandthensignextendedtothefulllengthoftheregister.

TheregistersRegDandReg2canbex8,x9,…x15.



Encoding:100111DDD01bbb01

WhereDDDdenotesRegD. WherebbbdenotesReg2.Availability:

Requires“C”(CompressedInstruction)extension.AvailableforRV64andRV128only.

Equivalent32BitInstruction:ADDW RegD,RegD,Reg2

C.SUBW-AddRegistertoRegister

GeneralForm:C.SUBW RegD,Reg2

Description:ThisinstructionsubtractsthevalueinReg2fromthevalueinRegDandplacestheresultinRegD.Theresultvalueistruncatedto32bitsandthensignextendedtothefulllengthoftheregister.



Requires“C”(CompressedInstruction)extension.AvailableforRV64andRV128only.

Equivalent32BitInstruction:SUBW RegD,RegD,Reg2

MiscellaneousInstructions

C.NOP-NopInstruction

GeneralForm:C.NOP



Description:Thisinstructiondoesnothing.TheencodingcanbeviewedasaspecialcaseoftheC.ADDIinstructionwherethedestinationisx0andtheimmediatevalueis0.

Encoding:0000000000000001


Equivalent32BitInstruction:ADDI x0,x0,0

C.EBREAK-DebuggingBreakInstruction

GeneralForm:C.EBREAK

Description:Thisinstructionisusedbydebuggers.Theideaisthatadebuggerwilltemporarilyreplacesomeinstructionwiththisinstruction.Whenexecuted,anexceptionwilloccur,allowingthedebuggertogetcontrol.

Encoding:1001000000000010


Equivalent32BitInstruction:EBREAK

C.ILLEGAL-IllegalInstruction

GeneralForm:C.ILLEGAL

Itisdoubtfulthatmostassemblerswouldrecognizetheopcode“C.ILLEGAL”,butweincludeitanyway.

Description:Thispatternisde2inedasbeingillegal.Anyattempttoexecuteitwillcausean“illegalinstruction”exception.Allzeroswasintentionallychosensothatanyattempttobranchintounpopulatedordysfunctionalmemory(whichoftenreadsaszeros)willresultinanimmediatetrap.



Encoding:0000000000000000


Equivalent32BitInstruction:ILLEGAL

InstructionEncoding

Thefollowingtablelistsallthecompressedinstructionsandisorderedinanattempttoshowthecompletecoverageofthe16bitinstructionencodingspace.

SomebitpatternsarereservedbytheRISC-Vspecforfutureuse;suchpatternsaremarked<Reserved>andshouldnotbeusedbyimplementors.

Somebitpatternsareunassignedandavailableforspeci2icimplementationstode2ineastheywish;suchpatternsaremarked<NSE>(NonstandardExtension).

Somebitpatternsaremarkedas<Hint>;implementorsarefreetousethesebitpatternstoencodehintstotheprocessortotheimproveexecutionspeed.Inallcases,thepatternsmarked<Hint>areequivalenttoinstructionsthatinvolvemodifyingaregisterandthatspeci2icallydisallowtheuseofregisterx0asthedestinationregister.Thus,implementorsarefreetoignorethe<Hint>patternsandexecutetheminthenaturalway,bycomputingaresultbutsendingthatvaluestothezeroregisterx0,causingtheseinstructionstobehaveasnopinstructions.

Recallthattheleastsigni2icant2bitsareusedtodistinguish16-bitinstructionsfromlongerinstructions.Iftheleastsigni2icant2bitsare11,thentheinstructionisnota16bitcompressedinstruction.

Thistableismore-or-lesssortedonleastsigni2icant2bits(00,01,10),followedbythemostsigni2icantbitsinorder.

C.ILLEGAL 0000000000000000

<Reserved> 00000000000DDD00 RegD≠0

C.ADDI4SPN 000VVVVVVVVDDD00 Value≠0



C.FLD 001VVVaaaVVDDD00 RV32andRV64only

C.LQ 001VVVaaaVVDDD00 RV128only

C.LW 010VVVaaaVVDDD00

C.FLW 011VVVaaaVVDDD00 RV32only

C.LD 011VVVaaaVVDDD00 RV64andRV128only

<Reserved> 100XXXXXXXXXXX00

C.FSD 101VVVaaaVVbbb00 RV32andRV64only

C.SQ 101VVVaaaVVbbb00 RV128only

C.SW 110VVVaaaVVbbb00

C.FSW 111VVVaaaVVbbb00 RV32only

C.SD 111VVVaaaVVbbb00 RV64andRV128only

C.ADDI 000VDDDDDVVVVV01 RegD≠0;Value≠0

<Reserved> 0000DDDDD0000001 RegD≠0;Value=0

<Hint> 000V00000VVVVV01 RegD=0;Value≠0

C.NOP 0000000000000001 RegD=0;Value=0

C.JAL 001VVVVVVVVVVV01 RV32only

C.ADDIW 001VDDDDDVVVVV01 Rv64andRV128only;RegD≠0

<Hint> 001V00000VVVVV01 Rv64andRV128only;RegD=0

C.LI 010VDDDDDVVVVV01 RegD≠0

<Hint> 010VDDDDDVVVVV01 RegD=0

C.LUI 011VDDDDDVVVVV01 RegD≠0,2;Value≠0

<Reserved> 011VDDDDDVVVVV01 RegD≠0,2;Value=0

<Hint> 011V00000VVVVV01 RegD=0

C.ADDI16SP 011V00010VVVVV01 RegD=2;Value≠0

<Reserved> 011V00010VVVVV01 RegD=2,Value=0

C.SRLI 100000DDDVVVVV01 RV32;V≠0

<Hint> 100000DDD0000001 RV32;V=0

<NSE> 100100DDDVVVVV01 RV32;



C.SRLI 100V00DDDVVVVV01 RV64;V≠0

<Hint> 100000DDD0000001 RV64;V=0

C.SRLI 100V00DDDVVVVV01 RV128only;V≠0

C.SRLI64 100000DDD0000001 RV128only;V=0

C.SRAI 100001DDDVVVVV01 RV32;V≠0

<Hint> 100001DDD0000001 RV32;V=0

<NSE> 100101DDDVVVVV01 RV32;

C.SRAI 100V01DDDVVVVV01 RV64;V≠0

<Hint> 100001DDD0000001 RV64;V=0

C.SRAI 100V01DDDVVVVV01 RV128only;V≠0

C.SRAI64 100001DDD0000001 RV128only;V=0

C.SUB 100011DDD00bbb01

C.XOR 100011DDD01bbb01

C.OR 100011DDD10bbb01

C.AND 100011DDD11bbb01

C.SUBW 100111DDD00bbb01 RV64andRV128only

<Reserved> 100111DDD00bbb01 RV32only

C.ADDW 100111DDD01bbb01 RV64andRV128only

<Reserved> 100111DDD01bbb01 RV32only

<Reserved> 100111XXX10XXX01

<Reserved> 100111XXX11XXX01

C.ANDI 100V10DDDVVVVV01

C.J 101VVVVVVVVVVV01

C.BEQZ 110VVVaaaVVVVV01

C.BNEZ 111VVVaaaVVVVV01

C.SLLI 0000DDDDDVVVVV10 RV32only;RegD≠0;V≠0

<Hint> 0000DDDDD0000010 RV32only;RegD≠0;V=0

<NSE> 0001DDDDDVVVVV10 RV32only;RegD≠0



<Hint> 000V00000VVVVV10 RV32only;RegD=0

C.SLLI 000VDDDDDVVVVV10 RV64only;RegD≠0;V≠0

<Hint> 0000DDDDD0000010 RV64only;RegD≠0;V=0


C.SLLI 000VDDDDDVVVVV10 RV128only;RegD≠0;V≠0

C.SLLI64 0000DDDDD0000010 RV128only;RegD≠0;V=0


C.FLDSP 001VDDDDDVVVVV10 RV32andRV64only

C.LQSP 001VDDDDDVVVVV10 RV128only;RegD≠0

<Reserved> 001VDDDDDVVVVV10 RV128only;RegD=0

C.LWSP 010VDDDDDVVVVV10 RegD≠0

<Reserved> 010VDDDDDVVVVV10 RegD=0

C.LDSP 011VDDDDDVVVVV10 RV64andRV128only;RegD≠0

<Reserved> 011VDDDDDVVVVV10 RV64andRV128only;RegD=0

C.FLWSP 011VDDDDDVVVVV10 RV32only

C.JR 1000aaaaa0000010 RegA≠0

<Reserved> 1000aaaaa0000010 RegA=0

C.MV 1000DDDDDbbbbb10 RegB≠0;RegD≠0

<Hint> 1000DDDDDbbbbb10 RegB≠0;RegD=0

C.EBREAK 1001000000000010

C.JALR 1001aaaaa0000010 RegA≠0

C.ADD 1001DDDDDbbbbb10 RegB≠0;RegD≠0

<Reserved> 1001DDDDDbbbbb10 RegB≠0;RegD=0

C.FSDSP 101VVVVVVbbbbb10 RV32andRV64only

C.SQSP 101VVVVVVbbbbb10 RV128only

C.SWSP 110VVVVVVbbbbb10

C.FSWSP 111VVVVVVbbbbb10 RV32only

C.SDSP 111VVVVVVbbbbb10 RV64andRV128only


Chapter7:ConcurrencyandAtomicInstructions

HardwareThreads(HARTs)

Asimpleprocessorhasasingle2low-of-control.Thatis,itoperatesasasingle“thread”,withaFETCH-DECODE-EXECUTEloop.Instructionsareexecutedone-after-the-other.(Inthecaseofpipelining,severalinstructionsmaybeinsomestageofexecutionatanymoment,buttheinstructionsarecompleted(or“retired”)one-after-the-other,andthepipelinedbehaviorisexactlythesameasifeachinstructionhadbeenexecutedalonetocompletionintheordertheyappearinthecode.)

Normally,anoperatingsystemwillmultiplextheprocessor,therebyprovidingmanythreads,viatimeslicing.Forclarity,wecandistinguishbetween“hardwarethread”and“softwarethreads”.TheOScanmultiplexasinglehardwarethreadtoprovidemultiplesoftwarethreads.

Inamulti-coreormultiprocessorsystem,eachcoreisrunningitsownindependentFETCH-DECODE-EXECUTEcycle.Thus,therewillbeanumberofhardwarethreadsequaltothenumberofcores.HowtheOSimplementssoftwarethreadsontopoftheavailablehardwarethreadsisacomplextopicwhichconcernsthekerneldesignandorganization.

Ina“superscalar”core,thereismorethanoneinstructionstream.Thereisa2ixednumber(suchas2or4)ofindependentinstructionstreams.Forexample,someparticularsuperscalarcoremightexecutetwoindependentinstructionstreamssimultaneously.Suchacoreisprovidingtwohardwarethreads.Eachstreamhasitsownindependentsetofregistersanditsownprogramcounter,more-or-lessasiftheywereseparatecoresinamultiprocessorsystem.

Onebene2itofthesuperscalarorganizationisthatexpensiveandrarelyusedcircuits(e.g.,2loatingpointdivide)canbeprovidedonlyinasinglecopy,andsharedbybothhardwarethreads.Also,theremaybemultipleinstantiationsofmorecriticalcircuits



(e.g.,integeraddition)whichareallocateddynamicallytothoseinstructionsthatneedthem.

Withsuperscalarorganization,itispossiblethatmorethanoneinstructioncanberetiredperclockcycle,providinggreaterperformance.

De=initionof“HART”:Inthediscussionofconcurrencycontrolandsynchronization,theRISC-Vdocumentationusestheterm“HART”torefertoahardwarethread.Thehardwarethreads(HARTs)mightbeimplementedondistinctprocessorswithinamultiprocessorsystem,orbymultiplecoresinasingleprocessor,oronasinglesuperscalarcore.TheRISC-Vspeccoversallpossibilities,includinghybridcombinations.

TheRISC-VMemoryModel

It’sacommonlyacceptedprincipleinprocessordesignthat,withinasingleinstructionstream,datadependenciesarerespected.Inotherwords,theprocessorexecutesinstructionsinorderandwillnotreorderinstructions.

Forexample,aninstructionthatloadsaregisterwithavaluefrommemoryandaninstructionthatstoresthecontentsofthatsameregisterinmemoryshouldneverbereordered.Assemblylanguageprogrammersarefreetoassumethateachinstructionisexecutedfrombeginningtocompletionintheorderitwasspeci2ied.Anyoptimizationsorreorderingdonebythehardwaremustpreservethisconstraint.Forexample,twoinstructionsthataccessdifferentregistersanddifferentmemoryaddressescansafelybereorderedorexecutedsimultaneouslytoachievefasterexecution.

However,betweendifferenthardwarethreads,theorderinwhichoperationsexecutemaybeindeterminate.Forexample,considertwoindependentprocessorsinamultiprocessorsystemwithsharedmemory.Imaginetwoinstructionswhichbothstoreintoasinglememorylocation;theymayexecuteinarbitraryorder,resultinginadifferentvaluebeingstoredlast.

Hereisonecommonrequirementforpropersynchronizationinasystemofcooperatingprocesses.(By“cooperatingprocesses”,wemeanasystemwithmultiplethreadsthataredesignedtoworktogetherconcurrentlyexecutingto



completesomegivenprogrammingtaskcorrectly,regardlessofthevagariesofindeterminateexecution.)

SynchronizationofReadersandWriters:Withapotentiallysharedpieceofdata,awritermustwaittowritethedatauntilallpreviousreadersare2inishedreadingtheoldvalue.Likewise,areadermustwaituntilanypreviouswriteris2inishedwritingthenewvalue.

TheRISC-Vspecassumesthatreadsandwritesissuedbyasinglehardwarethread(HART),mustbevisibletothatsameHARTintheorderexecuted.Thatis,thecore’sactualorderofexecutionofaHART’sinstructionsmustbeindistinguishablefromthesequentialorderinwhichtheprogrammerwrotetheinstructions.Perhapsthecorewillreorderinstructionstoimproveperformance,butanyreorderingmustbe“safe”inthesenseofhavingthesameeffectasbeingissuedintheoriginalorder.

Thisassumptionissomethingthateveryprogrammertakesforgranted:thecomputerwillexecutetheircodeinstructionsintheordertheprogrammerwrotethem.Forsinglethreadsexecutinginisolation,thereisnothingmuchmoretotalkabout.Butthingsquicklygetcomplicatedwhentherearemultiplethreads.

AsfarastheapparentorderoftheoperationsasviewedbyotherHARTs,theRISC-Vspecdoesnotassumeanything.Infact,theRISC-Vspecallowsthefollowing:

ItmayappeartoasecondHARTthattheinstructionsexecutedbytheZirstHARTaredoneoutoforder.

SoitispossiblethattwodistinctHARTswillseethesamesetofoperationshappeninginadifferentorder.ThisisaveryrealpossibilityonsomesystemswhentheoperationsinquestionarereadsandwritestomemoryandtheindividualHARTsareoperatingwithprivatecaches.

Forexample,considerthefollowingsequenceofinstructionsissuedbyHARTAtomemorylocationX.

WriteX!4 ReadX WriteX!5 ReadX



The2irstreadwillreturn4andthesecondreadwillreturn5.HoweverHARTBmightexecutetheseinstructions:

ReadX ReadX

The2irstreadmightreturn5andthesecondreadmightreturn4.ItappearstoHARTBthatthewriteswereexecutedoutoforder,probablyduetoinconsistenciesbetweentheircaches.

ConcurrencyControl-ReviewandBackground

Thissectionisnotspeci2ictoRISC-Vandcanbesafelyskippedifyouarefamiliarwithconcurrencycontrol.

Inanysystemwithmultiplethreads,someformofsynchronizationandconcurrencycontrolisrequired.Withoutit,thebehaviorofsomecodesequencesmaybenondeterministicandprogramerrorsmayresult.Forexample,eachofseveralthreadsmayneedtoperiodicallyqueryandupdateasharedpieceofdata.Withoutanycontrol,twoormorethreadsmaybetryingtoreadandupdatetheshareddatasimultaneously;therearisesapossibilitythatsomethreadsmayseeaninconsistentstateofthedataorthatsomeupdatesmaybelostorthatseveralconcurrentupdateswillresultintheshareddatabeingputintoainvalidorinconsistentstate.

Inonecommonapproachtoconcurrencycontrol—particularlyamongsoftwarethreads—a“lock”iscreatedtoprotecttheshareddata.Anyblockofcodeaccessingtheshareddatamust2irst“acquire”(or“obtain”or“set”)thelockbeforeaccessingtheshareddata,andmust“release”(i.e.,“free”,“clear”)thelockaftertheaccessiscomplete.Asequenceofcodewhichaccessestheshareddataiscalleda“criticalsection”andtheprogrammermustremembertoacquirethelockatthebeginningofthecriticalsectionandreleasethelockattheendofthecriticalsection.

Operatingsystemstypicallyprovide“locks”,alongwith“acquire”and“release”operations.Implementingtheseoperationsonasingleprocessorsystemissimple:theOSmerelyhastoavoidathread-switchoccurringduringtheacquireandreleaseoperationsthemselves.Thiscanbedoneeasilyandef2icientlybydisablinginterruptsforthedurationoftheacquire/releaseoperations.



Thingsaresigni2icantlymorecomplexformulti-coresystemswithsharedmemory.Nowthereis“multiprocessing”(trueconcurrency)andnotjust“multithreading”(simulatedconcurrencyusingtime-slicing).Thelockitselfbecomesshareddataandaccesstothelockineffectbecomesacriticalsectionofitsown.Blockinginterruptsononeprocessorisnolongeradequate.Thereexistalgorithmstosolvethisproblempurelyinsoftware(Peterson’salgorithm;Dekker’salgorithm)butarchitecturalsupportintheISAisrequiredforperformancereasons.

Inthepasttherehavebeenacoupleofinstructionsproposedtosupportthe“acquire”and“release”operationsforlocks.Onesuchinstructionwasthe“test-and-set”instruction.Thisinstructionincludestwooperands:anaddressandaregister.Theinstructionreadsfrommemoryattheaddressgivenandstorestheresultintotheregister.Thentheinstructionsstoresa2ixedknownvalue(usually“1”)intothesamememorylocation.Boththeread-memoryandthewrite-memoryoperationarespeci2iedtobe“atomic”,whichmeansthatthereisaguaranteethatnootherinstructionwilloccurbetweentheread-memoryandwrite-memoryfromanyothersource(e.g.,othercoresorI/Odevices).

Historically,hardwarecanenforcetheatomicityguaranteebysomehowblockingallaccessestoallmemorylocationsforallothercoresbetweenthereadandwritephasesofthetest-and-setinstruction.Thiswillslowotherpartsofthesystemdownalittle,butifthenumberofcoresissmall,it’snotmuchofaburden.

Thetest-and-setinstructioncanbeusedtoimplementalockverysimply.Here’show:thelockisrepresentedasasinglememorylocation,with“0”meaning“notlocked”and“1”meaning“locked”.The“acquire”operationconsistsofexecutingthetest-and-setinstruction.Aftertheinstruction,thelockwillbeinthe“set”stateandwillcontain“1”.The“acquire”operationthenexaminesthepreviouslockvalue,whichwasretrievedfrommemoryandstoredinaregister.Ifitwas“0”,thenallisgood:thelockwaspreviouslyfree.Ithasnowbeenacquiredandthecriticalsectioncanbeentered.However,ifthepreviousvaluewas“1”,thenthelockwasapparentlyalreadyset.Thelockhasnotbeenacquired.Theacquirecodemusttryagain,eitherimmediately,orafterawait,orafterthreadrescheduling.The“release”operationisimplementedbysimplystoringa“0”intothememorylocationrepresentingthelock.

AnotherapproachtoISAdesignistoprovidea“swap”instruction,insteadofthetest-and-setinstruction.Likethetest-and-setinstruction,theswapinstructionhasaread-memoryphaseandawrite-memoryphase.Theinstructionistakestwooperands:aregisterandamemoryaddress.Theinstructionswapsthevalues;aftertheinstructiontheregistercontainsthepreviousvaluefromthememorywordand



thememorywordcontainsthepreviousvalueintheregister.Theswapinstructionisageneralizationofthetest-and-setinstruction:byplacinga“1”intheregisterbeforetheswapinstructionisexecuted,theeffectwillbeidenticaltoatest-and-setinstruction.

Anotherapproachisthe“compare-and-swap”(CAS)instruction,whichisusedintheX-86architecture.TheCASinstructiontakesfouroperands:amemoryaddress,aregistercontainingan“expectedvalue”,andaregistercontaininga“newvalue”,andaregisterinwhichtostoreareturncode.Theinstructiondoesseveralthingsatomically,i.e.,withoutthepossibilityofinterruptionorinterferencefromotherthreads.First,theinstructionreadsavaluefrommemoryandcomparesittothe“expectedvalue.”Ifthetwoaredifferent,theinstructionfailsandreturnsacodetoindicatefailure.Butifthetwovaluesareequal,theinstructionstoresthe“newvalue”intothememorylocationandreturnsacodeindicatingsuccess.

Theproblemwithalltheseinstructionsisthateachrequirestwomemoryoperations.ThisviolatesthegeneralRISCprincipleofkeepingallinstructionssimple,minimal,andfast.Twomemoryoperationsinasingleinstructionsistoomuchandcomplicatesthecircuitry.

Furthermore,theseinstructionscanslowothercoresdownsincetheymustbeexecutedatomicallyandpresumablytheyworkbyfreezinguptheentirememorysystemduringtheirexecution.

Theproblemswithatomicallyexecutingbothareadandwritetogethergetworsewithmorecores.Theseproblemsalsogetworsewithmorefrequentlockingoperations.Tosupportincreasedconcurrency,it’sagoodideatohave2iner-grainedlocking,i.e.,tohavemorelocks,havingeachlockprotectlessdata.Finer-grainedlockingimpliesmore“acquire”and“release”operations,evenifthelocksareusuallyfoundtobefree.Theproblemsalsogetworsewithincreasedsharingofdata,ageneralizedtrendweareseeinginmanycontexts.

AReviewofCachingConcepts

Thissectionisnotspeci2ictoRISC-Vandcanbesafelyskippedifyouarefamiliarwithcaching.



Theproblemofsynchronizationbetweenmultipleprocessors(e.g.,coresorHARTs)getsmuchmorecomplexinthepresenceofcachememories.Inshort,theproblemstemsfromthefactthatasinglelocationinmemorycanhavemultiplevaluesatonce,becausethedatacanbepresentinseveralcacheswithinconsistentvalues.Let’sreviewthebasicideasbehindcachememoriesbeforewediscusstheRISC-Vinstructions.

Inmanycomputers,thebandwidthbetweenmainmemoryandtheprocessorcoreisaperformancebottleneck.Toincreaseperformance,a“cache”memoryisplacedbetweenmainmemoryandthecore.

Wheneverthecoretriestofetchdatafrommemory,thecacheischeckedand,ifitcontainsthedata,thedatacanbesenttothecoreimmediately.Otherwise,thedatamustbefetchedfrommainmemory,whichtakesmoretime.Thedatawillbesenttothecoreaswellasbeingstoredincachememorytospeedfuturerequestsforthesamedata.

Whendataiswrittenfromthecoretomemory,thedatawillsometimesbewrittentothecachememory,whichspeedsfutureaccessesifthecorewantstoreadthedataagainsoon(“writeallocate”).Somesystemsavoidcachingtheupdateddata(“no-write-allocate”).Ineithercase,theupdateddatamustbecopiedtomainmemoryatsomepoint.Thewritetomainmemorymayoccurimmediately,atthetimethecore2irstwritesthedatatocache,andthisiscalled“write-through”.Alternatively,writetomainmemorycanbedelayeduntillater.Atsomelatertime,whenspaceinthecacheisneededforsomeotherdata,theupdatedmodi2ieddatavaluewill2inallybewrittentomainmemory.Thispolicyiscalled“write-back”.

Cachesdonotoperateonbytesindividually;insteadtheystoredatainunitsof“cacheblock”.Acachelineisablockofcontiguousdatabytes(typically64bytes)whicharecopiedtogetherasaunitinandoutofthecache.Inarequestforasinglebyteorhalfword(forexample),theentirecacheblockcontainingthedesiredbyteswilleitherbeinthecache(a“cachehit”)ornot(a“cachemiss”).

Thetypicalalignmentrequirementsensurethatamultipledataunit(suchasahalfword)willnevercrossacacheblockboundary;thedatawilleitherbeentirelyinoneblockorentirelyinanotherblock.Misaligneddatawilloccasionallycrosscacheblockboundariesandtheprocessofreadingintoorwritingfromthecoretothememorysystemwillbecomplicatedandgenerallyrequireabouttwiceasmuchtime.



Cachesareoftenorganizedintolevels.Forexample,athree-levelcachemightconsistof:

L1 Closesttothecore.Fastest,smallestincapacity. L2 Intermediate L3 Closesttomainmemory.Slowest,largestincapacity.

Inasystemwithmultiplecoresonasinglechip,eachindividualcoremightpossessitsownprivateL1andL2caches,whileallcoreswillsharealargerL3cache.

Therearedifferencesbetweeninstructionandregulardata.Inparticular,instructionsarealwaysread-only.Sincethedataisnevermodi2iedbythecore,thereisnoneedtoprovidehardwaretoimplementthewrite-allocate/write-no-allocateorwrite-back/write-throughpolicies.

Asatypicalexample,eachcorewillhavetwoL1caches;onefordataandoneforinstructions.Theterm“uni=iedcache”meansthatthecacheholdsbothinstructionsanddata.Typicallytheslower,largercache(e.g.,L3)isuni2ied.

InmanyISAs,includingRISC-V,allaccesstoI/Odevicesis“memory-mappedI/O”whichmeansthattheI/Odevicesareassignedaddressesinthephysicalmemorymap.TosenddataorcommandstoanI/Odevice,thecorewillissue“store”instructionsandtoretrievedataorstatusinformationfromanI/Odevice,thecorewillexecute“load”instructions.

ManyI/Odevicesalsoaccessmainmemorydirectly.Forexample,thecoremightmovedataintomemoryandthenissuecommandstoanI/Odevicewhichsubsequentlycausethedevicetoretrievethesamedatadirectlyfrommemory.

Thus,themainmemorysystemmayhaveseveral“ports”allowingdatatobereadandwrittenseparatelyby:

•Differentcoresonasinglechip •Differentprocessorchips •VariousI/Odevices

Amoderncomputersystemwillconsistofanumberofbussesandanumberofdifferentdevicessittingoneachbus.



Whichallthiscomplexity,thereisaproblemofsynchronizationanddataconsistency.Forexample,acoremaywriteabyteofdatatomemoryandexpectsomeI/Odeviceorothercoretofetchthatbyte.Inthepresenceofcaches,theremaybeseveraldifferentvaluesforthesamememoryaddressatthesameinstant.Withoutfurthercontrol,a“reader”maynotseethevaluewrittenbya“writer”eventhoughthewriteoccursearlierintime.

Notethatthedataconsistencyproblemdoesnotoccurforread-onlydata.Ifaparticularbyteofdataneverchanges,thenthereisnotaproblem.Perhapsthedataiscachedorperhapsitmustbefetchedfrommainmemory.Butsincethereisonlyonevalue,thereisneverapossibilitythatthecachecontainsanoutdatedvalue.Asynchronizationproblemcanonlyoccurwhendataisupdated.

Notethatalldataiswrittenatsomepoint,exceptperhapsdatastoredinRead-OnlyMemory(ROM).Wemustassumethatwhenanycomputerispoweredup,thecontentsofdynamicmemoryareunde2ined;evenread-onlydatamustinvolveaninitialwritetothememorysocaremustbetakentoavoidreadingtheinitialunde2ineddata.

Thesynchronizationproblemoccursbecausemultiplevaluesforagivenmemorylocationcanbestoredinvariouscaches.Ifthecachesareeliminated,thenthedatamustnecessarilybeconsistent;butcachesyieldtremendousperformancebene2its,sosupportforsynchronizationin2luencesISAdesign.

Wecanmakeadistinctionbetweenprivatecachesandshared(i.e.,memory-sideorglobal)caches.Considerasystemwithseveralcoresandasinglesharedmainmemory.Eachcoremayhaveitsownprivatememorycache,inwhichcaseasinglebyteofmemorymaybecachedsimultaneouslyinseveraldifferentprivatecaches.Anupdatebyanysinglecoretothisbyteoughttobevisibletoallcoresbut(withoutsynchronization)othercoresmightreadoutdatedvaluesfromtheirprivatecaches.

If,insteadthereisonlyasinglesharedcachesittingbetweenmainmemoryandallcores,thenthesynchronizationproblemgoesaway.Agivenbytecanonlybecachedinoneplace,namelythesharedcache.Sinceallcoresareusingthissharedcache,anyupdatetothebytewillbere2lectedinthesharedcache,andthereforeseenbyallothercores.

Modernsystemsoftenuseamixedofprivateandsharedcaches.

Acachememoryconsistsofasetof“cachelines”.



Eachlineinacachecontainsacouplebitsidentifyingwhatdataiscontainedinthelineandwhetherthedatais“valid”or“invalid”.Moreprecisely,acachelinegenerallycontainsthefollowingpiecesofinformation:

•Thecachelinedata(e.g.,ablockof64databytes) •Theaddressofthedatablock •A“validbit”,totellwhetherthiscachelineismeaningful •A“dirtybit”,totellwhetherthedatahasbeenmodi2ied

Wediscussvirtualmemoryandaddresstranslationelsewhere.Butherewepointoutthateachcachelinemustcontaintheaddressofthedatastoredinthatcacheline.

Intheeasiest-to-understandapproach,eachcachelinecontainsthephysicaladdressofitsdatablock.Thecacheisan“associativememory”usingthephysicaladdressasthesearchkey(i.e.,theassociativeindex).However,becauseofvirtualmemoryaddresstranslation,thevirtualaddressisknownearlierintimethanthephysicaladdress,soitmakessensetoindexthecachelinesbyvirtualaddress.Realdesignsaremorecomplex,butforourpurposeswedonotneedtogofurtherhere.

Acrudesolutiontothecacheconsistencyproblemistoperiodicallyemptytheentirecache.

“Flushingthecache”meanswritingallupdateddatabacktomainmemoryandmarkingallcachelines“invalid”.Soasimpleapproachisto2lushallcachesafteranywrite.Flushingtheentirecacheafteranywriteisoverlyconservativeandnotthemostef2icientapproach,butitwillsolvethecacheconsistencyproblem.

Thesoftwaremightbealittlesmarterandavoidfullcache2lusheswhennotstrictlyrequired.Forexample,whenonecorewritestomainmemory,thereisaquestionofwhethercachesbelongingtoothercoresmustbe2lushed.Iftheprogrammercanbecertainthatthedatawillneverbereadbyothercores,thencallingforacache2lushcanbeavoided.

TheRISC-Vdesignersrecognizethata2iner-grainedcontrolovercachesandmemorysynchronizationisrequired,andprovidetheFENCEinstruction,whichwillbedescribedlater.



RISC-VConcurrencyControl

RISC-Vsupportsconcurrencycontrol,synchronization,andcacheconsistencywiththeseinstructions:

Alwaysavailable: FENCE Synchronizedatareadsandwrites FENCE.I Synchronizetheinstructioncache

Availableinthe“A”extension(for“AtomicInstructions”) LR LoadReserved SC StoreConditional AMOSWAP Atomicallyswaptwovalues AMOADD Atomicallyaddtwovalues AMOAND AtomicallyANDtwovalues AMOOR AtomicallyORtwovalues AMOXOR AtomicallyXORtwovalues AMOMAX AtomicallyMAXtwovalues AMOMIN AtomicallyMINtwovalues

UnderRevision:TheRISC-Vspecstatesthattheirmemorymodelisunderrevision,implyingthattheseinstructionsmaychange.

TheFENCEInstructions

Therearetwofenceinstructions:FENCEandFENCE.Iandtheseinstructionsarealwaysavailable.Theydonotrequirethe“A”(AtomicOperations)extension.

Thegeneralideawitha“fence”isthatalloperationsarepartitionedintotwosets:thepredecessorsetandthesuccessorset.

Thefencerequiresalloperationsinthepredecessorsettocompletebeforeanyoftheoperationsinthesuccessorsetcanproceed.InthecaseofRISC-V,thepredecessoroperationsaretheinstructionsthatmustcompletebeforethefence,andthesuccessoroperationsarethosethatmaynotbeginuntilafterthefence.



Asananalogy,considertheTourdeFrance,whereallthecyclistsmustcompleteeachstageoftherace,beforeanyridercanbeginthenextstage.Onlyaftertheslowestrider2inishesthepreviousstageoftheraceareanyridersallowedtostartthenextstage.Thestagesareeffectivelyseparateby“fences”.

TheFENCEinstructioninRISC-Vmayonlyaffectinstructionsthatreadorwritetothememorysystem.Itisusedtocontrolconcurrentaccessestosharedmemory.

Recallthatthephysicalmemoryaddressspacecanbepopulatedwith:

•Normal,physicalmemory(containingstoreddatabytes) •Memory-mappedI/Odevices

Ahardwarethread(HART)caneitherreadorwritevaluesto/fromthememoryspace.Thisyieldsthesecombinations:

R Read–readdatafromphysicalmemory W Output–writedatatophysicalmemory I Input–readdatafromamemory-mappedI/Odevice O Output–writedatatoamemory-mappedI/Odevice

Ofcoursemanyoperations(suchasamovementofdatafromoneregistertoanother)donottouchmemoryatall.Suchoperationsdonotfallintoanyofthesefourclassi2icationsandareignoredbytheFENCEinstruction.

ImaginethatsomeHARTexecutestwomemorywrite(“W”)operationsinsuccession.Perhapsthe2irstwriteoperationissettingalock,whichisintendedtoprotectsomepieceofsharedata.Onlyafterthelockhasbeenset,isitpermissibletoupdatetheshareddata.(Thisistheideabehindprotectingshareddatawithlocks;theshareddatashouldonlybeaccessedwhenthelockisheldbythethread.)

OfcoursetheHARTinquestionwillexecutethewritetothelockbeforethewritetothedata,butbecauseoftherelaxedmemorymodelassumedbytheRISC-Vspec,itispossiblethatotherHARTswillseethedatawriteappearingtocomebeforethelockwrite.OtherHARTscouldthereforeseetheupdateddatabeforeseeingthelockgettingset,therebyviolatingthefunctionalityandintegrityofthelockprotocol.TheFENCEinstructionprovidesawaytopreventthisdisasterandallowsprogrammerstoimplementproperconcurrencycontrolandcreatecorrect,reliablecooperatingprocesses.



TheFENCEinstructioncanbeusedtomakesurethatallotherHARTswillseethewriteoccurringbeforetheread.Inthisexample,wewilluseaFENCEinstructiontomakesurethatallwritestomemorythatwereexecutedbeforetheFENCEinstructionwillappeartoexecutebeforeallwritetomemorythatwillbeexecutedaftertheFENCEinstruction.

ForthepurposesoftheFENCEinstruction,weuse“P”foroperationsoccurringpriortotheFENCE(predecessoroperations)and“S”foroperationsoccurringaftertheFENCEinstruction(successoroperations).Thus,wehave:

Predecessoroperations: PR DatareadsoccurringbeforetheFENCE PW DatawritesoccurringbeforetheFENCE PI Memory-mappedreads(inputs)occurringbeforetheFENCE PO Memory-mappedwrites(output)occurringbeforetheFENCE

Successoroperations: SR DatareadsoccurringaftertheFENCE SW DatawritesoccurringaftertheFENCE SI Memory-mappedreads(inputs)occurringaftertheFENCE SO Memory-mappedwrites(output)occurringaftertheFENCE

Inourexample,wewanttoforceallwritestodatamemorythatwereissuedbeforetheFENCEtocompletebeforeallwritetodatamemorythatwillbeissuedaftertheFENCEinstruction.Thus,wecaninsertthefollowingFENCEinstructionbetweenthewritetothelockandthereadfromtheshareddata:

...writetolock…FENCE PW,SW # Complete past writes before future writes

…writetoshareddata…

FENCE

GeneralForm:FENCE PI,PO,PR,PW,SI,SO,SR,SW

Example:Thisinstructioncontains8singlebit2ields.Eachcanbeenabled(setto1)ordisabled(clearedto0).The2ieldsarecalledPI,PO,PR,PW,SI,SO,SR,SW.Thespecdoesnotindicatehowthisinstructionistobecodedinassembly.Perhapsthebitscanbesetbybeingmentionedandclearedotherwise.Forexample:



FENCE PR,PW,SR,SWDescription:

Seetextabove.Availability:

Thisinstructiondoesnotrequirethe“A”extension(AtomicInstructions).Itisalwaysavailable.

Encoding: ThisisavariationofanI-typeinstruction,wheretheRegDandReg22ieldsare

notusedandsettozeroandonly8oftheImmed-12bitsareused.

ImplementationofFENCE:OnesimpleimplementationofFENCEisforthehardwaretosimplyignorethedistinctionbetweenR,W,I,andOandsimplycompletealloperationsoccurringbeforetheFENCEbeforestartinganyoperationsaftertheFENCE.Inthemostconservativebutsimplestimplementation,thehardwaremightjust2lushallcaches,forcingasortofhard,system-widefenceoperation.Hopefullysomeimplementationswillprovidethe2iner-grainedcontrolpermittedbytheinstruction.

TheFLUSHinstructionisthemaintoolfor2lushingthecache,butisintendedtooperateononlydatacaches.However,instructionsmaybecachedinaseparateinstructioncacheandthereisaneedtoupdatethiscachefromtimetotime.ThisisthepurposeoftheFENCE.Iinstruction.

Wheneverbytesarewrittentomemoryandthesebytesareintendedtobesubsequentlyexecutedasinstructions,anupdatetotheinstructioncachemayberequired.Forexample,awrite(e.g.,theSTstoreinstruction)tomemorylocationXmaybeexecuted.SubsequentlythecorewillexecutetheinstructionlocatedataddressX;ofcoursethenewvalueshouldbefetched.However,iftheinstructioncachealreadycontainsthepreviouscontentsforlocationX,theremaybeaproblem.Thecorecannotsimplyusethecachedvalue.

TheeffectoftheFENCE.Iinstructionisequivalenttoinvalidatingtheentireinstructioncache,therebyforcingallfuturefetchestogotomemory.ThisguaranteesthatthenewestvalueforlocationXwillbefetched.

FENCE.I–(InstructionCacheFlushing)

GeneralForm:FENCE.I



Example:FENCE.I # Flush the instruction cache

Description:InvalidatetheentireinstructioncachelocaltothecurrentHART,oratleastdosomethingoperationallyequivalent.Seetext.

Availability:Thisinstructiondoesnotrequirethe“A”extension(AtomicInstructions).Itisalwaysavailable.

Encoding: ThisisavariationofanI-typeinstruction,wheretheRegD,Reg2,and

Immed-122ieldsarenotusedandsettozero.

Note:ThisinstructiononlyaffectsthecurrentHART.Inamulti-coresystem,itisprobablethatinstructionswillbecachedininstructioncachesthatareprivatetoeachcore.IfcoreAwritesdatatomemorylocationXandexpectstoexecutethatsamedata(e.g.,jumptoX),thenitshouldexecuteaFENCE.Iinstructiontomakesurethatitscachedoesn’tcontainanobsoletevalueforlocationX.

However,thiswillnothelpothercores,whichmayhavecachedolderdatafromlocationXintheirprivateinstructioncaches.Inorderto2lushtheinstructioncachesofothercores,softwareoncoreAwillhavetosomehowsignalsoftwarerunningontheothercores,whichwilltheneachneedtoexecuteFENCE.Iinstructions,to2lushtheirownprivatecaches.

ImplementationNotes:OneimplementationofFENCE.Iissimplyto2lushtheentireinstructioncache,i.e.,tosetallcachelinesto“invalid”.

Thismayseemlikeacoarse-grainedapproach,butitisassumedthatsoftwaretypicallyupdatesalargeblockofinstructionsinmemoryand,onlyafterwritingallthedata,willtherebeajumptothenewlycreatedcode.Forexample,consideraJust-In-Time(JIT)compiler,whichonlycompilesafunction/methodatthetimeitis2irstcalled/invoked.Thereisacleartransitionfromcompile-phasetoexecution-phase.Thus,onlyasingleFENCE.Iinstructionwouldbeneeded,andthepreviouscontentsoftheinstructioncachearelikelytobeunneededinthenearfutureanyway,soinvalidatingthemisacceptable.

Anotherapproachassumesthatthedataandinstructioncachesarekeptconsistent.Forexample,theinstructioncachemightbe“snooping”writestothe



datacache.Everywritetothedatacachewouldhavetheside-effectofupdatingtheinstructioncachewhennecessary.Whenevertheinstructioncachehappenedtocontaindatafromthesameaddressbeingwrittento,thatcachelineintheinstructioncachewouldbeupdatedoratleastinvalidated.Ifthecachesarekeptcoherent,thentheFENCE.Iinstructioncanbeimplementedbysimply2lushingthepipeline,whichmaypotentiallycontainoutdatedinstructiondata.

Load-Reserved/Store-ConditionalSemantics

TheRISC-Vdesignsupportsanapproachtoconcurrencycontrolcalledload-reserved/store-conditional(LR/SC).Thisisalsocalled“load-link/store-conditional(LL/SC).

We’llbeginbydescribingtheLRandSCinstructions,whichareonlyavailableinthe“A”(AtomicOperations)extensiontotheRISC-Vspec.

Theideaisthatanatomicoperationsuchastest-and-set,swap,orcompare-and-setisbrokenintotwoseparateinstructions.The2irstinstruction(“load-reserved”)performsthereadphaseandthesecondinstruction(“store-conditional”)performsthewritephase.

TheLRinstructionisverysimilartoatypicalLOADinstruction.Itreadsavaluefrommemoryandstoresitintoaregister.Butinaddition,theinstructionalso“reserves”thatmemorylocation.Youcanimaginethatthisreservationconsistsofmakinganoteofwhichcore(or“HART”inRISC-Vterminology)touchedwhichmemorylocationwithanLRinstruction.Thisreservationnoteissomethingthatismanagedbythememorysynchronization/controlhardwareandnotbythecoresattemptingtoaccessmemory.

Atsomelatertime,anSCinstructionwillbeexecuted.Itwillstoreavaluefromaregisterintomemory,justlikeatypicalSTOREinstruction.However,theinstructionmay“succeed”or“fail”.TheSCinstructionhasthreeoperands:aregistercontainingthevaluetobestoredinmemory,amemoryaddress,andaregisterintowhichthereturncodewillbestored.

Therearetworeturncodes:“0”indicatessuccessand“non-zero”indicatesfailure.Failuremightbecausedbymisalignedaddresses,accessviolations,etc.,butthese



arebesidethepoint.Theprimaryuseofthesuccess/failurehastodowithconcurrencycontrol.

TheproperwaytousetheseinstructionistoexecuteanLRinstructiononsomememorylocationandthen,shortlythereafter,toexecuteanSCinstructiononthatsamememorylocation.Ifnoothercore/HARThasstoredintothatmemorylocationsincetheLRinstruction,thentheSCwillsucceedandthevaluewillbestoredintomemory.Butifsomeothercore/HARThasstoredanythingintothatmemorylocation,thentheSCwillfailandnothingwillbestoredintomemory.Presumably,thecodewillthenloopandretrytheLR/SCsequenceuntilitsucceeds.

AreservationismadebyaLRinstructionandremainsvalidforasmall,2initeamountoftime(about16instructions)speci2iedbytheRISC-Vspec.TheSCinstructionmustbeexecutedwithinthiswindowofopportunity.Ifsomeothercore/HARTsneaksinanexecutesanLRonthesamememorylocation,thenthereservationiscancelledandtheSCwillfail.

LoadReserved(Word)

GeneralForm:LR.W RegD,(Reg1)

Example:LR.W x4,(x9) # x4 = Mem[x9] and reserve

Description:A32-bitvalueisfetchedfrommemoryandmovedintoregisterRegD.ThememoryaddressisinReg1.

Comment:Thisinstructionplacesa“reservation”onablockofmemorycontainingthisword.Seecommentsinthetextregardingsemantics.

Theaddressbeproperlyaligned.Thereservationmayincludemorethanjustthebytesaddressed,butwillatleastincludethesebytes.

RV64/RV128:Foramachinewitharegisterwidthlargerthan32-bits,thevalueissign-extendedtothefulllengthoftheregister.

Availability:Thisinstructionisonlyavailableinthe“A”extension(AtomicInstructions).

Encoding: ThisisavariationofanR-typeinstruction,whereReg2is00000.



LoadReserved(Doubleword)

GeneralForm:LR.D RegD,(Reg1)

Example:LR.D x4,(x9) # x4 = Mem[x9] and reserve

Description:A64-bitvalueisfetchedfrommemoryandmovedintoregisterRegD.ThememoryaddressisinReg1.

Comment:SeeLR.W.

RV64/RV128:OnlyavailableonRV64andRV128.ForRV128,thevalueissign-extendedtothefulllengthoftheregister.


Encoding: ThisisavariationofanR-typeinstruction,whereReg2is00000.

StoreConditional(Word)

GeneralForm:SC.W RegD,Reg2,(Reg1)

Example:SC.W x5,x4,(x9) # Mem[x9]=x4; x5=success code

Description:A32-bitvalueiscopiedfromregisterReg2tomemory.ThememoryaddressisinReg1.Asuccess/failcodeisplacedinRegDwhere0=successandnon-zero=failure.Onfailure,memoryisnotchanged.

Comment:Thisinstructionassumesa“reservation”haspreviouslybeenmadebyanLRinstructiononablockofmemorycontainingthisword.Seecommentsinthetextfordetails.

Theaddressbeproperlyaligned.Thereservationmayincludemorethanjustthebytesaddressed,butwillatleastincludethesebytes.



Thevalueof“1”willusuallybeusedtoindicatefailure,butthereisapossibilityfordifferentcodestobeusedtoindicatedifferentreasonsforfailure.

RV64/RV128:Foramachinewitharegisterwidthlargerthan32-bits,theupperbitsoftheregisterareignored.



StoreConditional(Doubleword)

GeneralForm:SC.D RegD,Reg2,(Reg1)

Example:SC.D x5,x4,(x9) # Mem[x9]=x4; x5=success code

Description:A64-bitvalueiscopiedfromregisterReg2tomemory.ThememoryaddressisinReg1.Asuccess/failcodeisplacedinRegDwhere0=successandnon-zero=failure.Onfailure,memoryisnotchanged.

Comment:SeeSC.W

RV64/RV128:OnlyavailableonRV64andRV128.ForRV128,theupperbitsoftheregisterareignored.



AtomicMemoryControlBits:“aq”and“rl”

The“A”extensionprovidesthefollowingatomicmemoryoperation(AMO)instructions:



LR LoadReserved SC StoreConditional AMOSWAP Atomicallyswaptwovalues AMOADD Atomicallyaddtwovalues AMOAND AtomicallyANDtwovalues AMOOR AtomicallyORtwovalues AMOXOR AtomicallyXORtwovalues AMOMAX AtomicallyMAXtwovalues AMOMIN AtomicallyMINtwovalues

Eachinstructioncontainstwoadditionalbitstocontroltheiroperationabitfurtherthansuggestedpreviously.Thesebitsarecalled“aq”and“rl”(foracquireandrelease).

Thebitscanbesetusingthefollowingassemblernotation:

lr.w.aq x5,(x6) # set the “aq” bit to 1lr.w.rl x5,(x6) # set the “rl” bit to 1lr.w.aq.rl x5,(x6) # set both bits

Sometimesinstructionsonasinglethread/HARTwillbereorderedbythecoreexecutionunittoachievegreaterperformance.Forexample,imaginetwoinstructionsappearinginsequence:2irstastoreinstructionfollowedbyaloadinstruction.Iftheaddressesintheseinstructionsaredifferent,thentheexecutionunitcanreasonablyperformbothoperationsinanyorder.Perhaps,thecorewillissuebothinstructionssimultaneously;thememoryunitmaycompletetheoperationoftheinstructionsinanyorder,perhapsduetothechancecontentsofthecache.Sincethetwoinstructionsaretouchingdifferentmemorylocations,theycanbesafelyreordered,right?

Usuallythereorderingissafe,butinthepresenceofotherconcurrentthreads/HARTs,therecanbeissues.Forexample,imaginethatoneofthetwomemorylocationsrepresentsalockthatisintendedtocontrolaccesstotheotherlocation.Now,itbecomescriticalthattheoperationsareexecutedinthecorrectorder.

(Asanexampleoftheproblem,imaginethatthestoreoperationisbeingusedtosetalock.Onlyafterthelockisset,isitallowabletoloadvaluesfromtheshareddata.Theshareddataisonlyguaranteedtobeinaconsistent,usablestatebyotherprocesseswhenthelockisfree;wheneversomeprocessholdsthelock,thestateofthedatamaybeinaninconsistent,un2inishedstate,andmustnotbeaccessedby



otherprocesses.Butnowimaginethattheloadoperationisperformed2irst,beforethelockissetwiththestoreinstruction.ThismeansthattheHARThasaccessedsharedmemorywithout2irstacquiringthelock.Thismightallowittoseeaversionorstateofthedatathatitshouldnotsee;aviolationoftheconventiontoacquirelocksbeforeaccessingtheshareddata.)

The“aq”and“rl”bitscontroltheorderingofinstructionsonasingleHART.TheydonothaveanyeffectontheschedulingofinstructionsexecutedondifferentHARTs.Inotherwords,thebitsaretherejusttomakesurethatreorderingofinstructionsonthecurrentHART(whichtheexecutionunitmightnormallydotoincreaseperformance)aretobesuppressedsothatconcurrencycontrolcodewillfunctionproperly.

The“aq”bithasthefollowingmeaning.Whenaninstructionwiththe“aq”(acquire)bitsetisexecuted,itmeansthatanyinstructionsthatfollowthe“aq”instructionmustbedelayedandcannotbeexecutedbeforethe“aq”instruction.The“aq”instructionwillbeexecuted2irst,beforetheinstructionsthatfollowitinthesequential2lowofinstructionsonthisHART.

The“rl”bithasthefollowingmeaning.Whenaninstructionwiththe“rl”(release)bitsetisexecuted,itmeansthatanyinstructionthatprecedesthe“rl”instructionmustbedoneandexecutedtocompletionbeforethe“rl”instruction.The“rl”instructionwillbeexecutedlast,aftertheinstructionsthatprecedeitinthesequential2lowofinstructionsonthisHART.

WhilewehavejustdescribedthebitsasapplyingtoonlyoneHART,thiswasasimpli2ication.Therearereallymultipleviewsoftheorderinwhichinstructionsareexecuted.First,asmentioned,thereistheorderthattheHARTitselfexecutesthetwoinstructions.Butsecond,thereistheorderinwhichotherHARTsobservetheexecution.Perhapsduetocachingeffects,thesecondordermaybedifferent.

Forexample,imaginethattherearetwostoreinstructionstobeexecutedbyoneHART.The2irstinstructionisusedtosetalockandthesecondinstructionisanupdatetotheshareddata.Presumably,anyupdateshouldonlybedoneprivatelybyaHARTthatisalreadyholdingthelock.TheconcernistheorderinwhichtheotherHARTsobservethesetwomemorystoreoperations:otherHARTsmustobservethestoretothelocktohappen2irst,thussignalingthatthelockisnolongerfree.OtherHARTsmustnotbeallowedtoseeaviewinwhichthesecondstoreappearedtooccurbeforethelockwasset,otherwisethisopensthedoortoallowingsomeHARTtoaccessdatathatshouldnotbeaccessible.



The“aq”and“rl”bitsimposeorderingconstraintsontheviewsobservedbyallHARTsoftheinstructionstreamoftheHARTusingthem.OtherHARTswillseeonlythelegalinstructionsequences.

Whenboth“aq”and“rl”bitsaresetinsomeinstruction,somethingcalled“sequentialconsistency”isimposed.AllotherHARTswillseethesameorderingasoccurredintheHARTcontainingtheinstruction:AllinstructionsthatwereexecutedbeforetheinstructioninquestionwillappeartoallotherHARTsashappeningbeforetheinstruction.AllinstructionsthatwereexecutedaftertheinstructionwillappeartoallotherHARTsashappeningaftertheinstruction.

RISC-Vdividesaccessestothememorysystemintotwocategories:

•Accessestophysicalmemory •Accessestomemory-mappedI/O

EachAMOinstruction(i.e.,LR,SC,AMOSWAP,AMOADD,AMOAND,AMOOR,AMOXOR,AMOMAX,AMOMIN)willaccessexactlyoneofthesetwodomains:eithermemoryorI/O.

The“aq”and“rl”bitsapplyonlytothedomainthatisbeingaccessed.Thebitsimposeorderingconstrainsonallaccessestothedomaininquestion,butimposenoconstraintsontheotherdomain.

Commentary:I’mguessingthemotivationisthateachdomain(memoryorI/O)willhaveaseparateapproachtoimplementingthesynchronizationconstraintsimposedbythe“aq”and“rl”bits.Theuseofthebitswilltriggereitheronemechanismortheother.

Orperhapstheimplementationsineachdomainwillimposewidelydifferentperformancehits.Certainlythe“aq”and“rl”bitswillberequiredtoworkproperlyinthememorydomainandpresumablytheimplementationwillendeavortoprovidehighperformance,sincelocksinmemoryarecommon.ButmaybetheirusewithintheI/Odomainwillberareandasuper-slowimplementationisacceptable,aslongasitcanbeisolatedfromthememorysystem.

Orperhapsitisenvisionedthat,insomeimplementationsofthestandard,thememorysystemwillsupportthe“aq”and“rl”semantics,butwillnotsupportthespeci2iedbehaviorforaccessestotheI/Osystem.



???

UsingLRandSCInstructions

Takealookatthefollowingcodesequence.Weassumethatthereisalock,whoseaddressisinregisterx10.Whenthelockholds“0”thelockisfree;thevalueof“1”isusedtoindicatethelockisset.

Inline2weloadthecurrentvalueofthelockandplaceareservationonthatlocation.Inline3,wechecktoseeifthelockwasalreadyset.Ifso,wejumptosomeothercode.Thiscodemightsimplyjumpbackto“lock”totryagainimmediately;thisiscalleda“spin”lockbecausethecodeloops,waitingforthevaluetobecomezero.Orperhapstheothercodewillwait,reschedulethreadsorwhatever,andcomebacktotryagainsometimelater.

Line4loadsthevalue“1”intoatemporaryregister.Line5istheSTORE-CONDITIONALinstruction,whichstoresthe“1”(indicating“locked”)intothelock.Thisinstructionwillsucceedorfailandthecodeisplacedintox5totellwhichhappened.

IfsomeotherHARTmanagedtostoreintothelock,thentheSCwillfailandwillnotbeupdatedbythisHART.Otherwise,ifallisgoodandthelockreservationisstillintact,thentheSCwillsucceedand“1”(representing“locked”)willbestoredintothelock.

ThelastlinelooksatthereturncodefromtheSCinstruction.IftheSCfailed,thiscodeloopsbacktothebeginningtotryagain.

1 lock:2 lr.w x5,(x10) # Grab the lock’s value3 bnz x5,fail # If already locked, fail4 li x7,1 # Store “1” into lock5 sc.w x5,x7,(x10) # to indicate “locked”6 bnz x5,lock # If failure, try again



Toreleasealock,allweneedtodoisstorea“0”intothelockvalue.Seethefollowingcode.Recallthatx0alwaysholds“0”.

7 unlock:8 sw x0,(x10) # Set to “unlocked”

Theinstructionsequencesaboveareokayinonerespect,but2lawedinanother.First,theywillatomicallyacquiredthelockinsuchawaythatonlyoneHARTatatimewillacquirethelock.However,itisassumedthatlocksareusedtoprotectcriticalsectionsofcode.Thecriticalsectioncodewillaccesssomeshareddata.Allaccessestotheshareddatamustoccurafteralockoperationandbeforeanunlockoperation,inorderforthelocktobemeaningful.

ButwithmultipleHARTs,itispossiblethat(duetoreorderingofmemoryoperationsanddifferentviewsoftheorder)otherHARTsmightexperiencetheseaccessesinthecriticalsectioncodehappeningeitherbeforethelockoperationoraftertheunlockoperation.Thiscouldsabotagethelock’sfunctionality.

Sowecanrewritethiscode,makinguseofthe“aq”and“rl”bits.Thecorrectedcodeisshownbelow.

The“rl”bitissetwithintheLRinstruction,whichmeansthatanymemoryoperationsthatcameearlierinthesequencewillbecompleted(inthesenseofbeingvisibletootherHARTs)beforethelocksequenceisbegun.Wedon’twantanythingwe’vedonethatshouldbevisibleoutsidethecriticalsectiontobeaccidentallyhidden.

The“aq”bitissetwithintheSCinstruction,whichmeansthatanythingthathappensafterthelocksequence(i.e.,anythinginthecriticalsectioncode)willbeseenbyotherHARTsasoccurringafterthelocksequence.Thismakessurethatnocriticalsectionstuffcanleakoutinfrontofthelocksequence.

1 lock:2 lr.w.rl x5,(x10) # Grab the lock’s value3 bnz x5,fail # If already locked, fail4 li x7,1 # Store “1” into lock5 sc.w.aq x5,x7,(x10) # to indicate “locked”6 bnz x5,lock # If failure, try again

Likewise,wedothesamewithintheunlocksequence,shownnext.



7 unlock:8 lr.w.rl x5,(x10) # Reserve the lock9 sc.w.aq x5,x0,(x10) # Set to “unlocked”10 bnz x5,unlock # If failure, try again

WeswitchfromusingaSTOREWORDinstructiontousingaLR/SCcombination,sothatwecanusethe“rl”and“aq”bits.WeuseanSCinstructiontostoretheunlockcode(“0”)intothelockandweuseanLRinstruction,becausetheSCinstructionwillfailifwedonotholdareservationonthememorylocation.The“rl”bitforceseverythingwehavedonebeforetheunlocksequencetoappeartootherHARTsasoccurringbeforethelockissetto“0”(unlocked).The“aq”bitontheSCinstructionforcesanythingwedonext,toappearasoccurringaftertheunlocksequence.

Wedon’tbothertochecktheoldvalueofthelock,sinceweassumethat(intheabsenceofbugs),wewouldnevertrytounlockalockthatwasnotpreviouslylocked.However,wemusttestforthefailureoftheSC;itisalwayspossiblethatsomeotherHARTtriedtoacquirethelock,touchedthememorylocationrepresentingthelock,andinvalidatedourreservation.

DeadlockandStarvation–Background

Deadlockoccurswhenevertwo(ormore)processesareeachholdingaresource(e.g.,theyhaveeachacquiredalock)andeachiswaitingforaresourceheldbyanotherprocess(e.g.,tryingtoacquireanotherlock).Onceadeadlocksituationoccurs,theprocesseswillbefrozen,pendingsomeoutsideintervention.DeadlockisaconcernaddressedbytheOSandthereareanumberofapproachestoavoidingordealingwithit.

Thereisanothersimilarproblemcalled“starvation”.(Theterm“livelockissometimesusedtomeanstarvation.)Withstarvation,theprocessesarenotnecessarilyfrozenbut,duetothevagariesofchance,someprocessesnevermakeforwardprogress.

Deadlockrequiresatleasttworesources(suchaslocks)thatarebeingfoughtover.Ontheotherhand,“starvation”canoccurwithonlyoneresource.



Tounderstandstarvation,imagineascenarioinwhichtwoHARTs/processes(calledAandB)competeforasinglelock.ThecoderunningonHARTA2indsthatthelockiscurrentlyheldbyBandrespondsbywaitingandthenretryingtoacquirethelock.Weassumethatnothread/processholdsalockinde2initely(i.e.,weassumethesearecooperatingprocesseswithoutanyprogrambugs),soBmusteventuallyreleasethelock.Butwhatif,duetothepoorluckofA,HARTBhappenstore-acquirethelockagainbeforeAhasachancetocheckit.WhenAchecksasecondtime,Aonceagain2indsthatthelockisstillunavailable.Imaginethatthisrepeatsinde2initely:Bisrepeatedlyreleasingthelock,butAisunabletomakeanyforwardprocess,becausebythetimeAchecks,Bhasalreadyre-acquiredthelock.Biswell-behaved,neverholdingthelockinde2initely,yetAendsupbeingfrozenandunabletomakeprogress.Thisisstarvation.

ItseemslikestarvationoughttoeventuallyresolveitselfwhenthebadluckofA2inallyendsandAgetsachancetoacquirethelock,butthislineofreasoningisdangerous.Funnythingscanhappenwhenprocesses/threads/HARTsarenotbehavingpurelystochastically.Thepossibilityofstarvationmustbeaddressed.

StarvationandLR/SCSequences

Considerthecodesequencetoacquirealockshownearlier.ThatsequenceissuesaLRinstructionandthen,acoupleofinstructionslater,itissuesanSCinstruction.TheguaranteemadebytheRISC-Vspecisthat,iftheSCissuccessfulthennootherHARTshavewrittentothelocation(evenifthevaluewrittenhappenstobethesamevalue).IfanotherHARTmanagedtoexecuteitsSC2irst,thentheSConthisHARTwillfail.TheSCmayfailforotherreasonsaswell.ButiftheSCsucceeds,wecanbesurethatthelocationinquestionhasnotbeenmodi2iedsincetheLRretrievedtheavalue.

However,theRISC-Vspecmakesanadditional,muchstrongerguarantee,anditisthis:IfyoukeepretryinganLR/SCsequence,itwilleventuallysucceed.Starvation(i.e.,livelock)isguaranteednottohappen.TheSCmayfailafewtimes,butitwilleventuallysucceed.Theideaisthata“denialofserviceattack”byaheavy2lowofrequestsfromotherHARTswillneverpreventanyHARTfrommakingprogress.

TherearesomeconstraintsplacedontheLR/SCsequenceinorderforthisguaranteetobemade,andthe“lock”codesequenceshownabovemeetsthem.TherecanbecodebetweentheLRandSC(includingarepeatlooptokeeptestinguntilit



succeeds).Thebasicrestrictionisthatthesequenceofinstructionscannotbetoolong,namely16instructions.Alsoothermemoryoperationsareforbidden.Alsoexception/trapprocessingisforbidden.Alsoinstructionsthatmighttakealongtimetocomplete(suchasintegermultiplyor2loatingpointoperations)areforbidden.

Commentary:Guaranteeingthatstarvationwillnotoccurisnotnecessarilystraightforward.Oneapproachinvolvesmaintainingaqueuethatsomehowre2lectshowlongaprocesshasbeenwaiting,andgivingprioritytotheprocessthathasbeenwaitinglongest.Anotherapproachinvolvespassinga“baton”aroundinsequencefromprocesstoprocess.Eachprocessgetsthebatoninturnand,withit,achancetomoveforward.Butafteralimitedamountoftime,itmustpassthebatonontothenextprocess.

TheAtomicMemoryOperation(AMO)Instructions

TheAMOSWAP(AtomicSwap)instructioncanbeusedtosimultaneouslyreadavaluefrommemoryandstoreanewvalueintothatsamelocation.Theinstructioncandothisatomically,whichmeansthatnointerveninginstructionthattriestostoreintothislocationcansneakinandexecute.

Typically,anatomicswapinstructionisusedtosetalock.Theinstructionsetsthelocktothe“locked”valuewhileatthesametimereadingtheoldvaluetomakesurethelockwaspreviously“unlocked.”Theinstructionmustbeatomictoensurethattwoconcurrentprocessesdon’tsimultaneouslystorethe“locked”valueintoan“unlocked”lockwithoutrealizingit.

AtomicSwap

GeneralForm:AMOSWAP.W.aq.rl RegD,Reg2,(Reg1) # 32-bitsAMOSWAP.D.aq.rl RegD,Reg2,(Reg1) # 64-bits

Examples:AMOSWAP.W x5,x4,(x9) # x5=Mem[x9]; Mem[x9]=x4AMOSWAP.W.aq x5,x4,(x9) # x5=Mem[x9]; Mem[x9]=x4AMOSWAP.D.rl x5,x4,(x9) # x5=Mem[x9]; Mem[x9]=x4

Description:



AvalueisreadfromthememorylocationwhoseaddressisinReg1andthevalueisplacedintoRegD.ThenthevaluefromReg2iswrittentothememorylocation.InthecasethatReg2andRegDarethesameregister,thevaluesareswapped,i.e.,thevaluestoredinmemoryisthepreviousvalueofReg2.

Thisinstructioncontainsan“aq”bitandan“rl”bit,whichcontroltheorderingofinstructionsappearingbeforeandafterthisinstruction.

Thememoryaddressmustbeproperlyalignedorelsean“AMOAddressmisaligned”exceptionwillberaised.

RV64/RV128:AMOSWAP.DisonlyavailableonRV64andRV128.



CodeExample:TheAMOSWAPinstructioncanbeusedtoimplementthe“lock”and“unlock”operationsonalock,asweshowhere.Thelockwillberepresentedas

0=unlocked,i.e.,free 1=locked

Weassumeregisterx10containstheaddressofthelockandx5isusedasatemporary.Hereiscodetosetthelock:

li x5,1 # x5 = 1retry:

amoswap.w.aq x5,x5,(x10) # x5 = oldlock & lock = 1bnez x5,retry # Retry if previously set

Afterthecriticalsection,whichpresumablyaccessesandmodi2iessomeshareddata,wehavethefollowingcodetorelease(i.e.,free)thelock:

amoswap.w.rl x0,x0,(x10) # Release lock by storing 0

TheinitialAMOSWAPhasthe“aq”bitset,whichmeansthatanyinstructionsthatfollowtheAMOSWAPcannotbeobservedbyotherHARTstoexecutebeforetheAMOSWAP.Thispreventscodefromthecriticalsectionfrom“leaking”outbeforethelockisset.



The2inalAMOSWAPhasthe“rl”bitset,whichmeansthatanyinstructionsthatprecedetheAMOSWAPcannotbeobservedbyotherHARTstoexecuteaftertheAMOSWAP.Thispreventscodefromthecriticalsectionfrom“leaking”outafterthelockisreleased.

AtomicAdd/AND/OR/XOR/Max/Min(Word)

GeneralForm:AMOADD.W.aq.rl RegD,Reg2,(Reg1) # additionAMOAND.W.aq.rl RegD,Reg2,(Reg1) # logical ANDAMOOR.W.aq.rl RegD,Reg2,(Reg1) # logical ORAMOXOR.W.aq.rl RegD,Reg2,(Reg1) # logical XORAMOMAX.W.aq.rl RegD,Reg2,(Reg1) # signed maximumAMOMAXU.W.aq.rl RegD,Reg2,(Reg1) # unsigned maximumAMOMIN.W.aq.rl RegD,Reg2,(Reg1) # signed minimumAMOMINU.W.aq.rl RegD,Reg2,(Reg1) # unsigned minimum

Examples:AMOADD.W x5,x4,(x9) # x5=Mem[x9]; Mem[x9]=x4+x5AMOADD.W.aq x5,x4,(x9) # x5=Mem[x9]; Mem[x9]=x4+x5

Description:AvalueisreadfromthememorylocationwhoseaddressisinReg1andthevalueisplacedintoRegD.AbinaryoperationisthenperformedonthevaluefetchedfrommemoryandthevalueinReg2andtheresultiswrittenbacktothememorylocation.InthecasethatReg2andRegDarethesameregister,theoperationisperformedusingtheinitialvalueoftheregister;thevaluestoredintheregisterwillbethevaluefetchedfrommemory.



RV64/RV128:TheresultstoredintoRegDwillbesign-extendedtothefullregistersize,i.e.,to64or128bits.


Encoding:



TheseareR-typeinstructions.

AtomicAdd/AND/OR/XOR/Max/Min(Doubleword)

GeneralForm:AMOADD.D.aq.rl RegD,Reg2,(Reg1) # additionAMOAND.D.aq.rl RegD,Reg2,(Reg1) # logical ANDAMOOR.D.aq.rl RegD,Reg2,(Reg1) # logical ORAMOXOR.D.aq.rl RegD,Reg2,(Reg1) # logical XORAMOMAX.D.aq.rl RegD,Reg2,(Reg1) # signed maximumAMOMAXU.D.aq.rl RegD,Reg2,(Reg1) # unsigned maximumAMOMIN.D.aq.rl RegD,Reg2,(Reg1) # signed minimumAMOMINU.D.aq.rl RegD,Reg2,(Reg1) # unsigned minimum

Examples:AMOADD.D x5,x4,(x9) # x5=Mem[x9]; Mem[x9]=x4+x5AMOADD.D.aq x5,x4,(x9) # x5=Mem[x9]; Mem[x9]=x4+x5

Description:AvalueisreadfromthememorylocationwhoseaddressisinReg1andthevalueisplacedintoRegD.AbinaryoperationisthenperformedonthevaluefetchedfrommemoryandthevalueinReg2andtheresultiswrittenbacktothememorylocation.InthecasethatReg2andRegDarethesameregister,theoperationisperformedusingtheinitialvalueoftheregister;thevaluestoredintheregisterwillbethevaluefetchedfrommemory.



RV64/RV128:TheseinstructionsareonlyavailableonRV64andRV128.


Encoding: TheseareR-typeinstructions.

Implementation:EachAMOoperationrequiresbothareadofmemoryandawritetomemory,aswellasasimplebinaryoperation.Itmaybethattheread-



operate-writefunctionalitywillbeimplementedcompletelywithinthememorysubsystem,insteadofwithinthecore.UponencounteringanAMOinstruction,thecorewouldsendtheinitialvalueofReg2tothememorysystem.Thememorysubsystemwouldthencompletetheoperation,performingtheatomicread-operate-writecycleandthensendingtheresultvaluebacktothecore,tobestoredinRegD.Byof2loadingtheAMOfunctionalitytothememorysubsystem,theatomicityisguaranteedbythememorysystemalone.

Implementation:AnotherapproachtoimplementingtheAtomicMemoryOperations(AMO)wouldmakeuseofthemoreprimitiveLRandSCoperations.Themicroarchitecturemight,forexample,implementtheAMOinstructionsintermsofthesimplerfunctionality,whichthemicroarchitecturealsousestoimplementtheLRandSCinstructions.

SequentialConsistency:Aseriesofreadsandwritestomemoryissaidtobe“sequentiallyconsistent”ifthefollowingstatementshold.(1)AllofthereadandwriteoperationsissuedbyalltheHARTscanbelinearizedintoasingle,undisputedorder,andthesameorderisobservedbyallHARTs.(2)TheoperationsofanyoneHARTmustofcourseappearinthisorderingintheordertheHARTactuallycalledforthem,althoughtheoperationsfromanytwoHARTscanbeinterleavedarbitrarily.(3)Oratleasttheresultsofthecomputationareindistinguishablefromsuchaglobal,linearorderingofalloperations.

Toforceallreadsandwritestoconformtosequentialconsistency,theprogrammercanusetheatomicoperationsasfollows.Thiswilleffectivelylinearizeallreadsandwritestomemory,sothateveryHARTwillagreeontheorderthateverythingisperformed.

Forreadssuchas: LD.Wx4,(x7)

Substitute: LR.W.aq.rl x4,(x7)

Forwritessuchas: ST.W(x7),x4

Substitute: AMOSWAP.W.aq.rl x0,x4,(x7)



Settingthe“aq”and“rl”bitsforcesallotherHARTstoseeinstructionsthatoccurbeforetheinstructiontoappeartoexecutetocompletionbeforetheinstruction.Also,allinstructionsthatoccuraftertheinstructionwillappeartootherHARTstoexecuteaftertheinstruction.

Implementation:NotethatanAMOSWAPoperationthatdiscardsthepreviousvaluefrommemory(i.e.,RegD=x0),canavoidthereadphaseoftheinstruction.Thisisstillausefulatomicinstruction,distinctfromastore(ST)instruction,duetothepresenceofthe“aq”and“rl”bits.Perhapssuchaninstructioncanbeoptimizedbythemicroarchitecturetoavoidthereadphase.


Chapter8:PrivilegeModes

PrivilegeModes

Atanytime,theRISC-Vprocessorisoperatinginexactlyoneofthefollowing“modes”:

U-Mode(UserMode) !Lowestprivilege S-Mode(SupervisorMode) M-Mode(MachineMode) !Highestprivilege

Commentary:Previousversionsincludedafourthprivilegelevel,inwhichahypervisorwastoexecute.HypervisorModewasprettymuchidenticaltoSupervisorModeandwas2laggedas“tobespeci2iedinthefuture.”

U-Mode(UserMode) !Lowestprivilege S-Mode(SupervisorMode) H-Mode(HypervisorMode) M-Mode(MachineMode) !Highestprivilege

Typically,user-levelapplicationswillexecuteinUserMode.WhenevertheapplicationwantsanOSservice,itwillmakeasystemcall.TheOScodethathandlesthiscallwillexecuteinSupervisorMode,i.e.,atahigherprivilegelevel.Uponreturntotheapplication,themodewillbeloweredbacktoUserMode.

Mostinstructionscanbeexecutedinanymode,butsomeinstructionsareprivilegedandcanonlybeexecutedinahighermode.

Theprivilegelevelsarestrictlyordered.Anythingthatcanbedoneatalowerprivilegelevelcanbedoneinanyoneofthehigherlevels.Forexample,anythingthatcanbedoneinUserModecanbedoneinSupervisorMode.



MachineModeisthehighestprivilegelevel;allinstructionsarelegalatthislevelandnothingisprotected.Whentheprocessorbeginsoperationafterpoweringuporafterareset,itisplacedinMachineMode.

Theremainingmodes(UserandSupervisormodes)areprettymuchthesameaseachother.Inotherwords,thereislittletodistinguishthem,excepttheirrelationshiptoeachother.IfyouunderstandUserMode,thenyouwillalsounderstandmostofSupervisorMode.

Thereisoneexceptiontotheuniformityandthisregardsvirtualmemoryandpagetables.ThisfunctionalityislocatedinSupervisorMode.

TheRISC-Vspecdescribestheprivilegemechanisminsomewhatgeneraltermsanditiseasytoimaginetheinsertionofadditionallevelsofprivilege,althoughitshouldbenotedthatafourthmode(HypervisorMode),whichexistedinpreviousversions,hasbeeneliminated.

Itappearsthatthecurrentmodeisnotavailableinanyuser-visibleregister.Inotherwords,itisdif2icultforcodetodeterminethecurrentmode.Itcannotsimplybereadfromaregister;insteadthecurrentmodeisimplicitinthefunctionalitythatisavailable.

[Thislackseemstobeintentional.Afterall,codeistypicallywrittentoruninonemodeoranotherandrarelyneedstoaskwhichmodeitisrunningin.Forexample,ahypervisormaywishtorunahostedOSisarestrictedlower-privilegemodethantheOSismeanttorunat,therebymaintainingfullcontrolofthemachineandpreventingthehostedOSfromcorruptingthehypervisoritselforotherhostedOSes.ThehostedOSexpectstoruninahigherprivilegemodethanitactuallyisrunningat;thehypervisormustcreateandpreservetheillusionthattheOSisrunninginahigherprivilegemodethanitactuallyis.]

TheRISC-Vspeci2icationdoesnotrequireallmodestobeimplemented.Someprocessorsmaynothaveallmodes.Herearethelegalpossibilities:

Option1 Option2 Option3 U U S M M M



ThemostbasicprocessorwillhaveonlyMachineMode.Inasense,thereisnoprivilegesysteminsuchasimpleprocessorsincealloperationscanbeperformedatanytime.Animplementationlikethiswouldbesuitableforanembeddedmicrocontroller.

�

Withoption2,therearetwoprivilegelevels:UserandMachine.ThismightbeappropriateforasimpleOSwithsomeprotectedsecuritymonitorrunninginMachineModeandoneormoreapplicationsrunninginUserMode.

�

Option3isintendedformoreelaboratesystemsthatwillberunningatraditionalOSlikeUnix/Linux.

�



Insomeimplementationsseveralinstructionsand/orISAfeatureswillbemissingandtherewillbeaneedtoemulatethemissingfunctionality.Thenaturalorganizationwillplacethecodetoemulatethemissinginstructions/featuresinMachineModesothattheOS(whichrunsinSupervisorMode)doesnotneedtobeawareofthesedetails.

PreviousRISC-VversionsdocumentedaHypervisorModeintendedtosupporthypervisors.Thehypervisorwastointendedtorunatitsownprivilegelevel,whileeachofthehostedOSesraninSupervisorMode.AlthoughHypervisorModehasbeeneliminated,weincludethefollowingdiagramanyway.

�

Intheabovediagramswesuggestwhichmodeswillbeusedforwhichsortsofcode,butthisisnotmandatory.Inparticular,anoperatingsystemkernelmightruninMachineModeinsomesystemsandinSupervisorModeinothersystems.

InterfaceTerminology

TheRISC-Vdocumentationdiscussestheinterfacingbetweenapplications,operatingsystems,andhypervisors.

Everyapplicationprogramrunsinaparticularenvironment,whichiscalledthe“ApplicationExecutionEnvironment”(AEE).Howtheapplicationinterfaceswiththeunderlyingexecutionenvironmentiscalledthe“ApplicationBinaryInterface”(ABI).



�

TypicallyanOSsitsunderneaththeapplicationandimplementstheApplicationBinaryInterface,whichconsistsoftheUserModeinstructionsetalongwiththecollectionofsystemcallsthatareavailabletoapplications.TheApplicationBinaryInterfaceisthesumtotalofwhattheapplicationprogrammerneedstounderstandinordertowriteprograms;theprogrammerdoesnothavetounderstandorknowwhatisgoingonwithintheApplicationExecutionEnvironment.

AnexampleApplicationBinaryInterfacewouldcombinetheprocessorISAalongwiththeOSsystem-callinterface.

IfanapplicationisportedtoanewApplicationExecutionEnvironment,thentheapplicationwillfunctionidenticallywithnoproblems,aslongasthenewenvironmentimplementsexactlythesameApplicationBinaryInterface.

Typicallythereareseveralapplicationsrunningatanyonetime.TheOSgenerallyprovidesoneApplicationBinaryInterfaceandallapplicationsarewrittentomeetthisspeci2ication.However,anOSmightimplementmorethanoneApplicationBinaryInterface,asshowninthenextdiagram.

�

TheOSmightberunningonbaremetal.OrtheOSmightberunningontopofahypervisororsomeothermonitor/security/bootstrappingsoftware.Inanycase,theOSiscreatedandwrittentomeetthespeci2icationsofa“SupervisorExecution



Environment”(SEE).Thisspeci2icationiscalledthe“SupervisorBinaryInterface”(SBI).

IftheOSisportedtoanewenvironment(forexample,tonewcomputerhardware),theOSshouldfunctionidentically,aslongasthenewexecutionenvironmentimplementstheexactsameSupervisorBinaryInterface.

TypicallyanOSisdesignedtorundirectlyonabaremachinewithnounderlyingsoftware.Inthiscase,theSBIconsistsofaspeci2icationoftheprocessor’sISA.

However,theOSmayinfactberunningontopofahypervisor.AslongasthehypervisorimplementsthecorrectSupervisorBinaryInterface,theOSshouldbeunawarethatitisrunningontopofahypervisor.

ItisconceivablethatthehypervisorprovidesmultipleSupervisorBinaryInterfaces.Forexample,onehypervisormightprovideanSBImimickingaPCandasecondSBImimickinganApplecomputer.SuchahypervisorwouldthanfacilitatethesimultaneousexecutionofbothaMSWindowsoperatingsystemandtheMacOS.Suchasetupisshownbelow.

�

Thehypervisoritselfrunsinsomeenvironment,whichiscalledthe“Hyperv2laisorExecutionEnvironment”(HEE).Theunderlyingenvironmentimplementsaparticularinterface,calledthe“HypervisorBinaryInterface”(HBI).

A“type-1hypervisor”(ornativehypervisor)isaprogramthatisintendedtorunonabaremachine.Inotherwords,thereisnosoftwarebelowthehypervisor.Thehypervisorachievesallitsworkusingonlytheinstructionsavailableontheprocessoranddoesnotmakeanysystemcalls.Foratype-1hypervisor,theHypervisorExecutionEnvironmentisjusttheprocessorcore’sinstructionsetarchitecture(ISA).



A“type-2hypervisor”(orhostedhypervisor)isaprogramthatisintendedtorunontopofothersoftware.Thehypervisorsitsontopofsomeotheroperatingsystemand,asfarastheunderlyingOSisconcerned,thehypervisorisjustanotherapplicationprogram.AnexampleofthiswouldbeVMWare(whichrunsontopoftheAppleMacOSandprovidesaSupervisorBinaryInterfacetomimicthePCsoWindowscanrun.OtherexamplesareQEMUandVirtualBox.

Inordertoruncomputersystemsasef2icientlyaspossible,hardwareimplementationformostinstructionsisdesirable.Themajorityofcommonoperationsmustbeimplementeddirectlyinhardware,byinstructionsexecutedinthecurrentmode.

However,somecodewilloccasionallytrytoexecuteinstructionsthatrequiresoftwareintervention.Perhapsaparticularimplementationfailstoprovidehardwaretosupportcertainoperations;exampleswouldbethelackofhardwaresupportfor“integermultiplication”or“2loatingpointarithmetic”.Orperhapsthecodeisrunninginahypervisorenvironmentinwhichaparticularinstructioncannotbeblindlyexecuted,butmustbeintercepted,monitored,and/oralteredbeforebeingexecuted.Regardlessofthereasontheinstructioncannotbeexecuteddirectly,an“exception”willoccurwhentheinstructionisencountered.Themodewillbeincreasedtoahigherprivilegelevel,softwarewillintervene,andeventuallyareturntotheoriginalprivilegemodewilloccurandinstructionexecutionwillresume.

AllISAdesignerstrytominimizethenumberofoperationsthatrequiresoftwareinterventionand,wheninterventionisnecessary,trytomaketheupcalltotheexceptionhandlerquickandef2icient.

OvertheyearswehaveaccumulatedgoodexperiencewiththeinterfacebetweenUserModecodeandoperatingsystemcode.Muchhasbeendonetoreducethefrequencyofsystemcallsandmaketheupcallinterfacefastandef2icient.

Morerecentlyweareusingmorehypervisorsoftwaretosupportlegacyoperatingsystems.InordertoimplementtheSupervisorBinaryInterface(SBI)expectedbytheOS,thehypervisormaybeobligatedtostepinfrequently,whenevertheOSexecutesaprivilegedinstruction.ProvidinggoodISAsupportforef2icienthypervisorsisanactiveresearcharea.



Exceptions,Interrupts,TrapsandTrapHandlers

Duringtheexecutionofaninstruction,an“exception”mayoccur.

HerearethedifferenttypesofexceptionsmentionedintheRISC-Vspec:

SynchronousExceptions:Illegalinstruction

Instructionaddressmisaligned Instructionaccessfault Loadaddressmisaligned Loadaccessfault Store/AtomicMemoryOperation(AMO)addressmisaligned Store/AtomicMemoryOperation(AMO)accessfault Environmentcall(i.e.,the“syscall”trapinstruction) BreakpointAsynchronousExceptions(Interrupts): Timerinterrupt Softwareinterrupt Externalinterrupt

Asynchronousexception(sometimesjustcalledan“exception”)occursasaresultoftryingtoexecuteaninstructionandissaidtobesynchronoussinceitisintimatelyconnectedwithaparticularinstruction.Thesynchronousexceptionwillbedetectedduringinstructionexecutionatthetimetheexceptionalconditionisdetected.Forexample,duringinstructiondecode,thehardwaremaydetectabadopcode2ield,whichwillinitiatethe“illegalinstruction”exceptionprocessing.

Theothertypeofexceptionisan“interrupt”.Duringtheexecutionofaninstruction,aninterruptmayoccur.Interruptsarecausedbyeventsoutsidethecurrentinstructionexecutionand,assuch,areasynchronous.Wheneveraninterruptoccurs,itwillbecomeassociatedwithasingleinstruction(basicallythecurrentlyexecutinginstruction)andthatinstructionischosentoreceivetheinterruptexception.

Anexceptioncanbehandledorignored.Iftheexceptionishandled,thena“trap”willoccur.Thetrapprocessinginvolvesatransferofcontroltoa“traphandler”routine.Trapprocessingconsistsofafewhardwareoperations,suchasmodifyingacoupleofhardware2lags,savingthePC,andeffectingatransferofcontroltothe2irstinstructionofthetraphandlerroutine.



A“traphandler”isasoftwareroutineandwillusuallybeexecutedatahigherprivilegemodethantheinstructionreceivingtheexception.Forexample,an“illegalinstruction”exceptionmightoccurinapplicationcoderunninginUserMode.ThetraphandlerwillruninSupervisorModeand,whencomplete,thehandlermayreturntoexecutinginstructionsinUserMode.

However,thechangeofmodemaynotalwaysoccur.Insomecases,thetraphandlerwillexecuteatthesameprivilegemodeastheinstructionreceivingtheexception.WithRISC-V,sometrapsmaybeentirelycontainedinUserMode;theapplicationcodewillberesponsibleforhandlingitsowntrapsandsupervisorcodewillnotbeinvolved.

A“timerinterrupt”iscausedwhenaseparatetimercircuitindicatesthatapredetermineintervalhasended.Thetimersubsystemwillinterruptthecurrentlyexecutingcode.TimerinterruptsaretypicallyhandledbytheOSwhichusesthemtoimplementtime-slicedmultithreading.

An“externalinterrupt”comesfromoutsidetheprocessorandtheprecisenatureofthecausewilldependontheapplication.Forexample,aRISC-Vprocessorusedinanembeddedprocesscontrolsystemmightreceiveexternalinterruptsfromvarioussensorsdemandingattention.

A“softwareinterrupt”iscausedbysettingabitinthemachinestatusword.Thiscanbeusefulinamulti-corechipwhereathreadrunningononecoreneedstosendaninterruptsignaltoanothercore.

ControlandStatusRegisters(CSRs)

Inadditiontothebasicuser-levelinstructionsdiscussedpreviously,whichcanbeexecutedbycoderunninginanymode,thereareanumberofadditionalfeaturesthatanyonewritinganOSkernelcodewillneedtounderstand.

AtthecenteroftheRISC-VprivilegesystemareanumberofControlandStatusRegisters(CSRs),whicharedifferentfromthegeneralpurposeand2loatingpointregisters.EachCSRhasauniquenameandspecializedfunction.



ControloftheprivilegesystemisperformedentirelybyreadingandwritingtheCSRs.OS/Kernelcodecanperformprivilegedoperationssolelybyreading/writingtoCSRs.

TheRISC-Vspeci2icationallowsforupto4,096CSRstobepresent,butinmostimplementations,notalladdresseswillbeused.Infact,thereareonlyacoupleofdozenCSRsde2inedbytheRISC-Vspeci2ication.Theotheraddressesareleftopenanddifferentimplementationsmaychoosetoaddadditionalnon-standardCSRs,uptothelimitof4,096registers.

EachCSRhasanaddress.Or,ifyouprefer,youcanthinkofeachCSRashavinganumberintherange0to4,095.

Sincetherecanbeupto4,096CSRs,a12-bitaddressisusedtoaddresseachCSR.(Recallthat212=4,096.)TheinstructionsthatreadandwritetheCSRsusetheI-typeinstructionformat.Thisinstructionformatincludesa12-bitimmediate2ield,andthis2ieldisusedtocontainthe12bitaddressoftheCSRregisterbeingoperatedon.

ThesizeoftheCSRsisspeci2iedtobethesamesizeasthegeneralpurposeregisters.SoalltheCSRswillbe32bitsinaRV32machine.Ina64-bitmachine,theCSRsare64bitswide.Likewise,inamachinewith128bitregisters,theCSRsare128bitswide.

Inordertoworkwithallsizesofmachines,thespeci2icationnevermakesuseofmorethan32bitsinanyCSRregister.For64-bitand128-bitmachines,theupper32or96bitsareneverusedandare2illedwithzeros.[Actually,thereareminorexceptions.]

EachCSRhasadifferentmeaninganduse.SomeCSRsarecomprisedofacollectionofspecialized2ields(some2ieldsareonly1or2bitsinlength),whileotherCSRscontainasinglefulllengthvalue,suchasa32bitaddress.

EachCSRbelongstooneoftheprivilegemodes.Inotherwords,someCSRsareMachineModeCSRs,someCSRsareSupervisorModeCSRs,andtheremainingCSRsareUserModeCSRs.

AMachineModeCSRcanonlyberead/writtenwhentheprocessorisinMachineMode;itisillegaltoaccessitwhenrunninginanyothermode.ASupervisorModeCSRcanberead/writtenwhentheprocessorisineitherMachineModeor



SupervisorMode,butnotfromcoderunninginUserMode.AUserModeCSRcanberead/writtenregardlessofthecurrentoperatingmode.

Thereareseveralinstructionswhichareusedtomanageandcontroltheprivilegesystem,andtoperformprivilegedoperations.Theinstructionsare:

ECALL–Tomakeasystemcallfromalowerprivilegeleveltoahighermode EBREAK–Usedbydebuggerstogetcontrol;similartoECALL URET–ToreturnfromtraphandlerthatwasrunninginUserMode SRET–ToreturnfromtraphandlerthatwasrunninginSupervisorMode MRET–ToreturnfromtraphandlerthatwasrunninginMachineMode WFI–Gointosleep/lowpowerstateandWaitForInterrupt CSR…–Instructionstoread/writetheControlandStatusRegisters(CSRs)

WritingtoaCSRwillchangethestateoftheprocessorcore.Forexample,toenable/disableinterrupts,theprogramwouldwritetooneoftheCSRs.

Reading/writingsomeCSRsisnotallowedatlowerprivilegelevels.SomeCSRsareread-onlyatallprivilegelevelsandarethereforesetinstonebythechipdesigners,e.g.,todescribethecore’scapabilities.

KeyIdea:TounderstandtheRISC-Vprivilegesystemandexceptionprocessing,itisbothnecessaryandsuf2icienttounderstandtheCSRsandtheirfunctionality.

CSRListing

TheControlandStatusRegisters(CSRs)arelistedbelowforreference.Theirfunctionswillbedescribedlater.

ForeachCSR,wegiveits12-bitaddress(using3hexdigits).Wealsoindicatewhetheritcanbeaccessedineachoftheprivilegemodes(User,Supervisor,Machine)and,ifso,whatsortofaccess(read-onlyorread/write)isallowed.

Recallthattheprocessorisexecutinginoneofthethreemodesatanymoment.ItisaprivilegeviolationtotrytoaccessaCSRthatisnotvisibleinthecurrentmode.Likewise,itisaprivilegeviolationtotrytowritetoaCSRthatisread-onlyinthecurrentmode.Ifthishappens,anillegalinstructionexceptionwillbesignaled.



Visibilityin…AddrName Description U S M

Counters:

C00 cycle Clockcyclecounter r r rB00 mcycle r/wC80 cycleh Upperhalfofcycle(RV32only) r r rB80 mcycleh r/w

C01 time Currenttimeinticks r r rC81 timeh Upperhalfoftime(RV32only) r r r

C02 instret Numberofinstructionsretired r r rB02 minstret r/wC82 instreth Upperhalfofinstret(RV32only) r r rB82 minstreth r/w

ExceptionProcessing:

000 ustatus Statusregister r/w r/w r/w100 sstatus r/w r/w300 mstatus r/w 004 uie Interrupt-enableregister r/w r/w r/w104 sie r/w r/w304 mie r/w

005 utvec Traphandlerbaseaddress r/w r/w r/w105 stvec r/w r/w305 mtvec r/w

102 sedeleg Exceptiondelegationregister r/w r/w302 medeleg r/w

103 sideleg Interruptdelegationregister r/w r/w303 mideleg r/w 106 scounteren Counterenable r/w r/w



306 mcounteren r/w

TrapHandling:

040 uscratch Tempregisterforuseinhandler r/w r/w r/w140 sscratch r/w r/w340 mscratch r/w

041 uepc PreviousvalueofPC r/w r/w r/w141 sepc r/w r/w341 mepc r/w

042 ucause Trapcausecode r/w r/w r/w142 scause r/w r/w342 mcause r/w

043 utval Badaddressorbadinstruction r/w r/w r/w143 stval r/w r/w343 mtval r/w

044 uip Interruptpending r/w r/w r/w144 sip r/w r/w344 mip r/w

VirtualMemory:

180 satp Addresstranslationandprotection r/w r/w

Informational:

301 misa ISAandextensions r/wF11 mvendorid VendorID rF12 marchid ArchitectureID rF13 mimpid ImplementationID rF14 mhartid HardwarethreadID r

FloatingPoint:

001 f=lags Floatingpointing2lags r/w r/w r/w002 frm Dynamicroundingmode r/w r/w r/w



003 fcsr Concatenationoffrm+f2lags r/w r/w r/w

PerformanceMonitoring:

323 mhpmevent3 Eventselector#3 r/wC03 hpmcounter3 Eventcounter#3 r r rB03 mhpmcounter3 r/wC83 hpmcounter3h Upperhalfofcounter(RV32only) r r rB83 mhpmcounter3h r/w

324 mhpmevent3 Eventselector#4 r/wC04 hpmcounter4 EventCounter#4 r r rB04 mhpmcounter4 r/wC84 hpmcounter4h Upperhalfofcounter(RV32only) r r rB84 mhpmcounter4h r/w

… … … … … …

33F mhpmevent31 Eventselector#31 r/wC1F hpmcounter31 EventCounter#31 r r rB1F mhpmcounter31 r/wC9F hpmcounter31h Upperhalfofcounter(RV32only) r r rB9F mhpmcounter31h r/w

PhysicalMemoryProtection:

3A0 pmpcfg0 PMPCon2igurationword#0 r/w3A1 pmpcfg1 PMPCon2igurationword#1 r/w3A2 pmpcfg2 PMPCon2igurationword#2 r/w3A3 pmpcfg3 PMPCon2igurationword#3 r/w

3B0 pmpaddr0 PMPAddress#0 r/w3B1 pmpaddr1 PMPAddress#1 r/w… … … …3BF pmpaddr15 PMPAddress#15 r/w

Debug/Trace:

7A0 tselect Triggerregisterselect r/w7A1 tdata1 Triggerdata#1 r/w



7A2 tdata2 Triggerdata#2 r/w7A3 tdata3 Triggerdata#3 r/w

7B0 dcsr Debugcontrolandstatus r/w7B1 dpc DebugPC r/w7B2 dscratch Scratchregister r/w

InstructionstoRead/WritetheCSRs

RegardlessofwhichControlandStatusRegister(CSR)isinvolved,thereareonly6instructionsthatareusedtoreadandwritetheregisters:

CSRRW–ReadandwriteaCSR CSRRS–Readandsetselectedbitsto1 CSRRC–Readandclearselectedbitsto0 CSRRWI–ReadandwriteaCSR(fromimmediatevalue) CSRRSI–Readandsetselectedbitsto1(usingimmediatemask) CSRRCI–Readandclearselectedbitsto0(usingimmediatemask)

ThereareafewotherinstructionsthatreadormodifyaCSR,buttheseadditionalinstructionsaremerelyspecialcases(i.e.,shorthandorsyntacticsugar)foroneoftheaboveinstructions.

Eachofthese6instructionsbothreadsandwritesasingleCSRinasingle,atomicaction.

EachoftheseinstructionsreadsaCSRbycopyingitspreviousvalueintooneofthegeneralpurposeregisters(x1,x2,…).Often,youonlywanttomodifytheregisteranddon’tcareaboutitspreviousvalue;ifso,youcanspecifythedestinationregistertobex0.

EachoftheseinstructionsisalsocapableofmodifyingtheCSR.Often,youonlywanttoreadtheregister;ifso,youcanspecifythesourcevalueasregisterx0.(Thisisaspecialcase.TheCSRisremainsunchanged;itisnotsettozero.)

Thenewvaluecanbeeitherafullvalue,orabitmaskwhichwillbeusedtosetorclearselectedbits.Thenewvalue(orbitmask)caneithercomefromaregisterorcanbecontainedwithinanimmediatedata2ieldwithintheinstructionitself.



WhethercodeisallowedtoaccessaparticularCSRisdependentonthecurrentprivilegemodeatwhichthecodeisbeingexecuted,aswellaswhichCSRisinvolved.Forexample,ifcoderunninginUserModeattemptstoreadorupdateoneoftheMachineModeCSRs(suchasmstatus),thiswillnotbeallowed.

Ifanattemptismadetoreadand/orwritetoaCSRthatisforbiddenatthecurrentprivilegelevel,thenan“IllegalInstruction”exceptionwilloccur.Theinstructionwillnotcompleteandexceptionprocessingwillensue.

WithinafewoftheCSRs,theprotectionisona2ield-by-2ieldbasis.Inotherwords,someofthebitsmaybeprotectedfromchange(i.e.,read-only),whiletheremainingbitscanbefreelychanged(i.e.,read-write).Forexample,withinthemstatusCSR,itispossibletomodifysomebitsbutnotallowabletomodifyotherbits.(ButforotherCSRs,theentireregisteriseitherwritableornot;thatis,allbitshavethesameprotection.)Anyattempttomodifybitsthatareread-onlywillsimplybeignored.

CSRRead/Write

GeneralForm:CSRRW RegD,Reg1,Immed-12

Example:CSRRW x9,x4,cycle # x9=CSR[0xC00]; CSR[0xC00]=x4

Description:Thisinstructioncanbeusedtoreadfromand/orwritetoaCSR.

TheImmed-122ieldisusedtoencodetheaddressofoneofthe4,096ControlandStatusRegisters(CSRs).ThepreviousvalueoftheCSRiscopiedtothedestinationregisterandthevalueofthesourceregisterReg1iscopiedtotheCSR.Thisisanatomicoperation.

Intheaboveexample,aCSRnamed“cycle”isbeingaccessed.ThisCSRhappenstohavethe12-bitaddress0xC00.Thisregistercontainsacounterofthenumberofclockcyclessinceitwaslastwrittento.Thecounterisautomaticallyincrementedasinstructionsareexecuted.

ToreadaCSRwithoutwritingtoit,thesourceregisterReg1canbespeci2iedasx0.TowriteaCSRwithoutreadingit,thedestinationregisterRegDcanbespeci2iedasx0.



Comment:InlowerprivilegemodessomeoftheCSRsareinaccessible.AnattempttoreadfromorwritetoaCSRmaycauseanillegalinstructionexception.


CSRReadandSetBits

GeneralForm:CSRRS RegD,Reg1,Immed-12

Example:CSRRS x9,x4,mstatus # x9=CSR[0x300]; CSR[0x300]|=x4

Description:TheImmed-122ieldisusedtoencodetheaddressofoneofthe4,096ControlandStatusRegisters(CSRs).ThepreviousvalueoftheCSRiscopiedtothedestinationregisterandthensomeselectedbitsoftheCSRaresetto1.ThevalueinReg1isusedasabitmasktoselectwhichbitsaretobesetintheCSR.Otherbitsareunchanged.Thisisanatomicoperation.

Intheaboveexample,aCSRnamed“mstatus”isbeingaccessedwhichhasthe12-bitaddress0x300.Thisregistercontainsanumberofbitswhichcontrolwhichcontrolinterruptprocessing.

ThisinstructioncanbeusedtosimplyreadaCSRwithoutupdatingit.IfReg1isx0,thennoupdatetotheCSRwilloccur.

Exception:Maycauseanillegalinstructionexceptionifthecurrentprivilegemodeisnothighenough.


CSRReadandClearBits

GeneralForm:CSRRC RegD,Reg1,Immed-12

Example:CSRRC x9,x4,mstatus # x9=CSR[0x300]; CSR[0x300]&=~x4



Description:TheImmed-122ieldisusedtoencodetheaddressofoneofthe4,096ControlandStatusRegisters(CSRs).ThepreviousvalueoftheCSRiscopiedtothedestinationregisterandthensomeselectedbitsoftheCSRareclearedto0.ThevalueinReg1isusedasabitmasktoselectwhichbitsaretobeclearedintheCSR.Otherbitsareunchanged.Thisisanatomicoperation.

ThisinstructioncanbeusedtosimplyreadaCSRwithoutupdatingit.IfReg1isx0,thennoupdatetotheCSRwilloccur.

Exception:Maycauseanillegalinstructionexceptionifthecurrentprivilegemodeisnothighenough.


CSRRead/WriteImmediate

GeneralForm:CSRRWI RegD,Immed-5,Immed-12

Example:CSRRWI x9,3,mstatus # x9=CSR[0x300]; CSR[0x300] = 3

Description:TheImmed-122ieldisusedtoencodetheaddressofoneofthe4,096ControlandStatusRegisters(CSRs).ThepreviousvalueoftheCSRiscopiedtothedestinationregisterandthentheentireCSRiswrittento.The5-bit2ieldthatisnormallyusedforReg1iszero-extendedandusedasthesourcevaluethatismovedintotheCSR.Thisisanatomicoperation.

Thisinstructionmakesbits[4:0]inanyCSRparticularlyeasytomodify.Exception:

Maycauseanillegalinstructionexceptionifthecurrentprivilegemodeisnothighenough.




CSRReadandSetBitsImmediate

GeneralForm:CSRRSI RegD,Immed-5,Immed-12

Example:CSRRSI x9,3,mstatus # x9=CSR[0x300]; CSR[0x300]|=3

Description:TheImmed-122ieldisusedtoencodetheaddressofoneofthe4,096ControlandStatusRegisters(CSRs).ThepreviousvalueoftheCSRiscopiedtothedestinationregisterandthensomeselectedbitsoftheCSRaresetto1.The5-bit2ieldthatisnormallyusedforReg1iszero-extendedandusedasabitmasktoselectwhichbitsaretobesetintheCSR.Otherbitsareunchanged.Thisisanatomicoperation.

Thisinstructionmakesbits[4:0]inanyCSRparticularlyeasytosetto“1”.Exception:



CSRReadandClearBitsImmediate

GeneralForm:CSRRCI RegD,Immed-5,Immed-12

Example:CSRRCI x9,3,status # x9=CSR[0x300]; CSR[0x300]&=~3

Description:TheImmed-122ieldisusedtoencodetheaddressofoneofthe4,096ControlandStatusRegisters(CSRs).ThepreviousvalueoftheCSRiscopiedtothedestinationregisterandthensomeselectedbitsoftheCSRareclearedto0.The5-bit2ieldthatisnormallyusedforReg1iszero-extendedandusedasabitmasktoselectwhichbitsaretobeclearedintheCSR.Otherbitsareunchanged.Thisisanatomicoperation.

Thisinstructionmakesbits[4:0]inanyCSRparticularlyeasytoclearto“0”.Exception:





BasicCSRs:CYCLE,TIME,INSTRET

TherearethreeControlandStatusRegisters(CSRs)thatareparticularlyeasytounderstand,solet’sstartwiththem:

cycle–counterofclockcyclestime–currentrealtimeinstret–counterofinstructionsretired(i.e.,executed)

EachoftheseCSRsisautomaticallyincrementedasinstructionsareexecuted.Allarereadableatanyprivilegemode.Togethertheycanbeusedtomeasurecodespeedandperformance.

TheCSR“cycle”countsthenumberofclockcycles.TheCSR“time”measuresrealtimeinunitsof“ticks.”Thenumberoftickspersecondisimplementationdependentandcanbedeterminedelsewhere.TheCSR“instret”countsthenumberofinstructionsretired(i.e.,thenumberofinstructionscompleted).

TheseCSRsareallUserModeregisters,whichmeanstheycanbereadbycodeexecutinginanymode.Theseregistersareread-only,sotheycannotbewrittento.

Wecanresetthesecounters,buttheresetoperationrequireshigherprivilege;UserModecodeshouldnotbeabletowritetothesecounters.Toaccommodateupdatingbutonlyathigherprivileges,theseregistersare“mirrored”byMachineModeregisters.Themirroredversions(calledmcycle,minstret,…)havedifferentnamesanddifferentaddresses.Assuch,theycanonlybewrittentobycoderunninginMachineMode.

Toavoidpossibleover2low,thesecountersneedtobe64-bits.For64-bitand128-bitmachines,eachofthethreeCSRswillbelargeenoughtoavoidover2low.

For32-bitmachines,theRV32speci2icationbreakseachcounterintotwo32-bitCSRs.Inotherwords,theRV32speci2icationintroduces3additionalregisterstocontainthehigh-order32-bits:



cycle–clockcycles(lower32bits)cycleh–clockcycles(higher32bits)time–realtime(lower32bits)timeh–realtime(higher32bits)instret–instructionsretired(lower32bits)instreth–instructionsretired(higher32bits)

Thefollowinginstructionsareshortformsofotherinstructions.Theyaredesignedtomakereadingtheabove-mentionedcountersespeciallyeasy.

Read“CYCLE”

GeneralForm:RDCYCLE RegD

Example:RDCYCLE x9 # x9=CSR[cycle]

Encoding: Thisisasimpli2iedformofamoregeneralinstruction.

Read“CYCLEH”

GeneralForm:RDCYCLEH RegD

Example:RDCYCLEH x9 # x9=CSR[cycleh]

Comment: Onlyon32-bitmachines.Encoding: Thisisasimpli2iedformofamoregeneralinstruction.

Read“TIME”

GeneralForm:RDTIME RegD

Example:RDTIME x9 # x9=CSR[time]




Read“TIMEH”

GeneralForm:RDTIMEH RegD

Example:RDTIMEH x9 # x9=CSR[timeh]


Read“INSTRET”

GeneralForm:RDINSTRET RegD

Example:RDINSTRET x9 # x9=CSR[instret]


Read“INSTRETH”

GeneralForm:RDINSTRETH RegD

Example:RDINSTRETH x9 # x9=CSR[instreth]


ExampleCode:Thefollowingsequencecanbeusedtocorrectlyreada64-bitcounterona32-bitmachinewherethecounterisbrokenintotwo32-bitpieces.



Thereisapossibilitythat,betweenreadingthetwohalves,over2lowfromthelower-order32bitsintotheupperbitsmightoccur.

again: rdcycleh x3 # Read high bits rdcycle x2 # Read low bits rdcycleh x4 # Check for overflow bne x3,x4,again # from low to high

CSRRegisterMirroring

Atanymomenttheprocessorisexecutinginsomemode,i.e.,ineitherUser(U),Supervisor(S),orMachine(M)Mode.

TheRISC-Vspecde2inesafewdozenControlandStatusRegisters(CSRs)andtheyarebrokeninto3groups.Thereisonegroupforeachmode,soeachCSRbelongstoeitherUserMode,SupervisorMode,orMachineMode.

WhenrunninginMachineModeallCSRsareaccessible,regardlessofwhichgrouptheybelongto,sinceMachineModeisthehighestprivilegelevel.InSupervisorMode,onlytheSupervisorandUserCSRscanbeaccessed.Finally,whenrunninginUserMode,onlyUserCSRsareaccessible.

Tostatethesamethinganotherway,aMachineCSRcanonlybereadorwrittenwhenrunninginMachineMode.ASupervisorCSRcanbewrittenwhenrunningineitherMachineModeorSupervisorMode.Finally,aUserCSRcanbereadorwrittenregardlessofthecurrentprivilegemode.

EachCSRhasaname.

Inmanycases,the2irstletteroftheregisternameindicateswhichgroupitisin(either“u”,“s”,or“m”).

Examplesthatfollowthe2irst-letterconventionaremstatusandmisa(accessibleonlyinMachineMode),sstatusandsatp(accessibleinSupervisorandMachineModes),andustatusanducause(accessibleinUser,Supervisor,andMachine



Modes).Counterexamplesarecycle,time,andinstretwhichoughttobeginwith“u”sincetheyareaccessibleinUser,Supervisor,andMachineModes.

AparticularimplementationmayaddnewCSRsinadditiontothosedescribedintheRISC-Vstandard.EachCSRregisterhasanaddressandtheaddressesare12bitswide,allowingforupto4,096CSRs.

Details:TheaddressspaceforCSRsusesa12-bitaddress,allowingfor4,096differentCSRregisters.Thisaddressspaceispartitionedinto4regions,withupto1,024registerseach.Thereisoneblockofregistersforeachmode(U,S,andM)alongwithanextra“reserved”blockwhichwasformerlyusedforHypervisorMode.HypervisorModehasbeeneliminated.

Furthermore,eachregisteralsofallsintooneofthefollowingaccessrestrictioncategories:

registersde2inedbytheRISC-Vstandard •read-only •read-write extensions,notde2inedbythestandard •read-only •read-write registersusedfordebugmode •standard •non-standardextensions

ThisallowsanimplementationoftheRISC-Vspectoaddnew,non-standardCSRswithoutfearthattheiraddresseswillbeoverloadedinfuturerevisionstothestandard.

Themode(U,S,orM)andtheaccessrestrictioncategorycanbedeterminedsolelyfrombitsintheCSRaddress.Thus,thehardwarecaneasilycheckwhetheragiventypeofaccess(e.g.,aread-writeaccessfromcoderunninginUserMode)isallowed,simplybylookingattheaddress.TheencodingofbitsintheCSRaddressisratherconvoluted;consulttheof2icialdocumentationfordetails.

Someoftheregistersaremirrored,whichmeansthesameconceptualregisterisavailableattwoormoreaddresses.Forexample,thecycleregister,whichcontainsa



counterofthecurrentnumberofclockcycles,ismirroredinbothUserModeandMachineMode.AmirroredCSRwillhavetwonamesandtwoaddresses:

Address RegisterName 0xC00 cycle(read-only) 0xB00 mcycle(read-write)

Thisgroupingandmirroringtechniquehasacoupleofadvantages.

Oneadvantageofmirroringtheregistersisthatasingleregistercanberead-onlyatoneprivilegelevelandyetwritableatahigherprivilegelevel.

Forexample,the“cycle”register(whichcountshowmanyclockcycleshaveoccurredsincelastbeingreset)isconceptuallyasingleregisterandwouldcertainlybeimplementedassuch.Theregistershouldbevisibleandreadabletoallcode,buttheresetoperation(whichwritestotheregister)shouldonlybeallowedwhenrunningatahigherprivilegelevel.Thus,wewantthe“cycle”registertoberead-onlyforUserModecode,butmodi2iablebyMachineModecode.

Toachievethis,theregisteris“mirrored”.Thereisaread-onlyCSRnamedcyclewhichcanbeaccessedatanyprivilegelevel,andthereisadifferentread-writeCSRnamedmcyclewhichcanonlybeaccessedwhenrunningattheMachineModeprivilegelevel.Actually,thereisonlyasingle,underlying“mirrored”register,soanyvaluewrittentomcyclewillbeimmediatelyvisiblewhenreadingfromcycle.

Accesstoregistersatahigherprivilegelevelthanthecurrentmodeisalwaysforbidden.Thisallowsanoperatingsystemtohideinformationfromuser-levelcode.Forexample,itmightbeusedinahypervisortohideinformationsothatanoperatingsystemcanbetrulyfooledaboutwhatsortofprocessoritisrunningon.

ThisgroupingmechanismisauniformapproachtopreventinglowerprivilegecodefrommodifyingCSRsthatmustbeprotected.Forexample,user-levelcodemustbepreventedfrommodifyingthevirtualmemorypagingschemeandthishappensnaturallybecausetheCSRassociatedwithpagetablesisnotaCSRthatisaccessibleinUserMode.

Someregisterscontainanumberofbit2ieldsandthese2ieldsneedtohavedifferentaccessibility.Forexample,inthemachinestatusregister,some2ieldsshouldberead-onlywhileother2ieldsareupdatable.Thisissupportedinthefollowingway.An



attempttoupdatea2ieldthatisread-onlywillbeignored,eventhoughother2ieldsinthesameCSRcanbeupdated.

The“status”registerisusedindeterminingwhetherinterruptsareenabledandwhatvirtualmemorymechanismiscurrentlyineffect,amongotherthings.Thisregisterismirroredatallprivilegelevels,usingadifferentCSRateachlevel.

Address RegisterName 0x000 ustatus 0x100 sstatus 0x300 mstatus

Withinthestatusregister,some2ieldsareinvisiblewhenrunningatlowerprivilegelevels.Thisissupportedbygivingtheregisteradifferentnameforeachmode.Somebitsthatarede2inedinthemstatusregisterwillbereadsimplyaszerosinthesstatusregister,tore2lectthefactthatthesebitsmustbeaccessibleinMachineModebutnotinSupervisorMode.

AnOverviewofImportantCSRs

InthissectionwedescribethemostimportantControlandStatusRegisters(CSRs).Thisdescriptiongoeshand-in-handwithdescribingfeaturesofaRISC-VpricessornotcoveredelsewhereandofconcernonlytoprogrammersofOSandkernelcode.

Recallthatthe2irstletteroftheCSRregisternamewilltypicallybem,s,orutoindicatewhichprivilegelevelisrequiredinordertoaccessthisregister.Forexample,the“misa”registercanonlybereadwhenrunninginMachineMode,whilethe“ustatus”registermaybeaccessedfromanymode.

misa–MachineInstructionSetArchitecture(ISA)

ThisCSRgivesinformationaboutthebasicarchitectureofthemachine.Ittellstheregisterwidth(32,64,or128)andindividualbitsinthisCSRalsoindicatewhichofthevariousoptionsandextensionsdetailedbytheRISC-Vspeci2icationhavebeenimplemented.ThisregisterencodeswhetherthismachineisanRV32IM,anRV64IMAFDQ,orsomeothervariantoftheRISC-Varchitecture.



Theregisterwidthofthemachine(either32,64,or128)isencodedinthemostsigni2icanttwobitsofthisCSR.Thisisclever,sinceitmakesitpossibleforthesamecodetobeexecutedondifferentmachineswithdifferentregisterwidths.Usingonlyinstructionstotestthesignoftheregisterandshifttheregisterleft,thecodecandeterminewhattheregistersizeis,andthenbranchaccordinglytodifferentcodeblocksforeachofthethreepossibilities.

Thisregisterisread-write.

Somemachinesmaysupportmultipleregisterwidths.Forexample,anRV64machinemaybecapableofrunningas(i.e.,emulating)anRV32machine.Uponpower-onorreset,themisaregisterwillbesettoindicatethewidestregisterwidththecoreiscapableofimplementing.Softwarecansetthisregistertoeffectivelyturn(forexample)anRV64machineintoanRV32machine.

Thelower-order26bitscorrespondtothelettersA,B,…Z(“A”=bit0,“B”=bit1,etc.)Eachbitwillbesettoindicatewhetherthisimplementationsupportsthecorrespondingextension.Forexample,bit5willbesetifthecoresupportsthe“F”singleprecision2loatingpointextension.

mvendorid–MachineVendorID

Forcommercialimplementations,thisCSRidenti2iesbynumberthevendor/manufacturer/organizationthathasproducedthischip.ThenumberusedhereistheIDissuedbyasemiconductorengineeringtradeorganizationcalledJEDEC.Forresearchandnon-commercialimplementations,thisregisterwillcontainzero.

Thisregisterisread-only.

marchid–MachinearchitectureID

ThisCSRidenti2iestheparticulararchitectureofthepartandisessentiallythe“partnumber”or“modelnumber”.Forcommercialimplementations,thisnumberisassignedbythevendor.Forsomenon-commercialoropen-sourceprojects,anumbermaybeassignedbytheRISC-VFoundation.Otherwise,thisregisterwillcontainzero.




mimpid–MachineImplementationID

Givenaparticularvendor(asidenti2iedinmvendorid)andapart/modelnumber(asidenti2iedinmarchid),theremaybeseveralversions.Thisnumberidenti2iestheparticularimplementationorversionoftheprocessor.Itmaybezero.


cycle–CycleCounter(read-only)mcycle–CycleCounter(writable)

Thecycleregisterisaccessibleread-onlyinallmodes.Itcountshardwareclockcycles.ThecountercanberesetbywritingtothemcycleCSR,whichisonlyaccessibleinMachineMode.

instret–InstructionCounter(read-only)minstret–InstructionCounter(writable)

Theinstretregisterisaccessibleread-onlyinallmodes.Itcountsthenumberofinstructionsexecuted(ormoreprecisely,thenumberofinstructionscompleted“instructionsretired”).ThecountercanberesetbywritingtotheminstretCSR,whichisonlyallowableinMachineMode.

time–CurrentTime(read-only)

Thecurrentrealtimeinticks.Seethecommentsformtime.Thisregisterisashadowofmtime.ThetimeCSRisaccessibleatallprivilegelevels.

Details:Theseregistersallrequire64bits,whichisproblematiconRV32machines.Todealwith32-bitmachines,thereareadditionalregisters(whosenamesendin“h”)tocontaintheupper32bits.

loworderbits highorder32bits(RV32only) cycle cycleh mcycle mcycleh instret instreth minstret minstreth time timeh



mtime–CurrentTimemtimecmp–TimeofNextInterrupt

Themtimeregisterprovidesthecurrentrealtimeasacountof“ticks”.Themeaningof“tick”(i.e.,thetick-rateofhowmanytickspersecondandthevalueoftime-zero,atwhichthecounterbegancounting)isdeterminedelsewhere.

Themtimecmpregisterisusedtotriggerthenexttimerinterrupt.Whenthemtimeregisterequals(orexceeds)thevalueinmtimecmp,atimerinterruptwillbetriggered.

TechnicallymtimeandmtimecmparenotControlandStatusRegisters(CSRs),buttheyarelistedheresincetheyaresimilar.Each“register”isa64bitcounterwhichisactuallyimplementedviamemory-mapping,unliketheCSRs.

Commentary:Puttingareal-timeclockinsidethecoreisnotpracticalsinceacorecanberunatdifferentfrequenciesatdifferenttimes(e.g.,under-clockingtoreducepowerconsumptionorover-clockingto…uh…causeittomalfunction).TypicallythereareseveralcoresonachipandtherealtimeclockisessentiallyanI/Odevicesharedbyallcores,providingconsistenttimevaluestoallcores.Thus,therealtimeclockisaccessedbytheuseofmemory-mappedaccesses,likeotherI/Odevices.

Typicallyprocessorchipsareclockedbyimpreciseclockcircuitry.Insystemsthataimtoprovideaccuraterealtimeclocks,theclockisnotimplementedonthesamechipastheprocessorcores.

Presumably,therealtimeclockhasonemtimecmpregisterpercore,allowingeachcoretodeterminewhenitsnexttimerinterruptwillhappen.

Commentary:Themtimecmpregisteris64bitsbutiftheRISC-Vcoreisonly32bits,thentherecouldbeaproblemsettingit.Itmustbeupdatedusingtwostore-word(SW)instructions,eachstoring32bits.Iftheprogrammeriscareless,the2irstloadmighttriggeraspurioustimerinterrupt.TheRISC-Vdocumentationprovidesthiscodetoavoidtheproblem:



# The new value for mtimecmp is in registers x11:x10li x5,-1 # 0xffffffff for lower 32 bitssw x5,mtimecmp # The 64-bit value is ≥ old valuesw x11,mtimecmp+4 # The 64-bit value is ≥ new valuesw x10,mtimecmp

mhartid–MachineHardwareThread(HART)ID

RISC-Vdocumentationde2inesthetermHARTtomean“HardwareThread”.Thisregisterdoesnotre2lectahigherlevel(e.g.,operatingsystem)conceptofthread.

Inasingle-coresystemwithasingle,simpleFETCH-DECODE-EXECUTEpipeline,thereonlyoneHART.

Inamulti-coresystem,whereeachcorewillexecuteasingle2low-of-control,eachcorewillhaveitsownHART.Eachcore’sHARTwillexecuteconcurrentlywiththeothercores’HARTs.ThisCSRidenti2ieswhichcoreisexecuting.

Itmaybeimportanttoidentifyonethreadasa“masterthread”.OneHARTmustbegivenanIDofzero.

Thenumberofhardwarethreadsis2ixedbuttheapplicationsoftwarewillneedanunpredictableandchangingnumberofthreads.TheOSwillmaptraditionalOSthreadsontotheavailablehardwarethreads.

Someadvancedsuperscalarcoresmayimplement“hardwaremultithreading,”inwhichasinglecoreiscapableofexecutingmorethanaoneindependent2low-of-controlatatime.Inotherwords,multi-threadingisperformeddirectlyinhardware.Forexample,asinglecoremightbeabletoexecutetwohardwarethreadsatatime.Theadvantageofhardwaremultithreadingisthatthecomputationalresourcesofthecore(e.g.,multipliers,adders,etc.)canbeusedmoreef2iciently.Insuchasystem,eachhardwarethreadwouldbeidenti2iedwithauniqueHARTandeachcorewouldbeexecutingmorethanoneHARTsimultaneously.




mstatus–MachineStatusRegistersstatus–SupervisorStatusRegisterustatus–UserStatusRegister

Conceptually,theprocessor“statusregister”isasingleregister,butitismirroredatallprivilegelevels.

Bymodifyingthisregister,thesoftwarecandothingslikeenable/disableinterruptsandchangetheISA.(Bysettingbitsinthestatusregister,onecancausetheprocessortoexecuteasifitwerea32bitRV32machine,eventhoughitisactuallya64bitRV64machine.)

Understandingsomeofthe2ieldinthisCSRrequiresunderstandinghowexceptionsareprocessedandtraphandlersareinvoked.

ThisCSRiscomplexandcontainsalargenumberof2ields.Assuch,wedelayfurtherdiscussionuntillater.

mtvec–MachineTrapVectorBaseAddressstvec–SupervisorTrapVectorBaseAddressutvec–UserTrapVectorBaseAddress

Whenatrapoccurs(asaresultofasynchronousexceptionorasynchronousinterrupt),ajumpwillbetakendirectlytothetraphandlerroutine.

Thereisasingletraphandlerforeachprivilegelevel.Theseregisterscontaintheaddressofthesethreetraphandlers.Inotherwords,whenanexceptionoccurs(andistobehandled,notignored),theprogramcounter(PC)issettothevalueinthisCSR,causingajumptothe2irstinstructionofthetraphandlercode.

Thenextsectiongivesadditionaldetail.

ExceptionProcessingandInvokingaTrapHandler

ExceptionsalwaystraptotheMachineModetraphandler2irst,regardlessoftheprivilegemodeatthetimetheexceptionoccurs.TheMachineModetraphandlerwillexecuteinMachineModeand(presumably)endwithanMRETinstruction,whichwillreturntothecodethatwasinterrupted.Theinterruptedcodemayhave



beenrunninginUser,Supervisor,orMachineMode;theMRETinstructionwillrestoretheprivilegeleveltowhateveritwaswhentheexceptionoccurred.

However,wemaynotwanttohandletheexceptioninMachineMode;wemightwanttohandleitinSupervisorModeorevenUserMode.Assuch,thereisafacilityto“delegate”someorallexceptionstothelowerprivilegelevels.

Forexample,ifaparticulartypeofexceptionshouldbehandledbytheSupervisorModetraphandler,thenthisexceptionwillbedelegatedfromMachineModedowntoSupervisorMode.Thetraphandleraddresswillbedeterminedfromthestvecregister(notmtvec);thehandlerwillruninSupervisorMode;andthehandlerwillendwithanSRETinstruction.

Likewise,ifaparticulartypeofexceptionoughttobedealtwithinUserMode,thenanyexceptionofthistypewillbefurtherdelegatedfromSupervisortoUserMode.Thetraphandleraddresswillbedeterminedfromtheutvecregister;thehandlerwillruninUserMode;andthehandlerwillendwithanURETinstruction.

WhetheranexceptionistobedelegatedfromMachineModetoSupervisorMode,ordownfurthertoUserMode,isdeterminedbythesettingsofvariousbitsinotherCSRs.Wedescribethedelegation(“deleg”)CSRselsewhere.

Thereisonlyasingletraphandlerateachofthethreeprivilegelevels(atleastinthebasespeci2ication).Themtvec,stvec,andutvecseCSRsarenormallywritableandOSsoftwarewillstoretheaddressofthetraphandlerintoaCSRbeforetheexceptionoccurs.

Oncetheexceptionoccursandthehandlerisinvoked,itscodemustexamineotherCSRstodeterminethenatureoftheexception,i.e.,whichexceptioncausedthetraphandlertobeinvoked.

(Thetraphandleraddressesmaybehardwired,inwhichcasetheseCSRsareread-only.Itisalsoallowablefortheretobeimplementation-dependentrestrictionsonwhichvaluesthemtvec/stvec/utvecCSRscancontain.)

ImplementationsarefreetoaddadditionaltraphandlersasextensionstotheRISC-Vspec.Inparticular,itmaymakesensefortheprocessortohaveadifferenttraphandlerforeachkindofexception.Thiscanmakehandlingtrapsfaster,sinceiteliminatestheneedforsoftwaretodeterminewhichtypeofexceptioncausedthetrapandjumpconditionally.



Thetraphandler(orhandlers,iftherearemorethanone)mustbeginonword-alignedaddresses.Thesemeansthatanyaddressstoredinthemtvec/stvec/utvecCSRsmusthave“00”astheleastsigni2icanttwobits.TheRISC-Vspecmakesuseofthesetwobitsasfollows.

Ifthelasttwobitsare“00”,thenitmeanstheCSRcontainstheaddressofasingletraphandleranditisuptothecodeinthehandlertodeterminewhichtypeofexceptionhasoccurred.

Ifthelasttwobitsare“01”,thenitmeansthereisacollectionoftraphandlers,oneforeachtypeofasynchronousinterrupt.Thisisgoodsinceweoftenwanttohandleasynchronousinterruptsreallyquickly.Moreparticularly,thereisajumptable(i.e.,anarrayofwords,whereeachwordcontainsaJUMPinstruction)andthemtvec/stvec/utvecCSRcontainstheaddressofthisjumptable.

[Details:Whenanasynchronousinterruptoccurs,theCSRisconsultedtolocatethejumptable.Theinterrupttypeisconsultedtodeterminewhichtableentryistobeused(i.e.,anindexintothearray).Thenajumpismadetotheassociatedentry,byloadingthePCwiththecomputedaddress.Presumably,eacharrayelementcontainsajumpinstructiontothe2irstinstructionofthehandler.Althoughtherecanbeadifferenttraphandlerforeachkindofasynchronousinterrupt(“timerinterrupt”,“externalinterrupt”,…),therewillonlybeonehandlerforallsynchronousexceptions(“illegalinstruction”,“addressmisaligned”,…).The2irstelementinthejumptable(index=0)willbeatraphandlerforallsynchronousexceptions,aswellasfora“user-modesoftwareinterrupt”.]

Theremainingbitpatterns“10”and“11”arenotused.

TheNon-MaskableInterrupt:HardwareFailure

Sometrapsare“maskable”andothersare“non-maskable”.Amaskableinterruptcaneitherbehandled,orcanbeignored,orcanbepassedfromahigherprivilegeleveltoalowerprivilegelevel.

Anon-maskableinterrupt(NMI)willbehandledimmediatelyandcannotbeignored.Onlyoneeventcancauseanon-maskableinterrupt:

•Ahardwarefailureisdetected(e.g.,byerrorcheckingcircuitry)



IntheeventofahardwareerrorNMI,thefollowinghappens:

•TheprivilegemodeissettoMachineMode.•ThepreviousPCwillbesavedinthemepcregister.•ThePCissettosome2ixed,predetermined,implementationdependentvalue.•Themcauseregisterissettoindicatethenatureofthefailure.

Nootherprocessorstateisaltered,whichmaybehelpfulindiagnosingtheproblem.

ResetProcessing

Thefollowingeventsarenotconsideredtobeexceptionsandarenothandledastraps:

•Power-on-reset,triggeredwhenthesystemisturnedon•Pressingaphysicalrestart/resetbutton•Awatchdogtimertimesout•Powerlevelsdropbelowspecs,triggeringa“brownout”event•Alow-powersleepstateisendedandtheprocessor“wakesup”

Intheaboveevents,thefollowinghappens:

•TheprivilegemodeissettoMachineMode•TheMIE(Interruptenable)bitinthestatuswordissetto0.•TheMPRVbitissetto0,whichturnsoffanyaddresstranslation.•Themcauseregisterissettoindicatewhicheventhasoccurred.•ThePCissettosome2ixed,predetermined,implementationdependentvalue.

Theremainingprocessorstateisunde2ined.

Power-On-Reset:Whenpoweris2irstappliedtoaprocessor,theelectronicswilltypicallycausea“power-on-resetinterrupt”tooccur.Typically,thisinitialprocessingsequencewillforceajumptoaknown,predeterminedaddressinsomeread-onlyportionofmemorybyloadingtheProgramCounter(PC)withsomeknownvalue.Theeffectistoforceajumptotheinitialstartupbootstrapprogram.

WatchdogTimer:A“watchdogtimer”consistsofanindependentelectroniccircuitthatisusedtorestartacomputerthathasexperiencedacatastrophicsoftwarefailure,e.g.,gonebrain-deadinanin2initeloop.



Thistechniquecanallowacomputertorecoverfromotherwisefatalmalfunctionssuchasprogrambugsandtransienthardwareerrors.Thewatchdogtimertechniquehassavedspaceprobesthathavehadfailuresin2light:theprobemaygosilentforseveraldaysbutthenwakesup,performsafullreset,andcontinuestocompletethemission.

Awatchdogtimerwillcauseanon-maskableinterruptafteracertainamountoftimehaselapsedwithoutthetimerbeingreset.Resettingthetimeriscalled“feedingthedog”.Whensoftwareisworkingnormally,theassumptionisthatthetimerwillberesetatregularintervals,thusfeedingthedog.Butifsomethinggoeswrongandthetimerisnotreset,theinterruptwilloccur.

ExceptionDelegation

Wenextlookatthemechanismthatallowsexceptionstobemaskedand/ordelegatedtolowerprivilegelevels.

Bydefault,everyexception(whethersynchronousorasynchronous)ishandledbyatraphandlerrunninginMachineMode.Ifthekernelprogrammerwantsthetraptobehandledatalowerprivilegelevel,thenonepossibilityisfortheprogrammertowritecodetoswitchthemodeandthenpasscontrolbacktothelowerprivilegelevel.

However,thistechniqueofsoftwaredelegationisnotveryef2icientanditisfastertohavesomeexceptionsautomaticallytraptoahandlerrunningatthelowerprivilegelevel.RISC-Vsupportsdelegationoftrapsbythehardware,asonepossibledesignoption.Ifthisoptionispresentintheimplementation,thentheexceptionwillbypasstheMachineModetraphandlerandgodirectlytoatraphandlerrunningatalowerlevel.

Thisiscontrolledbythefollowing4delegationCSRs:



medeleg–MachineExceptionDelegationRegistersedeleg–SupervisorExceptionDelegationRegister andmideleg–MachineInterruptDelegationRegistersideleg–SupervisorInterruptDelegationRegister

Weareusingthisterminology:

“exception”=synchronousexception(e.g.,illegalinstruction) “interrupt”=asynchronousexception(e.g.,timerinterrupt)

IfthebitcorrespondingtotheexceptiontypeintheMachineModedelegationregisters(medelegormideleg)isset,thentheMachineModetraphandlerwillnotbeinvoked.Instead,thetrapwillbeimmediatelydelegatedtothenextlowestprivilegelevel,i.e.,toSupervisorMode.

Next,theappropriatebitintheSupervisorModedelegationregisters(sedelegandsideleg)willbechecked.Ifset,thetrapwillbefurtherdelegatedtoUserMode.

Thetrapcannotbefurtherdelegated,sotherearenoUserModedelegationCSRs.

TrapsarealwayshandledinMachineMode,unlessdelegatedtoalowerlevel.The“deleg”CSRstellwhichexceptionsandinterruptsaredelegatedtowhichprivilegelevel.

Forexample,thehandlingofsomeparticulartrapmightbedelegatedfromMachineModetoSupervisorModecode.ThetrapmightbefurtherdelegatedtoatraphandlerrunninginUserMode.

Theseregisterscanbewrittento.Theycontrolwhetherindividualexceptionsandinterruptswillbedelegated,i.e.,processedbyatraphandlerrunningatalowerprivilegelevel,orwillbeprocessedatthehigherlevel.

ThemedelegregistertellswhetheransynchronousexceptionwillbehandledbytheMachineModetraphandler(whoseaddressisinthemtvecCSR),orwillbedelegatedtothenextlowerlevel.IfSupervisorModeisimplemented(insomechipsitmaynotbe),thenthesedelegregistertellswhethertheexceptionwillbehandledinSupervisorModeorfurtherdelegatedtoUserMode.



Thereisabitineach“exception”delegationregister(i.e.,medelegandsedeleg)foreachtypeofsynchronousexception,asfollows:

Bit Description0 instructionaddressmisaligned1 instructionaccessfault2 illegalinstruction3 breakpoint4 loadaddressmisaligned5 loadaccessfault6 store/atomicmemoryoperationmisaligned7 store/atomicmemoryoperationaccessfault8 environmentcallfromUMode9 environmentcallfromSMode10 (previouslyusedforHypervisorMode)11 environmentcallfromMMode

Ifthebitissetto1,thentheexceptionisdelegated,i.e.,handledbythenextlowerlevel.Ifthebitis0,thentheexceptionishandledatthislevel.

ThemidelegregistertellswhetheranasynchronousinterruptwillbehandledbytheMachineModetraphandler(whoseaddressisinthemtvecCSR),orwillbedelegatedtothenextlowerlevel.IfSupervisorModeisimplemented(insomechipsitmaynotbe),thenthesidelegregistertellswhethertheinterruptwillbehandledinSupervisorModeorfurtherdelegatedtoUserMode.

Thereisabitineach“interrupt”delegationregister(i.e.,midelegandsideleg)foreachtypeofasynchronousinterrupt,asfollows:



Bit Description0 USIP–usersoftwareinterrupt1 SSIP–supervisorsoftwareinterrupt2 (previouslyusedforHypervisorMode)3 MSIP–machinesoftwareinterrupt4 UTIP–usertimerinterrupt5 STIP–supervisortimerinterrupt6 (previouslyusedforHypervisorMode)7 MTIP–machinetimerinterrupt8 UEIP–userexternalinterrupt9 SEIP–supervisorexternalinterrupt10 (previouslyusedforHypervisorMode)11 MEIP–machineexternalinterrupt

Ifthebitissetto1,thentheinterruptisdelegated,i.e.,handledbythenextlowerlevel.Ifthebitis0,thentheexceptionishandledatthislevel.

mie–MachineModeInterruptEnablemip–MachineModeInterruptPending

Therearethreesourcesofinterrupts(i.e.,asynchronousexceptions):

•Softwareinterrupt •Timerinterrupt •ExternalInterrupt

SoftwareinterruptsareusedforoneHARTtosignalanotherHART.

Timerinterruptsoccurwhenthereal-timeclock(mtime)reachesapresetlimitvalue(mtimecmp).

Externalinterruptsareforallotherdevicestointerruptaprocessor.

Themieregistercontainsabitforeachtypeofasynchronousinterrupt,asfollows:



Bit Description0 USIE–usersoftwareinterruptenable1 SSIE–supervisor…3 MSIE–machine…

4 UTIE–usertimerinterruptenable5 STIE–supervisor…7 MTIE–machine…

8 UEIE–userexternalinterruptenable9 SEIE–supervisor…11 MEIE–machine…

Likewisemipregistercontainsabitforeachtypeofasynchronousinterrupt,withanalogousnames:

Bit Description0 USIP–usersoftwareinterruptpending1 SSIP–supervisor…3 MSIP–machine…

4 UTIP–usertimerinterruptpending5 STIP–supervisor…7 MTIP–machine…

8 UEIP–userexternalinterruptpending9 SEIP–supervisor…11 MEIP–machine…

(Bits2,6,and10wereusedforHypervisorModeandarenow“reserved”.Allremainingbitsareunused.)

Settingabitto1inthemip(interruptpending)registerindicatesthatthecorrespondinginterrupthasoccurredandshouldcauseatrapatsomepointinthefuture.Ifthesamebitisalsosetto1inthemie(interruptenable)register,thenthetrapprocessingwilloccurimmediately.

Inadditiontothependingbitsdiscussedhere,interruptsmayalsobegloballyenabledordisabled.SeetheMIE(interruptenable)bitinthemstatusregister.



NotethatthereisabitinthestatusregistercalledMIEandaCSRwiththesamename:mie.Likewise,thereisanSIEbitinthestatusregister,aswellasaCSRcalledsie.

Moreprecisely,aninterruptwillbetaken(i.e.,thetraphandlerwillbeinvoked)ifandonlyifthecorrespondingbitinbothmieandmipregistersissetto1,andifinterruptsaregloballyenabled.

Thetimerinterruptpendingbit(MTIP)issetbyhardwarewhenthetimerexpires,i.e.,whenthecurrenttime(mtime)reachesorexceedsthetimerlimitregister(mtimecmp).

Thetimerinterruptpendingbit(MTIP)isresetbywritingtothetimercomparisonregister.Afterwritingtothetimercomparisonregister,thetimerinterruptwillnolongerbepending.Softwarecannotmodifythefollowinginterruptpendingbits;theyareread-onlyandset/clearedbyothercircuitry:

3 MSIP–machinesoftwareinterruptpending7 MTIP–machinetimerinterruptpending11 MEIP–machineexternalinterruptpending

However,softwarerunninginMachineModecanmodifythesebits:

0 USIP–usersoftwareinterruptpending1 SSIP–supervisorsoftwareinterruptpending4 UTIP–usertimerinterruptpending5 STIP–supervisortimerinterruptpending8 UEIP–userexternalinterruptpending9 SEIP–supervisorexternalinterruptpending

Ifoneofthesebitsissetandthatinterrupttypehasbeendelegatedtoalowerlevel(seethemedelegandmidelegregisters),thentheinterruptwillbesignaledatthelowerprivilegelevel.



sip–SupervisorModeInterruptPendinguip–UserModeInterruptPending

sie–SupervisorModeInterruptEnableuie–UserModeInterruptEnable

Eachofthelowerprivilegelevelshasitsowninterruptpendingregisteranditsowninterruptenableregister.

Normallyinterruptsareprocessedatthehighestprivilegelevel.Forexample,ifatimerinterruptoccurswhilerunninginUserMode,thecorewillenterMachineModeandinvoketheMachineModetraphandler.Whencomplete,theMachineModetraphandlerwillreturntoexecutinginUserMode.

However,interruptscanbedelegatedfromahigherleveltoalowerlevel.Forexample,thetimerinterruptmightbedelegatedfromMachineModetoSupervisorMode.SoiftheinterruptoccursinUserMode,theSupervisorModetraphandlerwillbeinvokedandMachineModewillneverbeentered.

Ifaparticularinterrupttype(suchasthetimerinterrupt)hasbeendelegated(forexamplefromMachineModetoSupervisorMode),thenthecorrespondinginterruptpendingbitisshadowedfromtheMachineModeinterruptpending(mip)register.

Exactlywhatthismeansisunclear.PresumablytheMTIPbitinmipshadowedastheSTIPbit(andnottheMTIPbit)insip???

WhethertheinterruptcausesatrapinSupervisorModeornotisdeterminedbywhetheritisenabledintheSupervisorModesieregister,aswellaswhetherglobalinterruptsareenabledinSupervisorMode.



TheStatusRegister

mstatus–MachineStatusRegistersstatus–SupervisorStatusRegisterustatus–UserStatusRegister

Conceptually,theprocessorstatusregisterisasingleregister,buttheregisterismirroredatthelowerprivilegelevels,sincesome2ieldswithinthestatusregistermustbeprotecteddifferentlyatdifferentprivilegelevels.

ThisCSRcontainsanumberof2ieldsthatcanbereadandupdated.Bymodifyingthese2ields,thesoftwarecandothingslikeenable/disableinterruptsandchangethevirtualmemorymodel.Forexample,bywritingtothisCSR,thesoftwarecanturnonvirtualmemoryandpage-tabletranslation.

Nextweshowthelayoutofthestatusregister,followedbyalistingoftheindividualbit2ields.

Asmentioned,thestatusregisterismirroredinthedifferentmodes.Some2ieldsarepresentonlyathigherprivilegelevels.(Suchbitsare2illedwithzerosatlowerprivilegelevels.)

Twoofthe2ieldsareonlyusedfor64and/or128bitmachines.Thesetwo2ieldsresideinbitspositions[35:32],sotheyarenotevenpresentin32-bitmachines.



�



Name #ofbits Description

UIE 1 UserModeInterruptEnableSIE 1 SupervisorModeInterruptEnableMIE 1 MachineModeInterruptEnable

UPIE 1 UserModePreviousInterruptEnableSPIE 1 SupervisorModePreviousInterruptEnableMPIE 1 MachineModePreviousInterruptEnable

SPP 1 SupervisorMode–PreviousPrivilegeModeMPP 2 MachineMode–PreviousPrivilegeMode

FS 2 FloatingPointStatus(dirty/clean/initial/off)XS 2 UserModeExtension(dirty/clean/initial/off)

MPRV 1 ModifyPrivilege SUM 1 PermitSupervisorUserMemoryAccessMXR 1 MakeExecutableReadable

TVM 1 TrapVirtualMemoryTW 1 Time-outWaitTSR 1 TrapSRETinstruction

SD 1 (FS==11)or(XS==11)

ForRV64andRV128only:

UXL 2 Emulation:RegistersizewheninUserModeSXL 2 Emulation:RegistersizewheninSupervisorMode

MIE,SIE,UIE—InterruptsEnabled

Recalltheterm“interrupt”meansasynchronousexceptions,i.e.,TimerInterrupts,SoftwareInterrupts,andExternalInterrupts.

IftheprocessorisinMachineMode,theninterruptsareenabledwheneverMIE=1;interruptprocessingforSupervisorandUsermodeisdisabled.



IftheprocessorisinSupervisorMode,thensupervisorinterruptsareenabledwheneverSIE=1;interruptprocessingforUsermodeisdisabledandinterruptprocessingforMachineModeisalwaysenabledwhenrunninginSupervisorMode.

IftheprocessorisinUserMode,thenuserinterruptsareenabledwheneverUIE=1;interruptprocessingforMachineandSupervisorModeisalwaysenabledwhenrunninginUserMode.

MPIE,SPIE,UPIE,MPP,SPP—InterruptProcessing&PreviousState

MPIE–PreviousvalueofMIE(MachineModeInterruptEnable) SPIE–PreviousvalueofSIE(SupervisorModeInterruptEnable) UPIE–PreviousvalueofUIE(UserModeInterruptEnable)

MPP–MachineModetrap,previousPrivilegeMode(eitherM,S,orU)SPP–SupervisorModetrap,previousPrivilegeMode(eitherSorU)

Whenaninterruptoccursandatraphandleristobeinvoked,theprivilegemodewillchangeandthe“interruptenable”bitmustbesetto0topreventadditionaltrapprocessing(atthislevel)whilethetraphandlerisexecuting.

Theprocessormaybeexecutingatoneprivilegemode(sayUser)andthetrapmaybehandledatahigherlevel(saySupervisor).IfaMachineModetrapthenoccursbeforetheSupervisortraphandleriscomplete,thentheSupervisorTraphandlerwillbeinterruptedandtheMachineModetraphandlerwillexecute.Uponreturn,theSupervisorModetraphandlerwillresumeexecution;Finally,theinterruptedUserModecodewillberesumed.

Itisnecessarytosavethemachinestateinasortof“stack”,sothatuponcompletionofatraphandler,theprocessorcanreturntothepreviousstate.Sinceatraphandlerrunningatonelevel(saySupervisor)cannotbeinterruptedatthatsamelevel,thestackneednotbetoodeep.

Whenantraphandlerruns,allweneedtosaveisthepreviousmodeandtheinterruptenablebit.Notethepreviousinterruptenablebitmaybe0or1.Regardlessofitspreviousvalue,theprocessorwillchangetheinterruptenablebitto0whenthetraphandlerisinvokedandrestoreitwhenthetraphandlerreturns.

(Ofcourse,anon-maskableinterrupt,suchasahardwarefailure,orareset-typeevent,suchaspower-on-reset,mightoccuratanytime.Suchaneventcanneverbemaskedandwillalwaysbehandledregardlessofwhetherornotinterruptsare



enabled.Butthereisnoproblemsincenon-maskableinterruptsareprocessedinMachineModeandthepreviouslyexecutingcodeissimplyabandoned.)

ConsiderwhattheprocessorwilldoupontheoccurrenceofaninterruptwhichwillbehandledbytheMachineModetraphandler.First,theMPIEbitwillbesettoholdthepreviousvalueoftheinterruptenablebitforMachineModepriortotheinterrupt(i.e.,tothepreviousvalueofMIE).InterruptswillthenbedisabledbysettingMIEto0.Also,MPPwillbesettotellwhatprivilegeleveltheprocessorwasrunningatwhentheinterruptoccurred.TheprivilegemodeineffectwhentheinterruptoccurredmayhavebeeneitherMachine,Supervisor,orUserMode.Thustwobitsarerequiredtosavethepreviousprivilegelevel.

Itispossiblethatanexceptionwilloccurduringtheexecutionofatraphandler.Thismeansthat,whenexceptionsareprocessed,thepreviouslyvalueofthe“interruptsenabled”bitcanbeeither0or1.Forexample,itmaybethatsomeinstructionisunimplementedandmustbeemulated.Sowhenthatinstructionisencountered—eitherduringnormalcodewithinterruptsenabled,orhandlercodewithinterruptdisabled—anewtraphandlerwillbeinvoked.Itwillrunwithinterruptsdisabledand,uponcompletion,the“interruptsenabled”bitwillberestoredtoitspreviousvalue.

TheMIE(“MachineInterruptEnable”)bitdetermineswhetheramaskableinterruptwillcausetrapprocessingorwillremainpending.Ifinterruptsaredisabled,thentrapprocessingwillbesignaledbutwillremainpendinguntilinterruptsareonceagainenabled.[???]

Likewise,whenaninterruptoccursthatistobehandledbytheSupervisorModetraphandler,SPIE(“SupervisorPreviousInterruptEnable”)willbesettothepreviousvalueoftheSIEinterruptenablebitpriortotheinterrupt.ThenSIE(“SupervisorInterruptEnable”)willbesetto0todisableinterruptsatthislevel.ThenSPP(“SupervisorPreviousPrivilege”)willbesettowhicheverprivilegeleveltheprocessorwasrunningatwhentheinterruptoccurred.SincetheonlychoicesareSupervisororUser,theSPP2ieldisonlyonebit.

Similarly,whenaninterruptoccursandishandledbytheUserModetraphandler,UPIE(“UserPreviousInterruptEnable”willbesettothepreviousvalueoftheUIE(“UserInterruptEnable”)interruptenablebitpriortotheinterrupt.TheinterruptedcodemusthavebeenrunninginUserMode,sinceweneverinterruptahigherlevelmodetorunatraphandleratalowerlevel.Therefore,thereisnoneedfora



“UPP”(“UserPreviousPrivilege”)2ieldtoholdthepreviousmode;itmusthavebeen“UserMode”.

ConsideratrapwhichoccurswhilerunninginUserModetoatraphandlerinMachineMode.MPIEissettothepreviousvalueofMIE;thenMIEissetto0;alsoMPPissetto“UserMode”.

Afterthetraphasbeenhandled,anMRETinstructionwillbeexecuted.MPPwillbeconsultedtodeterminethatthepreviousmodewasUserMode.MIEissettoMPIE,restoringitsinitialvalue;themodeisrestoredtoUserMode;MPIEandMPParesettoarbitraryvalues,sincetheirinformationiseffectively“popped”offthestack.

Notethatthehandlermaychoosenottoreturntotheinterruptedcode;thisiscommoninoperatingsystems,particularlywhenservicingatimerinterrupt;thehandlerwillcauseathreadswitchandwillnotreturntotheinterruptedcodeuntilmuchlater.Whenthishappens,thesimplehardwarestacksystemisinadequate.Thehandlermustsavetheinformationinthestatusregisterandrestoreitlater.

TSR—TrapSRETInstruction

Ifthisbitis1,thenanyattempttoexecuteanSRETinstructionwillcausean“illegalinstruction”exception.If0,thenSRETmaybeexecutedwithoutinvokingatraphandler.

ThisbitisavailableonintheMachineModestatuswordmstatus,notinSupervisororUserModes.

TheabilitytointerceptandtrapanSRETinstructionmightbeusefultohypervisorcode.

TWandTVM—Time-outWaitandTrapVirtualMemory

ThesebitsareavailableonlyintheMachineModestatuswordmstatus,notinSupervisororUserModes.

Theirfunctionalityisnotdocumentedinthecurrentspec.Perhapstheyhavebeeneliminated.???



SXLandUXL—SupervisorandUserRegisterSize

AnRV32processoralwaysusesthe32ISA(e.g.,32-bitregisters)regardlessofwhatmodeitisin.For64-bitand128-bitmachines,thereisafacilitytoexecutecodemeantforaRISC-Vprocessorwithasmallerregistersize,i.e.,inRV32orRV64mode.

TheabilityoflargermachinestoexecutetheRV32orRV64bitISAsisnotrequired,butifpresent,theSXLandUXLbitscontrolit.Ifitisnotimplemented,thenthesetwo2ieldsareread-only.

AnRV64bitprocessoralwaysusesthe64-bitISAwhenrunninginMachineMode.However,itcanexecutein32-bitmodewhenrunningineitherSupervisororUserMode.Thisiscontrolledbywriting01totheSXLorUXLbits.

AnRV128bitprocessoralwaysusesthe128-bitISAwhenrunninginMachineMode.However,itcanexecutein32-bitor64-bitmodewhenrunningineitherSupervisororUserMode.ThisiscontrolledbywritingtotheSXLorUXLbits.

TheencodingusedfortheSXLandUXL2ieldsis:

01 32-bitISA 10 64-bitISA 11 128-bitISA

FS,XS,andSD—FloatingStatusandExtensionStatusBits

Thesebitsofthestatusregisterareconcernedwithimprovingcontextswitchingtimes.First,wedescribe“FS”2irst,whichisconcernedwiththe2loatingpointregisters.

Whenthe2loatingpointregistershavebeenused,theywillneedtobesavedwhenthekernelswitchesfromonesoftwarethreadtoanother.Butwhentheregistersareunused,we’dliketoavoidtheoverheadofsavingandrestoringthem,sincethisistimeconsuming.

TheFSbit2ieldconsistsoftwobits,encodedasfollows:



Value Meaning 00 Off 01 InitialValues 10 Clean,i.e.,updatedbutsavedinmemory 11 Dirty,i.e.,updatedbutnotsaved

Thesebitsaresetbythehardwareandcanalsobemodi2iedbysoftware.Theyalsocontrolwhethera2loatingpointinstructionwillcauseanexceptionorwillbeexecutednormally.

(Thesebitsareonlypresentifthe2loatingpointextensionisimplemented.Ifthereareno2loatingpointregisters,theFSbitsarehardwiredto00.)

TheFSbitsshouldbesetto00=“off”whenthe2loatingpointregistersarenotinuse.Anyattempttoreadormodifytheregisterswillcauseanillegalinstructionexception.Forathreadthatwillnotbeusingthe2loatingpointregisters,thisisagoodsetting,sincetheOScanavoidsavingandrestoringtheregisters.

Thecompletestateofthe2loatingpointsystemconsistsofthecontentsofofthe322loatingpointregistersandthe2loatingpointstatusregister(fcsr).Manyprogramsdon’tuse2loatingpoint,sothereismuchtobegainedbyavoidingthesave/restoreofallthisinformationoneachcontextswitch.

Whenathreadthatuses2loatingpointiscreatedandbeginslife,the2loatingpointregistersshouldbeinitializedbytheOStotheirinitialvalues(presumably+0.0)andthe2loatingpointstatusregistershouldbeinitialized(presumablytozero).

The“initial”state(01)indicatesthatthe2loatingpointregistersareusablebuttheyhavenotyetbeenalteredfromtheirinitialvalues.Thus,theyregistervaluesdonotneedtobesavedonthenextcontextswitch.

Whenevera2loatingpointregisterismodi2ied,hardwarewillchangethestateasre2lectedinFSto“11=dirty”.

The“clean”state(10)isusedtoindicatethattheregistersareinuseandhavebeenmodi2iedfromtheirinitialzerovalues,butthattheyhavenotbeenmodi2iedsincethelastcontextswitch.Inotherwords,theregisterscontainnon-zerovaluesbutthevalueslastsavedtomemoryarecurrentandaccuratelyre2lectthevaluesintheregisters.Thus,atthetimeofthenextcontextswitch,theregistersdonotneedtobe



savedagain.(Thiswillbecommonsinceaprogramthatuses2loatingpointatsomepointwilllikelyspendmanytimesliceswithoutfurthermodifyingtheregs.)

InsomeOSes,itmaybethecasethatprocessesalwaysbeginlifewiththe2loatingpointsystemenabledbydefault,sothe“off”stateisnotparticularlyuseful.

ButinotherOSes,processeswillbeginlifewiththe2loatingpointsystemdisabled,undertheassumptionthatmostprocesseswillnotuse2loatingpoint.Thus,theOSwillsetFSto00=off.Ifanattemptismadetoreadorwritethe2loatingstate,thehardwarewillsignalanexception.

SomeOSesmayrespondbyenablingtheregisters,settingthemtotheirzerovalues,andretryingtheinstruction.OtherOSesmayrequirethe2loatingpointsystemtobeexplicitlyenabledbysomesystemcallandtreatanyaccessestodisabledstateasanerror.

Itisnecessaryforread(aswellaswrite)attemptstocauseanexceptionwhenthestateis00=off.TheOSmayoccasionallyleavetheoldstatefromanotherunrelatedthreadintheregistersandwemustpreventinformationfrombleedingoverorescapingfromonethreadtothenext.

Whenathread2inishesitstimeslice,theOScanlookatFStodeterminehowmuchstatemustbesavedatthecontextswitch:

ValueofFS Action 00=off Savingnotnecessary 01=initial Savingnotnecessary 10=clean Savingnotnecessary;previouslysavedvaluesarestillgood 11=dirty Mustsavetheregisters

Ifthestatewaspreviously11=dirty,thentheOSshouldchangeitto“clean”beforestoringthethread’sinformation,sinceitwillnowbetruethatthein-memorysavedstateisup-to-date.

WhentheOSisreadytoinitiateandscheduleanewthread,itcanlookatthestateoftheFSfromthepreviousthreadandthestateofthenewthreadtodeterminewhattodo.(Recall,thatwhentheOSpreviouslysavedtheregswhenthethreadwaslastde-scheduled,itchangedthestatefrom“dirty”to“clean”,sothestateofthenewthreadwillneverbe“dirty”atthispoint.)



NextThread PreviousThread Action 00=off …any… Donothing 01=initial 01=initial Donothing(regsalreadyzeroed) 01=initial …other… Zerotheregs 10=clean …any… Restoreregsfrommemory 11=dirty <notpossible>

Whatwehavejustseenisthatsomesectionoftheprocessor(the2loatingpointsubsystem)requiredattentionateverycontextswitch,eithertosavestateortorestorestate.Theseactionsareexpensiveandwewishtoavoidthemwhenpossible.TheFSbitshelpdeterminewhenwecanavoidsaving/restoring.

SomeRISC-Vimplementationsmayaddnon-standardextensionswhicharenotdescribedinthespecandtheXSbitsareusedanalogouslyforthisadditionalstateinformation.

TheXSbitsworkthesamewayastheFSbitsbutcoverallotherstatethatshouldbesaved/restoreduponcontextswitch.Exactlywhatprocessorstateiscoveredby“XS”isleftas“implementationde2ined”.

Itispossiblethattherearemorethanonenon-standardextensions.TheXSbitsaremeanttoapplytoallofthem.Sothemeaningsarechangedslightlytore2lectthis.

Value Name Meaning 00 Off Allsub-systemsareoff 01 Initial Oneormoreareon;butnothingclean,nothingdirty 10 Clean Nothingisdirty,somepartsareclean 11 Dirty Atleastsomestatehasbeenmodi2ied

Itmaybethatthereareadditionalbitstodetermineexactlywhatpartsoftheprocessorstatearedirty/clean/initial,butthespecleavesthistoimplementationspeci2icdecisions.

Finally,uponcontextswitch,we’dliketobeabletoquicklydeterminewhetherwehavetodoanythingaboutthe2loatingpointregsandanynon-standardextensions.Mightweneedtosaveanything,orcanwejustignorethisissue?Thequestionwewanttoansweristhis:IsFS=11=dirty?OrisXS=11=dirty?



TheSDbitinthestatusregisterallowstheOSsoftwaretoquicklyanswerthesequestions.Thebitissetbythehardwareandde2inedas:

SD=((FS==11)OR(XS==11))

Thisbitisplacedinthesignbitofthestatuswordsoaquickcheckcandeterminewhetheranythingisdirty.Ifso,theOSsoftwarecanlookdeepertodeterminewhatstatemustbesaved.

These2ields(FS,XS,SD)areavailableintheMachineModemstatusandSupervisorModesstatusregisters.[Thediagramofthestatusregistersincorrectlyshowstheminustatus.]

MPRV,SUM,andMXR—VirtualMemoryEnabling

TheRISC-Vchipsupportsvirtualmemory,viapagetables.Thepagetablesystemisdescribedinaseparatechapter.

Typically,user-levelcodewillrunwithvirtualmemorymappingturnedonandallmemoryaccesseswillbemappedfromvirtualaddressestophysicaladdresses.

However,user-levelcodewilloccasionallymakesystemcalls,switchingfromexecutioninatalowerprivilegelevel(e.g.,UserMode)toahigherprivilege(e.g.,MachineMode).Oftendataispassedbetweenuser-levelcodeandtheOSinmemory;forexample,theuser-levelcodemightpassanaddresspointertotheOS.ThispointerisavirtualaddressandtheOScodewhichservicesthesystemcallwillthengothememorytofetchthedata.

SincetheaddressisavirtualaddresswhiletheOSisrunningin(say)MachineModewithvirtualmemorymappingturnedoff,thereisaproblem.TheOScouldperformtheaddresstranslation(fromVirtualtoPhysical)insoftware,butthisistimeconsuminganderror-prone.

Tomakethistasksimpler,theMPRVbitcanbeused.NormallytheMPRVisclearedto0andallLOADandSTOREinstructionsdonebycoderunninginMachineModewillhavenoaddresstranslationperformed.However,theOScodecansetthisbitto1;anysubsequentLOADorSTOREinstructionswillbeperformedwithaddresstranslationturnedon.Moreprecisely,addresseswillbetranslatedasifthecurrent



modewerethatspeci2iedbyMPP(theMachinePreviousPrivilegeModebitsinthemstatusregister),whichwasthemodefromwhichthemostrecenttraphandlerwasentered.

(SoasystemcallfromUserMode,wherepagingwason,wouldpassvirtualaddresses.Thehandler,runninginMachineMode,wouldneedaddresstranslationturnedonifitwantstoaccessthememoryusingtheseaddresses.HoweverasystemcallfromMachineModeitself(orSupervisorModewithtranslationturnedoff)wouldpassphysicaladdresses,sincetherewouldbenopaging.)

TheSUMbitisusedbycoderunninginSupervisorModeinsteadofMachineMode.IthasnoeffectoncoderunninginMachineModeorinUserMode.

Addresstranslationcanbeturnedonbywritingtothesatpregister.Ifturnedon,thenallUserModeandSupervisorModecodewillrunwithaddresstranslation.

Eachvirtualaddressspaceisdividedintobothuserandnon-userpages.Inotherwords,somepageswillbemarkedas“User”pagesandtheremainderwillbeinaccessibletoUserModecode.(TypicallytheportionoftheaddressspaceaccessibletoUserModecodeiscalledthe“bottomhalf”andtheportionaccessibleonlytoSupervisorModecodeiscalledthe“tophalf”,althoughthesetwoareasneednotbearrangedascontiguousblocksorlocatedinanyrelativeorder.)

NormallySupervisorcodewillonlyaccessnon-user(tophalf)pagesintheaddressspace.However,sometimesthesupervisorcodemustaccesstheuser(bottomhalf)portionoftheaddressspace.

TheSUMbitcanbeusedtoallowsupervisorcodetoaccesstheuserpages.IfSUM=0,thenSupervisorModecodecannotaccesspagesmarkedas“user”pages;itcanonlyaccessthenon-userportionoftheaddressspace.IfSUM=1,thenSupervisorModecodeisfreetoaddressbothuserandnon-userpages.Thisbitaffectsallloads,stores,andinstructionfetches.Normally,supervisorscodedoesnotneedtoaccessuserpages,sosupervisorwillrunwiththisbitcleartopreventinadvertentaccessestouserspace.Whenthesupervisorcodeneedstoaccessuserspace,e.g.,toretrievesyscallarguments,thisbitcanbetemporarilysetto1.

Sometimesatraphandlerwillneedtolookattheinstructionthatwasexecutingatthemomentofanexception.Forexample,iftheinstructionisunimplementedinthehardwareandmustbeemulated,thetraphandlerwillneedtoreadtheinstructiontodeterminehowtoemulateit.



Memorypagescanbemarked“readable”and/or“executable.”Apagecontaininginstructionswillnormallybemarked“executable”butwillnotbemarked“readable”.Therefore,anyattempttofetchawordfromthispagewillcauseanexception.

Inordertoallowthetraphandlertofetchtheinstruction(asdata),thisexceptionmustbesuppressed.ThisisthefunctionoftheMXRbitinthestatusregister.Whensetto1,datareadsareallowedfrompagesmarked“executable”;whensetto0,suchreadswillcauseanexception.

AdditionalCSRs

mepc–ProgramCounterforMachineModeTrapHandlermscratch–ScratchRegisterforMachineModeTrapHandlermcause–CauseCodeforMachineModeTrapHandler

WhentheMachineModetraphandlerisinvoked,thepreviousvalueoftheprogramcounterisstoredintomepc;thisvaluewillbeusedattheendofthetraphandler,whenanMRETinstructionisexecuted.Themepcregisterisread/writeandcanbewrittenbysoftware(e.g.,byanOSduringathreadswitch)tocausetheMRETtojumpintoanarbitrarythread.

WhenaMachineModetraphandlerisinvoked,thecodemustbecarefulnottooverwriteandlosethepreviousvaluesofanyregisters,sincetheywerebeingusedbytheinterruptedcode.Insteadtheregistersmustbeimmediatelysaved.However,itisvirtuallyimpossibletodoanythingwithouttheuseofatleastonegeneralpurposeregister.

Thisisthepurposeofthemscratchregister,whichisaread/writeregister.ThetraphandlercanswapmscratchwithanygeneralpurposeregisterandthenusethenormalISAinstructionstosavetheremainingregisters.

RecallthattheCSRRWinstructionwillexchangethevalueinanarbitraryCSRwithageneralpurposeregister.Thisinstructionaccomplishesthetaskofswappingmscratchwithageneralpurposeregister,therebysavingthegeneralpurposeregister.Thisgivesthesoftwarearegisterloadedwithabasevalue,whichcansubsequentlybeusedtosaveallremainingprocessorstate.



Themscratchregistercannotbechangedbytheexecutionofthecodeatlowerprivilegelevels.Thereforeitcanbereliedontocontainaknownvalue,uncorruptedbyuser-levelcode.Typicallyitmightcontainaframeorstackpointerorapointertoa“registersavearea.”Inanycase,uponentrytothetraphandler,mscratchcanbeswappedintoageneralpurposeregisterandthenusedtosavewhicheverregistersthetraphandlerwillbeneeding.

Commentary:SomeearlierISAdesignsusedageneralpurposeregisterforthisfunction.Theideawasthatoneregisterwould(byconvention)bereservedfortheOSkernel.Wheneveraninterruptoccurred,theOSkernelwouldbefreetousethisregisterinsavingthestateoftheinterruptedprocess.Therefore,thisregisterwasuselesstouser-levelcode.Thisregistercouldnotbeusedtoholdavaluesinceanytimeaninterruptoccurred,itwouldbemodi2iedbytheOStraphandler.

Thisdesignapproachreducedthenumberofavailablegeneralpurposeregistersavailabletouser-levelcode.

Furthermore,theOScouldnotdependontheregisterremainingunchangedduringuser-levelcodeexecution;thisnecessitatedloadingtheregisterwithaknownvaluebeforeitcouldbeused.

Also,ifnotconsistentlyzeroedbeforereturningfromOScodetouser-levelcode,datacouldpotentiallyleakfromtheOStouser-levelcode,presentingsecurityconcerns.

Themcauseregistercontainsanumericcodetoindicatewhatcausedthetrap.ItiswrittenbythehardwarewhenanexceptioncausestheMachineModetraphandlertobeinvoked.Herearethepossibleexceptioncausevalues:



NumericCode Description0 Instructionaddressmisaligned1 Instructionaccessfault2 IllegalInstruction(orprivilegedinstr.violation)3 Breakpoint4 Loadaddressmisaligned5 Loadaccessfault6 Store/AMOaddressmisaligned7 Store/AMOaccessfault8 EnvironmentcallfromU-Mode9 EnvironmentcallfromS-Mode11 EnvironmentcallfromM-Mode12 Instructionpagefault13 Loadpagefault15 Store/AMOpagefault

MAX+0 0x80000000 UserSoftwareInterruptMAX+1 0x80000001 SupervisorSoftwareInterruptMAX+3 0x80000003 MachineSoftwareInterrupt

MAX+4 0x80000004 UserTimerInterruptMAX+5 0x80000005 SupervisorTimerInterruptMAX+7 0x80000007 MachineTimerInterrupt

MAX+8 0x80000008 UserExternalInterruptMAX+9 0x80000009 SupervisorExternalInterruptMAX+11 0x8000000B MachineExternalInterrupt

Details:Eachcausehasanumericcode.Thecodeschemeusesthesignbittoshowwhetherthecauseisasynchronousexception(MSB=0)oraninterrupt(MSB=1).Forasynchronousinterrupts,asmallcodenumber(0,1,2,…)isstoredinthelower-orderbits,andthesign-bitisalsoset.Sointheabovetable,MAXstandsfor231fora32-bitmachine,263fora64-bitmachine,etc.Tomakethisclear,wegivetheactualvalueinhexforRV32.

Commentary:A“privilegedinstructionviolation”isaninstructionthatexistsbutcannotbeexecutedinthecurrentprivilegemode.Anillegalinstructionisaninvalidinstructionencodingwhichdoesnotrepresentade2inedinstruction.Concerningexceptionsraisedbyeitheroftheseviolations,RISC-Vmakesno



distinctionbetween“privilegedinstruction”and“illegalinstruction.”Bothcausethe“illegalinstruction”exception.

sepc–ProgramCounterforSupervisorModeTrapHandlersscratch–ScratchRegisterforSupervisorModeTrapHandlerscause–CauseCodeforSupervisorModeTrapHandler

uepc–ProgramCounterforUserModeTrapHandleruscratch–ScratchRegisterforUserModeTrapHandlerucause–CauseCodeforUserModeTrapHandler

Theseregistersarede2inedanalogously,tobeusedinthetraphandlersrunninginSupervisororUserMode.

Somecausevaluescannotshowupinthecauseregisters.Forexample,aSupervisorModetraphandlercannoteverbeconfrontedwithaMachineModetimerinterrupt.

Questions:Butcan’tanyinterruptbedelegated;woulditstillremainaMachineModetimerinterrupt,orwoulditbealteredtoaSupervisorModetimerinterrupt???

Thereisno“Loadaddressmisaligned”codeforthescauseregister,implyingthatsuchaexceptioncannotbehandledinSupervisorMode.Whynot???

Also,theof2icialdocumentationdoesnotincludeacausecodefor“EnvironmentcallfromS-Mode”asapossiblevalueinscause;isthisamistakeinTable4.2???

scounteren–CounterEnableRegisterforUserModemcounteren–CounterEnableRegisterforSupervisororUserMode

Coderunninginalowerprivilegelevel(suchasUserMode)mayperiodicallytrytoreadthevarioustimerandcounters.Coderunningatahigherlevel(suchasanOSorhypervisor)maywishtointerceptallaccessestocountersandtimersinordertofoolthelowerprivilegecode.Perhapsthehypervisorwantstopresenttheillusiontothelowerlevelcodethatitisrunningonabaremachinewhen,infact,itisnot.Thiscouldalsobeusefulforviruseswhichwanttohidetheirexistence.

TheseCSRregisterareusedtocontrolaccesstothefollowingcounter/timesCSRs:cycle,time,instret,hpmcounter3,hpmcounter4,…hpmcounter31.



ThepurposeofthescounterenregisteristoallowSupervisorModecodetodeterminewhetheraccessestooneofthecounter/timerregistersbycoderunninginUserModeshouldcauseanexceptionornot.

ThepurposeofthemcounterenregisteristoallowMachineModecodetodeterminewhetheraccessestooneofthecounter/timerregistersbycoderunninginUserModeorSupervisorModeshouldcauseanexceptionornot.

Thereisonebitforeachofthe32counters/timers.0=disabled,causeanexception;1=enabled,noexception.

Emulationoftime/mtimeAccesses:AnotheruseofthisCSRisasfollows:RecallthattherealtimeclockmtimeisnotactuallyaCSR,butisexpectedtobeimplementedasamemory-mappedI/Odevice.HoweverthereisatrueCSRnamedtime,whichismeanttobeamirrorofmtime.BytrappingallreadstothetimeCSR,MachineModecodecanperformedtheI/OtoretrievethetimefrommtimeandproperlyemulatethereadtothetimeCSR.

SystemCallandReturnInstructions

EnvironmentCall(SystemCall)

GeneralForm:ECALL

Example:ECALL # no operands

Comment:Thisinstructionisusedtoinvokeatraphandler.Thisinstructioncausesoneofthefollowingexceptions,dependingonthecurrentprivilegelevel:

EnvironmentCallfromUserModeEnvironmentCallfromSupervisorModeEnvironmentCallfromMachineMode

Anyandallargumentsandreturnedvaluesmustbepassedinregisters.Encoding: ThisisavariantofanI-typeinstruction.



Break(InvokeDebugger)

GeneralForm:EBREAK

Example:EBREAK # no operands

Comment:Thisinstructionisusedtoinvokeadebugger,bycausingan“Breakpoint”exception.Typicallythedebuggingsoftwarewillinsertthisinstructionatvariousplacesintheapplicationcodesequence,inordertogaincontrolfromanexecutingprogram.

Encoding: ThisisavariantofanI-typeinstruction.

MRET:MachineModeTrapHandlerReturn

GeneralForm:MRET

Example:MRET # no operands

Comment:ThisinstructionisusedtoreturnfromatraphandlerthatisexecutinginMachineMode.TheMPP2ieldofthestatusregisterwillbeconsultedtodeterminewhichmodetoreturnto(eitherm,s,oru).ThevalueintheMPIE2ieldwillbecopiedtoMIE,whichwillrestorethe“interruptsenabled”statetowhatitwaswhenthehandlerwasinvoked.ThereturnwillbeeffectedbycopyingthesavedprogramcounterfrommepctotheProgramCounter(pc).

ThisinstructionmayonlybeexecutedwhenrunninginMachineMode.Encoding: ThisisavariantofanI-typeinstruction.

SRET:SupervisorModeTrapHandlerReturn

GeneralForm:SRET



Example:SRET # no operands

Comment:ThisinstructionisnormallyusedtoreturnfromatraphandlerthatisexecutinginSupervisorMode.TheSPP2ieldofthestatusregisterwillbeconsultedtodeterminewhichmodetoreturnto(eithersoru).ThevalueintheSPIE2ieldwillbecopiedtoSIE,whichwillrestorethe“interruptsenabled”statetowhatitwaswhenthehandlerwasinvoked.ThereturnwillbeeffectedbycopyingthesavedprogramcounterfromsepctotheProgramCounter(pc).

ThisinstructionmaybeexecutedwhenrunningineitherSupervisorModeorMachineMode.


URET:UserModeTrapHandlerReturn

GeneralForm:URET

Example:URET # no operands

Comment:ThisinstructionisnormallyusedtoreturnfromatraphandlerthatisexecutinginUserMode.UserModetraphandlersalwaysreturntoUserModecode.ThevalueintheUPIE2ieldwillbecopiedtoUIE,whichwillrestorethe“interruptsenabled”statetowhatitwaswhenthehandlerwasinvoked.ThereturnwillbeeffectedbycopyingthesavedprogramcounterfromuepctotheProgramCounter(pc).

Thisinstructionmaybeexecutedinanymode.Encoding: ThisisavariantofanI-typeinstruction.

WFI:WaitForInterrupt

GeneralForm:WFI



Example:WFI # no operands

Comment:Thisinstructioncausestheprocessortosuspendinstructionexecution.Whenanasynchronousinterruptoccurs,theprocessorwillwakeupandresumeexecution.Thetraphandlerwillbeinvokedand,uponreturntothecodesequencecontainingtheWFIinstruction,thenextinstructionfollowingtheWFIwillbeexecuted.

Typically,softwarewillexecutethisinstructionwithinan“idle”loop,whichisonlyexecutedwhenthereisnothingtobedone.Suspendinginstructionexecutionmaysavepower.Inasystemwithmultiplecores,thisinstructionmayalsoprovideahinttotheinterruptcircuitrytoroutefutureinterruptstothiscore,sinceitisidling.

Thisinstructionmaybeimplementedasanop.Forsimplerprocessors,itmaybebettertojustspininanidleloop.Complexsystemsmayhaveaninexhaustiblesupplyofbackgroundprocesses,makingalow-powerwait-statepointless.

Ifinterruptsare(globally)disabledwhenaWFIinstructionisexecuted,thenthisdisablingisignored.Instructionexecutionwillresumewhenaninterruptoccurs.Seethespecfordetails.



Chapter9:VirtualMemory

PhysicalMemoryAttributes(PMAs)

Mainmemoryisthelargerangeofbyte-addressablelocations.Eachlocationhasa“physicaladdress”.

Insystemswithoutvirtualmemorymapping,eachandeveryaddressgeneratedbyaninstructionisaphysicalmemoryaddress.However,manysystemswillsupportvirtualmemorymapping,wherethereisadistinctionbetweenvirtualaddressesandphysicaladdresses.Aninstruction(e.g.,inaREADorWRITEinstruction)willgenerateavirtualaddressandthememorymanagementunit(MMU)willtranslatethisvirtualaddressintoaphysicaladdress,whichisthenusedbythehardwaretosendorretrievedatato/fromthemainmemorystore.

Thephysicaladdressspacecanbepopulatedby:

•Normalmemory •Memory-mappedI/Odevices •Empty(i.e.,unpopulatedholes)

I/Odevicesarecommonly“memorymapped”,whichmeanstheysitonthesamebusasnormalmemory.Thedeviceisassignedaspeci2icphysicaladdress(orsetofaddresses).SoftwarecansenddatatoanI/Odeviceby“writing”toamemorylocationpopulatedbythedeviceandcanretrievedatafromthedeviceby“reading”fromsuchanaddress.Fromthepointofviewoftheexecutingsoftware,thedevicelooksabitlikeanyothermemorylocation,butreadingandwritingtomemory-mappedI/Olocationsisdone,nottostoredata,butforthepurposeofsendingdatato,receivingdatafrom,andotherwisecontrollinganI/Odevice.

Tohandledvariousscenarios,thephysicaladdressspacewillbepartitionedintoregions.Theregionswillbecontiguous,one-after-the-other,non-overlappingandwillcovertheentirephysicaladdressspace.Everybyteinthephysicaladdressspacewillbeinexactlyonephysicalmemoryregion.



Eachregioncanbecharacterizedbydifferentattributes.Forexample:

Istheregionpopulatedwithmainmemory?Istheregionpopulatedwithmemory-mappedI/Oregisters?

Istheregion(i.e.,unpopulated,empty)? Doestheregionsupportreads(i.e.,datafetches)? Doestheregionsupportwrites? Doestheregionsupportexecution(i.e.,instructionfetches)? Doestheregionsupportatomicoperations? Istheregioncached? Istheregionsharedwithothercores,orprivate?

Collectively,theseare“physicalmemoryattributes”(PMAs).Someregionswillhavetheirattributes2ixedandunchangeable,otherregionswillbecon2igurableuponpower-up(i.e.,cold-pluggable),andotherregionsmighthavetheirattributeschangeduringsystemoperations(i.e.,hot-pluggable).

Memoryattributesmustcheckedconstantlyduringexecution.Forexample,aREADfromanunpopulatedmemoryregionoughttocauseanexception.

WhereshallthePMAinformationbestored?Oneapproachistostoretheinformationinthevirtualmemorytables,andrequirethememorymanagementunit(MMU)tocheckandenforcetheconstraints.Theremaybeproblemswhenthepagesizeofthememorymanagementunitdoesn’tmatchthesizeoftheregions.

IntheRISC-Vdesign,thereisaseparatesubsysteminthecorewhichisconcernedwithcheckingandenforcingthephysicalmemoryattributes.Thissubsystemiscalledthe“PMAchecker”.

Whenaninstructionattemptstoaccessmemory,theaddressis2irsttranslatedfromvirtualtophysicaladdress(ifvirtualmemoryisturnedon).Then,thePMAcheckerwillverifythatthetypeofaccessislegal.Ifnot,anexceptionwillbegenerated.Thereareseveralexceptiontypesforphysicalmemoryattribute(PMA)violationsandthesearedifferentfromthevirtualmemoryfaultexceptions.

TheinformationaboutthevariousmemoryregionsandtheirassociatedPMAswillbekeptinprocessorregisters.Theseregisters(orpartsofthem)maybehardwiredforsomeregions(whichisappropriatewhentheregionitselfisprecon2iguredandnotmodi2iable,suchasanon-chipROM),ormaybereadable(forqueryingthebus



todeterminewhatregionsarepresentandtheirPMAs),ormaybewritable(allowingthesoftwaretoestablishandmandatePMAs).Detailsareimplementationdependent.

ConcerningtheatomicoperationsofRISC-V,itisimportanttodeterminewhetheraparticularregioncansupportaparticularinstruction.

RegionspopulatedbymainmemorymustsupportalltheAMOinstructions(AtomicMemoryOperations)andLR/SCinstructions(assumingtheseinstructionsareevenimplemented,ofcourse).

RegionspopulatedbymemorymappedI/OdevicesarenotexpectedtosupporttheLR/SCinstructions.

Memory-mappedregionsmayormaynotsupporttheAMOoperations.IfsucharegionsupportsanyAMOinstructions,thenitmustsupportAMOSWAPataminimum.TheregionmayalsooptionallysupportotherAMOinstructions.HerearethelevelsofAMOsupportthatmaybeimplemented:

•NoAMOoperationssupported •OnlyAMOSWAPsupported •AMOSWAP,AMOAND,AMOOR,AMOXOR •AllAMOoperationsaresupported

ConcerningtheFENCEinstructions,recallthataccessesareclassi2iedasbeing“memory”or“I/O”.Tosupportthis,everyregionofthephysicalmemoryaddressspacemustbeclassi2iedaseither:

•mainmemory •I/O

Itispossiblethattwodifferentcores(orotherdevicessittingonthephysicalmemorybus)mayaccessthesamememorylocationsmore-or-lesssimultaneously.Accessestomainmemoryregionsare“relaxed”,whichmeansthattheexactorderingoftheoperationsisindeterminate(unlessatomicinstructionsareused,ofcourse).Theprogrammercanuseatomicinstructionstoforceparticularorderings,butfornormalinstructions,therearenoguaranteesaboutordering.Asinglecorewillseeitsownoperationsexecutedintheordertheywereissued,butreadsandwritesfromothercoresordevicescanappearatanytime,inanyorder.



Accesstomemory-mappedregionsmayeitherbe“relaxed”or“strong”.Iftheregionisde2inedtohaverelaxedordering,thentheorderingofoperationsisunde2ined,asjustdescribed.Iftheregionhasstrongordering,thenitwillappeartoonecore(A)thattheoperationsexecutedbysomeothercore(B)wereallexecutedintheordertheywereactuallyissuedbycore(B).

Thissummaryisasimpli2icationoftheRISC-Vspec.Thespecisdesignedtoapplytoaspectrumofdifferentkindsofsystems,fromsimpletocomplex.Moderncomplexsystemsmayhavemultipleparallelchannelsbetweencores,devices,andmainmemorystores.ImaginethatallthedevicesareconnectedbyanetworkwiththepropertiesoftheInternet:messagesaresentbetweendevices,cantakedifferentpaths,andcanbearbitrarilyreorderedintransit.Ontheotherhand,olderandsimplersystemshaveasingle,simplebusthatdoesonethingatatimeandreorderingisinconceivable;thesesystemsmayusedevicedriversthatweredesignedwithoutconcernforreorderingandonegoalistosupportsucharchitecturesandlegacyprogramming.

CachePMAs

qqqqq

PMAEnforcement

TheenforcementofPhysicalMemoryAttributes(PMAs)isoptional.

ARISC-VcoremayoptionallyprovideaPhysicalMemoryProtection(PMP)unit.PhysicalmemoryprotectionisimplementedusingasmallnumberofMachineModeCSRsthatdescribethememoryregionsandtheirPMAs.

TheCSRregistersencodethefollowinginformationforeachregion:

•Thestartingaddressoftheregion •Whethertheregionis4,8,16,32,…bytesinlength •DoestheregionsupportREADoperations? •DoestheregionsupportWRITEoperations? •DoesheregionsupportEXECUTIONfetches?



•Isthisregion“locked”?

Upto16regionscanbedescribedintheregisters.Thisinformationiscleverlyencodedin16registersplusacoupleofadditionalregisters.

ThePMPUnitisactivewhenrunninginSupervisororUserMode;allaccessesgeneratedarechecked.TheprotectionmayormaynotbeenforcedwhenrunninginMachineMode.

SincetheseregistersareMachineModeCSRs,theycanonlybeaccessedinMachineMode.

Thereisasingle“Lock”bitforeachoneofthe16regionsanditworksasfollows.Ifthelockbitis0,thenMachineModeaccessesarenotchecked;onlySupervisorandUserModeaccessesarechecked.

IftheLockbitisset,thenallMachineModebitsarecheckedaswell.

OnceaLockbitisset,itcannotbecleared.Alllockbitsareinitiallyclear;toclearalockbitrequiresthesystemtobefullyreset(e.g.,POWER-ON-RESET).Inotherwords,oncearegionbecomelocked,itstayslockedforever.Furthermore,allaccesses,regardlessofprivilegemode,willbechecked.

ThislockingschememightbeusefulforsecurebootstrappingcodewhichloadsOScodeintomemoryandthenlocksdowntheregioncontainingtheOS,topreventmalwarefromsubsequentlycorruptingtheOScode.

TheRISC-VspecsuggeststhattheCSRsthatdescribethememoryregionsmightgetswappedinandoutduringcontextswitches.

Forexample,oneprocess(adevicedriver)mightbeallowedtoaccessagivenmemoryregionwhileanotherprocess(auserthread)oughttobeforbiddenfromaccessingthesameregion.Asanotherexample,imagineahypervisorrunninginMachineModethatishostingtwodistinctoperatingsystems,whicharerunninginSupervisorMode.Eachoperatingsystemwillbegivenarangeofavailablemainmemorytouse.ToensurethateachOSstaysoutofthememoryregionusedbytheotherOS,theMachineModesoftwarewillusethePMPunittoenforceallaccesses.WhentimeslicingbetweentwoOSes,thehypervisorcoderunninginMachineModewillneedtoswaponesetofPMPregisterswiththeotherset.



Todescribeeachmemoryregion,wemustsomehowspecifyastartingaddressandalength.Eachmemoryregionhascertainprotectionattributesassociatedwithitandthesemustalsobegiven.

Recallthattherecanbeupto16differentmemoryregions.Thereisessentiallyatablewith16entries,whichcandescribeupto16differentmemoryregions.

Theinformationabouteachregion(i.e.,eachtableentry)isencodedintoasinglefull-length“addressregister”(e.g.,32bits)andanadditional“con2igurationbyte”(8bits).Thelengthoftheregioniscleverlyencodedwithinthesebits,asdescribednext.

Thenamesoftheaddressregistersareasfollows.EachisafullCSR,e.g.,32bitsforRV32machines,andlongerforRV64.

pmpaddr0 pmpaddr1 … pmpaddr15

Thenamesofthecon2igurationbytesaregivennext.Eachisan8bitquantity.

pmp0cfg pmp1cfg … pmp15cfg

Thecon2igurationbytesarepackedinto4registers,withfourcon2igurationbytesperregister.(…atleastfortheRV32machines.For64-bitmachines,only2CSRsareneeded,since8con2igurationbytescanbepackedintoeachregister.)

Eachregionhasacon2igurationbyte,whereinthe8bitsareusedasfollows:

Numberofbits Meaning1 Regionisreadable1 Regioniswritable1 Regionisexecutable1 Regionislocked2 “A”2ield2 unused



The“A”2ieldhasthesepossiblevalues:

Value Meaning00 Thisentryisnotused01 Top-of-rangeaddressencoding10 Regionis4byteslong11 Regionis>4byteslong

Notallofthe16regionsneedbepopulated;the00valueofthe“A”2ieldindicatesanunusedentryintheregiontable.

Eachregionmusthavealengththatisapowerof2.Furthermore,thelengthmustbe4orlarger;thatis:4,8,16,32,64,…Theregionmustbenaturallyaligned,givenitslength.Forexample,aregionofsize1,024muststartonanaddressthatisevenlydivisibleby1,024.

Iftheregionhassize4,the“A”2ieldwillbe01andtheaddress2ieldwillcontainthestartingaddressoftheregion.

Iftheregionhasalargersize(suchas8,16,32,64,…),the“A”2ieldwillbe11andtheaddressregisterwillcontainboththestartingaddressandlength.Forexample,consideraregionwithsize1,024.Howisthisencodedintheaddressregister?Theregionmustbenaturallyalignedsothestartingaddresswilllooklikethis,inbinary:

XXXXXXXXXXXXXXXXXXXXXX0000000000

Thelast10bitsofthestartingaddresswillalwaysbe0.Thelengthoftheregion(1,024inthisexample)willbeencodedinthesezerobits,asfollows:

XXXXXXXXXXXXXXXXXXXXXX0111111111

Thehardware,bylookingatthevalueintheaddressregisterandbylocatingtherightmost0bit,candirectlyinferthesizeoftheregion.

Okay,it’smorecomplexthatthis.For32bitmachines,theregiontablecanactuallybeusedtodescriberegionsanywhereina34bitaddressspace.Theextra2bitsofaddressareachievedasfollows.



Foraregionofsize4bytes,thelast2bitsmustbe0,sotheyarenotrepresented.All32bitsintheaddressregisterareusedforbits[33:2]oftheaddress,withbits[1:0]implicitlybeing00.

Foraregionofsize8bytes,theaddressregisterwillcontain:

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX0

Herethereare31signi2icantbits.Addingintheimplicitbits000(sincetheregionis8-bytesinsize),gives34bitsofstartingaddress.

Foraregionofsize16bytes,theaddressregisterwillcontain:

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX01

Herethereare30signi2icantbits.Addingintheimplicitbits0000(sincetheregionis16-bytesinsize),gives34bitsofstartingaddress.

Andsoonforlargerregions.ForRV64,thissameshiftingby2bitsalsooccurs.Allphysicaladdressesarelimitedto56bits,soonlythelowerorder54bitsintheaddressregistersareused.

Thereisa2inalwayofdescribingaregion’sstartingpointandsize.Ifthe“A”2ieldis01(“top-of-range”)thentheaddressregisterisinterpreteddifferently.Thisencodingschemecanbeusedwhentheregionsarecontiguousandeachregionfollowsdirectlyafterthepreviousregion.Inthisencoding,theaddressregisterdoesnotspecifythestartoftheregion,buttheendingaddressoftheregion.(Actually,itisonepastthelastaddress.)Withthiscodeforthe“A”2ield,thestartingaddressisgivenbythepreviousaddressregister(oraddress0forthe2irsttableentry).

Whenthe“top-of-range”encodingisused,theshiftingby2bitsstilloccurs.Thus,allphysicaladdressesare34bitsandeachregionmustbeginona4bytealignedaddress.

Thereasonthat34-bitand56-bitphysicaladdressesarespeci2iedisthatthepagingsystemsusethesesizes.Inparticular,theSv32pagingschemeuses34-bitphysicaladdressesandtheSv39andSv48pagingschemesuse56-bitphysicaladdresses.

Wheneveranaccesstophysicalmemoryisattempted,theprocessorwill2irstconsulttheprotectiontable.Thelowestnumberedentrythatmatchestheaddress



willbeusedtodetermineiftheaccessisallowed.Iftheaccessisnotallowed,thenasynchronousexceptionwillberaised(i.e.,“loadaccessexception”,“storeaccessexception”,or“instructionaccessexception”).Seethespecforcompletedetailsonwhichentryisselected.

VirtualMemory

VirtualMemory(i.e.,pagetablesandvirtual-to-physicaladdresstranslation)ishandledinSupervisorMode.

ThereareseveralvirtualmemoryschemesdescribedintheRISC-Vspec,called“bare”,“Sv32”,“Sv39”,and“Sv48”.The“bare”optionmeansnovirtualaddresstranslationoccurs.

A32bitmachine(RV32)maysupporta2levelpagetable(butnot3levelsor4levels).A64bitmachine(RV64)maysupporta3leveltableora4leveltable(butnota2leveltable).

PageTable Virtual Physical Depth AddressSpace AddressSpace

RV32: Bare Sv32 2levels 32bits,4GiBytes 34bits,16GiBytes

RV64: Bare Sv39 3levels 39bits,512GiBytes 56bits,64PiBytesSv48 4levels 48bits,256TiBytes 56bits,64PiBytesSv57 …tobedeZinedlater… Sv64 …tobedeZinedlater…

Thesizeofthephysicaladdressesisimplementationde2ined.Thenumbersgivenabovearethemaximums,butinaparticularimplementationthephysicaladdressspacewilloftenbesmaller.

Regardlessofwhichvirtualmemoryschemeisused,thepagesizeis4,096(4KiBytes).Thus,byteoffsetsintoapagerequire12bits,since212=4,096.Allpagesmustbealignedonapageboundary,sothelower12bitsofanypageaddressarealways000000000000.



WhatisaPageTable?Apagetableisatreewhereeverynodeisapage.Thesepagesareeachkeptinmainmemoryandaddressedwithphysicaladdresses.

Thepagesattheleafsarecalled“datapages”andcontaintheactualbytesinthevirtualaddressspace.

Theinteriornodesconstitutethepagetableitselfandallowthedatapagestobelocated.Therootpageplustheinteriornodesconstitutethepagetableandarenotvisibletotheuserprocess.Thedatapagesconstitutethevirtualaddressspace(oratleastthepartofitthatispopulated)andcontainthedatabytesthattheuserprocesswillaccess.

Eachinteriorpagecontainsanarrayof“PageTableEntries”(PTEs).EachPTEpointstoachildnode.SomePTEsareconsider“leafPTEs”andpointdirectlytoadatapage.OtherPTEspointtolowerlevelsinthepagetabletree.

IntheSv32addressingscheme,thepagetableisonly2levelsdeep.PageTableEntriesare4byteslong.Therefore,wehave:

1rootpage,containing1,024PTEs 1,024interiorpages,with1,024PTEseach 1,024×1,024datapages,of4,096byteseach

Thus,avirtualaddressspacecontains4GiBytes.

Theterm“pagetable”isoftenusedimpreciselytomeaneitherasingle4,096byteinteriornodeinthetree,ortheentirecollectionofinteriorpagesmakingupthewholetree,notincludingthedatapages.

Thesatp(SupervisorAddressandTranslationRegister)istheCSRthatcontrolsaddresstranslation.

Thesatpregistercontainsthree2ields:

MODE Isaddresstranslationturnedonornot? ASID AddressSpaceIdenti2ier PPN PhysicalPageNumber,i.e.,addressofpagetable’srootpage

ForRV32systems,thesatpregisteris32bits,arrangedasfollows:



�

ForRV64systems,thesatpregisteris64bits,arrangedasfollows:

�

TheMODE2ieldindicateswhethervirtualmemoryaddresstranslationisturnedonornotand,ifso,whichpagingtableschemeisbeingused.

ForRV32systems,theMODE2ieldcanhavethesevalues:

0 addresstranslationturnedoff(i.e.,“bare”mode)1 Sv32addresstranslationisturnedon

ForRV64systems,themode2ieldcanhavethesevalues:

0000 addresstranslationturnedoff(i.e.,“bare”mode)1000 Sv39addresstranslationisturnedon1001 Sv48addresstranslationisturnedonother unused/reservedforSv57,Sv64

The“bare”modeindicatesthatthereisnovirtual-to-physicaltranslation.Alladdressesarephysical.Regardlessofmode,theenforcementofphysicalmemoryattributes(i.e.,usingthe“pmp…”registers,describedelsewhere)isstilloperative.



ThePhysicalPageNumber(PPN)2ieldcontainsthephysicaladdressoftherootpageofthepagetabletree.

Sinceallpagesmustbealignedona4,096byteboundary,thelast12bitsoftheaddressofanypagemustbe0.ForRV32,thePhysicalPageNumber(PPN)2ieldis22bitswide,whichissuf2icienttoaddressanypageina34bitphysicaladdressspace.ForRV64,thePPN2ieldis44bitswide,whichissuf2icienttoaddressanypageina56bitphysicaladdressspace.

Eachuserprocesswillpresumablyruninadistinctaddressspace.EachaddressspaceisgivenanAddressSpaceIdenti2ier(ASID).ForRV32machines,theASIDis9bits(allowingfor29=512differentaddressspaces).ForRV64machines,theASIDis16bits(allowingfor216=65,536differentaddressspaces).

ThesatpregistercontainstheASIDofthecurrentlyexecutingprocess.

OverviewofTLBs:Tomakeaddresstranslationfastenoughforvirtualmemorytobefeasible,pagetableentriesmustbecachedinan“addresstranslationcache”.Suchacacheistraditionallycalleda“TranslationLookasideBuffer”,andsoisabbreviatedas“TLB”.

TheTLBwillcontainasmallnumberofpagetableentries.WhenaFETCH,LOADorSTOREtomemoryoccurs,theaddresstranslationhardwarewill2irstchecktheaddresstranslationcache(TLB).IftheTLBcontainsamatchingentry,theaddresstranslationhardwarewilluseittogenerateaphysicaladdressimmediately,whichismuchfasterbecauseweavoidgoingtomainmemorytoreadthepagetabletreetolocatethedatapage.

TheTLBwhichisorganizedasanassociativememory,keyedonvirtualpagenumber.IfaTLBentryispresent,thentheentrywillsupplythephysicalpagenumber.Iftheentryisabsent,thentheaddresstranslationprocessmustgotomemorytofetchtheappropriateentryfromthepagetabletree.SomeTLBentrywillbeevictedandthenewentrywillbeplacedintheTLB.Theaddresstranslationwillthenproceed.

UpdatingtheTLB,whichincludeswalkingthein-memorypagetabletreeto2indtheappropriateentryandwritingtheevictedentrybacktomemory,isacomplexandtime-consumingtask.Forfasterperformance,TLBupdatingandpage-tablewalkingareperformedinhardwareonmanysystems.However,insimpler



implementations,walkingthepagetableandupdatingtheTLBareperformedbysoftware.

Thewaysoftwareupdatingworksisasfollows:WhenthereisnovalidentryintheTLBduringanaddresstranslation,anexceptionwillbesignaled.Thehardwareneverattemptstowalkthein-memorypagetables.Theexceptionishandledbyatraphandler(whichmayberunninginMachineMode).Thetraphandlerwillessentiallyemulatethemissinghardwareinsoftwareandwillwalkthein-memorypagetableandupdateoftheTLBdirectly.Howthisoccursisasoftwareissue,notcoveredintheRISC-Vspec.

Fromtimetotimecontextswitcheswilloccurandthecurrentlyexecutingprocesswillchange.Sinceeachprocesscanoccupyadifferentaddressspace,theentriesintheTLBmaynotbeappropriateforthenewlyexecutingprocess.ThepurposeoftheAddressSpaceIDistomakesurethataprocessonlyusespagetableentriesthatapplytoit.EachentryintheTLBwillbetaggedwithanASID;onlyentriesthatmatchthecurrentlyexecutingprocess’sASID(asstoredinthesatpregister)willbeused.

ThespecdoesnotrequireASIDstobeimplementedand,iftheyareimplemented,thefullrangeofvaluesneednotbesupported.

Thespecmentionsthatwritingtothesatpregistermayrequirecarefulconcurrencycontrol.ItcouldbethataddresstranslationsusingolderpagetablesoperateconcurrentlyandanupdatetosatpmayrequiretheSFENCE.VMAinstructioninordertoensurecorrectnessofcode.

Theideaisthat,aslongastherearenottoomanydistinctaddressspacesandtheASID2ieldislargeenough,thentheASIDmechanismwillpreventoneprocessfromusingpagetableentriesfromthewrongaddressspace.Butwithalargernumberofaddressspaces,itmaybenecessaryto2lushtheaddress-translationcache,i.e.,to2lushtheTLB.Wemayalsoneedto2lushtheTLBwhenchangesaremadetoarunningprocess’saddressspace,i.e.,toapagetablethatiscurrentlyinuse.Forexample,ifweremoveadatapagefromtheaddressspace,itisnotsuf2icienttosimplyinvalidatetherelevantPageTableEntryinthepagetablenodestoredinmemory.WealsoneedtoensurethattherearenoentriesstillsittingintheTLB.

Comment:NotethatbystoringtheMode,ASID,andPhysicalPageNumberoftherootofthepagetableinasingleregister,itallowsallthreetobeupdatedatomicallywithasingleCSRinstruction.



SFENCE.VMA:SupervisorFenceforVirtualMemory

GeneralForm:SFENCE.VMA vaddr,asid

Example:SFENCE.VMA x5,x8

Comment:Thisinstructionisusedtoimposeorderonaccessestomemoryandupdatestopagetables.Inparticular,itisdesignedto2lushtheaddresstranslationcache(TLB).

WhenevervirtualmemoryisturnedonandaFETCH,LOADorSTOREtomemoryismade,addresstranslationmustoccursotherewillalsobeimplicitREADsandWRITEstothepagetablesstoredinmemory.Normally,theaccesstothepagetableinmemorycanbeavoidedandtheaccesswillbemadetotheaddresstranslation(TLB)cacheinstead.Butwheneverchangestothepagetablesinmemoryaremade,weneedto2lushobsoleteentriesintheTLB.Thisisthepurposeofthisinstruction.

WhentheOSupdatesapagetablenode,itwillissueaWRITEtomemory.InordertomakesurethataWRITEtothepagetableisseenbeforeallsubsequentmemoryaccesses,thisinstructionmustbeexecuted.ItensuresthatallWRITEstomemorythatoccurearlierintheinstructionstreamwillbecompletedandvisiblebeforeanyimplicitREADsorWRITEStothepagetabletriggeredbysubsequentoperationsintheinstructionstream.SincetheTLBiscachingpagetableentries,thisinstructionmustalsoinvalidateaffectedentries,forcingthememorymanagementunittogobacktomemorytoretrievecurrentpagetableentries.

[ItwouldseemnecessarytoalsoforceallearlieraddresstranslationstocompletebeforetheWRITEtothepagetablesoccurs,butthespecdoesnotmentionthisorderingconstraint.???]

Toimposea2inerlevelofgranularity,thisinstructioncanspecifyaspeci2icvirtualaddressand/oraspeci2icaddressspace.Reg1containsavirtualaddress;Reg2containsanAddressSpaceID.

IftheASIDisspeci2iedasx0,thenentriesforalladdressspacesareaffected;otherwise,onlyentriesforthegivenaddressspaceareaffected.



Ifthevirtualaddressisspeci2iedasx0,thenalllevelsofthepagetableareaffected;otherwise,onlytherelevantleafPTEinthepagetableisaffected.

IfboththeASIDandthevirtualaddressaregivenasx0,thenitcausesacomplete2lushoftheTLB.SimpleimplementationsarefreetoalwaysignoretheASIDandvirtualaddressspaceandperformaglobalTLB2lushwheneverthisinstructionisexecuted.

Itispossiblethatseveralcoresaresharingmemoryandthattwothreadsrunningondifferentcoresaresharingasingleaddressspaceandasinglepagetabledatastructureinmemory.Ideally,anyupdatetothispagetableshouldbeseenbyallthreadsonallcores.However,thisinstructiononlyaffectsasinglecore.Tosynchronizewithotherthreadsonothercores,theOSmustuseotherinstructions.

Encoding: ThisisavariantofanR-typeinstruction,wheretheRegD2ieldisunused.

Sv32(Two-LevelPageTables)

IntheSv32addresstranslationmode,avirtualaddressis32bits.Thisispartitionedintoa20-bitvirtualpagenumber(VPN)anda12bitoffset.The20-bitVirtualPageNumberistranslatedbythememorymanagementunitintoa22-bitPhysicalPageNumber.ThePhysicalPageNumberisthenconcatenatedwiththeoffsettogivethe34-bitphysicaladdress.

�



Thesizeofeachpageis4,096bytes,andeachbytewithinapagecanbeaddressedby12bits,since212=4,096.Allpagesmustbeproperlyalignedona4,096byteboundary.

EachPageTableEntry(PTE)is4bytes.Thus,eachpagecanhold1,024entries.Withinapage,eachentrycanbeaddressedbya10-bitaddress,since210=1,024.

Thepagetableisorganizedasa2-leveltree,whereallnodesareinterior(i.e.,non-data)pages.Therootpageisatthetopofthetreeandcontains1,024pagetableentries.Eachentrycontainsapointertoasecondlevelpage.Thesecondlevelpagescontainpagetableentrieswhichpointtothedatapages.Thesecondlevelpagesarereferredtoas“leaf”pages;thedatapagescanbeconsideredtoliebelowtheleafpages.

APageTableEntry(PTE)is4bytesandcontainsthefollowing2ields:

FieldSize Meaning 22 PhysicalPageNumber 2 RSW(Unde2ined;ReservedforSoftware) 1 D(Dirty) 1 A(Accessed) 1 G(Globalmapping) 1 U(Useraccessible) 1 V(Valid) 3 XWR(Executable,Writable,Readable)

�

TheVirtualPageNumber,whichis20bits,isdividedintotwo2ieldsof10bitseach,calledVPN[1]andVPN[0].



Theuppermost10bits(VPN[1])isusedtoselectanentryintherootpage.ThisentrycontainsaPhysicalPageNumber(PPN)2ield,whichaddressesasecondlevelpage.

Thesecond10bitsoftheVirtualPageNumber(VPN[0])isusedtoselectanentryinthesecondlevelpage.Thisyieldsthe2inalpagetableentry(calleda“leaf”pagetableentry),whichcontainsthePhysicalPageNumberofthepagecontainingtheactualdata.

Thisdiagramshowsthis.

�

Thereisasinglerootpageandupto1,024secondlevelpages.Eachpageinthepagetable(thatis,therootpageandallsecondlevelpages)contains1,024PageTableEntries.Theentriesatthebottomofthetreearecall“leaf”entriesandpointtothedatapages.Leafentriesaredistinguishedfromnon-leafentriesbytheXWR(Execute-Write-Read)permissionbits.IfXWR=000,thentheentryisnotaleafentry.



�

Thereisafacilityfor“megapages”.Amegapageisadatapagecontaining4MiBytes.Anoffsetintoamegapagewillbe22bits,since222=4×1,024×1,024=4,194,304.Amegapagemustbealignedtoa4MiByteboundary.

TheXWRbitsinapagetableentryindicatewhetherthatentryisaleafentry(meaningthattheentrypointstoadatapage)oranon-leafentry(meaningthattheentrypointstoanotherpageinthepagetabletree).IftheXWRbitsare000,thentheentryisanon-leafentry,pointingtothenextlevelinthepagetabletree.IftheXWRbitsarenot000,thentheyindicatethattheentrypointstoadatapageandtheygivethe“execute”,“read”,and“write”permissionsforthedatapage.

Inthecaseofmegapages,thereisnosecondlevelpageinthetree.Instead,thePageTableEntryintherootpagepointsdirectlytoadatapage,i.e.,amegapage.



�

Thereareacoupleofbene2itsofplacingmegapagesinavirtualaddressspacethatmayalsocontainnormal4,096bytedatapages.

First,wheneverdatafromapageis2irstreferenced,theMemoryManagementUnitmustwalkthepagetableto2indtheappropriateleafpagetableentry.Withmegapages,onlytherootpagetablemustbeconsulted,sincethereisnosecondlevelpageinvolved.Thisreducesthenumberofmemoryaccessesbyone.(ThiscouldmeansubstantialsavingsifthePTEneedstobere-loadedfrequently,duetoTLBcontentionand/orneedstobeupdatedinmemoryuponTLB2lushing).

Thesecondadvantageisthatasingleentryintheaddresstranslationcache(theTLB)willcover4MiBytes,insteadof4KiBytes.ThismeansthatfewerentriesintheTLBwillbeneededforeachaddressspace.TheTLBisasigni2icantbottleneckandreducingcontentioniscritical.

[Forexample,asimpleaddressspacemayhaveoneblockofmemorymarked“read/execute”(fortheprogramandconstants)andasecondblockmarked“read/write”(forstaticvariablesandthestack).Sinceneitherofthesewillnormallyexceed4MiBytes,only2entriesintheTLBwouldbeneededfortheentireaddressspace.TLBentriesarepreciousandfewinnumber,sothisreducespressureontheTLB.]

Howevermegapagescomewithdrawbacks.First,theywilloftenbelargerthannecessary.(Thisiscalledthe“internalfragmentationproblem”.)Theextraspaceinthemegapagesrepresentsrealphysicalmemoryand,ifnotneededbytheprocess,thismemoryiseffectivelywasted.Forexample,ifanaddressspaceonlyrequires64



KiBytes,butisallocatedtwomegapages,thenapproximately99%ofthememoryiswasted,since222–216≈222.Ifconcurrentprocessesaredoinglotsofcomplexdatasharingbysharingvirtualmemorypagesthisfragmentationproblemcanbecomeworse.

Second,theOSkernelmustbeabletoallocatelargeblocksofcontiguousmemory.Thisisnotaproblemifthememoryisdividedintoacollectionofequalsizedpages.Butiftherearemultiplepagesizes(e.g.,1,024and4,194,304)thenthingsgettrickier.Afterall,thisisessentiallytheproblemthatvirtualmemorywasdesignedtoaddress.(Thisiscalledthe“externalfragmentationproblem”.)

Next,welookatthemeaningofthebitsinthePageTableEntry(PTE).

TheValidbit(V)indicateswhetherthisPTEisvalidornot(1=valid,0=invalid).IfaninvalidPTEisencounteredbytheMemoryManagementUnit,thenanaccessfaultissignaled.(Theexceptionwillbeeithera“instructionaccessexception”,“readaccessexception”,or“storeaccessexception”asappropriate).IfthePTEisinvalid,theremainingbitsareignoredandcanbeusedbysoftwareasdesired.

TheUserAccessiblebit(U)controlswhetherthedatacanbeaccessedinSupervisorModeorUserMode.Typically,aprocess’saddressspacewillbedividedintotwoparts,sometimescalledthe“bottomhalf”andthe“tophalf”.ThebottomhalfcanonlybeaccessbycoderunninginUserMode;datainthetophalfcanonlybeaccessedwhenrunninginSupervisorMode.U=1meansbottomhalf;U=0meantophalf.IfUserModecodeattemptstoaccessdatainthetophalf,anaccessfaultwilloccur.IfSupervisorModecodeattemptstoaccessdatainthebottomhalf,anaccessfaultwilloccur.

However,theSUMbitintheSupervisorStatusword(sstatus)canbeusedtoallowSupervisorModecodetoaccesspagesinthebottomhalf.Normally,SupervisorcodewillrunwithSUM=0,inwhichcaseanaccessfaultwilloccur.Butoccasionally(e.g.,whendataispassedtotheOSinakernelcall)SupervisorCodewillsetSUM=1forashorttime.Duringthistime,LOADandSTOREinstructionswilloperateasifrunninginUserMode.Thisallowsthekernelroutinestoretrieveorstoreparametersinthebottomhalf(i.e.,intheuserportionofthevirtualaddressspace).

TheUbitisonlymeaningfulforleafPTEs;fornon-leafPTEs,thisbitwillbe0.

TheGlobalMappingbit(G)concernspagesthataresharedamongalladdressspaces.G=1meansgloballysharedbyalladdressspaces;G=0meanslocaltoasingle



addressspaceandnotshared.Ifapageisgloballyshared,itmeansthatitismappedintoalladdressspaces.Inotherwords,theAddressSpaceID(ASID)checkingissuppressed.Forexample,asetofcommonlyused,read-onlyfunctionsanddatacanbemappedintoalladdressspaces,makingtheseroutinesavailableatessentiallynocosttoalladdressspaces.Thesecouldbeuser-levelfunctionspresentinall“bottomhalves”orcouldbeSupervisorModecodethatisplacedintoall“tophalves”.

Thespecindicatesthatthisbitismeaningfulforbothleafandnon-leafPTEs.Whensetinanon-leafPTE,itmeanstheentirepagetabletreebelowisglobalandsharedbyalladdressspaces.

[???ItisnotclearwhytheGbitisspeci2iedasbeingmeaningfulinnon-leafPTEs.IfaleafPTEthatismarked“global”isstoredintheTLB,thenthatTLBentry(andthedatapageitrefersto)isshared.ButifthereisnoleafPTEintheTLB,thentheMemoryManagementUnitwillneedtowalkthepagetabletreetolocatetheleafPTE.Duringthiswalk,itwouldnotseemtomakeanydifferencewhethersomepagetablepageissharedornot;eachpageofthepagetabletreestillhastobeaccessed.PerhapstheRISC-Vspecenvisionsputtingnon-leafPTEsintheTLB.???]

TheAccessedbit(A)issetbytheMemoryManagementUnitwheneverabyteonthedatapageisread,written,orfetchedforexecution.Insometextbooksthisbitiscalledthe“referenced”bit.ThisbitisonlymeaningfulforleafPTEs;fornon-leafPTEs,thisbitwillbe0.

TheDirtybit(D)issetbytheMemoryManagementUnitwheneverabyteonthedatapageiswrittento.ThisbitisonlymeaningfulforleafPTEs;fornon-leafPTEs,thisbitwillbe0.

DetailsonUpdatingaPageTableEntry:WejustsaidthattheAccessedBitandtheDirtyBitaresetbythehardwarewheneveranybyteinthedatapageisaccessedorupdated.(ThesearetheonlybitsinthePageTableEntry(PTE)thatwouldeverbemodi2iedbyhardware,butanymodi2icationwillrequirethatthisPTE,whenevictedfromtheTLBinthefuture,mustbewrittenbacktothein-memorypagetable.)

HowevertheRISC-Vspecalsoallowsthehardwaretoworkasecondway.Inthisalternativeapproach,apagetableentryisnevermodi2iedbythehardware.Thissimpli2iesthehardwaresincePTEsneedneverbewrittenbacktomemory.Instead,the“A”and“D”bitsaremerelychecked.Iftheyarenotalreadyset



correctly,anexceptionissignaled.Thensoftwareinthetraphandlercantakeactionstowritetheupdatedpagetableentrybacktomemory.

Insomecases,the“A”and“D”bitsmaynotbeneeded.Forexample,ifsomedatapageisalwaysresidentinmemoryandneverswappedouttobackingstore,thenthe“A”and“D”bitscanbeignored.Oralso,ifamemory-mappedI/Odeviceismappedintosomevirtualaddressspace,thenthe“A”and“D”bitscanbeignored.Insuchcases,thebitscanbepresetto“1”toobviatetheneedforupdating.

TheExecutable(X),Writable(W),andReadable(R)bitsindicatewhetherthisPTEisaleafornon-leafentryand,forleafentries,theyindicatetheaccesspermissionsforthedataonthedatapage.IfXWR=000,thenthisPTEisnotaleafentry;thePhysicalPageNumberintheentrypointstothenextlower-levelpageinthepagetabletree.

TheRSWbitsarenotusedbythehardwareandarefreelyavailabletosoftwaretobeusedasdesired.

TheSv32PageTableAlgorithm

Wheneveranaccess(read,write,orfetch-for-execution)occurs,thefollowingstepsaretakenbytheMemoryManagementUnit,tomapavirtualaddressintoaphysicaladdress.

(TheRISC-VspecgivesthisalgorithminageneralformapplicabletoSv32,Sv39,andSv48,i.e.,two-,three-,andfour-levelpagetables.Inordertomakeiteasiertounderstand,theversiongivenheresimpli2iesandspecializesthealgorithmtotwo-leveltables.)

Inputs: va–thevirtualaddresstobetranslated(32bits),withthefollowingparts: va.VPN[1]–theuppermost10bits;offsetintothetoplevelpagetable va.VPN[0]–thefollowing10bits;offsetintothesecondlevelpagetable va.OFFSET–thelower12bits;offsetintothedatapage Thedesiredaccesstype(READ,WRITE,orFETCH) Thecurrentprivilegemode(UorS;noaddresstranslationisdoneinMmode) satp–theSupervisorAddressTranslationandProtectionCSR satp.PPN–thepagenumberoftherootpage.



status.SUM–the“PermitSupervisorUserMemoryAccess”bitinthestatusreg status.MXR–the“MakeExecutableReadable”bitinthestatusreg

Outputs: pa–physicaladdress(34bits) or accessexception–signalapagefaultexceptionandaborttranslation

Tempvariables: a–pointerto(i.e.,physicaladdressof)apageinmemory p–pointerto(i.e.,physicaladdressof)aPTEinmemory pte–aPageTableEntry,asfetchedfrommemory

Algorithm: Compute“a”,theaddressofroottop-levelpageinthepagetable a!satp.PPN||000000000000 Compute“p”,theaddressofthePTEinthetoplevelpage p!a+(va.VPN[1]||00) Readfrommemory“pte”,aPageTableEntryfromtheroot-levelpage pte!memory[p] IfthePhysicalMemoryProtection(PMP)systemindicatesproblems thensignalanexception IfthisPTEisnotvalid(i.e.,pte.V=0orpte.XWRisaninvalidvalue) thensignalanexception Ifpte.XWR≠000then… /*ThisPTEisaleafPTE,pointingtoamegapage.*/ Ifpte.PPN[0]≠0000000000,thensignalanexception Computethephysicaladdress pa!pte.PPN[1]||va.VPN[0]||va.OFFSET Else(i.e.,pte.XWR=000)… /*WedonothavealeafPTE,solookatthesecondlevel.*/ Compute“a”,theaddressofthesecondlevelpagetablepage a!pte.PPN[1]||pte.PPN[0]||000000000000 Compute“p”,theaddressofthePTEinthesecondlevelpage p!a+(va.VPN[0]||00) Readfrommemory“pte”,aPageTableEntryfromthesecondlevelpage pte!memory[p] IfthePhysicalMemoryProtection(PMP)systemindicatesproblems thensignalanexception IfthisPTEisnotvalid(i.e.,pte.V=0orpte.XWRisaninvalidvalue)



thensignalanexception Ifpte.XWR=000,thenwedonothavealeafPTE… thensignalanexception Computethephysicaladdress pa!pte.PPN[1]||pte.PPN[0]||va.OFFSET /*“pte”containstheleafPageTableEntry.*/ MakesurethisaccessisallowedbycheckingtheX,W,R,andUbits…. Ifpte.R≠1,thensignalanexception Ifpte.W≠1andthisisaWRITE,thensignalanexception Ifpte.X≠1andthisisaFETCH,thensignalanexception Ifthecurrentmodeis“User”… Ifpte.U=0,thensignalanexception Ifthecurrentmodeis“Supervisor”… IfthisisaREAD&pte.R≠1&status.MXR≠1,thensignalanexception Ifpte.U=1andstatus.SUM=0,thensignalanexception Checkthe“accessed”bit… Ifpte.A≠1,thensetit Checkthe“dirty”bit… IfthisisaWRITEandpte.D≠1,thensetit Ifthepte.Aorpte.Dbitswerechanged… Either •Signalanexception(andletsoftwareupdatethepagetable) •WritethePageTableEntrybacktomemory. Thismustbeatomicwithrespecttotheearlierreadingofthepte. IfthePMPsystemindicatesproblems,thensignalanexception

Sv39(Three-LevelPageTables)

BoththeSv39andSv48virtualaddressingschemesarenaturalextensionsoftheSv32scheme.Theprimarydifferenceisthattheysupportlargervirtualaddressspacesandpagetablesthatare3-levels(forSv39)and4-levels(forSv48),insteadof2-levels.

IfyouunderstandSv32then–forallintentsandpurposes–youalreadyunderstandSv39andSv48.

WithSv39,virtualaddressesare39bitsandphysicaladdressesare56bits.



�

WecansummarizetheSv39virtualaddressingschemeasfollows:Sv39isidenticaltoSv32,withtheseexceptions:

•ItmayonlybeimplementedonRV64systems,notonRV32systems. •Virtualaddressesare39bits. •Themaximumvirtualaddressspaceis239=512GiBytes. •Thephysicaladdressesare56bits. •Thepagesizeisunchanged(i.e.,4,096bytes). •Thepagetableshave3levels,insteadof2levels. •ThePageTableEntriesare64bits,insteadof32bits. •Eachpageofthepagetablecontains512PTEs,insteadof1,204PTEs. •Offsetsintothepagetablesare9bits(insteadof10bits)since29=512. •TheRSW,D,A,G,U,X,W,R,andVbitsworkidentically. •Megapagesare2MiBytes,insteadof4MiBytes. •“Gigapages”of1GiBytearealsosupported.

WithSv39,thepagetablewillhavethreelevels:



�

Ona64-bitmachine,addressesareinitially64bitslong,sincethisistheregisterlength.ForSv39,onlytheleastsigni2icant39bitsareusedtoaddressintothevirtualaddressspace.Theremaining25bitsmustbethesign-extension(notzero-2illed)oftheleastsigni2icantbits.Otherwise,anexceptionwillbesignaled.

Sv48(Four-LevelPageTables)

WithSv48,virtualaddressesare48bitsandphysicaladdressesare56bits.



�

WecansummarizetheSv48virtualaddressingschemeasfollows:Sv48isidenticaltoSv32andSv39,withtheseexceptions:

•ItmayonlybeimplementedonRV64systems,notonRV32systems. •IfSv48issupported,thenSv39mustalsobesupported. •Virtualaddressesare48bits. •Themaximumvirtualaddressspaceis248=256TiBytes. •Thephysicaladdressesare56bits. •Thepagesizeisunchanged(i.e.,4,096bytes). •Thepagetableshave4levels. •ThePageTableEntriesare64bits,thesameasSv39. •Eachpageofthepagetablecontains512PTEs,thesameasSv39. •Offsetsintothepagetablesare9bits,thesameasSv39. •TheRSW,D,A,G,U,X,W,R,andVbitsworkidentically. •Megapagesare2MiBytes,thesameasSv39. •Gigapagesare1GiByte,thesameasSv39. •“Terapages”of512GiBytesarealsosupported.

Ona64-bitmachine,addressesareinitially64bitslong,sincethisistheregisterlength.ForSv48,onlytheleastsigni2icant48bitsareusedtoaddressintothevirtual



addressspace.Theremaining16bitsmustbethesign-extension(notzero-2illed)oftheleastsigni2icantbits.Otherwise,anexceptionwillbesignaled.


Chapter10:WhatisNotCovered

Besidestheextensionswehavealreadydescribed,theRISC-Vspecmentionsseveraladditionalextensionswhichmayormaynotbeimplemented.Thespeci2icationsfortheseextensionsareinastateof2lux,sowewillsimplymentiontheirexistencehere.

TheDecimalFloatingPointExtension(“L”)

Mostprogrammersarefamiliarwithbinary2loatingpoint,butthereisalsosuchathingasarepresentationof2loatingpointnumbersusingadecimalbase.TheRISC-Vspeccontainsnothingspeci2icforthisextensionandonlymentionsitasfuturework.

TheBitManipulationExtension(“B”)

TheRISC-Vspeccontainsnothingspeci2icforthisextensionandonlymentionsitasfuturework.

TheDynamicallyTranslatedLanguagesExtension(“J”)

Highlevellanguagesoftenrequiredynamictypingchecking,garbagecollectionanddynamic(just-in-time)compiling,soincludingspecializedinstructionstosupportthesecanimproveperformance.However,theRISC-Vspeccontainsnothingspeci2icforthisextensionandonlymentionsitasfuturework.

TheTransactionalMemoryExtension(“T”)



TheRISC-Vspeccontainsnothingspeci2icforthisextensionandonlymentionsitasfuturework.

ThePackedSIMDExtension(“P”)

RecallthatSIMDstandsfor“singleinstruction,multipledata”.Thegoalistoperformalargenumberof2loatingpointoperationssimultaneously.Thereisdataparallelism,butnotfullconcurrencysincethesameoperationisperformedonacollectionofvalues.Asingleoperation(suchas“multiply”)isperformedinparallelonNvalues,followedbythenextoperationafterallNmultiplicationshavecompleted.

Theideawiththisextensionisthateach2loatingpointregisterwillholdmorethanonevalue.Forthisextension,the2loatingpointregisterswilllikelybelargerthan32or64bits.Forexample,inoneimplementationeach2loatingpointregistermightbe1,024bitswide.Thuseachregistercanholdmorethanonevalue.Since16×64=1,024,eachregistercanhold16doubleprecisionvaluesatonce.Whenanarithmeticinstruction(suchas“add”or“multiply”)isexecuted,theoperationisperformedonall16valuessimultaneously.

Thestatusofthisextensionisunsettled.Itmaybedroppedaltogetherinfavorofthe“V”VectorSIMDExtension.

TheVectorSIMDExtension(“V”)

The“V”VectorSIMDextensionisdesignedtoaccommodatelargevectorsandSIMDoperation.Itincludesasetof32vectorregisters(v0,v1,…v31),anumberofnewcon2igurationCSRregisters,andadditionalinstructions.TheRISC-Vspeci2icationisdif2iculttounderstand.Furthermore,thisextensionissubjecttorevisionandisnotyet“frozen”.

PerformanceMonitoring



RISC-Vde2inesacollectionofCSRregistersdevotedtomeasuringhardwareperformance:

mhpmcounter3 mhpmcounter4 … mhpmcounter31

mhpmevent3 mhpmevent4 … mhpmevent31

Thismechanismisnotdescribedinthespec,beyondsayingthatthe“events”tobecountedcanbeselectedbysettingmhpmevent3,mhpmevent4,….

Debug/TraceMode

TheRISC-Vspecmentionsanothermodecalled“DebugMode”andseveralrelatedControlandStatusRegisters(CSRs)namedtselect,tdata1,tdata2,andtdata3,butthespecdoesnotgiveanydetail.


AcronymListABI ApplicationBinaryInterfaceAEE ApplicationExecutionEnvironmentAMO AtomicMemoryOperationASID AddressSpaceIDCSR ControlandStatusRegistercycle CSR[C00]:Clockcyclecounter,UserModecycleh CSR[C80]:Upperhalfofcycle,UserMode(RV32only)dcsr CSR[7B0]:Debugcontrolandstatusdpc CSR[7B1]:DebugPCdscratch CSR[7B2]:ScratchregisterDZ DivideByZero(abitwithintheFloatingPointFlags,FFLAGS)fcsr CSR[003]:FloatingPointControlandStatusReg(frm||f2lags)f2lags CSR[001]:Floatingpointing2lagsFFLAGS FloatingPointFlags(i.e.,NX,UF,OF,DZ,NV)frm CSR[002]:DynamicroundingmodeFRM FloatingPointRoundingModeFS Fieldinstatusregister;statusof2loating(clean/dirty/…)HART HardwareThreadHBI HypervisorBinaryInterfaceHEE HypervisorExecutionEnvironmentHEIP hypervisorexternalinterrupthpmcounter3 CSR[C03]:Eventcounter#3,UserModehpmcounter31 CSR[C1F]:EventCounter#31,UserModehpmcounter31h CSR[C9F]:Upperhalfofcounter,UserMode(RV32only)hpmcounter3h CSR[C83]:Upperhalfofcounter,UserMode(RV32only)hpmcounter4 CSR[C04]:EventCounter#4,UserModehpmcounter4h CSR[C84]:Upperhalfofcounter,UserMode(RV32only)HSIP hypervisorsoftwareinterruptHTIP hypervisortimerinterruptinstret CSR[C02]:Numberofinstructionsretired,UserModeinstreth CSR[C82]:Upperhalfofinstret,UserMode(RV32only)IR InstructionRegisterISA InstructionSetArchitecture LR LinkRegister marchid CSR[F12]:ArchitectureID mcause CSR[342]:Trapcausecode,MachineMode mcounteren CSR[306]:Counterenable,MachineMode


AcronymList

mcycle CSR[B00]:Clockcyclecounter,MachineModemcycleh CSR[B80]:Upperhalfofcycle,MachineMode(RV32only) medeleg CSR[302]:Exceptiondelegationregister,MachineMode MEIP machineexternalinterrupt mepc CSR[341]:PreviousvalueofPC,MachineModemhartid CSR[F14]:HardwarethreadID mhpmcounter3 CSR[B03]:Eventcounter#3,MachineMode mhpmcounter31 CSR[B1F]:EventCounter#31,MachineMode mhpmcounter31h CSR[B9F]:Upperhalfofcounter,MachineMode(RV32only) mhpmcounter3h CSR[B83]:Upperhalfofcounter,MachineMode(RV32only) mhpmcounter4 CSR[B04]:EventCounter#4,MachineMode mhpmcounter4h CSR[B84]:Upperhalfofcounter,MachineMode(RV32only)mhpmevent3 CSR[323]:Eventselector#3,MachineModemhpmevent3 CSR[324]:Eventselector#4,MachineModemhpmevent31 CSR[33F]:Eventselector#31,MachineModemideleg CSR[303]:Interruptdelegationregister,MachineModemie CSR[304]:Interrupt-enableregister,MachineModeMIE Bitinstatusregister;MachineModeInterruptEnablemimpid CSR[F13]:ImplementationIDminstret CSR[B02]:Numberofinstructionsretired,MachineModeminstreth CSR[B82]:Upperhalfofinstret,MachineMode(RV32only)mip CSR[344]:Interruptpending,MachineModemisa CSR[301]:ISAandextensionsMMU MemoryManagementUnitMPIE Bitinstatusregister;MachineModePreviousInterruptEnableMPP Fieldinstatusregister;MachineMode–PreviousPrivilegeModeMPRV Bitinstatusregister;ModifyPrivilege mscratch CSR[340]:Tempregisterforuseinhandler,MachineModeMSIP machinesoftwareinterruptmstatus CSR[300]:Statusregister,MachineModeMTIP machinetimerinterruptmtval CSR[343]:Badaddressorbadinstruction,MachineModemtvec CSR[305]:Traphandlerbaseaddress,MachineModemvendorid CSR[F11]:VendorIDMXR Bitinstatusregister;MakeExecutableReadableNMI Non-maskableinterruptNSE NonstandardextensionNV InvalidOperation(abitwithintheFloatingPointFlags,FFLAGS)NX Inexact(abitwithintheFloatingPointFlags,FFLAGS)OF Over2low(abitwithintheFloatingPointFlags,FFLAGS)


AcronymList

PC ProgramCounterPMA PhysicalMemoryAttributePMP PhysicalMemoryProtectionUnitpmpaddr0 CSR[3B0]:PMPAddress#0pmpaddr1 CSR[3B1]:PMPAddress#1pmpaddr15 CSR[3BF]:PMPAddress#15pmpcfg0 CSR[3A0]:PMPCon2igurationword#0pmpcfg1 CSR[3A1]:PMPCon2igurationword#1pmpcfg2 CSR[3A2]:PMPCon2igurationword#2pmpcfg3 CSR[3A3]:PMPCon2igurationword#3PPN PhysicalPageNumberPTE PageTableEntryRISC ReducedInstructionSetComputerRM RoundingModeBitsROM ReadOnlyMemoryRV128/RV128I Instructionspresentonlyon128bitmachinesRV32/RV32I Thebasicinstructionset,presentonallmachinesRV64/RV64I Instructionspresentonlyon64and128bitmachinessatp CSR[180]:Addresstranslationandprotectionsatp SupervisorAddressTranslationandProtectionCSRSBI SupervisorBinaryInterfacescause CSR[142]:Trapcausecode,SupervisorModescounteren CSR[106]:Counterenable,SupervisorModeSD Fieldinstatusregister,summarizesFSandXS2ieldssedeleg CSR[102]:Exceptiondelegationregister,SupervisorModeSEE SupervisorExecutionEnvironmentSEIP supervisorexternalinterruptsepc CSR[141]:PreviousvalueofPC,SupervisorModesideleg CSR[103]:Interruptdelegationregister,SupervisorModesie CSR[104]:Interrupt-enableregister,SupervisorModeSIE Bitinstatusregister;SupervisorModeInterruptEnableSIMD Singleinstruction,multipledatasip CSR[144]:Interruptpending,SupervisorModeSP StackPointerSPP Fieldinstatusregister;SupervisorMode–PrevPrivilegeModeSPIE Bitinstatusregister;SupervisorModePrevInterruptEnablesscratch CSR[140]:Tempregisterforuseinhandler,SupervisorModeSSIP supervisorsoftwareinterruptsstatus CSR[100]:Statusregister,SupervisorModeSTIP supervisortimerinterrupt


AcronymList

stval CSR[143]:Badaddressorbadinstruction,SupervisorModestvec CSR[105]:Traphandlerbaseaddress,SupervisorModeSUM Bitinstatusregister;PermitSupervisorUserMemoryAccessSXL Fieldinstatusregister;Emulation:RegsizewheninSModetdata1 CSR[7A1]:Triggerdata#1tdata2 CSR[7A2]:Triggerdata#2tdata3 CSR[7A3]:Triggerdata#3time CSR[C01]:Currenttimeintickstimeh CSR[C81]:Upperhalfoftime(RV32only)TLB TranslationLookasideBuffertselect CSR[7A0]:TriggerregisterselectTSR Bitinstatusregister;TrapSRETinstruction TW Bitinstatusregister;Time-outWaitTVM Bitinstatusregister;TrapVirtualMemoryucause CSR[042]:Trapcausecode,UserModeUEIP userexternalinterruptuepc CSR[041]:PreviousvalueofPC,UserModeUF Under2low(abitwithintheFloatingPointFlags,FFLAGS)uie CSR[004]:Interrupt-enableregister,UserModeUIE Bitinstatusregister;UserModeInterruptEnableuip CSR[044]:Interruptpending,UserModeUPIE Bitinstatusregister;UserModePreviousInterruptEnableuscratch CSR[040]:Tempregisterforuseinhandler,UserModeUSIP usersoftwareinterruptustatus CSR[000]:Statusregister,UserModeUTIP usertimerinterruptutval CSR[043]:Badaddressorbadinstruction,UserModeutvec CSR[005]:Traphandlerbaseaddress,UserModeUXL Fieldinstatusregister;Emulation:RegsizewheninUserModeVPN VirtualPageNumberWFI WaitForInterruptXOR Exclusive-OrXS Fieldinstatusregister;statusofextension(clean/dirty/…)


AbouttheAuthorProfessorHarryH.PorterIIIteachesintheDepartmentofComputerScienceatPortlandStateUniversity.Hehasproducedseveralvideocourses,notablyontheTheoryofComputation.Recentlyhebuiltacompletecomputerusingtherelaytechnologyofthe1940s.Thecomputerhaseightgeneralpurpose8bitregisters,a16bitprogramcounter,andacompleteinstructionset,allhousedinmahoganycabinetsasshown.PorteralsodesignedandconstructedtheBLITZSystem,acollectionofsoftwaredesignedtosupportauniversity-levelcourseonOperatingSystems.Usingthesoftware,studentsimplementasmall,butcomplete,time-sliced,VM-basedoperatingsystemkernel.Porterhashabitofdesigningandimplementingprogramminglanguages,themostrecentbeingalanguagespeci2icallytargetedatkernelimplementation.

PorterholdsanSc.B.fromBrownUniversityandaPh.D.fromtheOregonGraduateCenter.

PorterlivesinPortland,Oregon.Whennottryingto2igureouthowhiscomputerworks,heskis,hikes,travels,andspendstimewithhischildrenbuildingthings.

ProfessorPorter’swebsite:www.cs.pdx.edu/~harry

�


http://www.cs.pdx.edu/~harry

RISC-Vweb.cecs.pdx.edu/~harry/riscv/RISCV-Summary.pdf · 2018-01-26 · RISC-V: An Overview of the...

Documents

Transcript of RISC-Vweb.cecs.pdx.edu/~harry/riscv/RISCV-Summary.pdf · 2018-01-26 · RISC-V: An Overview of the...