RISC-Vweb.cecs.pdx.edu/~harry/riscv/RISCV-Summary.pdf · 2018-01-26 · RISC-V: An Overview of the...
Transcript of RISC-Vweb.cecs.pdx.edu/~harry/riscv/RISCV-Summary.pdf · 2018-01-26 · RISC-V: An Overview of the...
RISC-V:AnOverviewofthe
InstructionSetArchitecture
HarryH.PorterIIIPortlandStateUniversity
January26,2018
TheRISC-Vprojectde2inesanddescribesastandardizedInstructionSetArchitecture(ISA).RISC-Visanopen-sourcespeci2icationforcomputerprocessorarchitectures,notaparticularchiporimplementation.Todate,severaldifferentgroupshavedesignedandfabricatedsiliconimplementationsoftheRISC-Vspeci2ications.Basedontheperformanceoftheseimplementationsandthegrowingneedforinteroperabilityamongvendors,itappearsthattheRISC-Vstandardwillincreaseinimportance.
ThisdocumentintroducesandexplainstheRISC-Vstandard,givinganinformaloverviewofthearchitectureofaRISC-Vprocessor.Thisdocumentdoesnotdiscussaparticularimplementation;insteadwediscusssomeofthevariousdesignoptionsthatareincludedintheformalRISC-Vspeci2ication.
ThisdocumentgivesthereaderaninitialintroductiontotheRISC-Vdesign.
DateLastModi=ied:January26,2018 AvailableOnline: www.cs.pdx.edu/~harry/riscv
TableofContentsListofInstructions 6
Chapter1:Introduction 11Introduction 11TheRISC-VNamingConventions 13TheUsualDisclaimer/RequestForCorrections 15DocumentRevisionHistory/PermissiontoCopy 16BasicTerminology 17
Chapter2:BasicOrganization 20MainMemory 20LittleEndian 20TheRegisters 23ControlandStatusRegisters(CSRs) 25Alignment 26
Chapter3:Instructions 28Instructions 28TheCompressedInstructionExtension 28InstructionEncoding 31User-LevelInstructions 36ArithmeticInstructions(ADD,SUB,…) 38LogicalInstructions(AND,OR,XOR,…) 49ShiftingInstructions(SLL,SRL,SRA,…) 52MiscellaneousInstructions 57BranchandJumpInstructions 65LoadandStoreInstructions 78IntegerMultiplyandDivide 89
Chapter4:FloatingPointInstructions 103FloatingPointing–ReviewofBasicConcepts 103TheFloatingPointExtensions 119FloatingPointLoadandStoreInstructions 124BasicFloatingPointArithmeticInstructions 127FloatingPointConversionInstructions 133
RISC-VArchitectureSummary/Porter Page� of�2 323
TableofContents
FloatingPointMoveInstructions 137FloatingPointCompareandClassifyInstructions 142
Chapter5:RegisterandCallingConventions 146StandardUsageofGeneralPurposeRegisters 146SavingRegistersAcrossCalls 148Registerx0-TheZeroRegister(“zero”) 150Registerx1-TheReturnAddress(“ra”) 150Registerx2-TheStackPointer(“sp”) 151Registerx3-TheGlobalPointer(“gp”) 152Registerx4-TheThreadBasePointer(“tp”) 152Registerx5-x7,x28-x31-TempRegisters(“t0-t6”) 153Registerx8,x9,x18-x27-SavedRegisters(“s0-s11”) 154Registerx8-FramePointer(“fp”) 154Registerx10-x17-ArgumentRegisters(“a0-a7”) 155CallingConventionsforArguments 156FloatingPointRegisters 158
Chapter6:CompressedInstructions 160CompressedInstructions 160InstructionFormats 161LoadInstructions 162StoreInstructions 168Jump,Call,andConditionalBranchInstructions 175LoadConstantintoaRegister 178Arithmetic,Shift,andLogicInstructions 180MiscellaneousInstructions 189InstructionEncoding 191
Chapter7:ConcurrencyandAtomicInstructions 195HardwareThreads(HARTs) 195TheRISC-VMemoryModel 196ConcurrencyControl-ReviewandBackground 198AReviewofCachingConcepts 200RISC-VConcurrencyControl 205TheFENCEInstructions 205Load-Reserved/Store-ConditionalSemantics 210
RISC-VArchitectureSummary/Porter Page� of�3 323
TableofContents
AtomicMemoryControlBits:“aq”and“rl” 213UsingLRandSCInstructions 217DeadlockandStarvation–Background 219StarvationandLR/SCSequences 220TheAtomicMemoryOperation(AMO)Instructions 221
Chapter8:PrivilegeModes 227PrivilegeModes 227InterfaceTerminology 230Exceptions,Interrupts,TrapsandTrapHandlers 234ControlandStatusRegisters(CSRs) 235CSRListing 237InstructionstoRead/WritetheCSRs 241BasicCSRs:CYCLE,TIME,INSTRET 246CSRRegisterMirroring 249AnOverviewofImportantCSRs 252ExceptionProcessingandInvokingaTrapHandler 257TheStatusRegister 268AdditionalCSRs 280SystemCallandReturnInstructions 284
Chapter9:VirtualMemory 288PhysicalMemoryAttributes(PMAs) 288CachePMAs 291PMAEnforcement 291VirtualMemory 296Sv32(Two-LevelPageTables) 302TheSv32PageTableAlgorithm 309Sv39(Three-LevelPageTables) 311Sv48(Four-LevelPageTables) 313
Chapter10:WhatisNotCovered 316TheDecimalFloatingPointExtension(“L”) 316TheBitManipulationExtension(“B”) 316TheDynamicallyTranslatedLanguagesExtension(“J”) 316TheTransactionalMemoryExtension(“T”) 316ThePackedSIMDExtension(“P”) 317
RISC-VArchitectureSummary/Porter Page� of�4 323
TableofContents
TheVectorSIMDExtension(“V”) 317PerformanceMonitoring 317Debug/TraceMode 318
AcronymList 319
AbouttheAuthor 323
RISC-VArchitectureSummary/Porter Page� of�5 323
ListofInstructions
AddImmediate 38AddImmediateWord 39AddImmediateDouble 40Add 41AddWord 42AddDouble 42Subtract 45SubtractWord 46SubtractDouble 46SignExtendWordtoDoubleword 47SignExtendDoublewordtoQuadword 47Negate 48NegateWord 48NegateDoubleword 49AndImmediate 49And 49OrImmediate 50Or 50XorImmediate 50Xor 51Not 51ShiftLeftLogicalImmediate 52ShiftLeftLogical 52ShiftRightLogicalImmediate 53ShiftRightLogical 53ShiftRightArithmeticImmediate 54ShiftRightArithmetic 54ShiftInstructionsforRV64andRV128 56Nop 57Move(RegistertoRegister) 57LoadUpperImmediate 58LoadImmediate 58AddUpperImmediatetoPC 59LoadAddress 60SetIfLessThan(Signed) 61SetLessThanImmediate(Signed) 61SetIfGreaterThan(Signed) 62
RISC-VArchitectureSummary/Porter Page� of�6 323
ListofInstructions
SetIfLessThan(Unsigned) 62SetLessThanImmediate(Unsigned) 62SetIfGreaterThan(Unsigned) 63SetIfEqualToZero 63SetIfNotEqualToZero 64SetIfLessThanZero(signed) 64SetIfGreaterThanZero(signed) 65BranchifEqual 65BranchifNotEqual 67BranchifLessThan(Signed) 67BranchifLessThanOrEqual(Signed) 67BranchifGreaterThan(Signed) 68BranchifGreaterThanOrEqual(Signed) 68BranchifLessThan(Unsigned) 69BranchifLessThanOrEqual(Unsigned) 69BranchifGreaterThan(Unsigned) 70BranchifGreaterThanOrEqual(Unsigned) 70JumpAndLink(Short-DistanceCALL) 71Jump(Short-Distance) 73JumpAndLinkRegister 73JumpRegister 74Return 74CallFarawaySubroutine 75TailCall(FarawaySubroutine)/Long-DistanceJump 77LoadByte(Signed) 78LoadByte(Unsigned) 78LoadHalfword(Signed) 79LoadHalfword(Unsigned) 79LoadWord(Signed) 80LoadWord(Unsigned) 81LoadDoubleword(Signed) 81LoadDoubleword(Unsigned) 82LoadQuadword 83StoreByte 83StoreHalfword 84StoreWord 85StoreDoubleword 85StoreQuadword 86Multiply 89Multiply–HighBits(Signed) 91
RISC-VArchitectureSummary/Porter Page� of� 7 323
ListofInstructions
Multiply–HighBits(Unsigned) 91Multiply–HighBits(SignedandUnsigned) 92MultiplyWord 94Divide(Signed) 98Divide(Unsigned) 98Remainder(Signed) 99Remainder(Unsigned) 99DivideWord(Signed) 100DivideWord(Unsigned) 100RemainderWord(Signed) 101RemainderWord(Unsigned) 101FloatingLoad(Word) 124FloatingLoad(Double) 125FloatingLoad(Quad) 125FloatingStore(Word) 126FloatingStore(Double) 126FloatingStore(Quad) 127FloatingAdd 127FloatingSubtract 128FloatingMultiply 129FloatingDivide 129FloatingMinimum 130FloatingMaximum 130FloatingSquareRoot 131FloatingFusedMultiply-Add 132FloatingFusedMultiply-Subtract 132FloatingFusedMultiply-Add/Subtract(Quad) 133FloatingPointConversion(Integerto/fromSinglePrecision) 134FloatingPointConversion(Integerto/fromDoublePrecision) 135FloatingPointConversion(Integerto/fromQuadPrecision) 136FloatingPointConversion(Single/Doubleto/fromQuadPrecision) 136FloatingSignInjection(SinglePrecision) 137FloatingSignInjection(DoublePrecision) 138FloatingSignInjection(DoublePrecision) 138FloatingMove 139FloatingNegate 139FloatingAbsoluteValue 140FloatingMoveTo/FromIntegerRegister(SinglePrecision) 140FloatingMoveTo/FromIntegerRegister(DoublePrecision) 141FloatingPointComparison 142
RISC-VArchitectureSummary/Porter Page� of� 8 323
ListofInstructions
FloatingPointClassify 144C.LW-LoadWord 162C.LW-LoadDoubleword 162C.LQ-LoadQuadword 163C.FLW-LoadSingleFloat 164C.FLD-LoadDoubleFloat 164C.LWSP-LoadWordfromStackFrame 165C.LDSP-LoadDoublewordfromStackFrame 166C.LQSP-LoadQuadwordfromStackFrame 166C.FLWSP-LoadSingleFloatfromStackFrame 167C.FLDSP-LoadDoubleFloatfromStackFrame 168C.SW-StoreWord 168C.SD-StoreDoubleword 169C.SQ-StoreQuadword 170C.FSW-StoreSingleFloat 170C.FSD-StoreDoubleFloat 171C.SWSP-StoreWordtoStackFrame 172C.SDSP-StoreDoublewordtoStackFrame 172C.SQSP-StoreQuadwordtoStackFrame 173C.FSWSP-StoreSingleFloattoStackFrame 173C.FSDSP-StoreDoubleFloattoStackFrame 174C.J-Jump(PC-relative) 175C.JAL-JumpandLink/Call(PC-relative) 175C.JR-JumpRegister 176C.JALR-JumpRegisterandLink/Call 176C.BEQZ-BranchifZero(PC-relative) 177C.BNEQZ-BranchifNotZero(PC-relative) 177C.LI-LoadImmediate 178C.LUI-LoadUpperImmediate 179C.ADDI-AddImmediate 180C.ADDIW-AddImmediate(Word) 180C.ADDI16SP-AddImmediatetoStackPointer 181C.ADDI4SPN-AddImmediatetoStackPointer 182C.SLLI-ShiftLeftLogical(Immediate) 182C.SRLI-ShiftRightLogical(Immediate) 183C.SRAI-ShiftRightArithmetic(Immediate) 184C.ANDI-LogicalAND(Immediate) 185C.MV-MoveRegistertoRegister 185C.ADD-AddRegistertoRegister 186C.AND-AddRegistertoRegister 186
RISC-VArchitectureSummary/Porter Page� of� 9 323
ListofInstructions
C.OR-AddRegistertoRegister 187C.XOR-AddRegistertoRegister 187C.SUB-AddRegistertoRegister 188C.ADDW-AddRegistertoRegister 188C.SUBW-AddRegistertoRegister 189C.NOP-NopInstruction 189C.EBREAK-DebuggingBreakInstruction 190C.ILLEGAL-IllegalInstruction 190FENCE 207FENCE.I–(InstructionCacheFlushing) 208LoadReserved(Word) 211LoadReserved(Doubleword) 212StoreConditional(Word) 212StoreConditional(Doubleword) 213AtomicSwap 221AtomicAdd/AND/OR/XOR/Max/Min(Word) 223AtomicAdd/AND/OR/XOR/Max/Min(Doubleword) 224CSRRead/Write 242CSRReadandSetBits 243CSRReadandClearBits 243CSRRead/WriteImmediate 244CSRReadandSetBitsImmediate 245CSRReadandClearBitsImmediate 245Read“CYCLE” 247Read“CYCLEH” 247Read“TIME” 247Read“TIMEH” 248Read“INSTRET” 248Read“INSTRETH” 248EnvironmentCall(SystemCall) 284Break(InvokeDebugger) 285MRET:MachineModeTrapHandlerReturn 285SRET:SupervisorModeTrapHandlerReturn 285URET:UserModeTrapHandlerReturn 286WFI:WaitForInterrupt 286SFENCE.VMA:SupervisorFenceforVirtualMemory 301
RISC-VArchitectureSummary/Porter Page� of� 10 323
Chapter1:Introduction
Introduction
AnInstructionSetArchitecture(ISA)de2ines,describes,andspeci2ieshowaparticularcomputerprocessorcoreworks.TheISAdescribestheregistersanddescribeseachmachine-levelinstruction.TheISAtellsexactlywhateachinstructiondoesandhowitisencodedintobits.
TheISAformstheinterfacebetweenhardwareandsoftware.HardwareengineersdesigndigitalcircuitstoimplementagivenISAspeci2ication.Softwareengineerswritecode(operatingsystems,compilers,etc.)basedonagivenISAspeci2ication.
ThereareanumberofInstructionSetArchitecturesinwidespreaduse,forexample:
x86-64(AMD,Intel) ARM(ARMHoldings) SPARC(Sun/Oracle)
EachoftheseISAsisproprietaryandverycomplex.ThedetailsareoftenobscuredinlengthymanualsandsomedetailsoftheISAarenotmadepublicatall.Furthermore,thewidelyusedISAshavebeenaroundforyearsandtheirdesignscarrybaggageasaresult,e.g.,forbackwardcompatibility.Sincetheselegacydesignswere2irstcreated,we’velearnedmoreabouthowtodesigncomputers.Changesinsiliconhardwaretechnologyhavealsohadanimpactonwhichdesignchoicesarenowoptimal.
TheRISC-VprojectcameoutofUCBerkeleytoaddresssomeoftheseissues.RISC-Vispronounced“RISC-2ive”.
OnegoalwastocreateamodernISAincorporatingthebestcurrentideasinprocessordesign.TheystrovetocreateanISAthatwasmuchsimplerthanthelegacyISAs,butatthesametime,wasalsopracticalandintendedtoaccommodatereallyfasthardwareimplementations.
RISC-VArchitectureSummary/Porter Page� of�11 323
Chapter1:Introduction
AnothergoalwastocreateapureReducedInstructionSet(RISC)architecture.Thegoalistobeabletoexecuteoneinstructionperclockcycleandtoachievethis,eachinstructionneedstobesimpleandlimited.
Anothergoalwastocreateanopen-sourceISA.ExistingISAsareproprietary.Theyareowned,managed,andcontrolledbycorporateentitieslikeIntel,SunMicrosystems,andARMHoldings.Theopen-sourceapproachtakenbyRISC-VmeansthatmanydifferentcompaniescanprovidehardwareimplementationsoftheRISC-Varchitecture.CreatinganecosysteminwhichmultiplevendorscancompeteinimplementingasingleISAshouldresultinmanyofthebene2itsseeninotheropen-sourceprojects.
TheRISC-Vdesignisnotasingle,fullyspeci2ied,concreteISA.Instead,RISC-Visasomewhatgeneralizedspeci2icationwhichcanbeinstantiatedor2leshed-outtodescribeanactualsiliconpart.
Thedesignersunderstoodthattherearemanydifferentmarkets,manydifferentapplicationareasforcomputers,manydifferentdesignconstraints,andsoon.Forexample,anembeddedcomputerforadishwasherneedstobecheap,reliable,andsimple,butdoesn’trequirespeed,supportforanoperatingsystem,multiplecores,orsupportfor64-bitoperations.Ontheotherhand,othercomputerswillhavemultiplecores,64-bitoperations,etc.
TheRISC-VprojectapproachesthisplethoraofdesignchoicesbyintroducinganumberofoptionsintotheISA.Inthisrespect,RISC-VisreallyasingleInstructionSetArchitecture;itisacollectionofrelatedISAs.Perhapsitcanbeviewedasamenu,fromwhichaparticularimplementationwillchoosesomeitems,butnotothers.ThedocumentationprovidedbyRISC-Visincomplete;anactualprocessorcorewillpresumablycomewithdocumentationsayingexactlyhowandwhichpartsoftheRISC-Vspeci2icationitimplements.
Forexample,theRISC-Vspeci2icationdoesnotanswerthesequestions:
Howwidearetheregisters?Optionsare:32bits,64bits,and128bits
Howmanyregistersarethere? Optionsare:16or32 Howlongareinstructions? 32bitinstructionsaremandatory;16bitformsareoptional
RISC-VArchitectureSummary/Porter Page� of� 12 323
Chapter1:Introduction
Ishardwaremultiply/divideincluded? Is9loatingpointsupported?
Optionsare:notsupported,single,ordoubleprecision Issupportforanoperatingsystem(i.e.,SupervisorMode)included? Arepagetablessupportedand,ifso,whichoptionisselected?
Asaresultofallthisparameterization,theRISC-Vspeci2icationisdif2iculttoread.Forexample,thelengthofregistersisgivenasavariableXLEN.Soinsteadofsayingsomethinglike
“…loadsa32bitvalueintobits[31:0]”thedocumentationsays
“…loadsanXLENbitvalueintobits[XLEN-1:0]”.
ThisdocumentisanattempttodescribeRISC-Vusingamoretraditionalapproach.WeproceedbychoosingaparticularISAthatmeetstheRISC-Vspeci2icationsanddescribethat.Tomakethingsmoreconcrete,wewilldescribethe32bitRISC-Vvariant,butmakeadditionalcommentsaboutthe64-bitand128-bitvariants.
TheRISC-VNamingConventions
Aspreviouslydescribed,theRISC-Vspeci2icationisnotasingleISA.Instead,itisacollectionofISAoptions.AnamingconventionisusedinwhichaparticularISAvariationisgivenacodednametellingwhichISAoptionsarepresentandsupported.Aparticularhardware(chip)canbedescribedorsummarizedwithsuchacodedname,indicatingwhichRISC-Vfeaturesareimplementedbythechip.
Forexample,considerthefollowingRISC-Vname:
RV32IMAFD
The“RV”standsfor“RISC-V”andallcodednamesbeginwith“RV”.
The“32”indicatesthatregistersare32bitswide.Otheroptionsare64bitsand128bits:
RV32 32-bitmachines RV64 64-bitmachines RV128 128-bitmachines
RISC-VArchitectureSummary/Porter Page� of� 13 323
Chapter1:Introduction
Theremaininglettershavethesemeanings:
I–Basicintegerarithmeticissupported M–Multiplyanddividearesupportedinhardware A–Theinstructionsimplementingatomicsynchronizationaresupported F–Singleprecision(32bit)2loatingpointissupported D–Doubleprecision(64bit)2loatingpointissupported
Eachoftheseisconsideredtobean“extension”ofthebaseISA,exceptforthe“I”(basicintegerinstructions),whichisalwaysrequired.
Theletter“G”isusedasanabbreviationfor“IMAFD”:
RV32G=RV32IMAFD
Inaddition,thereareadditionalvariantsandextensions:
S–Supervisormodeisimplemented Q–Quad-precision(128bit)2loatingpointissupported C–Compressed(i.e.,16bit)instructionsaresupported E–Embeddedmicroprocessors,withonly16registers
TheRISC-VdocumentationalsomentionsseveraladditionalISAdesignextensions,butonlysaystheseinstructionswillbespeci2iedatsomefuturedate.Presently,thereisnothingspeci2icforthefollowingextensions:
L–Decimalarithmeticinstructions V–Vectorarithmeticinstructions P–PackedSIMDinstructions B–Bitmanipulationinstructions T–Transactionalmemorysupport
TheapproachwetakeinthisdocumentistodescribeaRISC-Varchitecturewiththefollowingfeatures:
•32registers •Registersare32bitswide •Multiplyanddivideinstructionsarepresent •Singleanddoubleprecision2loatingpointinstructionsarepresent
RISC-VArchitectureSummary/Porter Page� of� 14 323
Chapter1:Introduction
•Atomicinstructionsarepresent •Supervisormodeissupported •Virtualmemoryissupported
Ratherthanspecifyallvariantssimultaneouslyusinggeneralities,wewillfocusonaspeci2icISAandthencommentonpossiblevariations.
TheRISC-VdocumentationprovidesanISA“standard”whichcanbeadoptedandusedfreelybydifferentgroups.Thestandardleavessomedecisionsopen,givingimplementersseveralchoices.Forexample,theimplementerisfreetochoosethesizeoftheregisters;thestandardmentions32-bits,64-bits,and128-bitsbutdoesnotmandateaparticularchoice.
Inotherareas,thestandardisun2inished.Forexample,mentionismadeofdecimal2loatingpointinstructions,butthesectiondiscussingthemsays“tobe2illedinlater”.
Also,theRISC-Vdocumentationacknowledgesthatsomeimplementerswillchoosetoviolatethestandardorextendthestandard.Theyusetheterminology“non-standardextensions”torefertofeaturesthatmightbepresentinagivenchip,butwhichdonotconformtotheRISC-Vstandard.
Commentary:TheRISC-Vdocumentationcontainsanumberofenlighteningparentheticalremarksdescribingthevariousdesignchoicestheyconsideredandofferingjusti2icationsforthedesigndecisionstheymade.ThislevelofthoughtfulnessisabsentinmostextantISAs.Insomecases,theRISC-VdocumentationseemstoaddressthedesignspaceitselfandoffersalanguageandframeworkforfutureISAdevelopment.
TheUsualDisclaimer/RequestForCorrections
Thisisaderivativework,basedon:
• TheRISC-VInstructionSetManual,VolumeI:User-LevelISADocument(Version2.2,May7,2017)
• TheRISC-VInstructionSetManual,VolumeII:PrivilegedArchitecture(Version1.10,May7,2017)
Consulttheof2icialdocumentation,whichprevails.
RISC-VArchitectureSummary/Porter Page� of� 15 323
Chapter1:Introduction
TheRISC-Viscomplex.Thisdocumenttakeslibertiesandsimpli2iesthings.Ourgoalistointroducethegeneraldesignandexplainthemainideas.Whilethismaylooklikeamanualorreferencework,itisnot.Tomakethismaterialcomprehensible…
•Somedetailsaresimpli2ied. •Somestatementsarenotstrictlytrueorcomplete. •Somematerialmaysimplybeincorrect.
Thisdocumentuses“???”tomarkincompleteorquestionableinformation.Pleasecontacttheauthorifyou2ind…
•Inaccurateinformationthatyoucancorrect •Incompleteinformationthatyoucan2illin •Confusingtextthatneedstobereworded
DocumentRevisionHistory/PermissiontoCopy
Versionnumbersarenotusedtoidentifyrevisionstothisdocument.Insteadthedateandtheauthor’snameisused.Thedocumenthistoryis:
Date AuthorJanuary26,2018 HarryH.PorterIII<Initialversion>
InthespiritofRISC-Vandtheopen-sourcemovement,theauthorgrantspermissiontofreelycopyand/ormodifythisdocument,withthefollowingrequirement:
Youmustnotalterthissection,excepttoaddtotherevisionhistory.Youmustappendyourdate/nametotherevisionhistory.
Youarealsofreetoadaptthisdocumenttodescribeaparticularimplementation.Ifyouspecializethisdocumenttoaspeci2ichardwaredesign,youshouldchangethetitletore2lectthisnewuse.Anymaterialliftedshouldbereferenced.
RISC-VArchitectureSummary/Porter Page� of� 16 323
Chapter1:Introduction
BasicTerminology
There has been some confusion in computer science documentation regardingabbreviationsforlargenumbers.Forexample:
4K=? 4,000 4,096
Weusethefollowingpre2ixnotationforlargenumbers,whichisbecomingcommoninthecontextofcomputerarchitecture:
Pre=ix Example Value Ki kibi KiByte 210 1,024 ~103 Mi mebi MiByte 220 1,048,576 ~106 Gi gibi GiByte 230 1,073,741,824 ~109 Ti tebi TiByte 240 1,099,511,627,776 ~1012 Pi pebi PiByte 250 1,125,899,906,842,624 ~1015 Ei exbi EiByte 260 1,152,921,504,606,846,976 ~1018
Contrastthistothestandardmetricpre2ixes,whichweavoid:
Pre=ix Example Value K kilo KByte 103 1,000 M mega MByte 106 1,000,000 G giga GByte 109 1,000,000,000 T tera TByte 1012 1,000,000,000,000 P peta PByte 1015 1,000,000,000,000,000 E exa EByte 1018 1,000,000,000,000,000,000
Inthisdocument,weusetheterms“byte”,“halfword”,“word”,“doubleword”,and“quadword”torefertovarioussizesofbinarydata.
number number of bytes of bits example value (in hex) ======== ======= =================================== byte 1 8 A4 halfword 2 16 C4F9 word 4 32 AB12CD34 doubleword 8 64 01234567 89ABCDEF quadword 16 128 4B6D073A 9A145E40 35D0F241 DE849F03
RISC-VArchitectureSummary/Porter Page� of� 17 323
Chapter1:Introduction
The bits within an 8-bit byte are numbered from 0 (lower, least signi2icant) to 7(upper,mostsigni2icant).
7654 3210 ==== ==== 0000 0000
Thebitswithina16-bithalfwordarenumberedfrom0to15.
15 12 8 4 0 ==== ==== ==== ==== 0000 0000 0000 0000
Thebitswithina32-bitwordarenumberedfrom0to31.
31 28 24 20 16 12 8 4 0 ==== ==== ==== ==== ==== ==== ==== ==== 0000 0000 0000 0000 0000 0000 0000 0000
Likewise, thebitswithina64-bitdoublewordarenumbered from0 to63and thebitswithina128-bitquadwordarenumberedfrom0to127.
Weusethefollowingnotationtorepresentarangeofbits:
Example Meaning ========= ====================================================== [7:0] Bits 0 through 7; e.g., all bits in a byte [31:0] Bits 0 through 31; e.g., all bits in a word [31:28] Bits 28 through 31; e.g., the upper 4 bits in a word [1:0] Bits 0 through 1; e.g., the least significant 2 bits
Asinglehexdigit canbeused to represent4bits (half abyte, sometimes calleda“nibble”),asfollows:
RISC-VArchitectureSummary/Porter Page� of� 18 323
Chapter1:Introduction
Binary Hex ====== === 0000 0 0001 1 0010 2 0011 3 0100 4 0101 5 0110 6 0111 7 1000 8 1001 9 1010 A 1011 B 1100 C 1101 D 1110 E 1111 F
The 8 bits within a byte are conveniently expressed with two hex digits. Forexample:
8-bit byte In Hex ========== ======== 1010 0100 A4
The32bitsinawordaregivenwith8hexdigits.Forexample:
32-bit word In Hex ============================================= ======== 1010 1011 0001 0010 1100 1101 0011 0100 AB12CD34
RISC-VArchitectureSummary/Porter Page� of� 19 323
Chapter2:BasicOrganization
MainMemory
Mainmemoryisbyteaddressable.Addressesare4bytes(i.e.,32bits)long,allowingforupto4GiBytestobeaddressed.
TheRISC-Vdocumentationdiscussesseveraldifferentaddresssizeoptions,butwe’llstartsimplewith32bitaddresses.
Memorycanbeviewedasasequenceofbytes:
address data (in hex) (in hex) ======== ======== 00000000 89 00000001 AB 00000002 CD 00000003 EF 00000004 01 00000005 23 00000006 45 00000007 67 ... ... FFFFFFFC E0 FFFFFFFD E1 FFFFFFFE E2 FFFFFFFF E3
“Low”memoryreferstosmallermemoryaddresses,whichwillbeshownhigheronthepagethan“high”memoryaddresses,asintheaboveexample.
LittleEndian
TheRISC-Visalittleendianarchitecture.
RISC-VArchitectureSummary/Porter Page� of�20 323
Chapter2:BasicOrganization
Asanexample,assumethatmainmemoryholdsthefollowingbytes:
address data (in hex) (in hex) ======== ======== ... ... E5000004 1A E5000005 2B E5000006 3C E5000007 4D E5000008 5E E5000009 6F E500000A 70 E500000B 81 E500000C 92 E500000D A3 E500000E B4 E500000F C5 ... ...
Considerloadingaregistersthatis32bits(4bytes)wide.ThereareseveralLOADinstructions,whichcanmoveeitherabyte,ahalfword,orawordfrommemoryintoaregister.
AssumethataLOADinstructionloadsabytefromaddress0xE5000004.Theregisterwillthencontain:
0x0000001A
AssumethataLOADinstructionloadsahalfword(2bytes)fromthesameaddress.Theregisterwillthencontain:
0x00002B1A
AssumethataLOADinstructionloadsaword(4bytes)fromthesameaddress.Theregisterwillthencontain:
0x4D3C2B1A
Notethatwecanviewthememoryasholdingasequenceofproperlyalignedhalfwords,whichwecannaturallyrepresentasfollows:
RISC-VArchitectureSummary/Porter Page� of� 21 323
Chapter2:BasicOrganization
address data (in hex) (in hex) ======== ======== ... ... E5000004 2B1A E5000006 4D3C E5000008 6F5E E500000A 8170 E500000C A392 E500000E C5B4 ... ...
Orwecanviewthismemoryasholdingasequenceofproperlyalignedwords,whichlookslikethis:
address data (in hex) (in hex) ======== ======== ... ... E5000004 4D3C2B1A E5000008 81706F5E E500000C C5B4A392 ... ...
Commentary:Inalittleendianarchitecture,theorderofthebytesischangedwheneverdataiscopiedfrommemorytoaregisterorstoredfromaregisterintomemory.Thiscanbeasourceofconfusion,particularlywhenhumanslookataprintoutofmemorycontents.
Bigendianarchitecturesaresimplertounderstandsincethebytesarenotreorderedduringloadsandstores.
Noticethatwithalittleendianarchitecture,the2irstbyteinmemory(0x1Aintheexample)alwaysgoesintothesamebitsintheregister,regardlessofwhethertheinstructionismovingabyte,halfword,orword.Thiscanresultinasimpli2icationofthecircuitry.
Thespecmentionsthattheyselectedlittleendiansinceitiscommerciallydominant.Theyalsomentionthepossibilityof“bi-endian”architectures,whichmixbig-andlittle-endianness.
RISC-VArchitectureSummary/Porter Page� of� 22 323
Chapter2:BasicOrganization
TheRegisters
Forconcreteness,thisdocumentfocusesprimarilyonRV32,the32-bitvariantofRISC-V.Theboxbelowdiscusses“OtherRegisterSizes”.
Thegeneralpurposeregistersare32bits(i.e.,4bytesoroneword)inwidth.
Thereare32registers.Theregistersarenamedx0,x1,x2,…x31.
OtherRegisterSizes:WemainlyfocusondescribingtheRV32variantofRISC-V,inwhichtheregistersare32bitsinwidth.
IntheRV64variant,theregistersare64bits(i.e.,8bytesoradoubleword)insize.
IntheRV128variant,theregistersare128bits(i.e.,16bytesoraquadword)insize.
Inallcases,thenumberofregistersis32.
If2loatingpointissupported,therewillbeadditional=loatingpointregistersnamedf0,f1,f2,…f31.Theseregistersaredistinctfromx0,x1,x2,…x31.Floatingpointregisterswillbediscussedlater.
Registerx0isaspecial“zeroregister”.Whenread,itsvalueisalways0x00000000.Wheneverthereisanattempttowritetox0,thedataissimplydiscarded.
AllotherregistersaretreatedidenticallybytheISA;thereisnothingspecialaboutanyregister.
Thereisnospecialsupportfora“stackpointer,”a“framepointer,”ora“linkregister”.Althoughtheseareimportantconceptsandareusedintheimplementationofprogramminglanguages,theISAdoesnothaveanyspecializedinstructionsforthem.Anyregistercanbeusedforthesefunctions.
RISC-VArchitectureSummary/Porter Page� of� 23 323
Chapter2:BasicOrganization
ReducedNumberofGeneralPurposeRegisters:IntheRISC-V“E”extension,thenumberofregistersisreducedto16.
Theregistersarenamedx0,x1,x2,…x15.
Thisextensionisonlyapplicableto32-bitmachines.Inotherwords,thebaseISAwilleitherbeRV32IorRV32E,withtheonlydifferencebeingthenumberofgeneralpurposeregisters.For64-bitand128-bitmachines,therewillalwaysbeafullsizedregistersetwith32registers,so“E”doesnotapplytoRV64orRV128.
Allinstructionformatsarethesame.Inotherwords,thereisnochangetotheinstructionencodingsbetweenthenormalRV32IandthereducedRV32Evariants.Theonlydifferenceisthatthe5-bit2ieldsforencodingaregisternumberarelimitedtocontainingonlythevaluesof0through15(inbinary:00000through01111).
Thisextensionismeantforsimpli2ied,embedded(“E”)computers.Thespecsuggeststhatthecircuitryforafullsizedregistersetcanconsume50%ofthechiprealestate,notcountingthespaceusedforcache.Thus,dividingthenumberofregistersinhalfwillresultina25%savingsinrealestate.
CompressedInstructions:IntheRISC-V“C”extensionforcompressedinstructions,8registers(x8,x9,…x15)areeasilyaccessible.Toaccesstheotherregisters,theprogrammermayhavetouseanormal,uncompressedinstruction.Therefore,compilersareencouraged(butnotrequired)touseonlyregistersx8,x9,…x15.
Inthisextension,thefactthatcertainregisterswillbeusedasstackpointer,linkregister,etc.ismadeuseof,sosomeregistersaretreatedslightlydifferentlyhere.
Wediscusscompressedinstructionslater.
Inadditiontothegeneralpurposeregisters,thereisalsoaprogramcounter(PC),whosewidthmatchesthewidthofthegeneralpurposeregisters.
Inthevariantwefocuson,thePCregisteris32bitswide.IntheRV64andRV128variants,thesizeofthePCincreasesandmatchesthesizeofthegeneralpurposeregisters.
RISC-VArchitectureSummary/Porter Page� of� 24 323
Chapter2:BasicOrganization
ManyprocessorISAsincludea“statusregister”,sometimescalleda“conditioncoderegister.”Sucharegisterusuallycontainsbitssuchas:
•Sign/NegativeValue •Zero/Equal •CarryBit •Over2low
InotherISAs,thereisusuallyaCOMPAREinstruction(whichwillsetbitsinthestatusregister)andseveralBRANCHinstructions(whichwilltestthestatusregisterbitsandconditionallyjump).
TheRISC-Vdoesnotincludeanysuch“statusregister.”Instead,theBRANCHinstructionsinRISC-Vwillperformboththetestandtheconditionaljump.
Commentary:Thenormalpatternofmostcodeinothernon-RISC-VarchitecturesistoexecuteaCOMPAREinstructionand,immediatelyafterward,executeaBRANCHinstruction.Theygotogetherandeffectivelyperformasingle“test-and-jump”operation.BycombiningthemintoasingleinstructionintheRISC-Varchitecture,greaterperformanceef2iciencycanbeachievedwheneverthis“test-and-jump”operationmustbeperformed.
Furthermore,byeliminatingthe“statusregister”,theprocessingofinterruptsisstreamlined.Here’swhy:Thecodeinanyinterrupthandlerroutinewillcertainlymodifythestatusregister,sothereforethestatusregistermustbeautomaticallysavedwheneveraninterrupthandlerisinvokedandrestoredwheneverthehandlerreturns.Additionalarchitecturalcomplexitywouldberequiredtosupporttheseoperations(whichmustbeatomic),butthiscomplexityisavoidedwiththeRISC-Vapproach.
ControlandStatusRegisters(CSRs)
TheentirestateofarunningRISC-Vcoreconsistsof:
•Theintegerregistersx1,x2,…x31 •The2loatingpointregisters,if2loatingpointissupported. •TheProgramCounter(PC) •Asetof“ControlandStatusRegisters”(CSRs)
RISC-VArchitectureSummary/Porter Page� of� 25 323
Chapter2:BasicOrganization
The“ControlandStatusRegisters”(CSRs)areusedfortheprotectionandprivilegesystem.TheprivilegesystemisusedbytheOSkerneltoprotectitselfandmanageuser-levelprocesses.
Atanymoment,theRISC-Vprocessorwillbeexecutingeitherinuser-modeorinsupervisor-mode.Kernelcodeisexecutedinsupervisor-modeandapplicationprogramsareexecutedinuser-mode.[Actuallytherearetwonon-usermodes;we’llgointodetailslater.]
Each“ControlandStatusRegister”(CSR)hasaspecialnameandeachhasauniquefunction.Readingand/orwritingaCSRwillhaveaneffectontheprocessoroperation.ReadingandwritingtotheCSRsisusedtoperformalloperationsthatcannotbeperformedwithnormalinstructions.ThebehaviorassociatedwitheachCSRisoftenquitecomplexandeachregistermustbeunderstoodindependently.
Thereisalarge2ileofCSRs:therecanbeupto4,096CSRsde2ined.TheCSRsarereadandwrittenwithjustacoupleofgeneral-purposeinstructions.
AfewdozenCSRsarede2inedinthespec,givingtheirnames,behaviors,andeffects.ThespecleavesopenthepossibilityofmoreCSRsbeingde2inedanditisclearthattherewillbesigni2icantvariationbetweenimplementationsinthedetailsofwhichCSRsareimplementedandexactlyhoweachoneworks.
Fortunately,theCSRscanbeignoredat2irst.TheCSRsareprimarilyusedbyOScode.Forexample,theCSRsareusedforinterruptprocessing,threadswitching,andpagetablemanipulation.
Inorderunderstandtheuser-modeinstructionsetandtocreateuser-levelcode,theCSRscanandshouldbeignored,especiallyonyour2irstintroductiontoRISC-V.
Alignment
A“halfwordaligned”addressisanaddressthatisamultipleof2.Thelastbitofahalfword-alignedaddresswillalwaysbe0.Likewise,a“wordaligned”addressisamultipleof4,andendswiththebits00.Similarlyfor“doublewordalignment”and“quadwordalignment”.
RISC-VArchitectureSummary/Porter Page� of� 26 323
Chapter2:BasicOrganization
Ahalfword-sizedvalueissaidtobe“properlyaligned”ifitisstoredatahalfword-alignedaddress.Aword-sizedvalueisproperlyalignedifitisstoredataword-alignedaddress,andsimilarlyforothersizes.
RISC-VdoesnotrequiredatatobeproperlyalignedfortheLOADandSTOREinstructions.Inotherwords,anyvaluemaybestoredatanyaddress.However,properalignmentisgoodpracticeandencouraged.Inmostimplementations,LOADandSTOREinstructionswillperformmuchfasterwhenthedataisproperlyaligned.
However,thereisanalignmentrequirementforinstructions.
Instructionsare32bitsinlengthandmustbestoredatword-alignedlocations.Anattempttojumptoanunalignedaddressedwillcausean“instructionmisaligned”exception.
Commentary:IntheRISC-V“C”extensionforcompressedinstructions,halfword-sizedinstructionsareallowed.Inthiscase,thealignmentrequirementisslightlyrelaxed:Instructions(whether16or32bits),mustonlybehalfwordaligned,notwordaligned.
Sinceallinstructionsmustbeatleasthalfwordaligned,allbranchtargetaddressandtheprogramcounter(PC)valuesmustbeevennumbers,i.e.,mustendwithasingle0bit.Someinstructionsmakeuseofthefactthattheleastsigni2icantbitisalways0,allowingamoreef2icientencodingofinstructionsbyleavingthis2inalbitimplicit.
RISC-VArchitectureSummary/Porter Page� of� 27 323
Chapter3:Instructions
Instructions
Instructionsare32bits(4bytes,1word)inlengthandmustbestoredatword-alignedmemorylocations.InstructionsfortheRV64andRV128variantsarestill32bitslong.
Somecomputerarchitecturesaresaidtoemploy“two-address”instructions.Forexample,thefollowinginstructionmentionsonlytworegisters.(ThisisnotRISC-V.)
ADD x4,x5 # x4 = x4 + x5
RISC-Visa“threeaddress”architecture.TheADDinstructionlookslikethis:
ADD x4,x5,x7 # x4 = x5 + x7
NotethatintheRISC-Vassemblylanguage,thedestinationistypicallytheleftmostoperand.
TheCompressedInstructionExtension
Thischapterdescribesthe32-bitinstructions.Howeverinthissection,weintroducethe16-bitcompressedinstructions.Wewillcoverthecompressedinstructionsfullyinalaterchapter.
IntheRISC-V“C”extensionforcompressedinstructions,halfword-sizedinstructionsarealsoallowed.Ifthisextensionisimplemented,both16-bitand32-bitinstructionscanbeused.
RISC-VArchitectureSummary/Porter Page� of�28 323
Chapter3:Instructions
The16-bitand32-bitinstructionscanbeintermixed.Thereisno“mode”bittoputtheprocessorinto“compressed-instruction”mode,asthereisinotherprocessors.
Withthisextension,thealignmentrestrictionforinstructionsisrelaxedtohalfword-alignment.
Eachcompressed16-bitinstructionisexactlyequivalentinfunctiontoa32-bitinstruction.However,therearemany32-bitinstructionsforwhichthereisnoequivalent16-bitversion.
Thus,eachcompressed16-bitinstructioncanbethoughtofasashorthandforsomelonger32-bitinstruction.Theideaisthatthemostfrequentlyusedinstructionshave16-bitversions.Byusingtheshorterversions,thesizeofprogramcodecanbereduced.TheRISC-Vdesignershavedoneresearchtodeterminewhichinstructionsarethebestcandidatestobegivencompressedvariants.
Commentary:Reducingthesizeofcoderesultsinincreasedprocessorperformancesinceitallowsmoreinstructionstobecached,reducingthetimetofetchinstructionsfrommainmemory,whichisoftenaperformancebottleneck.
Inatypicalhardwareimplementation,whenacompressedinstructionisfetchedandloadedintotheInstructionRegister(IR)priortobeingexecuted,thehardwarewillnoticethatitisacompressedinstruction.Atthattime,thecompressedinstructionwillimmediatelybeexpandedfrom16bitsintotheequivalent32bitinstruction.Thereafter,thereisnoneedforanyadditionalhardwarelogictosupportthecompressedinstructionset.
Toaddress32registers,5bitsarerequired.Consideraninstructionwhichrefersto3registers,suchas:
ADD x8,x9,x10
Ifthisinstructionisencodedusing5bitsforeachregister,15bitswouldbeconsumed.Withonly16bitstoworkwithincompressedinstructions,thiswillclearlynotwork,sinceonly1bitisleftforencodingtheopcode.
Instead,manyofthecompressedinstructionsrestrictaccesstoonly8oftheregisters.Withthisrestriction,registerscanbeencodedusingonly3bits.
RISC-VArchitectureSummary/Porter Page� of� 29 323
Chapter3:Instructions
Manyofthecompressedinstructionsonlyallowaccesstoregistersx8,x9,…x15.However,certainotherregistersare(byconventionandhabit)usedforspecialpurposes,suchasfora“stackpointer”(x2)or“linkregister/returnaddress”(x1),andsomecompressedinstructionsimplicitlyrefertotheseregisters.
Somecompressedinstructionsallowanyofthe32registerstobespeci2ied,employinga5-bit2ieldtoencodetheregister.Obviously,theseinstructionsdon’thaveroomforthreesuchregisteroperands.
Ingeneral,RISC-Vusesthree-addressinstructions.However,somecompressedinstructionsusethetwo-addressapproach.Forexample,theinstruction
ADD x8,x8,x10 # Add x10 to x8 canberepresentedincompressedformsincethedestinationregisterisalsoanoperand.
Moreprecisely,acompressedinstructionoftheform:
C.ADDW RegD,RegB # RegD = RegD + RegB
canbeexpandedto
ADDW RegD,RegD,RegB # Equivalent 32-bit inst
Inassemblylanguage,compressedinstructionsareidenti2iedwiththe“C.”pre2ix.
Theinstructionopcodehereendswitha“W”suf2ix,indicatingthatitoperatesona32-bitword,asopposedto64-bitor128-bitvalues.
Inatypicalimplementation,acompressedinstructionwillbeexpandedtotheequivalent32-bitinstructionbytheFETCHandDECODEhardware,afteraninstructionisfetchedfrommemory,directlybeforeitisexecuted.
However,theexpansiontoafull-sized32-bitinstructioncouldbeperformedbytheassembler.Thiswouldbeusefulwhenassemblingforamachinethatdoesnotsupportthecompressedextension.
AssemblerNote:Asophisticatedassemblerwillautomaticallygeneratecompressedinstructionswheneveritcan.Theideaisthattheprogrammer(or
RISC-VArchitectureSummary/Porter Page� of� 30 323
Chapter3:Instructions
compiler)willcreateonly32-bitinstructions.Uponencounteringa32-bitinstructionthatcanalsobecodedasa16-bitinstruction,theassemblerwillchoosethesmallerinstruction.Suchanassemblerwillrelieveprogrammers(andcompilers)fromtheburdenofselectingcompressedinstructions,althoughasophisticatedcompilermaybeabletogenerateshortercodesequencesifitisawareofwhichinstructionscanbecompressed.
InstructionEncoding
Eachinstructionis32-bitsinlength.
Inotherwords,theISAisbuiltarounda32-bitinstructionsize.Eachcompressed16-bitinstructioncanalsoberepresentedasaequivalent32-bitinstruction,sothesetof32-bitinstructionfullydescribestheavailableinstructions.Theleastsigni2icant2bitsalwaysindicatewhethertheinstructionis16or32bits,sotheprocessorcanimmediatelytellwhetherornotithasjustloadedacompressedinstruction.
Details:Abitpatternendingin11indicatesa32-bitinstruction.Theotherbitpatterns(00,01,10)indicateacompressedinstruction.Thisapproachreducestheavailablenumberofunique16-bitinstructionsby¼,andreducesthenumberofeffectivebitsavailableforthe32-bitinstructionsto30bits:areasonablecompromise.
Thedistinguishingbitsareintheleastsigni2icantpositions,whichisappropriateforalittleendianarchitecture.
Thereisalsoaprovisionforinstructionsthatarelongerthan32bits,butnosuchinstructionsarespeci2ied.Suchinstructionswouldbeanon-standardextension.
Longerinstructionsmustbeamultipleof16bitsinlength.Anencodingschemeisspeci2ied,whichshowshowinstructionsoflength48bits,64bits,etc.aretohavetheirlengthencoded.Beyondthelengthencoding,furtherinstructionsdetailsareleftunspeci2iedanduptotheimplementersofnon-standarddevices.
RISC-VArchitectureSummary/Porter Page� of� 31 323
Chapter3:Instructions
Forinstructionsoflength32bits,theleastsigni2icant2bitsmustbe11.Inaddition,thenext3bitsmustnotbe111,sincethisindicatesaninstructionoflengthgreaterthan32-bits.
Compressed(16-bit)Instructions:xxxx xxxx xxxx xxAA (whereAA≠11)
Normal(32-bit)Instructions:xxxx xxxx xxxx xxxx xxxx xxxx xxxB BB11 (whereBBB≠111)
LongerInstructions:xxxx xxxx ... xxxx xxxx xxxx xxxx xxx1 1111
Consulttheof2icialdocumentationfordetailsforlongerinstructions.
Inthissection,wewilldiscussonly32-bitinstructions.
Sincethereare32registers,a2ieldwithawidthof5bitsisusedtoencodeeachregisteroperandwithininstructions.
Theprototypicalinstructionmentions3registers,whicharesymbolicallycalled
RegD–Thedestination Reg1–The2irstoperand Reg2–Thesecondoperand
Inaddition,anumberofinstructionscontainimmediatedata.Theimmediatedatavalueisalwayssign-extendedtoyielda32-bitvalue.
Thefollowingsizesofimmediatedatavaluesareused:
Notation Fieldwidth NumberofValues Rangeofvalues Immed-12 12bits 212=4Ki -2,048..+2,047 Immed-20 20bits 220=1Mi -524,288..+524,287
Commentary:Loadinganarbitrary32bitvalueintoaregisterusinginstructionswhicharethemselvesonly32-bitsnecessarilyrequirestwoinstructions.
Notethat12+20equals32.InRISC-V,anarbitrary32-bitvaluecanbesplitintotwopiecesandloadedintoaregisterwithtwoinstructions.Thedesignershavechosenthe12-20splitforanumberofreasons.Manycommonconstantsand
RISC-VArchitectureSummary/Porter Page� of� 32 323
Chapter3:Instructions
offsetsfallwithintherangeofthesmallerimmed-12values.Also,theRISC-Vvirtualmemorysupportsapagesizeof4KiBytes.
Herearetheinstructionformats:
R-typeinstructions: Operands: RegD,Reg1,Reg2 Example:
ADD x4,x6,x8 # x4 = x6+x8 I-typeinstructions: Operands: RegD,Reg1,Immed-12 Examples:
ADDI x4,x6,123 # x4 = x6+123LW x4,8(x6) # x4 = Mem[8+x6]
S-typeinstructions: Operands: Reg1,Reg2,Immed-12 Example:
SW x4,8(x6) # Mem[8+r6] = x4 (word) B-typeinstructions(avariantofS-type): Operands: Reg1,Reg2,Immed-12 Example:
blt x4,x6,loop # if x4<x6, goto offset(pc) U-typeinstructions: Operands: RegD,Immed-20 Example:
LUI x4,0x12AB7 # x4 = value<<12AUIPC x4,0x12AB7 # x4 = (value<<12) + pc
J-typeinstructions(avariantofU-type): Operands: RegD,Immed-20 Example:
jal x4,foo # call: pc=offset+pc; x4=ret addr
TheonlydifferencebetweenS-typeandB-typeinstructionsishowthe12-bitimmediatevalueishandled.InanB-typeinstruction,theimmediatevalueis
RISC-VArchitectureSummary/Porter Page� of� 33 323
Chapter3:Instructions
multipliedby2(i.e.,shiftedleft1bit)beforebeingused.IntheS-typeinstruction,thevalueisnotshifted.
Inbothcases,sign-extensionoccurs.Inparticular,thebitstotheleftaresynthesizedby2illingtheminwithacopyofthemostsigni2icantbitactuallypresent.
S-typeimmediatevalues: Actualvalueused(wheres=sign-extension):
ssss ssss ssss ssss ssss VVVV VVVV VVVV Rangeofvalues:
-2,048..+2,0470xFFFFF800..0x000007FF
B-typeimmediatevalues: Actualvalueused(wheres=sign-extension):
ssss ssss ssss ssss sssV VVVV VVVV VVV0 Rangeofvalues:
-4,096..+4,094(inmultiplesof2)0xFFFFF000..0x00000FFE
TheonlydifferencebetweenU-typeandJ-typeinstructionsishowthe20-bitimmediatevalueishandled.InaU-typeinstruction,theimmediatevalueisshiftedleftby12bitstogivea32bitvalue.Inotherwords,theimmediatevalueisplacedintheuppermost20bits,andthelower12bitsarezero-2illed.
InaJ-typeinstruction,theimmediatevalueisshiftedleftby1bit(i.e.,multipliedby2).Itisalsosign-extended.
U-typeimmediatevalues: Actualvalueused:
VVVV VVVV VVVV VVVV VVVV 0000 0000 0000 Thevalueisalwaysalignedtoamultipleof4,096. J-typeimmediatevalues: Actualvalueused(wheres=sign-extension):
ssss ssss sssV VVVV VVVV VVVV VVVV VVV0 Rangeofvalues:
-1,048,576..+1,048,574(inmultiplesof2)0xFFF00000..0x000FFFFE
Next,wegivetheencodingsforthedifferenttypesofinstructions.Inthefollowing,eachletterrepresentsasinglebit,accordingtothefollowinglegend:
RISC-VArchitectureSummary/Porter Page� of� 34 323
Chapter3:Instructions
DDDDD=RegD 11111=Reg1 22222=Reg2 VVVVV=Immediatevalue XXXXX=Op-code/functioncode
R-typeinstructions: Operands: RegD,Reg1,Reg2 Encoding:
XXXX XXX2 2222 1111 1XXX DDDD DXXX XXXX I-typeinstructions: Operands: RegD,Reg1,Immed-12 Encoding:
VVVV VVVV VVVV 1111 1XXX DDDD DXXX XXXX S-typeandB-typeinstructions: Operands: Reg1,Reg2,Immed-12 Encoding:
VVVV VVV2 2222 1111 1XXX VVVV VXXX XXXX U-typeandJ-typeinstructions: Operands: RegD,Immed-20 Encoding:
VVVV VVVV VVVV VVVV VVVV DDDD DXXX XXXX
Commentary:NotethatReg1,Reg2,andRegDoccurinthesameplaceinallinstructionformats.
Thissimpli2iesthechipcircuitry,whichwouldbemorecomplexif,forexample,RegDwassometimesinonepartoftheinstructionandothertimesinadifferentplaceintheinstruction.
Aconsequenceofkeepingtheregister2ieldsinthatsameplaceisthatimmediatedatavaluesinaninstructionaresometimesnotalwaysincontiguousbits.AndnotethatthebitsencodingtheimmediatevaluesintheS-typeandB-typeinstructionsarenotcontiguous.
RISC-VArchitectureSummary/Porter Page� of� 35 323
Chapter3:Instructions
I-type: VVVV VVVV VVVV ---- ---- ---- ---- ——
S-typeandB-type: VVVV VVV- ---- ---- ---- VVVV V--- ——
U-typeandJ-type: VVVV VVVV VVVV VVVV VVVV ---- ---- ——
WhilebothS-typeandB-typehavea12-bitimmediatevalue,thepreciseorderofthebitsinthose2ieldsdiffersbetweenthetwoformats.Whileyouwouldreasonablyassumethebitsareinorder,theyareinfactscrambledupalittleintheB-typeinstructionformat.Similarly,thebitsofthe20-bitimmediatevalue2ieldintheJ-typeformatsarescrambled,ascomparedtoU-type.Consultthespecfordetails.
Immediatevaluesarealwayssign-extended.Althoughtheimmediatevaluecanbedifferentsizesandmaybebrokenintomultiple2ields,thesignbitisalwaysinthesameinstructionbit(namelytheleftmostbit,bit31).Thissimpli2iesandspeedssign-extensioncircuitry.
User-LevelInstructions
TheRISC-Vspeci2icationbreakstheinstructionsintotwobroadcategories:“user-level”instructionsand“privileged”instructions.Webeginbydescribingtheindividualuser-levelinstructions.We’llcovertheprivilegedinstructionslater.
Theinstructionsetisquitesmallandtightlyde2ined;therearenotasmanyinstructionsintheRISC-Varchitectureasinotherarchitectures.
Severalfamiliarinstructions(whichyouwould2indinotherarchitectures)aresimplyspecialcasesofmoregeneralRISC-Vinstructions.
Theof2icialRISC-Vdocumentationfocusesontheactualinstructionsandmentionsspecialcasesonlybrie2ly,inconnectionwiththemoregeneralinstruction.Wewillseparateoutsomeofthesespecialcasesandpresentthemasusefulinstructionsintheirownright,mentioningthattheyareactuallyimplementedasspecialcasesof
RISC-VArchitectureSummary/Porter Page� of� 36 323
Chapter3:Instructions
otherinstructions.Also,someotheruseful“instructions”areactuallyexpandedbytheassemblerintoasequenceoftwosimplerinstructions.
TheRISC-Vstandarddescribethreedifferentmachinesizes:
RV32 32-bitregisters RV64 64-bitregisters RV128 128-bitregisters
AnyparticularRISC-Vhardwarechipwillimplementonlyoneoftheabovestandards.However,eachsizeisstrictlymorepowerfulthanthesmallersizes.AnRV64chipwillincludealltheinstructionsnecessarytomanipulate32-bitdata,soitcaneasilydoanythinganRV32chipcan.Likewise,anRV128chipisastrictsupersetoftheRV32andRV64chips.
Youshouldreadtheinstructiondescriptionsbelowassumingthattheregisterwidthis32bits.However,thesamedescriptionsapplyandmakesensefor64-bitor128-bitregisters,exceptwherespeci2icallynoted.
Regarding64-bitand128-bitextensions:Forthelargermachinesizes,thebasic32-bitinstructionsworkidentically.
Insummary,smallerdatavaluesaresign-extendedto2itintolargerregisters.Thus,32-bitvaluesaresign-extendedto64bitsonanRV64machine.Thismeansyoucanrun32-bitcodedirectlyona64-bitmachinewithnochanges!
Ofcourse,someinstructionsfromaRV64machinewillnotbepresentonaRV32machine,socodethattrulyrequires64-bitswon’twork.
Whenusinga64-bitmachinetoexecutecodewrittenfora32-bitmachine,keepinmindthattheupper32bitsofregisterswillnormallycontainthesignextension,andnotazeroextension.
Likewise,32-bitand64-bitvaluesaresign-extendedto128-bitsonanRV128machine.
The12-bitand20-bitimmediatevaluesininstructionsaresign-extendedtothebasicmachinesize.ThisincludestheimmediatevaluesinthetwoU-typeinstructions(LUIandAUIPC).
RISC-VArchitectureSummary/Porter Page� of� 37 323
Chapter3:Instructions
Forinstructionsthatarenotpresentonallmachinesizes,wemakespecialnotes.Otherwise,theinstructionispresentinallvariants.
Theof2icialdocumentationdoesnotdwellontheassemblernotation.Theassemblernotationpresentedhereissometimesa“bestguess”ofwhatisacceptedbythetypicalRISC-Vassemblertools.
ArithmeticInstructions(ADD,SUB,…)
AddImmediate
GeneralForm:ADDI RegD,Reg1,Immed-12
Example:ADDI x4,x9,123 # x4 = x9 + 0x0000007B
Description:Theimmediatevalue(asign-extended12-bitvalue,i.e.,-2,048..+2,047)isaddedtothecontentsofReg1andtheresultisplacedinRegD.
Comments:Thereisno“subtractimmediate”instructionbecausesubtractionisequivalenttoaddinganegativevalue.
Encoding: ThisisanI-typeinstruction.
Commentary:Thereisnodistinctionbetweensignedandunsignedaddition;thesamehardwareyieldscorrectresultregardlessofwhetherbothvaluesareconsideredtorepresentsignedvaluesorbothareconsideredtorepresentunsignedvalues.
Theonlydistinctionbetweensignedandunsignedadditionisinhowover2lowistobedetected.Thesameistrueofsubtraction.ThisisnotuniquetoRISC-V;itistrueofallcomputers.
Over2lowisignoredinRISC-V.Whenover2lowoccurs,noexceptionsareraisedandnostatusbitsareset.Thissatis2iestheneedsofthe“C”language,butmaybeproblematicforlanguagesthatmandateover2lowdetection.
RISC-VArchitectureSummary/Porter Page� of� 38 323
Chapter3:Instructions
Thus,RISC-Vhasonlyonesortofaddition;RISC-Vdoesnotdistinguishbetweensignedandunsignedaddition.Likewise,thereisnodistinctionbetweensignedandunsignedsubtraction.
Notethatover2lowdetectioncanbeprogrammedfairlysimply.Forexample,thefollowingcodesequencewilladdtwounsignedvaluesandbranchonover2low: ADDI RegD,Reg1,Immed-12
BLTU RegD,Reg1,overflow ifRegD<UReg1thenbranch
Thefollowingcodesequencewilladdtwosignedvaluesandbranchonover2low.However,itwillonlyfunctioncorrectlyaslongasweknowthattheimmediatevalueispositive. ADDI RegD,Reg1,Immed-12
BLT RegD,Reg1,overflow ifRegD<SReg1thenbranch
Inthegeneralcaseofsignedaddition,youcanusethefollowingcodesequence,whichwillrequireacoupleofadditionalregisters: ADD RegD,Reg1,Reg2
SLTI Reg3,Reg2,0 Reg3=(Reg2<S0)?1:0 SLT Reg4,RegD,Reg1 Reg4=(RegD<SReg1)?1:0 BNE Reg3,Reg4,overflow ifReg3≠Reg4thenbranch
AddImmediateWord
GeneralForm:ADDIW RegD,Reg1,Immed-12
Example:ADDIW x4,x9,123 # x4 = x9 + 0x0000007B
Description:Thisinstructionisonlypresentin64-bitand128-bitmachines.Theoperationisperformedusing32-bitarithmetic.
Theimmediatevalue(asign-extended12-bitvalue,i.e.,-2,048..+2,047)isaddedtothecontentsofReg1.Theresultisthentruncatedto32-bits,signed-extendedto64or128bitsandplacedinRegD.
Encoding: ThisisanI-typeinstruction.
RISC-VArchitectureSummary/Porter Page� of� 39 323
Chapter3:Instructions
RV32/RV64/RV128:TheADDIinstructionispresentinallmachinevariations.TheADDIinstructionisusedtoperform32-bitadditionona32-bitmachine,64-bitadditionona64-bitmachine,and128-bitadditionona128-bitmachine.
Toperform32-bitadditionona64-bitor128-bitmachine,theADDIWinstructionisused.
Toperform64-bitadditionona128-bitmachine,theADDIDinstructionisused.
Whenworkingwith32-bitdataona64-bitmachine,theupperhalfoftheregisterwillcontainnothingbutthesignextensionofthelowerhalf.Thisistrueregardlessofwhetherthedatainthelower32bitsistobeinterpretedasasignedinteger,anunsignedinteger,orsomethingotherthananinteger.
TounderstandhowtheADDIinstructiondiffersfromADDIWona64-bitmachine,considerthisexample:
0x 0000 0000 0000 0001 + 0x 0000 0000 7FFF FFFF -------------------------- 0x 0000 0000 8000 0000 Result of ADDI
0x 0000 0000 0000 0001 + 0x 0000 0000 7FFF FFFF -------------------------- 0x FFFF FFFF 8000 0000 Result of ADDIW
AddImmediateDouble
GeneralForm:ADDID RegD,Reg1,Immed-12
Example:ADDID x4,x9,123 # x4 = x9 + 0x0000007B
Description:Thisinstructionisonlypresentin128-bitmachines.Theoperationisperformedusing64-bitarithmetic.
RISC-VArchitectureSummary/Porter Page� of� 40 323
Chapter3:Instructions
Theimmediatevalue(asign-extended12-bitvalue,i.e.,-2,048..+2,047)isaddedtothecontentsofReg1.Theresultisthentruncatedto64-bits,signed-extendedto128bitsandplacedinRegD.
Encoding: ThisisanI-typeinstruction.
SignExtension:Considertheproblemofsign-extendinga32-bitvaluetoa64-bitvalueonaRV64machine.Forexample,the64-bitvalue:
0x 3B4C 204E 92A7 5321
shouldbe“coerced”to2itwithintherangeofa32-bitsignedinteger,resultingin:
0x FFFF FFFF 92A7 5321
TheADDIWinstructioncandothis.Simplyadding0tothevaluewillproducethedesiredresult.
Likewise,theADDIDinstructioncanbeusedtocoercea128-bitvalueintoa64-bitvalue.
Add
GeneralForm:ADD RegD,Reg1,Reg2
Example:ADD x4,x9,x13 # x4 = x9+x13
Description:ThecontentsofReg1isaddedtothecontentsofReg2andtheresultisplacedinRegD.
Comments:Thereisnodistinctionbetweensignedandunsigned.Over2lowisignored.
Encoding: ThisisanR-typeinstruction.
RV32/RV64/RV128:TheADDinstructionispresentinallmachinevariations.TheADDinstructionisusedtoperform32-bitadditionona32-bitmachine,64-bitadditionona64-bitmachine,and128-bitadditionona128-bitmachine.
RISC-VArchitectureSummary/Porter Page� of� 41 323
Chapter3:Instructions
Toperform32-bitadditionona64-bitor128-bitmachine,theADDWinstructionisused.
Toperform64-bitadditionona128-bitmachine,theADDDinstructionisused.
AddWord
GeneralForm:ADDW RegD,Reg1,Reg2
Example:ADDW x4,x9,x13 # x4 = x9+x13
Description:ThecontentsofReg1isaddedtothecontentsofReg2andtheresultisplacedinRegD.
RV32/RV64/RV128:Thisinstructionisonlypresentin64-bitand128-bitmachines.Theoperationisperformedusing32-bitarithmetic.
Comments:Thereisnodistinctionbetweensignedandunsigned.Over2lowbeyond32-bitsisignored.The32-bitresultissign-extendedto2illtheupperbitsofthedestinationregister.
Encoding: ThisisanR-typeinstruction.
AddDouble
GeneralForm:ADDD RegD,Reg1,Reg2
Example:ADDD x4,x9,x13 # x4 = x9+x13
Description:ThecontentsofReg1isaddedtothecontentsofReg2andtheresultisplacedinRegD.
RV32/RV64/RV128:Thisinstructionisonlypresentin128-bitmachines.Theoperationisperformedusing64-bitarithmetic.
Comments:
RISC-VArchitectureSummary/Porter Page� of� 42 323
Chapter3:Instructions
Thereisnodistinctionbetweensignedandunsigned.Over2lowbeyond64-bitsisignored.The64-bitresultissign-extendedto2illtheupperbitsofthedestinationregister.
Encoding: ThisisanR-typeinstruction.
Commentary:TheRISC-Vauthorsdecidedtode2inemanyinstructionsgenerally,withoutspecifyinghowmanybitstheyoperateon.Instead,thenumberofbitsoperatedonbytheinstructionisdeterminedbytheregistersizeofthemachine.
Forexample,theADDinstructionperforms32-bitadditiononaRV32machine,64-bitadditiononanRV64machine,and128-bitadditiononanRV128machine.
Toperform32-bitadditionona64-bitmachine,anewinstructionisincluded,sincetheupper32bitsoftheoperandsmustbeignoredandtheresultmusthavetheupper32bits2illedwiththepropersign-extension.
Soforthe64-bitmachine(RV64),theyaddedasecondinstruction(ADDW)toadd32-bitquantities.ForRV128,theyaddedathirdinstruction(ADDD)toadd64-bitquantities.
Thefollowingtableshowshowmanybitseachinstructionoperatesonandhighlightsaslightlackoforthogonality:
SizeofMachine 32-bit 64-bit 128-bit BasicInstructionSet: ADD 32 64 128 RV64adds: ADDW 32 32 RV128adds: ADDD 64
Analternateapproachisthefollowing:(ThisisnotRISC-V.)
SpecifythatthebasicADDinstructionwillperform32-bitadditionregardlessofregistersize;for64-bitand128-bitmachines,thisinstructionwillignoretheupperbitsoftheoperandsandsign-extendtheresult.Then,forRV64andRV128machines,anewinstruction(notpresentonRV32machines)isincludedto
RISC-VArchitectureSummary/Porter Page� of� 43 323
Chapter3:Instructions
perform64bitaddition.Finally,forRV128machines,athirdinstructionisincludedtoperform128bitaddition.
Toillustratethisalternateapproach,wecanmakeupreasonablenamesforthesehypotheticalinstructionsandshowwhichmachineshavewhichinstructionswiththefollowingtable: SizeofMachine 32-bit 64-bit 128-bit BasicInstructionSet: ADDW 32 32 32 RV64adds: ADDD 64 64 RV128adds: ADDQ 128
Thesetwoapproachesareequivalent,havingthesameinstructions,whichonlydifferintheirnames.Thenameschosenforinstructionsisanissueorthogonaltohowtheinstructionsareencoded.TheRISC-Vdesignersselectedthe2irstnamingschemetomirrortheinstructionopcodeencodingpatternstheychose.
Thisissueappliestothefollowinginstructiongroups:
ADDI ADD SUBI SUB NEG SLLI SLL SRLI SRL SRAI SRA LD ST MUL DIV DIVU REM
RISC-VArchitectureSummary/Porter Page� of� 44 323
Chapter3:Instructions
REMU
Ormorespeci2ically:
RV32 RV64 RV128 ADDI ADDIW ADDID ADD ADDW ADDD SUBI SUBIW SUBID SUB SUBW SUBD NEG NEGW NEGD SLLI SLLIW SLLID SLL SLLW SLLD SRLI SRLIW SRLID SRL SRLW SRLD SRAI SRAIW SRAID SRA SRAW SRAD LD LDW LDD ST STW STD MUL MULW MULD DIV DIVW DIVD DIVU DIVUW DIVUD REM REMW REMD REMU REMUW REMUD
Subtract
GeneralForm:SUB RegD,Reg1,Reg2
Example:SUB x4,x9,x13 # x4 = x9-x13
Description:ThecontentsofReg2issubtractedfromthecontentsofReg1andtheresultisplacedinRegD.
Comments:Thereisnodistinctionbetweensignedandunsigned.Over2lowisignored.
Encoding: ThisisanR-typeinstruction.
RISC-VArchitectureSummary/Porter Page� of� 45 323
Chapter3:Instructions
SubtractWord
GeneralForm:SUBW RegD,Reg1,Reg2
Example:SUBW x4,x9,x13 # x4 = x9-x13
Description:ThecontentsofReg2issubtractedfromthecontentsofReg1andtheresultisplacedinRegD.
RV32/RV64/RV128:Thisinstructionisonlypresentin64-bitand128-bitmachines.Theoperationisperformedusing32-bitarithmetic.
Comments:Thereisnodistinctionbetweensignedandunsigned.Over2lowbeyond32-bitsisignored.The32-bitresultissign-extendedto2illtheupperbitsofthedestinationregister.
Encoding: ThisisanR-typeinstruction.
SubtractDouble
GeneralForm:SUBD RegD,Reg1,Reg2
Example:SUBD x4,x9,x13 # x4 = x9-x13
Description:ThecontentsofReg2issubtractedfromthecontentsofReg1andtheresultisplacedinRegD.
RV32/RV64/RV128:Thisinstructionisonlypresentin128-bitmachines.Theoperationisperformedusing64-bitarithmetic.
Comments:Thereisnodistinctionbetweensignedandunsigned.Over2lowbeyond64-bitsisignored.The64-bitresultissign-extendedto2illtheupperbitsofthedestinationregister.
Encoding: ThisisanR-typeinstruction.
RISC-VArchitectureSummary/Porter Page� of� 46 323
Chapter3:Instructions
SignExtendWordtoDoubleword
GeneralForm:SEXT.W RegD,Reg1
Example:SEXT.W x4,x9 # x4 = Sign-extend(x9)
Description:Thisinstructionisonlyavailablefor64-bitand128-bitmachines.
Thevalueinthelower32bitsofReg1issigned-extendedto64or128bitsandplacedinRegD.
Comments:Thisinstructionisusefulwhena32-bitsignedvaluemustbe“coerced”toalargervalueon64-bitand128-bitmachine.
Encoding:Thisisaspecialcaseofamoregeneralinstruction.Thisinstructionisassembledidenticallyto:
ADDIW RegD,Reg1,0
Commentary:TheRISC-Vdocumentationusestwodistinctsuf2ixnotationstodenotethesizeofinstructions.Forexample:
ADDIWSEXT.W
SignExtendDoublewordtoQuadword
GeneralForm:SEXT.D RegD,Reg1
Example:SEXT.D x4,x9 # x4 = Sign-extend(x9)
Description:Thisinstructionisonlyavailablefor128-bitmachines.
Thevalueinthelower64bitsofReg1issigned-extendedto128bitsandplacedinRegD.
Encoding:Thisisaspecialcaseofamoregeneralinstruction.Thisinstructionisassembledidenticallyto:
RISC-VArchitectureSummary/Porter Page� of� 47 323
Chapter3:Instructions
ADDID RegD,Reg1,0
Negate
GeneralForm:NEG RegD,Reg2
Example:NEG x4,x9 # x4 = -x9
Description:ThecontentsofReg2isarithmeticallynegatedandtheresultisplacedinRegD.
Comments:Theresultiscomputedbysubtractionfromzero.Over2lowcanonlyoccurwhenthemostnegativevalueisnegated.Over2lowisignored.
Encoding:Thisisaspecialcaseofamoregeneralinstruction.Thisinstructionisassembledidenticallyto:
SUB RegD,x0,Reg2
NegateWord
GeneralForm:NEGW RegD,Reg2
Example:NEGW x4,x9 # x4 = -x9
Description:ThecontentsofReg2isarithmeticallynegatedandtheresultisplacedinRegD.
RV64/RV128:Thisinstructionisonlypresentin64-bitand128-bitmachines.Theoperationisperformedusing32-bitarithmetic,whereastheNEGinstructionoperateson64-bitor128-bitquantitiesinthelargermachines.
Encoding:Thisisaspecialcaseofamoregeneralinstruction.Thisinstructionisassembledidenticallyto:
SUBW RegD,x0,Reg2
RISC-VArchitectureSummary/Porter Page� of� 48 323
Chapter3:Instructions
NegateDoubleword
GeneralForm:NEGD RegD,Reg2
Example:NEGD x4,x9 # x4 = -x9
Description:ThecontentsofReg2isarithmeticallynegatedandtheresultisplacedinRegD.
RV64/RV128:Thisinstructionisonlypresentin128-bitmachines.Theoperationisperformedusing64-bitarithmetic.
Encoding:Thisisaspecialcaseofamoregeneralinstruction.Thisinstructionisassembledidenticallyto:
SUBD RegD,x0,Reg2
LogicalInstructions(AND,OR,XOR,…)
AndImmediate
GeneralForm:ANDI RegD,Reg1,Immed-12
Example:ANDI x4,x9,123 # x4 = x9 & 0x0000007B
Description:Theimmediatevalue(asign-extended12-bitvalue,i.e.,-2,048..+2,047)islogicallyANDedwiththecontentsofReg1andtheresultisplacedinRegD.
Encoding: ThisisanI-typeinstruction.
And
GeneralForm:AND RegD,Reg1,Reg2
RISC-VArchitectureSummary/Porter Page� of� 49 323
Chapter3:Instructions
Example:AND x4,x9,x13 # x4 = x9 & x13
Description:ThecontentsofReg1islogicallyANDedwiththecontentsofReg2andtheresultisplacedinRegD.
Encoding: ThisisanR-typeinstruction.
OrImmediate
GeneralForm:ORI RegD,Reg1,Immed-12
Example:ORI x4,x9,123 # x4 = x9 | 0x0000007B
Description:Theimmediatevalue(asign-extended12-bitvalue,i.e.,-2,048..+2,047)islogicallyORedwiththecontentsofReg1andtheresultisplacedinRegD.
Encoding: ThisisanI-typeinstruction.
Or
GeneralForm:OR RegD,Reg1,Reg2
Example:OR x4,x9,x13 # x4 = x9 | x13
Description:ThecontentsofReg1islogicallyORedwiththecontentsofReg2andtheresultisplacedinRegD.
Encoding: ThisisanR-typeinstruction.
XorImmediate
GeneralForm:XORI RegD,Reg1,Immed-12
RISC-VArchitectureSummary/Porter Page� of� 50 323
Chapter3:Instructions
Example:XORI x4,x9,123 # x4 = x9 ^ 0x0000007B
Description:Theimmediatevalue(asign-extended12-bitvalue,i.e.,-2,048..+2,047)islogicalXORedwiththecontentsofReg1andtheresultisplacedinRegD.
Encoding: ThisisanI-typeinstruction.
Xor
GeneralForm:XOR RegD,Reg1,Reg2
Example:XOR x4,x9,x13 # x4 = x9 ^ x13
Description:ThecontentsofReg1islogicallyXORedwiththecontentsofReg2andtheresultisplacedinRegD.
Encoding: ThisisanR-typeinstruction.
Not
GeneralForm:NOT RegD,Reg1
Example:NOT x4,x9 # x4 = ~x9
Description:ThecontentsofReg1isfetchedandeachofthebitsis2lipped.TheresultingvalueiscopiedintoRegD.
Encoding:Thisisaspecialcaseofamoregeneralinstruction.Thisinstructionisassembledidenticallyto:
XORI RegD,Reg1,-1 # Note that -1 = 0xFFFFFFFF
RISC-VArchitectureSummary/Porter Page� of� 51 323
Chapter3:Instructions
ShiftingInstructions(SLL,SRL,SRA,…)
ShiftLeftLogicalImmediate
GeneralForm:SLLI RegD,Reg1,Immed-12
Example:SLLI x4,x9,5 # x4 = x9<<5
Description:Theimmediatevaluedeterminesthenumberofbitstoshift.ThecontentsofReg1isshiftedleftthatmanybitsandtheresultisplacedinRegD.Theshiftvalueisnotadjusted,i.e.,0meansnoshiftingisdone.(Apparentlyallvalues,including0,areallowed.???)
RV64/RV128:For32-bitmachines,theshiftamountmustbewithin0..31.For64-bitmachines,theshiftamountmustbewithin0..63.For128-bitmachines,theshiftamountmustbewithin0..127.
Encoding: ThisisanI-typeinstruction.
ShiftLeftLogical
GeneralForm:SLL RegD,Reg1,Reg2
Example:SLL x4,x9,x13 # x4 = x9<<x13
Description:RegisterReg2containstheshiftamount.ThecontentsofReg1isshiftedleftandtheresultisplacedinRegD.
RV64/RV128:For32-bitmachines,theshiftamountmustbewithin0..31.For64-bitmachines,theshiftamountmustbewithin0..63.For128-bitmachines,theshiftamountmustbewithin0..127.
Encoding: ThisisanR-typeinstruction.
RISC-VArchitectureSummary/Porter Page� of� 52 323
Chapter3:Instructions
ShiftRightLogicalImmediate
GeneralForm:SRLI RegD,Reg1,Immed-12
Example:SRLI x4,x9,5 # x4 = x9>>5
Description:Theimmediatevaluedeterminesthenumberofbitstoshift.ThecontentsofReg1isshiftedrightthatmanybitsandtheresultisplacedinRegD.Theshiftis“logical”,i.e.,zerobitsarerepeatedlyshiftedinonthemost-signi2icantend.
RV64/RV128:For32-bitmachines,theshiftamountmustbewithin0..31.For64-bitmachines,theshiftamountmustbewithin0..63.For128-bitmachines,theshiftamountmustbewithin0..127.
Encoding: ThisisanI-typeinstruction.
ShiftRightLogical
GeneralForm:SRL RegD,Reg1,Reg2
Example:SRL x4,x9,x13 # x4 = x9>>x13
Description:RegisterReg2containstheshiftamount.ThecontentsofReg1isshiftedrightandtheresultisplacedinRegD.Theshiftis“logical”,i.e.,zerobitsarerepeatedlyshiftedinonthemost-signi2icantend.
RV64/RV128:For32-bitmachines,theshiftamountmustbewithin0..31.For64-bitmachines,theshiftamountmustbewithin0..63.For128-bitmachines,theshiftamountmustbewithin0..127.
Encoding: ThisisanR-typeinstruction.
RISC-VArchitectureSummary/Porter Page� of� 53 323
Chapter3:Instructions
ShiftRightArithmeticImmediate
GeneralForm:SRAI RegD,Reg1,Immed-12
Example:SRAI x4,x9,5 # x4 = x9>>>5
Description:Theimmediatevaluedeterminesthenumberofbitstoshift.ThecontentsofReg1isshiftedrightthatmanybitsandtheresultisplacedinRegD.Theshiftis“arithmetic”,i.e.,thesignbitisrepeatedlyshiftedinonthemost-signi2icantend.
RV64/RV128:For32-bitmachines,theshiftamountmustbewithin0..31.For64-bitmachines,theshiftamountmustbewithin0..63.For128-bitmachines,theshiftamountmustbewithin0..127.
Encoding: ThisisanI-typeinstruction.
ShiftRightArithmetic
GeneralForm:SRA RegD,Reg1,Reg2
Example:SRA x4,x9,x13 # x4 = x9>>>x13
Description:RegisterReg2containstheshiftamount.ThecontentsofReg1isshiftedrightandtheresultisplacedinRegD.Theshiftis“arithmetic”,i.e.,thesignbitisrepeatedlyshiftedinonthemost-signi2icantend.
RV64/RV128:For32-bitmachines,theshiftamountmustbewithin0..31.For64-bitmachines,theshiftamountmustbewithin0..63.For128-bitmachines,theshiftamountmustbewithin0..127.
Encoding: ThisisanR-typeinstruction.
RISC-VArchitectureSummary/Porter Page� of� 54 323
Chapter3:Instructions
RV32/RV64/RV128:Theshiftoperationsde2inedaboveworkontheentireregister,regardlessofitssize.Fora64-bitmachine,theyshiftall64bits;fora128-bitmachine,theyshiftall128bits.
Notethatwhenusing64-bitregisterstocontain32-bitvalues,wecannotsimplyuse64-bitshiftoperations.Weneedsomenewinstructions.
Toillustratetheproblem,considertheresultofright-shiftingthefollowing32-bitvalueby5ona32-bitmachine: before: 0x 8000 0000
shiftrightlogical: 0x 0400 0000shiftrightarithmetic: 0x FC00 0000
Whathappensifwetrytouse64-bitshiftinstructions?
Ifthevalueisrepresentedasa64-bitsign-extendedvalue,wegetthewrongvalueforthelogicalshift: before: 0x FFFF FFFF 8000 0000
shiftrightlogical: 0x 07FF FFFF FC00 0000shiftrightarithmetic: 0x FFFF FFFF FC00 0000
Ontheotherhand,ifthevalueisrepresentedasa64-bitzero-extendedvalue,wegetthewrongvalueforthearithmeticshift: before: 0x 0000 0000 8000 0000
shiftrightlogical: 0x 0000 0000 0400 0000shiftrightarithmetic: 0x 0000 0000 0400 0000
Asaconsequence,severalnewshiftinstructionsareneededfor64-bitmachinestoperform32-bitshiftingcorrectly.FortheRV64architecturalextension,thefollowinginstructionsareadded: SLLIW RegD,Reg1,Immed-12
SLLW RegD,Reg1,Reg2SRLIW RegD,Reg1,Immed-12SRLW RegD,Reg1,Reg2SRAIW RegD,Reg1,Immed-12SRAW RegD,Reg1,Reg2
wheretheshiftamountsarerestrictedto5bits(i.e.,0..31).
RISC-VArchitectureSummary/Porter Page� of� 55 323
Chapter3:Instructions
Thesenewinstructionsperformashiftofbetween0and31bitsandtheyoperateonlyonthelowerorder32bits,leavingtheupper32bitsunmodi2ied.
Forexample,whenshiftingrightby5wegetthecorrectvalueinthelow-order32-bitsregardlessofwhatisintheupper-bits. before: 0x 89AB CDEF 8000 0000
SRLW(logical): 0x 89AB CDEF 0400 0000SRAW(arithmetic): 0x 89AB CDEF FC00 0000
Similarly,whenwegotoa128-bitmachine,anothergroupofshiftinstructionsisneededtoperform64-bitshiftingcorrectly.FortheRV128architecture,thefollowinginstructionsareadded: SLLID RegD,Reg1,Immed-12
SLLD RegD,Reg1,Reg2SRLID RegD,Reg1,Immed-12SRLD RegD,Reg1,Reg2SRAID RegD,Reg1,Immed-12SRAD RegD,Reg1,Reg2
wheretheshiftamountsarerestrictedto6bits(i.e.,0..63).
ThesenewRV128instructionsperformashiftofbetween0and63bitsandtheyoperateonlyonthelowerorder64bits,leavingtheupper64bitsunmodi2ied.
ShiftInstructionsforRV64andRV128
GeneralForm: SLLIW RegD,Reg1,Immed-12 # RV64 and RV128 only
SLLW RegD,Reg1,Reg2 # RV64 and RV128 onlySRLIW RegD,Reg1,Immed-12 # RV64 and RV128 onlySRLW RegD,Reg1,Reg2 # RV64 and RV128 onlySRAIW RegD,Reg1,Immed-12 # RV64 and RV128 onlySRAW RegD,Reg1,Reg2 # RV64 and RV128 only
SLLID RegD,Reg1,Immed-12 # RV128 onlySLLD RegD,Reg1,Reg2 # RV128 onlySRLID RegD,Reg1,Immed-12 # RV128 onlySRLD RegD,Reg1,Reg2 # RV128 onlySRAID RegD,Reg1,Immed-12 # RV128 onlySRAD RegD,Reg1,Reg2 # RV128 only
RISC-VArchitectureSummary/Porter Page� of� 56 323
Chapter3:Instructions
Comment: Seeabove.
MiscellaneousInstructions
Nop
GeneralForm:NOP
Example:NOP # Do nothing
Description:Thisinstructionhasnoeffect.
Comment:Thereareseveralwaystoencodethe“nop”operation.UsingADDIisthecanonical,recommendedway.Alsonotethatinstructionswiththebitencodingof0x00000000and0xFFFFFFFFarespeci2icallynotusedfor“nop”.Thesetwovaluesarecommonlyreturnedfrommemoryunitswhentheactualmemoryismissing(i.e.,unpopulated).Thetwovalues0x00000000and0xFFFFFFFFarespeci2iedas“illegalinstructions”andwillcausean“illegalinstructionexception”iffetchedandexecuted.
Encoding:Thisisaspecialcaseofamoregeneralinstruction.Thisinstructionisassembledidenticallyto:
ADDI x0,x0,0
Move(RegistertoRegister)
GeneralForm:MV RegD,Reg1
Example:MV x4,x9 # x4 = x9
Description:ThecontentsofReg1iscopiedintoRegD.
Encoding:
RISC-VArchitectureSummary/Porter Page� of� 57 323
Chapter3:Instructions
Thisisaspecialcaseofamoregeneralinstruction.Thisinstructionisassembledidenticallyto:
ADDI RegD,Reg1,0
LoadUpperImmediate
GeneralForm:LUI RegD,Immed-20
Example:LUI x4,0x12345 # x4 = 0x12345<<12 (0x12345000)
Description:Theinstructioncontainsa20-bitimmediatevalue.Thisvalueisplacedintheleftmost(i.e.,upper,mostsigni2icant)20bitsoftheregisterRegDandtherightmost(i.e.,lower,leastsigni2icant)12-bitsaresettozero.
RV64/RV128:Thedescriptionaboveappliesto32-bitmachines.Regardlessofregistersize,theimmediatevalueismovedintobits31:12.Inthecaseof64-bitregistersor128-bitregisters,thevalueissign-extendedto2illtheupper32or96bits(i.e.,bits63:32orbits127:32).
Comment:Thisinstructionisoftenuseddirectlybeforeaninstructioncontaininga12-bitimmediatevalue,whichwillbeaddedintoRegD.Together,theyareusedtoeffectivelymakea32-bitvalue.Inthecaseof64-bitor128-bitmachines,thiswillbea32-bitsignedvalue,intherange-2,147,483,648..2,147,483,647(i.e.,-231..231-1).
Encoding: ThisisaU-typeinstruction.
LoadImmediate
GeneralForm:LI RegD,Immed-32
Example:LI x4,123 # x4 = 0x0000007B
Description:Theimmediatevalue(whichcanbeany32-bitvalue)iscopiedintoRegD.
Encoding:
RISC-VArchitectureSummary/Porter Page� of� 58 323
Chapter3:Instructions
Thisisaspecialcaseofmoregeneralinstructionsandisassembleddifferentlydependingontheactualvaluepresent.
Iftheimmediatevalueisintherangeof-2,048..+2,047,thenitcanbeassembledidenticallyto:
ADDI RegD,x0,Immed
Iftheimmediatevalueisnotwithintherangeof-2,048..+2,047butiswithintherangeofa32-bitnumber(i.e.,-2,147,483,648..+2,147,483,647)thenitcanbeassembledusingthistwo-instructionsequence:
LUI RegD,Upper-20ADDI RegD,RegD,Lower-12
where“Upper-20”representstheuppermost20bitsofthevalueand“Lower-12”representstheleastsigni2icant12-bitsofthevalue.
Commentary:Noticethattheassemblermustbecarefulwhencomputingthe“Upper-20”and“Lower-12”piecesfromanarbitrary32-bitvalue.Theassemblercannotsimplybreakthevalueapart.
Why?Becausetheimmediate12bitvaluewillbesign-extendedinthesecond(ADDI)instruction.Ifthemostsigni2icantbitof“Lower-12”is1,thentheADDIinstructionwillbeworkingwithanegativevalue,causingtheincorrectvaluetobecomputed.
Instead,theassemblerneedstocomputethis: Given:Value(a32-bitquantity) Compute: Lower12=Value[11:0] x=Value–SignExtend(Lower12) Upper20=(x>>12)[19:0]TheresultingLUI/ADDIsequencewillloadthecorrectsignedvalueon32,64,and128bitmachines,sincethequantitiesaresigned-extendedinbothinstructions.
AddUpperImmediatetoPC
GeneralForm:AUIPC RegD,Immed-20
RISC-VArchitectureSummary/Porter Page� of� 59 323
Chapter3:Instructions
Example:AUIPC x4,0x12345 # x4 = PC + (0x12345<<12)
Description:Theinstructioncontainsa20-bitimmediatevalue.Thisvalueismovedintotheleftmost(i.e.,upper,mostsigni2icant)20bitsofandtherightmost(i.e.,lower,leastsigni2icant)12-bitsaresettozero.ThenumbersocreatedisthenaddedtothecontentsoftheProgramCounter.TheresultisplacedinRegD.ThevalueofthePCusedhereistheaddressoftheinstructionthatfollowstheAUIPC.
RV64/RV128:Thedescriptionaboveappliesto32-bitmachines.Inthecaseof64-bitor128-bitsregisters,theimmediatevalueissign-extendedbeforebeingaddedtothePC.ThesizeofthePCequaltothesizeoftheregisters.
Comment:Thisinstructionisoftenuseddirectlybeforeaninstructioncontaininga12-bitimmediatevalue,whichwillbeaddedintoRegD.Together,theyareusedtoeffectivelymakea32-bitPC-relativeoffset.Thisisadequatetoaddressanylocationina32-bit(4GiByte)addressspace.
Inthecaseof64-bitor128-bitmachines,thiswillbea32-bitsignedoffset,intherange-2,147,483,648..2,147,483,647(i.e.,-231..231-1).Iftheaddressspaceislargerthan4GiBytes,thistechniquewillfail;adifferentinstructionsequenceisrequired.
ThecurrentPCcanbeobtainedbyusingthisinstructionwithanimmediatevalueofzero.UseofJALtodeterminethecurrentPCisnotrecommended.
Encoding: ThisisaU-typeinstruction.
LoadAddress
GeneralForm:LA RegD,Address
Example:LA x4,MyVar # x4 = &MyVar
Description:TheaddressofsomememorylocationiscopiedintoRegD.Noaccesstomemoryoccurs.
Encoding:
RISC-VArchitectureSummary/Porter Page� of� 60 323
Chapter3:Instructions
Thereisnoactual“loadaddress”instruction;insteadtheassemblersubstitutesasequenceoftwoinstructionstoachievethesameeffect.
The“address”canrefertoanylocationwithinthe32-bitmemoryspace.TheaddressisconvertedtoaPC-relativeaddress,withanoffsetof32bits.Thisoffsetisthenbrokenintotwopieces:a20-bitpieceanda12-bitpiece.Theinstructionisassembledusingthesetwoinstructions:
AUIPC RegD,Upper-20ADDI RegD,RegD,Lower-12
SetIfLessThan(Signed)
GeneralForm:SLT RegD,Reg1,Reg2
Example:SLT x4,x9,x13 # x4 = (x9<x13) ? 1 : 0
Description:ThecontentsofReg1iscomparedtothecontentsofReg2usingsignedcomparison.IfthevalueinReg1islessthanthevalueinReg2,thevalue1isstoredinRegD.Otherwise,thevalue0isstoredinRegD.
Encoding: ThisisanR-typeinstruction.
SetLessThanImmediate(Signed)
GeneralForm:SLTI RegD,Reg1,Immed-12
Example:SLTI x4,x9,123 # x4 = (x9<0x0000007B) ? 1 : 0
Description:Theimmediatevalue(asign-extended12-bitvalue,i.e.,-2,048..+2,047)iscomparedtothecontentsofReg1usingsignedcomparison.IfthevalueinReg1islessthantheimmediatevalue,thevalue1isstoredinRegD.Otherwise,thevalue0isstoredinRegD.
Encoding: ThisisanI-typeinstruction.
RISC-VArchitectureSummary/Porter Page� of� 61 323
Chapter3:Instructions
SetIfGreaterThan(Signed)
GeneralForm:SGT RegD,Reg1,Reg2
Example:SGT x4,x9,x13 # x4 = (x9>x13) ? 1 : 0
Description:ThecontentsofReg1iscomparedtothecontentsofReg2usingsignedcomparison.IfthevalueinReg1isgreaterthanthevalueinReg2,thevalue1isstoredinRegD.Otherwise,thevalue0isstoredinRegD.
Encoding:Thisisaspecialcaseofadifferentinstruction.Thisinstructionisassembledidenticallyto:
SLT RegD,Reg2,Reg1 # Note: regs are switched
Note:Thereisno“SetIfGreaterThanImmediate(signed)”instruction.
SetIfLessThan(Unsigned)
GeneralForm:SLTU RegD,Reg1,Reg2
Example:SLTU x4,x9,x13 # x4 = (x9<x13) ? 1 : 0
Description:ThecontentsofReg1iscomparedtothecontentsofReg2usingunsignedcomparison.IfthevalueinReg1islessthanthevalueinReg2,thevalue1isstoredinRegD.Otherwise,thevalue0isstoredinRegD.
Encoding: ThisisanR-typeinstruction.
SetLessThanImmediate(Unsigned)
GeneralForm:SLTIU RegD,Reg1,Immed-12
Example:SLTIU x4,x9,123 # x4 = (x9<0x0000007B) ? 1 : 0
RISC-VArchitectureSummary/Porter Page� of� 62 323
Chapter3:Instructions
Description:Theimmediatevalue(asign-extended12-bitvalue,i.e.,-2,048..+2,047)iscomparedtothecontentsofReg1usingunsignedcomparison.IfthevalueinReg1islessthantheimmediatevalue,thevalue1isstoredinRegD.Otherwise,thevalue0isstoredinRegD.
Encoding: ThisisanI-typeinstruction.
SetIfGreaterThan(Unsigned)
GeneralForm:SGTU RegD,Reg1,Reg2
Example:SGTU x4,x9,x13 # x4 = (x9>x13) ? 1 : 0
Description:ThecontentsofReg1iscomparedtothecontentsofReg2usingunsignedcomparison.IfthevalueinReg1isgreaterthanthevalueinReg2,thevalue1isstoredinRegD.Otherwise,thevalue0isstoredinRegD.
Encoding:Thisisaspecialcaseofadifferentinstruction.Thisinstructionisassembledidenticallyto:
SLTU RegD,Reg2,Reg1 # Note: regs are switched
Note:Thereisno“SetIfGreaterThanImmediateUnsigned”instruction.
SetIfEqualToZero
GeneralForm:SEQZ RegD,Reg1
Example:SEQZ x4,x9 # x4 = (x9==0) ? 1 : 0
Description:IfthevalueinReg1iszero,thevalue1isstoredinRegD.Otherwise,thevalue0isstoredinRegD.
Comment:Thisinstructionisimplementedwithanunsignedcomparisonagainst1.Usingunsignednumbers,theonlyvaluelessthan1is0.Thereforeiftheless-thanconditionholds,thevalueinReg1mustbe0.
RISC-VArchitectureSummary/Porter Page� of� 63 323
Chapter3:Instructions
Encoding:Thisisaspecialcaseofamoregeneralinstruction.Thisinstructionisassembledidenticallyto:
SLTIU RegD,Reg1,1
SetIfNotEqualToZero
GeneralForm:SNEZ RegD,Reg2
Example:SNEZ x4,x9 # x4 = (x9≠0) ? 1 : 0
Description:IfthevalueinReg2isnotzero,thevalue1isstoredinRegD.Otherwise,thevalue0isstoredinRegD.
Comment:Thisinstructionisimplementedwithanunsignedcomparisonagainst0.Usingunsignednumbers,theonlyvaluenotlessthan0is0.Thereforeiftheless-thanconditionholds,thevalueinReg2mustbenotbe0.
Encoding:Thisisaspecialcaseofamoregeneralinstruction.Thisinstructionisassembledidenticallyto:
SLTU RegD,x0,Reg2
SetIfLessThanZero(signed)
GeneralForm:SLTZ RegD,Reg1
Example:SLTZ x4,x9 # x4 = (x9<0) ? 1 : 0
Description:IfthevalueinReg1islessthanzero(usingsignedarithmetic),thevalue1isstoredinRegD.Otherwise,thevalue0isstoredinRegD.
Encoding:Thisisaspecialcaseofamoregeneralinstruction.Thisinstructionisassembledidenticallyto:
SLT RegD,Reg1,x0
RISC-VArchitectureSummary/Porter Page� of� 64 323
Chapter3:Instructions
SetIfGreaterThanZero(signed)
GeneralForm:SGTZ RegD,Reg2
Example:SGTZ x4,x9 # x4 = (x9>0) ? 1 : 0
Description:IfthevalueinReg2isgreaterthanzero(usingsignedarithmetic),thevalue1isstoredinRegD.Otherwise,thevalue0isstoredinRegD.
Comment:“Reg2>0”isequivalentto“0<Reg2”.
Encoding:Thisisaspecialcaseofamoregeneralinstruction.Thisinstructionisassembledidenticallyto:
SLT RegD,x0,Reg2
BranchandJumpInstructions
BranchifEqual
GeneralForm:BEQ Reg1,Reg2,Immed-12
Example:BEQ x4,x9,MyLabel # If x4==x9 goto MyLabel
Description:ThecontentsofReg1iscomparedtothecontentsofReg2.Ifequal,controljumps.ThetargetaddressisgivenasaPC-relativeoffset.Moreprecisely,theoffsetissign-extended,multipliedby2,andaddedtothevalueofthePC.ThevalueofthePCusedistheaddressoftheinstructionfollowingthebranch,notthebranchitself(???).Theoffsetismultipliedby2,sinceallinstructionsmustbehalfwordaligned.Thisgivesaneffectiverangeof-4,096..4,094(inmultiplesof2),relativetothePC.
Comment:Mostconditionalbranchesoccurwithinsmallishsubroutines/functions,sothelimitedrangeshouldbeadequate.
RISC-VArchitectureSummary/Porter Page� of� 65 323
Chapter3:Instructions
Ifthelimitedsizeoftheoffsetisinadequate,codesuchasthefollowing: BEQ x4,x9,MyLabel
mustbealteredto: BNE x4,x9,Skip J MyLabel # a “long jump”
Skip:Thisappliestoallconditionalbranchinstructions.
Encoding: ThisisaB-typeinstruction.
Commentary:Instructionsizes,locations,andrelativeoffsetsaredeterminedbytheassemblerandlinker,notbythecompiler.However,thecompiler(whichselectsinstructionsandproducestheassemblycodesequence)doesnothaveenoughknowledgetodeterminewhetheranoffsetwillbewithinrange.Therefore,itcannotknowwhetherornota“longjump”isnecessary.
Whatisthesolution?
(1)Thecompileralwayserrsonthesafesideandgeneratesmanyunnecessary“longjumps.”Notacceptable.
(2)Thecompilerignorestheproblem,alwaysusesshortjumps,andoccasionallygeneratescodethatwillnotassemblewithoutfurtherattention.Notacceptable.
(3)Thecompilerismodi2iedtocountinstructionsanddeterminewhen“longjumps”shouldbeinserted.Thisputsaburdenonthecompiler,butthecompilermayalreadybekeepingtrackofthelengthsofthecodesequencesitgenerates,inordertocomparealternatives.Inorderforthistowork,theassemblermustbeforbiddenfromdoingthingslikecompilingtheLIinstructionconditionally,asdescribedearlier.
(4)Theassemblerdetectsoffsetover2lowsandquietlyaltersthecodesequencebyreversingthesenseofthetestandinsertinga“longjump”instruction.Thismaybetheeasiest,butviolatesthetraditionalone-to-onemappingbetweenassemblerandmachinecode.
RISC-VArchitectureSummary/Porter Page� of� 66 323
Chapter3:Instructions
BranchifNotEqual
GeneralForm:BNE Reg1,Reg2,Immed-12
Example:BNE x4,x9,MyLabel # If x4≠x9 goto MyLabel
Description:ThecontentsofReg1iscomparedtothecontentsofReg2.Ifnotequal,controljumpstoaPC-relativetargetaddress.
Comment:Thetargetlocationgivenbytheoffsetmustbewithintherangeof-4,096..4,094(inmultiplesof2),relativetothePC.Seethe“BEQ”instruction.
Encoding: ThisisaB-typeinstruction.
BranchifLessThan(Signed)
GeneralForm:BLT Reg1,Reg2,Immed-12
Example:BLT x4,x9,MyLabel # If x4<x9 goto MyLabel
Description:ThecontentsofReg1iscomparedtothecontentsofReg2.IfReg1islessthanReg2(usingsignedcomparison),controljumpstoaPC-relativetargetaddress.
Comment:Thetargetlocationgivenbytheoffsetmustbewithintherangeof-4,096..4,094(inmultiplesof2),relativetothePC.Seethe“BEQ”instruction.
Encoding: ThisisaB-typeinstruction.
BranchifLessThanOrEqual(Signed)
GeneralForm:BLE Reg1,Reg2,Immed-12
Example:BLE x4,x9,MyLabel # If x4<=x9 goto MyLabel
RISC-VArchitectureSummary/Porter Page� of� 67 323
Chapter3:Instructions
Description:ThecontentsofReg1iscomparedtothecontentsofReg2.IfReg1islessthanorequaltoReg2(usingsignedcomparison),controljumpstoaPC-relativetargetaddress.
Comment:Thetargetlocationgivenbytheoffsetmustbewithintherangeof-4,096..4,094(inmultiplesof2),relativetothePC.Seethe“BEQ”instruction.
Encoding:Thisisaspecialcaseofanotherinstruction.Thisinstructionisassembledidenticallyto:
BGE Reg2,Reg1,Immed-12 # Note: regs are swapped
BranchifGreaterThan(Signed)
GeneralForm:BGT Reg1,Reg2,Immed-12
Example:BGT x4,x9,MyLabel # If x4>x9 goto MyLabel
Description:ThecontentsofReg1iscomparedtothecontentsofReg2.IfReg1isgreaterthanReg2(usingsignedcomparison),controljumpstoaPC-relativetargetaddress.
Comment:Thetargetlocationgivenbytheoffsetmustbewithintherangeof-4,096..4,094(inmultiplesof2),relativetothePC.Seethe“BEQ”instruction.
Encoding:Thisisaspecialcaseofanotherinstruction.Thisinstructionisassembledidenticallyto:
BLT Reg2,Reg1,Immed-12 # Note: regs are swapped
BranchifGreaterThanOrEqual(Signed)
GeneralForm:BGE Reg1,Reg2,Immed-12
Example:BGE x4,x9,MyLabel # If x4>=x9 goto MyLabel
RISC-VArchitectureSummary/Porter Page� of� 68 323
Chapter3:Instructions
Description:ThecontentsofReg1iscomparedtothecontentsofReg2.IfReg1isgreaterthanorequaltoReg2(usingsignedcomparison),controljumpstoaPC-relativetargetaddress.
Comment:Thetargetlocationgivenbytheoffsetmustbewithintherangeof-4,096..4,094(inmultiplesof2),relativetothePC.Seethe“BEQ”instruction.
Encoding: ThisisaB-typeinstruction.
BranchifLessThan(Unsigned)
GeneralForm:BLTU Reg1,Reg2,Immed-12
Example:BLTU x4,x9,MyLabel # If x4<x9 goto MyLabel
Description:ThecontentsofReg1iscomparedtothecontentsofReg2.IfReg1islessthanReg2(usingunsignedcomparison),controljumpstoaPC-relativetargetaddress.
Comment:Thetargetlocationgivenbytheoffsetmustbewithintherangeof-4,096..4,094(inmultiplesof2),relativetothePC.Seethe“BEQ”instruction.
Encoding: ThisisaB-typeinstruction.
BranchifLessThanOrEqual(Unsigned)
GeneralForm:BLEU Reg1,Reg2,Immed-12
Example:BLEU x4,x9,MyLabel # If x4<=x9 goto MyLabel
Description:ThecontentsofReg1iscomparedtothecontentsofReg2.IfReg1islessthanorequaltoReg2(usingunsignedcomparison),controljumpstoaPC-relativetargetaddress.
RISC-VArchitectureSummary/Porter Page� of� 69 323
Chapter3:Instructions
Comment:Thetargetlocationgivenbytheoffsetmustbewithintherangeof-4,096..4,094(inmultiplesof2),relativetothePC.Seethe“BEQ”instruction.
Encoding:Thisisaspecialcaseofanotherinstruction.Thisinstructionisassembledidenticallyto:
BGEU Reg2,Reg1,Immed-12 # Note: regs are swapped
BranchifGreaterThan(Unsigned)
GeneralForm:BGTU Reg1,Reg2,Immed-12
Example:BGTU x4,x9,MyLabel # If x4>x9 goto MyLabel
Description:ThecontentsofReg1iscomparedtothecontentsofReg2.IfReg1isgreaterthanReg2(usingunsignedcomparison),controljumpstoaPC-relativetargetaddress.
Comment:Thetargetlocationgivenbytheoffsetmustbewithintherangeof-4,096..4,094(inmultiplesof2),relativetothePC.Seethe“BEQ”instruction.
Encoding:Thisisaspecialcaseofanotherinstruction.Thisinstructionisassembledidenticallyto:
BLTU Reg2,Reg1,Immed-12 # Note: regs are swapped
BranchifGreaterThanOrEqual(Unsigned)
GeneralForm:BGEU Reg1,Reg2,Immed-12
Example:BGEU x4,x9,MyLabel # If x4>=x9 goto MyLabel
Description:ThecontentsofReg1iscomparedtothecontentsofReg2.IfReg1isgreaterthanorequaltoReg2(usingunsignedcomparison),controljumpstoaPC-relativetargetaddress.
RISC-VArchitectureSummary/Porter Page� of� 70 323
Chapter3:Instructions
Comment:Thetargetlocationgivenbytheoffsetmustbewithintherangeof-4,096..4,094(inmultiplesof2),relativetothePC.Seethe“BEQ”instruction.
Encoding: ThisisaB-typeinstruction.
ComparisonstoZero:Thefollowingabbreviationsaresimplevariationsonthepreviousinstructions.Theycompareavalueinaregistertozerousingsignedcomparison.Thezeroregister(x0)isimplicitintheseforms.
AbbreviatedForm: AssembledAs: BEQZ Reg1,target BEQ Reg1,x0,target BNEZ Reg1,target BNE Reg1,x0,target BLTZ Reg1,target BLT Reg1,x0,target BLEZ Reg1,target BGE x0,Reg1,target BGTZ Reg1,target BLT x0,Reg1,target BGEZ Reg1,target BGE Reg1,x0,target
Commentary:TheRISC-Varchitecturedoesnotincludeconditionallyexecutedinstructions.Theideawithaconditionallyexecutedinstructionisthattheinstructionexecutionispredicatedonsomeconditioncode(s).Theinstructionisexecuted,andwilleitherworknormallyorhavenoeffect,dependingonwhethertheconditionistrueorfalse.Thebene2itisthatconditionalbranchinstructions(withtheirpipelineissues)cansometimesbeavoided,resultinginimprovedef2iciency.TheRISC-Vdesignersconsideredandrejectedconditionalexecution.
JumpAndLink(Short-DistanceCALL)
GeneralForm:JAL RegD,Immed-20
Example:JAL x1,MyFunct # Goto MyFunct, x1=RetAddr
Description:Thisinstructionisusedtocallasubroutine(i.e.,function).Thereturnaddress(i.e.,thePC,whichistheaddressoftheinstructionfollowingtheJAL)issavedinRegD.
RISC-VArchitectureSummary/Porter Page� of� 71 323
Chapter3:Instructions
ThetargetaddressisgivenasaPC-relativeoffset.Moreprecisely,theoffsetissign-extended,multipliedby2,andaddedtothevalueofthePC.ThevalueofthePCusedistheaddressoftheinstructionfollowingtheJAL,nottheJALitself(???).Theoffsetismultipliedby2,sinceallinstructionsmustbehalfwordaligned.Thisgivesaneffectiverangeof±1MiByte,i.e.,-1,048,576..1,048,574(inmultiplesof2),relativetothePC.
AssemblerShorthand:Byconvention,x1isgenerallyusedasthe“linkregister”.Iftheregisterisnotmentioned,thenx1isimplied.
JAL MyFunct # Call MyFunctComment:
Theprogrammingconventionistouseregisterx1asa“linkregister.”Thisreturnaddressisstoredinx1,ratherthanbeingpushedontoastackinmemory,thusmakingcallsandreturnsfaster.However,ifsomeroutine“foo”willcallanotherroutine“bar”,then“foo”mustsavex1beforecalling“bar”.The“foo”routinemightpushx1ontoastackinmemory,or“foo”mightsavex1ina“callee-saved”registerinthehopeofavoidinganymemoryaccess.
Exceptions:Thismaygeneratean“instructionmisalignedexception.”Thetargetaddresswillnecessarilybeamultipleof2,butitmaynotbemultipleof4.Formachinesthatdonotsupport16-bitinstructions,thiswillcauseanalignmentexception.
Encoding: ThisisaJ-typeinstruction.
Commentary:Mostprogramscodesegmentsarefairlysmall,sothelimitedrangeofJALshouldbeadequate.
Iftheprogramsizeexceeds1MiByte,thelimitedsizeoftheoffsetmaybeinadequate.Codesuchasthefollowing:
JAL x1,MyFunct
mustbealteredto:
LUI Reg2,Offset-20JALR x1,Reg2,Offset-12
RISC-VArchitectureSummary/Porter Page� of� 72 323
Chapter3:Instructions
TypicallyseveralroutinesareseparatelycompiledsothecompilerwillnotknowhowfarawaythetargetfunctionisfromtheCALLinstruction.Eventheassemblerwillnotknowandonlyatlink-timecanitbedeterminedwhethera20-bitoffsetisadequate.Oneapproachisforthelinkertoprintoutaninscrutableerrormessage.
Jump(Short-Distance)
GeneralForm:J Immed-20
Example:J MyLabel # Goto MyLabel
Description:ThetargetaddressisgivenasaPC-relativeoffset.Theeffectiverangeis±1MiByte,i.e.,-1,048,576..1,048,574(inmultiplesof2),relativetothePC.
Exceptions:Thismaygeneratean“instructionmisalignedexception.”Thetargetaddresswillnecessarilybeamultipleof2,butitmaynotbemultipleof4.Formachinesthatdonotsupport16-bitinstructions,thiswillcauseanalignmentexception.
Encoding:Thisisaspecialcaseofanotherinstruction.Thisinstructionisassembledidenticallyto:
JAL x0,Immed-20 # Discard return address
JumpAndLinkRegister
GeneralForm:JALR RegD,Reg1,Immed-12
Example:JALR x1,x4,MyFunct # Goto MyFunct, x1=RetAddr
Description:Thisinstructionisusedtocallasubroutine(i.e.,function).Thereturnaddress(i.e.,thePC,whichistheaddressoftheinstructionfollowingtheJAL)issavedinRegD.
ThetargetaddressiscomputedbyaddingtheoffsettothecontentsofReg1.Moreprecisely,theoffsetissign-extendedandaddedtothevalueofReg1.The
RISC-VArchitectureSummary/Porter Page� of� 73 323
Chapter3:Instructions
offsetisnotmultipliedby2.Thisgivesaneffectiverangeof±2KiByte,i.e.,-2,048..2,047,relativetotheaddressinReg1.
Comment:Thisinstructioncanbeusedinseveralways.SeetheJR,RET,CALL,andTAILinstructions.
AssemblerShorthand:Byconvention,x1isusedasthe“linkregister”.Thefollowingform:
JALR Reg1 # Call *Reg1isusedshorthandfor:
JALR x0,Reg1,0 Exceptions:
Thismaygeneratean“instructionmisalignedexception.”Encoding: ThisisanI-typeinstruction.
JumpRegister
GeneralForm:JR Reg1
Example:JR Reg1 # Goto *Reg1, i.e., PC = Reg1
Description:JumptotheaddressinReg1.
Exceptions:Thismaygeneratean“instructionmisalignedexception.”
Encoding:Thisisaspecialcaseofanotherinstruction.Thisinstructionisassembledidenticallyto:
JALR x0,Reg1,0 # Discard ret addr; offset=0
Return
GeneralForm:RET
Example:RET # Goto *x1, i.e., PC = x1
RISC-VArchitectureSummary/Porter Page� of� 74 323
Chapter3:Instructions
Description:Byconvention,x1isusedasthe“linkregister”andwillholdareturnaddress.Thisinstructionreturnsfromasubroutine/function.
Exceptions:Thismaygeneratean“instructionmisalignedexception.”
Encoding:Thisisaspecialcaseofanotherinstruction.Thisinstructionisassembledidenticallyto:
JALR x0,x1,0 # PC=x1+0; don’t save prev PC
CallFarawaySubroutine
GeneralForm:CALL Immed-32
Example:CALL MyFunct # PC = new addr; x1 = ret addr
Description:Byconvention,x1isusedasthe“linkregister”andwillholdareturnaddress.Thisinstructioncallsasubroutine/functionusingaPC-relativescheme,wherethesubroutineoffsetfromtheCALLinstructionexceedsthe20-bitlimit(i.e.,±1MiByte)oftheJALinstruction.Thisinstructionmodi2iesregisterx6.
Encoding:Inordertodealwiththelargerdistancetothesubroutine,this“synthetic”instructionwillbeassembledusingthefollowingtwo-instructionsequence.Thetargetaddresscanbeexpressedasa32-bitoffsetfromtheProgramCounter.Thisoffsetisbrokenintotwopieces,whichareaddedtothePCintwosteps.
AUIPC x6,Immed-20 JALR x1,x6,Immed-12
TheAUIPCinstructionaddsthePCtotheupper20-bitportionofthe32-bitoffsetandplacestheresultinatemporaryregister.TheJALRinstructionaddsinthelower12bitsofthe32-bitoffsetandtransferscontrolbyloadingthesumintothePC.Italsosavesthereturnaddressinx1.
TheCALLinstructionmakesuseoftheconventionthatx1isthelinkregister.Italsousesx6,whichisa“caller-savedtemporaryregister”byconvention.
RISC-VArchitectureSummary/Porter Page� of� 75 323
Chapter3:Instructions
Commentary:Theactualvalues2illedintotheinstructions(intheImmed-20andImmed-122ields)mustbecomputedbythelinker,sincetheactualtargetaddresswill,ingeneral,notbeavailabletotheassembler.Thelinkermustconvertthetargetaddressintoa32-bitoffsetfromtheProgramCounter.Atruntime,thePCvalueusedwillbetheaddressoftheAUIPC(nottheJALRinstruction)sothevaluemustberelativetothataddress.
Alsonotethatthe12-bitimmediatevaluewillbesignextendedintheJALRinstruction,sosimplybreakingtheoffsetapartintopieceswillnotwork.Instead,thelinkermust2irstisolatethelow-order12bitsandsign-extendit.Thenthelinkermustsubtractitfromthefull-32-bitoffset,yieldingtheupperImmed-20portion.
ThisappliestotheTAILinstruction,too.
Exceptions: Thissequencemaygeneratean“instructionmisalignedexception”ifthetarget
offsetfromwhichtheimmediatevaluesarecomputedisnotwordalignedandtheprocessordoesnotsupport16-bitinstructions.
TailRecursion:Somesubroutines/functionsendbycallinganotherfunction.Forexample,thelaststatementinafunctionnamed“foo”maybetoinvoke/callafunctionnamed“bar.”Whenthecalledfunction(bar)returns,theoriginalfunction(foo)immediatelycompletesandreturns.Anyvaluereturnedbythecalledfunction(bar)willbeusedasthereturnvaluefromthecaller(foo).
Thispatternoftenoccursinfunctionalprograms,whererecursionisusedtoimplementrepetitiveloopingbehavior.Withoutspecialattention,somefunctionalprogramsthatuserecursioninthiswaycancreateverydeepstacksbypushingthousandsofreturnaddressesontothestack.
Acommonoptimizationiscalledthe“tailrecursionoptimization”.Hereishowitworks.Thecompiledcodeforthecaller(foo)ismodi2iedtonotactuallycallbar,butinsteadtosimplyjumptothe2irstinstructioninbar.Then,whenbarreturns,theprocessorwilleffectivelyreturnfromfoo,sincenootherreturnaddresswassaved.Furthermore,anythingreturnedbybarwillbenaturallybereturnedtofoo’scaller.Thetailrecursionoptimizationeffectivelyturnsfunctionalprogramswhichuserecursiontoperformloopingactivitiesintoinstructionsequencesthatuse
RISC-VArchitectureSummary/Porter Page� of� 76 323
Chapter3:Instructions
“goto”instructions.Itcovertsrecursiveprogramsintoloopingprogramswith“goto”instructions,makingrecursiveprogrammingtechniquespractical.
Quiteoftenarecursivefunctionwillcallitself.Insuchcasesofthetailrecursionoptimization,the“goto”neednotbealong-distancejumpandtheJinstructionwillsuf2ice.Butinothercases,mutualrecursionmightinvolveseparatelycompiledfunctionsandrequirethelong-distancejumpingabilityoftheTAILinstruction.
TailCall(FarawaySubroutine)/Long-DistanceJump
GeneralForm:TAIL Immed-32
Example:TAIL MyFunct # PC = new addr; Discard ret addr
Description:ThisinstructionisusedtojumptoadistantlocationusingaPC-relativescheme,wherethedisplacementfromtheTAILinstructiontothetargetinstructionexceedsthe20-bitlimit(i.e.,±1MiByte)oftheJinstruction(jumpshortdistance).Thisinstructionmodi2iesregisterx6.
Comment:Thisinstructionisnothingmorethanalong-distance“goto”instruction;thename“TAIL”maybeconfusing,butre2lectshowitwillsometimesbeused.
Encoding:This“synthetic”instructionwillbeassembledusingthefollowingtwo-instructionsequence.
AUIPC x6,Immed-20 JALR x0,x6,Immed-12
SeethecommentsfortheCALLinstruction.Theonlydifferenceisthatherethereturnaddressisdiscarded(x0),insteadofbeingsavedinx1.
Exceptions: LiketheCALLinstruction,thissequencemaygeneratean“instruction
misalignedexception.”
RISC-VArchitectureSummary/Porter Page� of� 77 323
Chapter3:Instructions
LoadandStoreInstructions
LoadByte(Signed)
GeneralForm:LB RegD,Immed-12(Reg1)
Example:LB x4,1234(x9) # x4 = Mem[x9+1234]
Description:An8-bitvalueisfetchedfrommemoryandmovedintoregisterRegD.ThememoryaddressisformedbyaddingtheoffsettothecontentsofReg1.Thevalueissign-extendedtothefulllengthoftheregister.
Comment:Thetargetlocationgivenbythe12-bitoffsetmustbewithintherangeof-2,048..2,047relativetothevalueinReg1.
Thereisnoalignmentissueandthisinstructionwillexecuteatomically.Encoding: ThisisanI-typeinstruction.
LoadByte(Unsigned)
GeneralForm:LBU RegD,Immed-12(Reg1)
Example:LBU x4,1234(x9) # x4 = Mem[x9+1234]
Description:An8-bitvalueisfetchedfrommemoryandmovedintoregisterRegD.ThememoryaddressisformedbyaddingtheoffsettothecontentsofReg1.Thevalueiszero-extendedtothefulllengthoftheregister.
Comment:Thetargetlocationgivenbythe12-bitoffsetmustbewithintherangeof-2,048..2,047relativetothevalueinReg1.
Thereisnoalignmentissueandthisinstructionwillexecuteatomically.Encoding: ThisisanI-typeinstruction.
RISC-VArchitectureSummary/Porter Page� of� 78 323
Chapter3:Instructions
LoadHalfword(Signed)
GeneralForm:LH RegD,Immed-12(Reg1)
Example:LH x4,1234(x9) # x4 = Mem[x9+1234]
Description:A16-bitvalueisfetchedfrommemoryandmovedintoregisterRegD.ThememoryaddressisformedbyaddingtheoffsettothecontentsofReg1.Thevalueissign-extendedtothefulllengthoftheregister.
Comment:Thetargetlocationgivenbythe12-bitoffsetmustbewithintherangeof-2,048..2,047relativetothevalueinReg1.
Theaddressofthememorylocationisnotrequiredtobeproperlyaligned(i.e.halfword-aligned),butitisassumedthatthisinstructionwillexecutefasterwithproperlyalignedaddresses.
Thisinstructionisguaranteedtoexecuteatomicallyiftheaddressisproperlyaligned.Iftheaddressisnotaligned,thereisnoguaranteeofatomicoperation.
Encoding: ThisisanI-typeinstruction.
LoadHalfword(Unsigned)
GeneralForm:LHU RegD,Immed-12(Reg1)
Example:LHU x4,1234(x9) # x4 = Mem[x9+1234]
Description:A16-bitvalueisfetchedfrommemoryandmovedintoregisterRegD.ThememoryaddressisformedbyaddingtheoffsettothecontentsofReg1.Thevalueiszero-extendedtothefulllengthoftheregister.
Comment:Thetargetlocationgivenbythe12-bitoffsetmustbewithintherangeof-2,048..2,047relativetothevalueinReg1.
RISC-VArchitectureSummary/Porter Page� of� 79 323
Chapter3:Instructions
Theaddressofthememorylocationisnotrequiredtobeproperlyaligned(i.e.word-aligned),butitisassumedthatthisinstructionwillexecutefasterwithproperlyalignedaddresses.
Thisinstructionisguaranteedtoexecuteatomicallyiftheaddressisproperlyaligned.Iftheaddressisnotaligned,thereisnoguaranteeofatomicoperation.
Encoding: ThisisanI-typeinstruction.
LoadWord(Signed)
GeneralForm:LW RegD,Immed-12(Reg1)
Example:LW x4,1234(x9) # x4 = Mem[x9+1234]
Description:A32-bitvalueisfetchedfrommemoryandmovedintoregisterRegD.ThememoryaddressisformedbyaddingtheoffsettothecontentsofReg1.
Comment:Thetargetlocationgivenbythe12-bitoffsetmustbewithintherangeof-2,048..2,047relativetothevalueinReg1.
Theaddressofthememorylocationisnotrequiredtobeproperlyaligned(i.e.word-aligned),butitisassumedthatthisinstructionwillexecutefasterwithproperlyalignedaddresses.
Thisinstructionisguaranteedtoexecuteatomicallyiftheaddressisproperlyaligned.Iftheaddressisnotaligned,thereisnoguaranteeofatomicoperation.
RV64/RV128:Foramachinewitharegisterwidthlargerthan32-bits,thevalueissign-extendedtothefulllengthoftheregister.
Encoding: ThisisanI-typeinstruction.
RISC-VArchitectureSummary/Porter Page� of� 80 323
Chapter3:Instructions
LoadWord(Unsigned)
GeneralForm:LWU RegD,Immed-12(Reg1)
Example:LWU x4,1234(x9) # x4 = Mem[x9+1234]
Description:Thisinstructionisonlyavailablefor64-bitand128-bitmachines.
A32-bitvalueisfetchedfrommemoryandmovedintoregisterRegD.Thevalueiszero-extendedtothefulllengthoftheregister.ThememoryaddressisformedbyaddingtheoffsettothecontentsofReg1.
Inamachinewith32-bitregisters,neithersign-extensionnorzero-extensionisnecessaryforvaluethatisalready32bitswide.Thereforethe“signedload”instruction(LW)doesthesamethingasthe“unsignedload”instruction(LWU),makingLWUredundant.
Comment:Thetargetlocationgivenbythe12-bitoffsetmustbewithintherangeof-2,048..2,047relativetothevalueinReg1.
Theaddressofthememorylocationisnotrequiredtobeproperlyaligned(i.e.word-aligned),butitisassumedthatthisinstructionwillexecutefasterwithproperlyalignedaddresses.
Thisinstructionisguaranteedtoexecuteatomicallyiftheaddressisproperlyaligned.Iftheaddressisnotaligned,thereisnoguaranteeofatomicoperation.
Encoding: ThisisanI-typeinstruction.
LoadDoubleword(Signed)
GeneralForm:LD RegD,Immed-12(Reg1)
Example:LD x4,1234(x9) # x4 = Mem[x9+1234
RISC-VArchitectureSummary/Porter Page� of� 81 323
Chapter3:Instructions
Description:Thisinstructionisonlyavailablefor64-bitand128-bitmachines.
A64-bitvalueisfetchedfrommemoryandmovedintoregisterRegD.ThememoryaddressisformedbyaddingtheoffsettothecontentsofReg1.Signextensionoccursformachineswith128-bitregisters.
Comment:Thetargetlocationgivenbythe12-bitoffsetmustbewithintherangeof-2,048..2,047relativetothevalueinReg1.
Theaddressofthememorylocationisnotrequiredtobeproperlyaligned(i.e.word-aligned),butitisassumedthatthisinstructionwillexecutefasterwithproperlyalignedaddresses.
Thisinstructionisguaranteedtoexecuteatomicallyiftheaddressisproperlyaligned.Iftheaddressisnotaligned,thereisnoguaranteeofatomicoperation.
Encoding: ThisisanI-typeinstruction.
LoadDoubleword(Unsigned)
GeneralForm:LDU RegD,Immed-12(Reg1)
Example:LDU x4,1234(x9) # x4 = Mem[x9+1234]
Description:Thisinstructionisonlyavailablefor128-bitmachines.
A64-bitvalueisfetchedfrommemoryandmovedintoregisterRegD.Thevalueiszero-extendedto128-bits.ThememoryaddressisformedbyaddingtheoffsettothecontentsofReg1.
Comment:Thetargetlocationgivenbythe12-bitoffsetmustbewithintherangeof-2,048..2,047relativetothevalueinReg1.
Theaddressofthememorylocationisnotrequiredtobeproperlyaligned(i.e.word-aligned),butitisassumedthatthisinstructionwillexecutefasterwithproperlyalignedaddresses.
RISC-VArchitectureSummary/Porter Page� of� 82 323
Chapter3:Instructions
Thisinstructionisguaranteedtoexecuteatomicallyiftheaddressisproperlyaligned.Iftheaddressisnotaligned,thereisnoguaranteeofatomicoperation.
Encoding: ThisisanI-typeinstruction.
LoadQuadword
GeneralForm:LQ RegD,Immed-12(Reg1)
Example:LQ x4,1234(x9) # x4 = Mem[x9+1234]
Description:Thisinstructionisonlyavailablefor128-bitmachines.
A128-bitvalueisfetchedfrommemoryandmovedintoregisterRegD.ThememoryaddressisformedbyaddingtheoffsettothecontentsofReg1.
Comment:Thetargetlocationgivenbythe12-bitoffsetmustbewithintherangeof-2,048..2,047relativetothevalueinReg1.
Theaddressofthememorylocationisnotrequiredtobeproperlyaligned(i.e.word-aligned),butitisassumedthatthisinstructionwillexecutefasterwithproperlyalignedaddresses.
Thisinstructionisguaranteedtoexecuteatomicallyiftheaddressisproperlyaligned.Iftheaddressisnotaligned,thereisnoguaranteeofatomicoperation.
Encoding: ThisisanI-typeinstruction.
StoreByte
GeneralForm:SB Reg2,Immed-12(Reg1)
Example:SB x4,1234(x9) # Mem[x9+1234] = x4
RISC-VArchitectureSummary/Porter Page� of� 83 323
Chapter3:Instructions
Description:An8-bitvalueiscopiedfromregisterReg2tomemory.Theupper(moresigni2icant)bitsinReg2areignored.ThememoryaddressisformedbyaddingtheoffsettothecontentsofReg1.
Comment:Thetargetlocationgivenbythe12-bitoffsetmustbewithintherangeof-2,048..2,047relativetothevalueinReg1.
Thereisnoalignmentissueandthisinstructionwillexecuteatomically.Encoding: ThisisanS-typeinstruction.
StoreHalfword
GeneralForm:SH Reg2,Immed-12(Reg1)
Example:SH x4,1234(x9) # Mem[x9+1234] = x4
Description:A16-bitvalueiscopiedfromregisterReg2tomemory.Theupper(moresigni2icant)bitsinReg2areignored.ThememoryaddressisformedbyaddingtheoffsettothecontentsofReg1.
Comment:Thetargetlocationgivenbythe12-bitoffsetmustbewithintherangeof-2,048..2,047relativetothevalueinReg1.
Theaddressofthememorylocationisnotrequiredtobeproperlyaligned(i.e.halfword-aligned),butitisassumedthatthisinstructionwillexecutefasterwithproperlyalignedaddresses.
Thisinstructionisguaranteedtoexecuteatomicallyiftheaddressisproperlyaligned.Iftheaddressisnotaligned,thereisnoguaranteeofatomicoperation.
Encoding: ThisisanS-typeinstruction.
RISC-VArchitectureSummary/Porter Page� of� 84 323
Chapter3:Instructions
StoreWord
GeneralForm:SW Reg2,Immed-12(Reg1)
Example:SW x4,1234(x9) # Mem[x9+1234] = x4
Description:A32-bitvalueiscopiedfromregisterReg2tomemory.ThememoryaddressisformedbyaddingtheoffsettothecontentsofReg1.
Comment:Thetargetlocationgivenbythe12-bitoffsetmustbewithintherangeof-2,048..2,047relativetothevalueinReg1.
Theaddressofthememorylocationisnotrequiredtobeproperlyaligned(i.e.word-aligned),butitisassumedthatthisinstructionwillexecutefasterwithproperlyalignedaddresses.
Thisinstructionisguaranteedtoexecuteatomicallyiftheaddressisproperlyaligned.Iftheaddressisnotaligned,thereisnoguaranteeofatomicoperation.
RV64/RV128:Foramachinewitharegisterwidthlargerthan32-bits,theupperbitsoftheregisterareignored.
Encoding: ThisisanS-typeinstruction.
StoreDoubleword
GeneralForm:SD Reg2,Immed-12(Reg1)
Example:SD x4,1234(x9) # Mem[x9+1234] = x4
Description:Thisinstructionisonlyavailablein64-bitand128-bitmachines.
A64-bitvalueiscopiedfromregisterReg2tomemory.ThememoryaddressisformedbyaddingtheoffsettothecontentsofReg1.Fora128-bitmachinetheupperbitsoftheregisterareignored.
RISC-VArchitectureSummary/Porter Page� of� 85 323
Chapter3:Instructions
Comment:Thetargetlocationgivenbythe12-bitoffsetmustbewithintherangeof-2,048..2,047relativetothevalueinReg1.
Theaddressofthememorylocationisnotrequiredtobeproperlyaligned(i.e.doubleword-aligned),butitisassumedthatthisinstructionwillexecutefasterwithproperlyalignedaddresses.
Thisinstructionisguaranteedtoexecuteatomicallyiftheaddressisproperlyaligned.Iftheaddressisnotaligned,thereisnoguaranteeofatomicoperation.
Encoding: ThisisanS-typeinstruction.
StoreQuadword
GeneralForm:SQ Reg2,Immed-12(Reg1)
Example:SQ x4,1234(x9) # Mem[x9+1234] = x4
Description:Thisinstructionisonlyavailablein128-bitmachines.
A128-bitvalueiscopiedfromregisterReg2tomemory.ThememoryaddressisformedbyaddingtheoffsettothecontentsofReg1.
Comment:Thetargetlocationgivenbythe12-bitoffsetmustbewithintherangeof-2,048..2,047relativetothevalueinReg1.
Theaddressofthememorylocationisnotrequiredtobeproperlyaligned(i.e.word-aligned),butitisassumedthatthisinstructionwillexecutefasterwithproperlyalignedaddresses.
Thisinstructionisguaranteedtoexecuteatomicallyiftheaddressisproperlyaligned.Iftheaddressisnotaligned,thereisnoguaranteeofatomicoperation.
Encoding: ThisisanS-typeinstruction.
RISC-VArchitectureSummary/Porter Page� of� 86 323
Chapter3:Instructions
Commentary:Forthe“store”instructions,theRISC-VdocumentationdoesnotmakeitclearwhethertheaddressisformedusingReg1orReg2,norwhetherthesourceregisterisReg1orReg2.However,basedontheassumptionthatthesameregisterwillbeusedforaddresscalculationinboth“load”and“store”instructionsandoncommentselsewhere,weconcludetheorderiswhatisshownhere.
Commentary:TheRISC-Vdocumentationsuggeststhatassemblerexpectsthememoryaddressoperandtobethesecondoperand,notthe2irst.Thisviolatesthegeneralpatterninassemblycodethatdatamovementistowardtheleft,i.e.,fromthesecondoperandtothe2irst.
Commentary:Byconvention,registerx2isusedasastacktoppointer(SP).However,theRISC-Varchitecturedoesnotcontainanyspecialinstructionsformanipulatingthestack;registerx2istreateduniformlytoallotherregistersbytheISA,eventhoughtheassemblermaysupportsomeshorthandnotationsthatfacilitateusex2asastackpointer.
Compilersfortraditionalprogramminglanguagestypicallyplacelocalvariablesinastackframe(i.e.,anactivationrecord)whichisconvenientlyaccessedusingthestacktoppointer.Toreadandwritesuchlocalvariables,the“load”and“store”instructionswillnaturallyuseoffsetsfromx2:
LB x5,offset(x2) # Load from variableSB x5,offset(x2) # Store into variableADDI x2,x2,4 # Pop 4 bytes off stackLW x5,0(x2) # Fetch word on stack top
Assumingthatstackframestendtohavemodestsizes,theRISC-Vinstructionset,whichutilizes12-bitoffsets(-2,048..+2,047),workswell.
Forglobal(i.e.,sharedorcommon)variables,oneapproachistokeepasinglepointertoablockofglobalvariables.Byconvention,registerx3isoftenusedassucha“globalbasepointer.”Again,assumingtheamountofmemoryrequiredfortheglobalvariablesismodest,the12-bitoffsetworkswell.
Incaseswherethedesiredoffsetsexceedthe12-bitlimit,the“loadupperimmediate”instruction(LUI)comestotherescue.RecallthatLUItakesa20-bitimmediatevalue,shiftsitleft12bits,andmovesitintoaregister.
RISC-VArchitectureSummary/Porter Page� of� 87 323
Chapter3:Instructions
Asanexample,thefollowinginstructionsequencecanbeusedtoloadavaluefromalocationoffsetfromabasepointer(inthiscasex2),whentheoffsetexceedsthe12-bitlimit.
LUI x5,Immed-20 # Grab the upper 20 bitsADD x5,x5,x2 # Add the base pointerLB x5,Immed-12(x5) # Add the lower 12 bits
Asimilarsequencecanbeusedfor“store”instructions,althoughanotherregisterwouldberequired.
Notethatsomecaremustbetakenwhenbreakinga32-bitoffsetinto12and20bitpiecesbecauseofthesign-extensionthatwilloccurwiththe12-bitportionwhenitisaddedin.
Commentary:Insomecasesitwillbenecessarytousethe“load”or“store”instructionstoaccessarbitrarylocationsinmemory.
Insomecases,aPC-relativeaddressispreferred;inothercasesanabsoluteaddressispreferred.
Anyaddresswithina32-bitaddressspacecanbeaccessedusinga32-bitoffsetfromthePC,sincea32-bitoffset(signedornot)fromthePCwillwraparoundfrom0xFFFFFFFFto0x00000000.Any32-bitoffsetcanbebrokenintotwopieces:a20-bitupperportionanda12-bitlowerportion.
Forcaseswhereanabsoluteaddress(i.e.,notaPC-relativeaddress)isdesired,recallthatthe“LoadUpperImmediate”instruction(LUI)containsa20-bitimmediatevaluewhichisshiftedleft12bitsandmovedintoaregister.
Forexample,toloadahalfwordfromanarbitrarylocationinmemory,aninstructionsequencelikethiscanbeused:
LUI x4,Immed-20LH x4,Immed-12(x4)
(Inthecaseofa“store”instruction,asecondregisterwouldbeneeded.)
RISC-VArchitectureSummary/Porter Page� of� 88 323
Chapter3:Instructions
Asmentionedelsewhere,caremustbetakenwhenbreakinga32-bitoffsetinto12and20bitpiecesbecauseofthesign-extensionthatwilloccurwiththe12-bitportionwhenitisaddedin.
ForcaseswhereaPC-relativeaddressisdesired,recallthatthe“AddUpperImmediatetoPC”instruction(AUIPC)containsa20-bitimmediatevaluewhichisshiftedleft12bits,addedtothePC,andstoredinaregister.
Forexample,toloadahalfwordusinganarbitraryPC-relativeoffset,aninstructionsequencelikethiscanbeused:
AUIPC x4,Immed-20LH x4,Immed-12(x4)
PC-relativeaddressesarepreferredinsituationswhere(1)thecodemaytherelocatedtoarbitrarylocationsafterlink-time;(2)theloadinstructionandthetargetareassembledinthesame2ileandtheresomereasontocompleteinstructiongenerationbeforelink-time;and(3)theaddressspaceexceeds32-bits(4GiBytes),butthetargetlocationlieswithin±2GiBytesoftheloadinstruction.
IntegerMultiplyandDivide
TheRISC-Vspecmakesthemultiplyanddivideinstructionsoptional.Thissectiondescribestheseinstructions,whichmayormaynotbeimplemented.(TherewasachangebetweenRISC-Vversion1.9and2.0;wedescribethemorerecentversionhere.)
Multiply
GeneralForm:MUL RegD,Reg1,Reg2
Example:MUL x4,x9,x13 # x4 = x9*x13
Description:ThecontentsofReg1ismultipliedbythecontentsofReg2andtheresultisplacedinRegD.
RISC-VArchitectureSummary/Porter Page� of� 89 323
Chapter3:Instructions
RV32/RV64/RV128:Regardlessofthesizeoftheregisters,theresultoftheirmultiplicationwillbetwiceaslarge,andthereforerequire2registerstocontain.Thisinstructioncapturesthelower-orderhalfoftheresultandmovesitintothedestinationregister.Seethecommentaryandtheothermultiplyinstructionsfortheupperhalf.
Comments:Thereisnodistinctionbetweensignedandunsigned;theresultisidentical.Over2lowisignored.
Encoding: ThisisanR-typeinstruction.
BinaryMultiplication–UpperBits,Signedvs.Unsigned
Whenevertwovaluesaremultiplied,theresultmayrequireuptotwiceasmanybitstorepresent.Forexample,considermultiplyingthesetwo8bitvalues:
1101 0101 1011 1011
Ifweviewtheseasunsignednumbers,wegetthisresult: 1101 0101 = 213 (unsigned) × 1011 1011 = 187 (unsigned)1001 1011 1001 0111 = 39,831 (unsigned)
Ifweviewtheseassignedvalues,wegetthisresult: 1101 0101 = -43 (signed) × 1011 1011 = -69 (signed)0000 1011 1001 0111 = 2,967 (signed)
Ifweviewonevalueassignedandtheotherasunsigned,wegetthisresult: 1101 0101 = -43 (signed) × 1011 1011 = 187 (unsigned)1110 0000 1001 0111 = -8,041 (signed)
Regardlessofwhetherweareinterpretingtheoperandsassignedorunsignednumbers,notethattheleast-signi2icanthalfoftheresultisalwaysthesamebits.Thisisnotacoincidence;itisalwaystrue.
However,themost-signi2icanthalfcanbedifferent,asthesethreecasesprove.
RISC-VArchitectureSummary/Porter Page� of� 90 323
Chapter3:Instructions
Forthisreason,theremustbethreeadditionalmultiplyinstructionstoobtainthemost-signi2icanthalfoftheresult:
MULH –signedoperands MULHU –unsignedoperands MULHSU –onesignedoperandandoneunsignedoperand
Wemaywishtodetectover2low,i.e.,tomakesuretheresultissmallenoughto2itintothesamenumberofbitsastheoperands.Foranunsignedresult,wecanchecktomakesurethebitsofthemost-signi2icanthalfareallzero.Forasignedresult,wemustchecktomakesurethatthebitsofthemost-signi2icanthalfareallthesameasthesignbitoftheleast-signi2icanthalf.
Multiply–HighBits(Signed)
GeneralForm:MULH RegD,Reg1,Reg2
Example:MULH x4,x9,x13 # x4 = HighBits(x9*x13)
Description:ThecontentsofReg1ismultipliedbythecontentsofReg2andthemost-signi2icanthalfoftheresultisplacedinRegD.Bothoperandsandtheresultareinterpretedassignedvalues.
Encoding:ThisisaR-typeinstruction.
Multiply–HighBits(Unsigned)
GeneralForm:MULHU RegD,Reg1,Reg2
Example:MULHU x4,x9,x13 # x4 = HighBits(x9*x13)
Description:ThecontentsofReg1ismultipliedbythecontentsofReg2andthemost-signi2icanthalfoftheresultisplacedinRegD.Bothoperandsandtheresultareinterpretedasunsignedvalues.
Encoding: ThisisanR-typeinstruction.
RISC-VArchitectureSummary/Porter Page� of� 91 323
Chapter3:Instructions
Multiply–HighBits(SignedandUnsigned)
GeneralForm:MULHSU RegD,Reg1,Reg2
Example:MULHSU x4,x9,x13 # x4 = HighBits(x9*x13)
Description:ThecontentsofReg1ismultipliedbythecontentsofReg2andthemost-signi2icanthalfoftheresultisplacedinRegD.Oneoperandisinterpretedassignedandoneoperandisinterpretedasunsignedandtheresultisinterpretedasasignedvalue.
Thespecsuggeststhat: Reg2=multiplier=signed Reg1=multiplicand=unsignedbutthisinterpretationisalsoapossibility??? Reg2=multiplier=unsigned Reg1=multiplicand=signed
Encoding: ThisisanR-typeinstruction.
RecommendedUsage:Typically,theprogrammerwillwanttoobtainthefullresultofamultiplication,i.e.,bothupperhalfandlowerhalves.Thisrequirestwomultiplyinstructions.
Forexample,thefollowingsequence
MULH x4,x9,x13 # compute upper halfMUL x5,x9,x13 # compute lower half
willplacetheresultintheregisterpairx4:x5.
Itisrecommendedthattheinstructionsalwaysbeplacedinthisorder,i.e.,withtheupperhalfcomputed2irstandthelowerhalfsecond.Insomeimplementations,theexecutionunitmayrecognizethiscommonprogrammingidiomand,atthemicro-architecturallevel,fusethesetwoinstructionsintoasinglemultiplicationoperation,therebyimprovingperformance.
RISC-VArchitectureSummary/Porter Page� of� 92 323
Chapter3:Instructions
AdditionalMultiplyInstructionFor64-bitand128-bitMachines
64-bitand128-bitmachinesalsoaddanadditionalmultiplyinstruction(MULW)toproperlyperform32-bitmultiplication,asitwouldbedoneona32-bitmachine.
Tounderstandwhythisisnecessary,considera64-bitmachinebeingusedtoexecute32-bitcodeoperatingon32-bitintegers.ARV64machinewillstoreevery32-bitvalueinthelower-orderbitsofa64-bitregister,withtheupper32bitscontainingthesignextensionofthelower-order32bits.
Butwhenwemultiplytwo32-bitvalues,theresultmightover2lowandbelargerthan32-bits.Theresultwedesireisthelow-order32bitsofthecorrectanswer,withasignextensionintheupper32bits.So,toproperlyemulatethebehaviorofa32-bitmultiply,wedon’tactuallywantthemathematicallycorrectresult.
Let’slookatanexample.Tomakeourexamplemanageable,let’sconsideramachinewith16-bitregisters,whicharebeingusedtocontain8-bitvalues(ratherthan64-bitregisterscontaining32-bitvalues).
Imaginethatwewishtomultiply117×94.Bothnumberscanberepresentedas8-bitsignedvalues,buttheresult(10,998)cannotbe.
0000 0000 0111 0101 = +117× 0000 0000 0101 1110 = +94 0010 1010 1111 0110 = +10,998
Thisisasituationwhereover2lowoccurs,inthesensethattheresultcannotberepresentedinonly8bits.Inordertoemulatean8-bitmachine,wewantthelow-order8bitsofthecorrectresult,withsignextensionintheupper8bits,likethis:
1111 1111 1111 0110
TheMULinstructionwouldgiveusthemathematicallycorrectresult,namely
0010 1010 1111 0110
Butwedon’twantthemathematicallycorrectresult.Insteadweneedanewinstructiontogiveustheappropriatevalueforemulatingthe8-bitmultiplyoperation,namelytherightlower-orderbits,withsignextensionintheupperpartoftheregister:
RISC-VArchitectureSummary/Porter Page� of� 93 323
Chapter3:Instructions
1111 1111 1111 0110
MultiplyWord
GeneralForm:MULW RegD,Reg1,Reg2
Example:MULW x4,x9,x13 # x4 = x9*x13
Description:ThecontentsofReg1ismultipliedbythecontentsofReg2andtheresultisplacedinRegD.Onlythelowerorder32-bitsoftheresultareused;thelower32bitsaresignedextendedtothefulllengthoftheregister.
Comment:Thisinstructionisusedtoproperlyemulate32-bitmultiplicationona64-bitor128-bitmachine.
Notethatonlytheleast-signi2icant32bitsofReg1andReg2canpossiblyaffecttheresult.
Ifyouwanttheupper32-bitsofthefull64-bitresultusetheMULinstructionona64-bitmachine.
RV32/RV64/RV128:Thisinstructionisonlyavailableon64-bitand128-bitmachines.
Encoding: ThisisanR-typeinstruction.
Commentary:Considerthefollowingsequencewhenexecutedona32-bitmachine:
MULH x4,x9,x13 # compute upper halfMUL x5,x9,x13 # compute lower half
Itwillplacethe64-bitresultintheregisterpairx4:x5.
Nowconsiderexecutingthissamesequenceona64-bitmachine.Thex5registerwillcontainthefull64-bitvalue,nota32-bit,sign-extendedvalue.Thex4registerwillcontainnothingbutmeaninglesssignbits.
RISC-VArchitectureSummary/Porter Page� of� 94 323
Chapter3:Instructions
Previouslywesaidthat32-bitcodecouldbeexecutedona64-bitmachinewithnochange:Theideawasthattheupper32bitsoftheregistersaresimplyignored.Thisisaclearexception:inthissequencevalidbitsareplacedintheupper32bits.Therefore,thecodesequencetoperforma32-bitmultiplywith64-bitresultmustbedifferentforRV32andRV64machines.
AdditionalMultiplyInstructionFor128BitMachines
Itwouldseemlogicalfora“double”versionofMUL(withthename“MULD”)toexistforRV128machines,inanalogytotheMULWinstruction.Howeverthespecdoesnotmentionthisinstruction.
IntegerDivision–BasicConcepts
Withdivisionandremainderthereisalwaysquestionabouthownegativeoperandsaretreated.
Considerdividingabyn(thatis,a/n).
q=aDIVn #computequotientr=aREMn #computeremainder
Thereisagreementthatthequotient(q)andremainder(r)mustobeytheseequations:
a=nq+r |r|<|q|
Manylanguages(C,C++,Java)perform“truncateddivision”:
q=trunc(a/n)r=a-ntrunc(a/n)
whichproducestheseresults:
7 / 3 = 2 7 % 3 = 1-7 / 3 = -2 -7 % 3 = -1 7 / -3 = -2 7 % -3 = 1 -7 / -3 = 2 -7 % -3 = -1
Anotherreasonablede2initionis“2looreddivision”:
RISC-VArchitectureSummary/Porter Page� of� 95 323
Chapter3:Instructions
q=⌊a/n⌋r=a-n⌊a/n⌋
whichproducesthefollowingresults.Thedot(•)indicatesdifferenceswithtruncateddivision.
7 / 3 = 2 7 % 3 = 1-7 / 3 = -3 • -7 % 3 = 2 • 7 / -3 = -3 • 7 % -3 = -2 •-7 / -3 = 2 -7 % -3 = -1
Thereisalsoade2initioncalled“Euclideandivision”,inwhichtheremainderisnevernegative.Thedot(•)indicatesdifferenceswithboththepreviousde2initions.
7 / 3 = 2 7 % 3 = 1-7 / 3 = -3 -7 % 3 = 2 7 / -3 = -2 7 % -3 = 1-7 / -3 = 3 • -7 % -3 = 2 •
TheRISC-VspecmakesnocommitmentregardingtheexactmeaningofDIVandREM.ThefollowingquotefromWikipediaispertinent:
“…Euclideandivisionissuperiortotheotheronesintermsofregularityandusefulmathematicalproperties,although2looreddivision…isalsoagoodde2inition.Despiteitswidespreaduse,truncateddivisionisshowntobeinferiortotheotherde2initions.”
— DaanLeijen,DivisionandModulusforComputerScientists
DivisionErrorConditions
RISC-Vsupportsbothsigneddivision(i.e.,dividingasignednumberbyasignednumber)andunsigneddivision(i.e.,dividinganunsignednumberbyanunsignednumber).Withmultiplication,youcouldmixsignedandunsigned;notsowithdivision.
Concerningthesizesoftheresult,notethatthefollowingmusthold,since|n|≥1:
|q|≤|a||r|<|a|
RISC-VArchitectureSummary/Porter Page� of� 96 323
Chapter3:Instructions
Thus,iftheoperands(aandn)are32-bits,thentheresults(qandr)willalmostalways2itinto32-bits.
Thereisexactlyoneexceptioninwhichtheresultswillnot2it.Thisiscalled“divisionover2low”anditcanonlyoccurwithsigneddivision.
LetMrepresent-231,whichisthemostnegativenumberrepresentablein32-bits.IfwedivideMby-1,theresultis+231,whichisonegreaterthanthelargest32-bitnumber.
TheRISC-Vspeci2iesthatdivisionover2lowwillbehandledasfollows:
CorrectResult: q=+231 r=0 ActualResult: q=-231 r=0
Theotherwell-knownissueiswithdivisionbyzero.Thespecsaysthattheresultwillbe:
q=0xFFFFFFFF r=a
Thiscanbeinterpretedas:
SignedDivide-By-ZeroResult: q=-1 r=a
UnsignedDivide-By-ZeroResult: q=+232-1 i.e.,themaximumunsignednumber r=a
Ourdiscussionoferrorconditionsisfor32-bitdivision,butnaturallyextendsandappliesto64-bitand128-bitdivision.
Bothdivide-by-zeroanddivision-over2lowerrorsoccurwithoutcausingatraporexceptionandinstructionexecutioncontinueswithnoindication.Infactthereare
RISC-VArchitectureSummary/Porter Page� of� 97 323
Chapter3:Instructions
noarithmetictrapsorexceptions.Ifsomehigh-levellanguagemandatesthatdivide-by-zeromustbecaught,thenthecompilermustinsertasingleBRANCH-IF-ZEROinstruction.Theactualresultscomputedbythehardwarearechosenbecausethesearethevaluesthatnaturallyresultfromatypicalhardwareimplementation.
Ifthehigh-levellanguagemandatesthatdivision-over2lowshouldbecaught(andtheyalloughtto!)thenatleastoneadditionaltest-and-branchinstructionmustbeexecuted.
Divide(Signed)
GeneralForm:DIV RegD,Reg1,Reg2
Example:DIV x4,x9,x13 # x4 = x9 DIV x13
Description:ThecontentsofReg1isdividedbythecontentsofReg2andthequotientisplacedinRegD.Bothoperandsandtheresultaresignedvalues.
Comments:Divide-by-zeroanddivision-over2lowresultinmathematicallyincorrectresults.Seediscussionabove.
Encoding: ThisisanR-typeinstruction.
Divide(Unsigned)
GeneralForm:DIVU RegD,Reg1,Reg2
Example:DIVU x4,x9,x13 # x4 = x9 DIV x13
Description:ThecontentsofReg1isdividedbythecontentsofReg2andthequotientisplacedinRegD.Bothoperandsandtheresultareunsignedvalues.
Comments:Divide-by-zeroproducesamathematicallyincorrectresult.Division-over2lowcannotoccur.Seediscussionabove.
Encoding: ThisisanR-typeinstruction.
RISC-VArchitectureSummary/Porter Page� of� 98 323
Chapter3:Instructions
Remainder(Signed)
GeneralForm:REM RegD,Reg1,Reg2
Example:REM x4,x9,x13 # x4 = x9 REM x13
Description:ThecontentsofReg1isdividedbythecontentsofReg2andtheremainderisplacedinRegD.Bothoperandsandtheresultaresignedvalues.
Comments:Divide-by-zeroanddivision-over2lowresultinmathematicallyincorrectresults.Seediscussionabove.
Encoding: ThisisanR-typeinstruction.
Remainder(Unsigned)
GeneralForm:REMU RegD,Reg1,Reg2
Example:REMU x4,x9,x13 # x4 = x9 REM x13
Description:ThecontentsofReg1isdividedbythecontentsofReg2andtheremainderisplacedinRegD.Bothoperandsandtheresultareunsignedvalues.
Comments:Divide-by-zeroproducesamathematicallyincorrectresult.Division-over2lowcannotoccur.Seediscussionabove.
Encoding: ThisisanR-typeinstruction.
RecommendedUsage:Oftentheprogrammerwillwanttoobtainboththequotientandremainder.ItisrecommendedthattheDIVbedone2irstandtheREMbedonesecond.
Forexample:
RISC-VArchitectureSummary/Porter Page� of� 99 323
Chapter3:Instructions
DIV x4,x9,x13 # x4 = x9 DIV x13REM x5,x9,x13 # x5 = x9 REM x13
Insomeimplementations,theexecutionunitmayrecognizethiscommonpatternandfusethesetwoinstructionsintoasingledivisionoperation,therebyimprovingperformance.
AdditionalDivideInstructionsFor64BitMachines
For64-bitmachines,thereare4additionalDIVIDE/REMAINDERinstructionstoperform32-bitoperations.Inotherwords,theabove4instructionswillperform32-bitoperationsonaRV32machinebutwillperform64-bitoperationsonanRV64machine.Thefollowing4instructionswillworkjustliketheaboveinstructionswouldworkona32-bitmachine,bysign-extendingeverythingupto64-bits.
DivideWord(Signed)
GeneralForm:DIVW RegD,Reg1,Reg2
Example:DIVW x4,x9,x13 # x4 = x9 DIV x13
RV32/RV64/RV128:Thisinstructionisonlyavailableon64-bitand128-bitmachines.
Description:ThecontentsofReg1isdividedbythecontentsofReg2andthequotientisplacedinRegD.Bothoperandsandtheresultaresignedvalues.Onlythelow-order32bitsoftheoperandsareusedandthe32-bitresultissigned-extendedto2illthedestinationregister.
Comments:Divide-by-zeroanddivision-over2lowresultinmathematicallyincorrectresults.Seediscussionabove.
Encoding: ThisisanR-typeinstruction.
DivideWord(Unsigned)
GeneralForm:DIVUW RegD,Reg1,Reg2
RISC-VArchitectureSummary/Porter Page� of� 100 323
Chapter3:Instructions
Example:DIVUW x4,x9,x13 # x4 = x9 DIV x13
RV32/RV64/RV128:Thisinstructionisonlyavailableon64-bitand128-bitmachines.
Description:ThecontentsofReg1isdividedbythecontentsofReg2andthequotientisplacedinRegD.Bothoperandsandtheresultareunsignedvalues.Onlythelow-order32bitsoftheoperandsareusedandthe32-bitresultissigned-extended(???)to2illthedestinationregister.
Comments:Divide-by-zeroproducesamathematicallyincorrectresult.Division-over2lowcannotoccur.Seediscussionabove.
Encoding: ThisisanR-typeinstruction.
RemainderWord(Signed)
GeneralForm:REMW RegD,Reg1,Reg2
Example:REMW x4,x9,x13 # x4 = x9 REM x13
RV32/RV64/RV128:Thisinstructionisonlyavailableon64-bitand128-bitmachines.
Description:ThecontentsofReg1isdividedbythecontentsofReg2andtheremainderisplacedinRegD.Bothoperandsandtheresultaresignedvalues.Onlythelow-order32bitsoftheoperandsareusedandthe32-bitresultissigned-extendedto2illthedestinationregister.
Comments:Divide-by-zeroanddivision-over2lowresultinmathematicallyincorrectresults.Seediscussionabove.
Encoding: ThisisanR-typeinstruction.
RemainderWord(Unsigned)
GeneralForm:REMUW RegD,Reg1,Reg2
RISC-VArchitectureSummary/Porter Page� of� 101 323
Chapter3:Instructions
Example:REMUW x4,x9,x13 # x4 = x9 REM x13
RV32/RV64/RV128:Thisinstructionisonlyavailableon64-bitand128-bitmachines.
Description:ThecontentsofReg1isdividedbythecontentsofReg2andtheremainderisplacedinRegD.Bothoperandsandtheresultareunsignedvalues.Onlythelow-order32bitsoftheoperandsareusedandthe32-bitresultissigned-extended(???)to2illthedestinationregister.
Comments:Divide-by-zeroproducesamathematicallyincorrectresult.Division-over2lowcannotoccur.Seediscussionabove.
Encoding: ThisisanR-typeinstruction.
AdditionalDivideInstructionsFor128BitMachines
Itwouldseemlogicalfor“double”versionsoftheseinstructionstoexistforRV128machines.Howeverthespecdoesnotmentionthese.
RISC-VArchitectureSummary/Porter Page� of� 102 323
Chapter4:FloatingPointInstructions
FloatingPointing–ReviewofBasicConcepts
Thissectioncanbeskippedifyouarefamiliarwith2loatingpointarithmetic.Thediscussionhereisgeneral,andnotspeci2ictoRISC-V.
TheIEEE754-2008standarddescribeshow2loatingpointnumbersaretoberepresentedandhow2loatingpointoperationsaretobeexecutedbycomputers.
Thestandardiscomplicatedanddetailed.Thissectionismeanttobeanintroductionandisnotanexhaustivedescription.MostmodernprocessorISAsimplementtheIEEE754-2008speci2ication,butthespeci2icationhasoptionsandsomepartsarenotfullyimplementedonsomecomputers.
Thespeci2icationde2inestwoimportantdatatypes:
SinglePrecision(32-bit“2loat”values) DoublePrecision(64-bit“double”values)
Somecomputersimplementonlysingleprecision;othercomputersimplementboth.
Ineithercase,theideaistorepresentarealrationalnumberinawaysimilartoscienti2icnotation.Forexample,thefollowingnumberisgiveninscienti2icnotation:
6.022×1023(anapproximationtoAvogadro’sconstant)
Withonly32bitsofprecision(or64bitsfordoubleprecision),therearelimitstotheamountofprecisionandthesizeoftheexponents.Theavailablebitsareusedasfollows:
RISC-VArchitectureSummary/Porter Page� of�103 323
Chapter4:FloatingPointInstructions
Numberofbitsusedfor… Single Double Sign 1 1 Exponent 8 11 Value 23 52 Total 32 64
Furthermore,with2loatingpointanumericalvalueisrepresentedinbinary(notdecimal)andthisintroducessomesubtletieswhengoingbackandforthbetweentheinternalbitpatternsanddecimalrepresentationswhichhumanscanread.
Everypositiveintegercanberepresentedwitha2initenumberofdigitsanda2initenumberofbits.Forexample,hereisthesamenumber,representedbothways.Ofcourse,thisnumberrequiresafewmorecharactersinbinary,buttherepresentedvalueisequal.
2,468 (decimal) 100110100100 (binary)
Wecommonlyrepresentrationalnumbersindecimalusinga“decimalpoint”,asin:
123.456
Wecanalsorepresentrationalnumbersinbinaryusinga“binarypoint”,asin:
101.0101
Withdecimalnumbers,thepositionofeachdigitisimportantandwetalkabouttheplacevalueofthedigits.Theplacevaluesareallpowersof10:
… 1000 100 10 1 1/10 1/100 1/100 1/1000 … … 103 102 101 100 10-1 10-2 10-3 10-4 …
Wecanusetheplacevaluestocomputethevalueofanumber,asyoulearnedinprimaryschool:
123.456 =(1×102)+(2×101)+(3×100)+(4×10-1)+(5×10-2)+(6×10-3) =(1×100)+(2×10)+(3×1)+(4×0.1)+(5×0.01)+(6×0.001)
RISC-VArchitectureSummary/Porter Page� of� 104 323
Chapter4:FloatingPointInstructions
Likewise,withbinarynumbers,thevalueofeachbitisscaledaccordingtotheplacevalueofthebit.Butwithbinary,theplacevaluesareallpowersof2:
… 8 4 2 1 1/2 1/4 1/8 1/16 … … 23 22 21 20 2-1 2-2 2-3 2-4 …
Usingthis,wecanconvertbinarynumbersintodecimalnumbers:
101.0101 =(1×22)+(0×21)+(1×20)+(0×2-1)+(1×2-2)+(0×2-3)+(1×2-4) =(1×4)+(0×2)+(1×1)+(0×1/2)+(1×1/4)+(0×1/8)+(1×1/16) =4+1+1/4+1/16 =5.3125
Somerationalnumbersrequireanin2initenumberofdigitsintheirdecimalrepresentation.Forexample:
1/3=0.33333…=0.3(3)*
Likewise,somerationalnumbersmayrequireanin2initenumberofbitsintheirbinaryrepresentation.Butregardlessofwhetherwerepresentarationalnumberindecimalorbinary,thein2initestringsofdigits/bitswillsettleintoasimplerepeatingpattern.Thisistrueofallrationalnumbers,butirrationalnumbers(e.g.,𝜋,√2)donothavesuchsimpledecimalorbinaryrepresentations.Neithertheirdecimalnortheirbinaryexpansionswilleverexhibitarepeatingpattern.
Somenumbersmanyhavea2initerepresentationindecimalbutrequireanin2initesequenceinbinary.Forexample,thefollowingnumber:
4.3
requiresanin2initebinaryexpansiontorepresentit,namely:
100.01001100110011…=100.01(0011)*
Itturnsoutthateverybinarynumberwithoutarepeatingpartcanberepresentedwitha2initenumberofdecimaldigits.Furthermore,thenumberofdigitstotherightofthedecimalpointwillneverexceedthenumberofplacestotherightofthebinarypoint.Forexample:
RISC-VArchitectureSummary/Porter Page� of� 105 323
Chapter4:FloatingPointInstructions
101.1101(binary)=5.8125(decimal)
Turningto2loatingpointrepresentation,wehavelimitednumberofbitsavailable,whichmeanswecannotaccommodatearbitraryprecision.Noteverynumberisrepresentable,sowemustroundnumberstoanearbynumberthatisrepresentable.
Forexample,thenumber6.022×1023canonlyberepresentedapproximately,eventhoughitappearsnottohaveagreatamountofprecision.Hereistheclosestnumberthatcanberepresentedusingasingleprecision2loatingpoint:
6.02200013124147498450944×1023
Ontheotherhand,itturnsoutthatthisnumber:
2.383496609792×1012
canberepresentedexactlyusingonly32bitsingleprecision.Thenextlargestvaluethatcanberepresentedexactlyhappenstobe:
2.383496871936×1012
WecanmakethefollowingstatementsaboutIEEE754-20082loatingpointnumberrepresentation:
• Every2loatingpointnumbershasasign.Everynumberiseitherpositiveornegative.
• Therearetworepresentationsforzero:positivezero(i.e.,+0.0)andnegativezero(i.e.,-0.0).
• Therearetworepresentationsofin2inity:positivein2inity(+∞or+inf)andnegativein2inity(-∞or-inf)
• Theexponentmaybepositiveornegative,allowingbothverylargenumbersandverysmallnumbers.
• Thereisaspecialrepresentationcalled“notanumber”(“NaN”).Thisvaluecanrepresentamissingvalueortheresultofaunde2inedoperation,suchasdivide
RISC-VArchitectureSummary/Porter Page� of� 106 323
Chapter4:FloatingPointInstructions
byzero.Insomeimplementationstherearetwovariations,called“quietNaN”and“signalingNaN”.
• Every32-bitinteger(i.e.,everyintegerintherange-2,147,483,648to+2,147,483,647)canberepresentedexactlywitha64bitdoubleprecision2loatingpointnumber,butnotwithasingleprecision2loat.Infact,theintegerrangeisalittlegreater:every54-bitintegercanberepresentedexactlyindoubleprecision2loatingpoint.Almostalllargerintegerswillgetrounded.
• Every25-bitinteger(i.e.,everyintegerintherange-16,777,216to+16,777,215)canberepresentedexactlywitha32bitsingleprecision2loatingpointnumber.Almostallintegersoutsideofthisrangewillgetrounded.
Hereistherangeofvaluesthatcanberepresented.(Weusedecimalnotationhereandapproximatetheexactvalues.)
SinglePrecision Largestvalue: ~3.40282347×10+38 Smallestvalueabovezero: ~1.17549440×10-38 Digitsofaccuracy: about7
DoublePrecision Largestvalue: ~1.7976931348623157×10+308 Smallestvalueabovezero: ~2.2250738585072014×10-308 Digitsofaccuracy: about16
Rememberthatnoteveryvalueintheaboverangescanberepresented.Why?Recallthereisacountablein2inityofrationalnumbersjustbetweenanytwonumbers,yetwithonly32or64bits,weonlyhaveasmallnumberofuniquerepresentations.
Floatingpointarithmeticismeanttomimicmathematicalarithmetic,butitmustberememberedthattheyareonlyapproximatelythesame:
• Theexactvalueorresultofanoperationisnotalwaysrepresentable,sothecomputedanswerisoftennotmathematicallycorrect.
• Floatingpointadditionisnotalwaysassociative,duetoroundingerrors.Thatis,(x+y)+zisnotalwaysequaltox+(y+z).
RISC-VArchitectureSummary/Porter Page� of� 107 323
Chapter4:FloatingPointInstructions
• Floatingpointmultiplicationisnotalwaysassociative.Thatis,(x*y)*zisnotalwaysequaltox*(y*z).
• Floatingpointmultiplicationdoesnotalwaysdistributeoveradditionwiththeexactsameresults.Thatis,x*(y+z)isnotalwaysequalto(x*y)+(x*z).
However,wecansaythis:
• Floatingpointadditionandmultiplicationarecommutative,likemath.Forexample,x+y=y+x,soyoudon’thavetoworryabouttheorderofoperandsforasingleoperation.
Herearethedifferenttypesofthingsthatcanberepresentedina2loatingpointbitpattern:
•Positivezero(+0.0) •Negativezero(-0.0) •Positivein2inity(+∞or+inf) •Negativein2inity(-∞or-inf) •Not-a-number(NaN) QuietNan(qNaN) SignalingNan(sNaN)
•Normalnumbers(or“normalizednumbers”)•Denormalizednumbers(or“denormals”)
Zero–PositiveandNegative
Thereareexactlytwowaystorepresentzero,oneispositiveandtheotherisnegative.Thisisunlikemath,wherethereisonlyasinglenumbercalledzeroanditisunsigned.
Herearesomeinterestingbehaviors:
1/+0yields+∞ 1/-0yields–∞ +0willnormallycompareasequalto-0(e.g.,the==inthe“C”language) Somelanguagesprovideawaytodistinguish+0and-0.
Thereareadditionalbehaviors,suchas:
RISC-VArchitectureSummary/Porter Page� of� 108 323
Chapter4:FloatingPointInstructions
-0/-∞yields+0
Although+0and-0maycompareasequal,theymayalsoresultindifferentoutcomesinsomecomputations.Thischallengesourunderstandingofthemeaningof“equal”,tosaytheleast.
Thebitpatternrepresentationofzerois:
single double +0.0 0x00000000 0x0000000000000000 -0.0 0x80000000 0x8000000000000000
Notethatthe2loatingpointrepresentationfor+0.0isbit-for-bitidenticaltotherepresentationfor0inbinaryintegerrepresentation(bothsignedandunsigned).Also,-0.0isrepresentedidenticallytothemostnegativesignedinteger.
NotaNumber-NaN
Operationslikethefollowingaremathematicallyunde2inedand,whenattempted,willresultinaNaNresult,toindicatethattheresultisunde2ined.
0/0 ∞/∞ 0*∞
Otheroperationsaremathematicallyde2inedbutgiveacomplexresult.Complexnumbersarenothandledby2loatingpoint,sooperationssuchasthefollowingwillreturnNaN.
Squarerootofanegativenumber Logofanegativenumber
AnotheruseofNaNistorepresentanuninitializedormissingvalue.Ifavariableisusedbeforeitisinitialized,spuriousincorrectresultsmightoccur,butthiscanbeavoidedifthevariablecontainsNaN.
TheIEEEspecactuallymentionstwokindsofNaN:“signalingNaN”and“quietNaN”.Butusually,wejusttalkaboutNaNwithoutmakinganydistinctionaboutwhetheritissignalingorquiet.
RISC-VArchitectureSummary/Porter Page� of� 109 323
Chapter4:FloatingPointInstructions
A“signalingNaN”issupposedtocauseabreakinthe2lowofexecutionwhenitisencounteredinacomputation.Thatis,atraporexceptionofsomesortwilloccur,andthenormalinstructionsequencewillbeinterruptedimmediately.SignalingNaNsmightreasonablyusedforuninitializedvalues:theirusemayrepresentaprogrambugwhichneedsattention.Intheory,signalingNaNsmightalsobeusedasplaceholdersforvalues(suchascomplexnumbers)whichrequirespecialhandling.
Theideawitha“quietNaN”,isthatitcanbeusedasanoperandinarithmeticoperations.Furthermore,aquietNaNwillbepropagated.Thatis,ifoneoftheargumentsisaquietNaN,theresultwillalsobeaquietNaN.Thisallowsalengthysequenceofoperationstobeperformedquicklywithnospecialtestingforproblems.OnceaNaNappears,asaresultofsomeerror,itwillpersistinthechainofcomputations.Eachsubsequentoperationwillcompletenormally,withoutcausinganexceptionortrapeventhoughsomesortoferroroccurredearlierinthesequence.Ifanyproblemsoccuratanystepofthecomputation,the2inalresultwillbeaquietNaN.Therefore,itissuf2icienttoperformonlyasingletestforNaNaftertheentirecomputationtoseeifanyerrorsaroseatanystageofthecomputation.
ThespecdoesnotrequiresignalingNaNs;theyareoptional.OneimplementationapproachisforthehardwaretointerpretallNaNvaluesidentically,basicallyasquietNaNs.
ThereareseveralbitpatternsthatcanbeusedtorepresentNaNs,sothereisnotasinglebitpatternforNaN.
Avalueisde2inedtorepresentNaNif(1)theexponent2ieldisall1’s,and(2)thebitsofthefraction2ieldarenotallzero.(Ifthefractionbitsareallzero,thenthevalueiseither+∞or–∞.)ThesignbitofaNaNvalueisignored.
IfadistinctionbetweenquietandsignalingNaNisimplemented,thenoneofthebitsinthefraction2ieldwillbeusedtodistinguishbetweenquietandsignaling.
TheexactbitpatternsforNaNarenotfullyspeci2iedandcanvarybetweenimplementations.
Wecansaythatavaluewithallbitssettoone(i.e.,therepresentationforthesignedinteger-1whichis0xFFFFFFFFor0xFFFFFFFFFFFFFFFF)willde2initelyrepresentaNaNandwillalmostcertainlyrepresentaquietNaN.Forexample,thisall-onespatternwillbeaquietNaNforIntel,AMD,SPARC,ARM,RISC-V,etc.
RISC-VArchitectureSummary/Porter Page� of� 110 323
Chapter4:FloatingPointInstructions
MixingSingleandDoublePrecisionUsingNaN
Therearemanybitsinthefraction2ield,andtheonlyrequirementforNaNisthattheycannotallbezero.Thus,thereisroomtostoresomeadditionaldatawithintheNaN.SoaNaNcancarryasortof“payload”valueinthefractionbits.ThiscapabilitymayormaynotbeusedinaparticularimplementationofIEEE754-2008.
Forexample,thefraction2ieldinadoubleis52bits.AssumethatonebitisreservedtobealwayssettoindicatethatthisisaNaN,andassumethatasecondbitisreservedandusedtodistinguishbetweenaquietNaNandasignalingNaN.Thisleaves50bitsthatcanbeusedtostoreaarbitraryvalue.Noticethatthisisenoughroomtostoreanentiresingleprecision2loatingpointnumber.
Imagineamachinethatimplementsdoubleprecisionarithmeticanduses64-bitregisterstostore2loatingpointvalues.Howmightthismachinestore32-bitsingleprecisionvaluesinthesesameregisters?
Any64-bitvalueinwhichthehighorder32bitsareset,willbealwaysrecognizedasaNaN.Oneapproachtostoringasingleprecisionvalueina64bitregisteristostorethesingleprecisionvalueintheleastsigni2icantbits32bitsandall1sinthemostsigni2icant32bits.
Allsingleprecisionoperationswillonlylookattheleastsigni2icant32-bitsoftheoperandsand,fortheresultvalue,willalwayssetthemost-signi2icant32bitsto1s.
Anyaccidentalattempttoperformadoubleprecisionoperationonaregistercontainingasingleprecisionvalue,willinterpretthatoperandasaNaN.
NormalandDenormalizedNumbers
Noteverynumberisrepresentableandtherepresentablenumbersarespacedoutonthenumberline.Soeachpossible2loatingpointvalueisseparatedbyanumericaldistancefromthenextsmallestnumberandfromthenextlargestnumber.Asthenumbersgetsmallerandclosertozero,thespacinggetssmallerandthenumbersareclosertogether.Asthenumbersgetlarger,thespacingisfartherapart.
Forexample,thefollowingnumbersdifferbyaverysmallamount:
RISC-VArchitectureSummary/Porter Page� of� 111 323
Chapter4:FloatingPointInstructions
4.567×10-25 4.568×10-25
Ontheotherhand,thesetwonumbersdifferbyaverylargeamount:
4.567×10+25 4.568×10+25
Howeverinbothexamplesabove,theaccuracyisthesame:4digitsofprecision.
However,thereareonlyalimitednumberofbitsavailabletorepresenttheexponents.Exponentscannotcontinuetogetmorenegativeandwecannotrepresentsmallerandsmallernumbers,evermoreclosetozero.Therefore,thispatternofthe2loatingpointnumbersbecomingspacedevermorecloselyastheygetcloserandclosertozerocannotcontinue.Somethinghastochangeasthenumbersgetsmallerandapproachzero.
Whathappensisthatbelowsomesize,therepresentablevaluesaresimplyspaceduniformlyallthewaydowntozero.Thisistheroleofdenormalizednumbers.
Most2loatingpointnumbersare“normal”numbers.Normalnumbershaveabout7digitsofaccuracy(forsingleprecision)and16digitsofaccuracy(fordoubleprecision).Inotherwords,wecanapproximateanydesiredvaluewithabout7(or16)digitsofaccuracy.
Anotherwaytolookatdenormalizednumbersisthis:Forverysmallvalues,wecannotapproximatethevaluewithfullaccuracy.Aswegetcloserandclosertozero,wecanapproximatethetruevaluewithfewerandfewerplacesofaccuracy.Forreallytinyvalues,wemayevenbeforcedtouse0.0torepresentthevalue,essentiallylosingallaccuracy.
Wecanmakethefollowingstatementsaboutdenormalizednumbers:
• Alldenormalizednumbersareveryclosetozero.• Denormalizednumbersextendonboththepositiveandnegativesidesofzero.
• +0.0and-0.0arethemselvesrepresentedasdenormalizednumbers.• Alldenormalizednumbersareregularlyandevenlyspaced.(Exception:+0.0and-0.0haveanin2initesimaldifferenceandareconsideredequal.)
RISC-VArchitectureSummary/Porter Page� of� 112 323
Chapter4:FloatingPointInstructions
• Thelargestdenormalizednumberisjustlessthanthesmallestpositivenormalnumber.
• Likewise,themostnegativedenormalizednumberisjustgreaterthantheleastnegativenormalnumber.
• Itisgenerallysafetoignorethedistinctionbetweennormalizedanddenormalizednumberswhenusing2loatingpointinyourapplications.
Therearerulesfordeterminingtheprecisionoftheresultsofanarithmeticcalculationinvolvingscienti2icnotation.Butifverysmallvalues(i.e.,denormalizednumbers)ariseduringacomputation,thenyourassumptionsaboutprecisionwillbeviolatedandthe2inalresultswillhavereducedprecision.Insomecases,the2inalresultwillbeameaningless,incorrectvalue.
Warning:AlwaysrememberthatnumbersasrepresentedincomputersareNOTtruemathematicalnumbers.ComputerarithmeticisNOTmathematicalarithmetic.Remember:“int”sarenotintegersand“2loats”arenotrealorrationalnumbers.
Computervaluesandcomputationaremereapproximationsofmathematicallypureideals.Agoodprogrammerknowshowimportantitistounderstandandremembertheirdifferencesincreatingreliablesoftware.
RepresentationofSinglePrecisionFloatingPointValues
Asingleprecision2loatingpointnumberisrepresentedwitha32-bitwordasshownhere:
�
LetNbethenumberrepresented.Let“sign”bethemostsigni2icantbit.Let“exponent”bethebitpatterninbits30:23.Let“fraction”bethebitpatterninbits22:0.
Thenumberrepresentedis:
N=(-1)sign×1.fraction×2exponent
RISC-VArchitectureSummary/Porter Page� of� 113 323
Chapter4:FloatingPointInstructions
The2irsttermsimplygivesthesignofthenumber:0=positiveand1=negative.Notethatthemostsigni2icantbitholdsthesignbitforboth2loatingpointnumbersandsignedintegers.
Whenconvertinganumbersuchas101.0101into2loatingpointformat,youshould2irstshiftthedecimalplacetojustaftertheleftmost1bit.Everynumber(exceptzero)willalwayscontainatleastasingle1bit.Thus,themostsigni2icantbitmustbea1andrepresentingitisredundant.Thisexplainswhywepre2ixthefractionalpartwith“1.”.(Thistrickofmakingonebitimplicitdoesn’tworkwithdecimalnumbers:theleadingdigitcanbeanythingexcept0,sowecannotmakeitimplicit.)
Thereare8bitsintheexponent2ield.Theinterpretationofthe“exponent”bitpatternsis:
BitPattern MeaningofExponentField 0000 0000 DenormalizedNumbers,includingzero 0000 0001 -126 ... ... 0111 1110 -1 0111 1111 0 1000 0000 +1 ... ... 1111 1110 +127 1111 1111 InZinity,Not-a-Number
Fornormalizednumbers,theexponenthasaneffectiverangeof-126..+127.
Thesmallestpositivenormalizednumberis:
Inbinary: 1.00000000000000000000000×2-126 (Thereare23zeros)
Inbits: 0x00800000=00000000100000000000000000000000
Decimalapproximation: 1.17549435×10-38
RISC-VArchitectureSummary/Porter Page� of� 114 323
Chapter4:FloatingPointInstructions
Thelargestnormalizednumberis:
Inbinary: 1.11111111111111111111111×2+127 (Thereare1+23ones)
Inbits: 0x7F7FFFFF=01111111011111111111111111111111
Decimalapproximation: 3.4028235×10+38
Iftheexponentisallones(i.e.,11111111),thenthevalueofthefractionmatters.Ifthefractionisallzeros,thenthevalueis+∞or–∞dependingonthesignbit.
+∞: 0x7F800000=01111111100000000000000000000000
-∞: 0xFF800000=11111111100000000000000000000000
Iftheexponentisallones(i.e.,11111111)andthevalueofthefractionisnotallzeros,thenNaNisrepresented.TherearemultiplerepresentationsthataretobeinterpretedasNaNvalues.Thecanonical,preferredrepresentationofNaNisoftenthis:
NaN(typical): 0xFFFFFFFF=11111111111111111111111111111111
Iftheexponent2ieldisallzeros(i.e.,00000000),thenthevalueisadenormalizednumber.Thevalueofthenumberis:
N=(-1)sign×0.fraction×2-126
Noticethattheleadingimplicit“1”bitisnolongerassumed;itisnow“0”.Alsotheexponentisalways-126,whichhappenstobethesmallestexponentfornormalizednumbers.
Herearesomesamplenumbersthatmayhelpexplaindenormalizednumbers:
RISC-VArchitectureSummary/Porter Page� of� 115 323
Chapter4:FloatingPointInstructions
Smallestnormalizednumber: 1.00000000000000000000000×2-126 (24bitsofprecision) Largestdenormalizednumber: 0.11111111111111111111111×2-126 (23bitsofprecision) … Randomdenormalizednumber: 0.00000000001100101110101×2-126 (13bitsofprecision) … Smallestdenormalizednumber: 0.00000000000000000000001×2-126 (1bitofprecision) +0.0: 0.00000000000000000000000×2-126 (0bitsofprecision)
RepresentationofDoublePrecisionFloatingPointValues
Doubleprecision2loatingpointnumbersarerepresentedusingananalogousscheme.Theonlydifferenceisthenumberofbitsinthe“exponent”and“fraction”2ields.
Hereistherepresentationofa64-bitdoubleprecision2loatingpointvalue:
�
Thereare11bitsintheexponent2ield,insteadof8asinsingleprecision.Theinterpretationofthe“exponent”bitpatternsis:
RISC-VArchitectureSummary/Porter Page� of� 116 323
Chapter4:FloatingPointInstructions
BitPattern MeaningofExponentField 000 0000 0000 DenormalizedNumbers,includingzero 000 0000 0001 -1022 ... ... 011 1111 1110 -1 011 1111 1111 0 100 0000 0000 +1 ... ... 111 1111 1110 +1023 111 1111 1111 InZinity,Not-a-Number
Fornormalizednumbers,theexponenthasaneffectiverangeof-1022..+1023.
Thesmallestpositivenormalizednumberis:
Inbinary: 1.000000000...00000000000×2-1022 (Thereare52zeros)
Inbits: 0x0010000000000000
Decimalapproximation: 2.2250738585072014×10-308
Thelargestnormalizednumberis:
Inbinary: 1.111111111...11111111111×2+1023 (Thereare1+52ones)
Inbits: 0x7FEFFFFFFFFFFFFF
Decimalapproximation: 1.7976931348623157×10+308
Thesmallestdenormalizednumberis:
Inbinary: 0.000000000...00000000001×2-1022
Inbits: 0x0000000000000001
Decimalapproximation: 5.0×10-324
RISC-VArchitectureSummary/Porter Page� of� 117 323
Chapter4:FloatingPointInstructions
OtherFloatingPointSizes
Inadditiontothewellknownsingleprecision(32-bit)anddoubleprecision(64-bit)sizes,theIEEE754-2008standardalsodescribesthese2loatingpointsizes.Theyaremuchlesscommon.
16bits(halfprecision) 128bits(quadrupleprecision) 256bits(octupleprecision)
Thereisalsomentionofdecimal-basedrepresentations.
Rounding
Theresultofacomputationmaybeanumberthatisnotpreciselyrepresentable.Forexample,theadditionoftwodoublenumbersmaynotbeexactlyrepresentableasadouble.
Forexample,herearetwonumberswith5digitsofprecision.Theirsumhas8digitsofprecision.
12.345 =1.2345×101 + .067891 =6.7891×10-2 12.412891 =1.2412891×101
TheIEEEspecsaysthattheexactresultshouldbe“rounded”toanumberthatcanberepresented.Forexample,whentwodoublesareadded,theirresultwillbesomedoublevalue.
Intheaboveexample,tobringtheresultbackdownto5digitsofprecision,someaccuracymustbesacri2iced.Likewise,when2loatingpointoperationsareperformed,theremaybealossofaccuracyasaresultofrounding.
TheIEEEspeclistsseveralwaysthatavaluecanberoundedtosomethingthatcanberepresented:
RISC-VArchitectureSummary/Porter Page� of� 118 323
Chapter4:FloatingPointInstructions
•Roundtothenearestnumber (Foratie,thevaluewithazerointheleastsigni2icantbitischosen.) •Roundtowardzero(i.e.,truncate) •Roundtowardpositivein2inity(i.e.,roundup) •Roundtowardnegativein2inity(i.e.,rounddown)
Inordertoperformroundingcorrectly,acomputermayneedtoperformcalculations(e.g.,multiplication)withgreaterprecisionto2irstcomputethecorrectvalue.Then,asthe2inalstepinthecalculation,thevaluemustbeproperlyroundedto2itintotheavailable2loatingpointbits.
Hereisanotherexample,showingthattheentireeffectofanoperationcanbelostastheresultofrounding. 12.345 =1.2345×101 + .00000067891 =6.7891×10-7 12.34500067891 =1.234500067891×101
Roundingtoavaluewiththesameprecision(eitherroundingdown,roundingtozero,orroundingtothenearest)givestheinitialoperand,unchanged.Hereistheroundedresult:
12.345 =1.2345×101
TheFloatingPointExtensions
TheRISC-Vspecdescribestheseextensionstosupport2loatingpointarithmetic
F–Singleprecision2loatingpoint(32bitvalues) D–Doubleprecision2loatingpoint(64bitvalues) Q–Quadprecision2loatingpoint(128bitvalues)
The“D”extensionisasupersetof“F”;whendoubleprecisionisimplemented,allinstructionsoperatingonsingleprecisionvalueswillalsobeincluded.
Likewise,the“Q”isasupersetof“D”;whenthe“Q”extensionisimplemented,all“F”and“D”instructionswillalsobeimplemented.
RISC-VArchitectureSummary/Porter Page� of� 119 323
Chapter4:FloatingPointInstructions
Inthisdocument,weintermixthedescriptionsof“F”,“D”,and“Q”extensions,ratherthandescribeoneaftertheother.
FloatingPointRegisters
Thereare322loatingpointregisters,namedf0,f1,…f31.
Inthe“F”extension,eachregistercanholdonesingleprecision2loatingpointvalue.Thus,eachregisteris32bitswide.Allregistersfunctionidentically;thereisnothingspecialaboutf0,asthereiswithx0oftheintegerregisters.
Inthe“D”extension,theseregistersareall64bitswideandcanholdeitherasingleprecisionvalueoradoubleprecisionvalue.Iftheregistercontainsasingleprecision2loatingpoint,thenthemostsigni2icant32bitsoftheregisterwillallbe1.Wheninterpretedasadouble,suchavaluewillberecognizedasaNaN.
Inthe“Q”extension,theregistersare128bitswide.Eachregistercanholdeitherasingle,double,orquadprecision2loatingpointvalue.Inthecaseofsmallervalues,theupperbitswillbesetto1sothatthevaluewillappeartobeNaNifinterpretedasalargerprecisionvalue.
Inadditionthereisa“FloatingPointControlandStatusRegister”(called“FCSR”).
TheFCSRisdividedintothefollowing2ields:
Bits Widthinbits Description 0 1 NX:Inexact 1 1 UF:Under2low 2 1 OF:Over2low 3 1 DZ:DivideByZero 4 1 NV:InvalidOperation 5:7 3 FloatingPointRoundingMode(FRM) 8:31 24 (unused)
TheFloatingPointControlandStatusRegister(FCSR)isoneofthe“ControlandStatusRegisters(CSRs),whicharediscussedelsewhereinthisdocument.Thisparticularregisterisfreelyaccessibleregardlessoftheoperatingmode(user,supervisor,ormachinemode),sothisneednotbeaconcernhere.
RISC-VArchitectureSummary/Porter Page� of� 120 323
Chapter4:FloatingPointInstructions
TherearesomegeneralpurposeinstructionsusedforreadingandwritingtheCSRsandthesearediscussedelsewhere.
Collectively,the5singlebit2ieldsarereferredtoasthe“FloatingPointFlags”(FFLAGS):
FFLAGS(FloatingPointFlags): NX:Inexact UF:Under2low OF:Over2low DZ:DivideByZero NV:InvalidOperation
These52lagscanberesettozerobywritingtotheFloatingPointControlandStatusRegister(FCSR).Theymaybesettoonefromtime-to-timeas2loatingpointinstructionsareexecuted.Onceset,theywillremainset.Therefore,thesebitscanbequeriedafterasequenceofinstructionstodetermineifanyofseveralunusualconditionshasoccurredduringthepreviousinstructionsequence.
Themeaningofeachbitisstraightforward.
Iftheresultofanoperationhadtoberounded,thenthe“NX:Inexact”bitwillbeset.Iftheresultistoosmallto2itinanormalizedformandisalsoinexact,thenthe“UF:Under2low”bitwillbesetandtheresultwillbereturnedasadenormalizedvalue.Iftheresultistoolargetoberepresented,thenthe“OF:Over2low”bitwillbesetand+∞or-∞willusuallybereturnedastheresult.The“DZ:DivideByZero”bitissetforoperationslike1/0andlog(0)and+∞or-∞willbereturnedastheresult.Certainoperationsareconsideredinvalid,suchas“squarerootofanegativenumber”.The“NV:InvalidOperation”bitwillbesetandNaN(bydefault,quietNaN)willbereturned.
FloatingPointinstructionsthathaveproblemswillsettheFFLAGSbitsbutwillnevercauseatraporexception.Instead,instructionexecutionwillcontinueuninterrupted.
IftheresultofaninstructionisNaN,thenaquietNaNwillbeinserted,unlessotherwisedocumented.RISC-VwillinsertthefollowingrepresentationforaquietNaN:
RISC-VArchitectureSummary/Porter Page� of� 121 323
Chapter4:FloatingPointInstructions
SinglePrecisionquietNaN: 0x7FC00000=0 11111111 10000000000000000000000
The“unused”zerobitsmaybeusedforotherRISC-Vextensions,butnormallytheywillappeartobezerowhenread.Floatingpointcodeshouldignorethevalueofthis2ieldandnotattempttochangeit.
Thereareseveraldifferentwaysthata2loatingpointresultcanberoundedto2itintotheavailableprecision:
•Roundtothenearestnumber (Foratie,thevaluewithazerointheleastsigni2icantbitischosen.) •Roundtowardzero(i.e.,truncate) •Roundtowardnegativein2inity(i.e.,rounddown) •Roundtowardpositivein2inity(i.e.,roundup)
Theroundingmodecanbeeitherstaticordynamic.
Thereisa“RoundingMode”(RM)2ieldineach2loatingpointinstruction.Instaticmode,theroundingmethodtobeusedisencodedintheRMbitswithintheinstruction,accordingtothesecodes:
000 Roundtothenearestnumber (Foratie,thevaluewithazerointheleastsigni2icantbitischosen.) 001 Roundtowardzero(i.e.,truncate) 010 Roundtowardnegativein2inity(i.e.,rounddown) 011 Roundtowardpositivein2inity(i.e.,roundup) 100 Roundtothenearestnumber (Foratie,roundawayfromzero.) 101 invalid 110 invalid 111 Usedynamicmode:ConsulttheCSRfortheroundingmethod.
Withthedynamicmode,theroundingmodewillbedeterminedbytheFloatingPointRoundingMode(FRM)bitsintheFloatingPointControlandStatusRegister(FCSR).IftheinstructionsRM2ieldcontains111,thentheroundingmodetobeusedisdetermineddynamicallywhentheinstructionisexecutedbyconsultingtheFCSR.TheFRMbitsintheFCSRtellwhichroundingmethodtouse,accordingtothefollowingtable.Thistableisthesameasabove,exceptforthelastline.
RISC-VArchitectureSummary/Porter Page� of� 122 323
Chapter4:FloatingPointInstructions
000 Roundtothenearestnumber (Foratie,thevaluewithazerointheleastsigni2icantbitischosen.) 001 Roundtowardzero(i.e.,truncate) 010 Roundtowardnegativein2inity(i.e.,rounddown) 011 Roundtowardpositivein2inity(i.e.,roundup) 100 Roundtothenearestnumber (Foratie,roundawayfromzero.) 101 invalid 110 invalid 111 invalid
Ifa2loatingpointinstructionwillnotneedtoperformroundingbutcontainsanRM2ield(forexample,theFNEGinstruction),theRM2ieldshouldbesetto000.
TheRISC-Vspecdoesnotindicateorsuggesthowtheroundingmodeshouldbewritteninassemblycode.???
The2lagsportion(NX,UF,OF,DZ,NV)oftheFloatingPointStatusRegister(FCSR)isaccessibleandmirroredinasecondCSRcalled(FFLAGS).Likewise,theroundingmodebitsoftheFloatingPointStatusRegister(FCSR)areaccessibleandmirroredinathirdCSRcalled(FRM).
ControlandStatusRegistersarecoveredelsewhere,buttheregisterconcernedwiththe2loatingpointextensionare:
CSRAddr Name Description001 f=lags Floatingpointing2lags002 frm Dynamicroundingmode003 fcsr Concatenationoffrm+f2lags
Theseregsareread/writeinanyprivilegemode.
Separateassemblerinstructionsarede2inedtoaccesstheseregisters,butthesearejustshorthand(syntacticsugar)forother,moregeneralinstructions.Theinstructionsare:
FRFLAGS-ReadtheFFLAGSregisterFRFLAGS RegD RegD=CSR[FFLAGS]
FSFLAGS-SwaptheFFLAGSregisterFSFLAGS RegD,Reg1 RegD=CSR[FFLAGS];CSR[FFLAGS]=Reg1
RISC-VArchitectureSummary/Porter Page� of� 123 323
Chapter4:FloatingPointInstructions
FRRM-ReadtheFRM(RoundingMode)registerFRRM RegD RegD=CSR[FRM]
FSRM-SwaptheFRM(RoundingMode)registerFSRM RegD,Reg1 RegD=CSR[FRM];CSR[FRM]=Reg1
Not-A-Number-NaN
Thespecsaysthatthe“canonicalNaN”is0x7fc00000.Presumably,theymeanaquietNaN.TheRISC-VspecmentionstheexistenceofasignalingNaN,sayingthatifoneisencounteredinanoperation,aninvalidinstructionexceptionwilloccur.However,thespecdoesnotsayhowasignalingNaNisdistinguished.???
FloatingPointLoadandStoreInstructions
Thefollowinginstructionswillmovedatabetweenmemoryanda2loatingpointregister.
WeusethenotationFReg1,FReg2,andFRegDtoindicate2loatingpointregisters,justasweuseReg1,Reg2,andRegDtoindicateintegerregisters.
FloatingLoad(Word)
GeneralForm:FLW FRegD,Immed-12(Reg1)
Example:FLW f4,1234(x9) # f4 = Mem[x9+1234]
Description:A32-bitvalueisfetchedfrommemoryandmovedinto2loatingregisterFRegD.ThememoryaddressisformedbyaddingtheoffsettothecontentsofReg1.
Comment:Thetargetlocationgivenbythe12-bitoffsetmustbewithintherangeof-2,048..2,047relativetothevalueinReg1.
Theaddressofthememorylocationisnotrequiredtobeproperlyaligned(i.e.word-aligned),butitisassumedthatthisinstructionwillexecutefasterwithproperlyalignedaddresses.
RISC-VArchitectureSummary/Porter Page� of� 124 323
Chapter4:FloatingPointInstructions
Thisinstructionisguaranteedtoexecuteatomicallyiftheaddressisproperlyaligned.Iftheaddressisnotaligned,thereisnoguaranteeofatomicoperation.
RISC-VExtensions:Thisinstructionrequiresthe“F”extension.
Encoding: ThisisanI-typeinstruction.
FloatingLoad(Double)
GeneralForm:FLD FRegD,Immed-12(Reg1)
Example:FLD f4,1234(x9) # f4 = Mem[x9+1234]
Description:A64-bitvalueisfetchedfrommemoryandmovedinto2loatingregisterFRegD.ThememoryaddressisformedbyaddingtheoffsettothecontentsofReg1.
Comment:Thetargetlocationgivenbythe12-bitoffsetmustbewithintherangeof-2,048..2,047relativetothevalueinReg1.
Theaddressofthememorylocationisnotrequiredtobeproperlyaligned(i.e.doubleword-aligned),butitisassumedthatthisinstructionwillexecutefasterwithproperlyalignedaddresses.
Thisinstructionisguaranteedtoexecuteatomicallyiftheaddressisproperlyaligned.Iftheaddressisnotaligned,thereisnoguaranteeofatomicoperation.
RISC-VExtensions:Thisinstructionrequiresthe“D”extension.
Encoding: ThisisanI-typeinstruction.
FloatingLoad(Quad)
GeneralForm:FLQ FRegD,Immed-12(Reg1)
RISC-VArchitectureSummary/Porter Page� of� 125 323
Chapter4:FloatingPointInstructions
Comment: Analogous;Onlyavailablein“Q”quadprecision2loatingpointextension.
FloatingStore(Word)
GeneralForm:FSW FReg2,Immed-12(Reg1)
Example:FSW f4,1234(x9),f4 # Mem[x9+1234] = f4
Description:A32-bitvalueiscopiedfromregisterFReg2tomemory.ThememoryaddressisformedbyaddingtheoffsettothecontentsofReg1.
Comment:Thetargetlocationgivenbythe12-bitoffsetmustbewithintherangeof-2,048..2,047relativetothevalueinReg1.
Theaddressofthememorylocationisnotrequiredtobeproperlyaligned(i.e.word-aligned),butitisassumedthatthisinstructionwillexecutefasterwithproperlyalignedaddresses.
Thisinstructionisguaranteedtoexecuteatomicallyiftheaddressisproperlyaligned.Iftheaddressisnotaligned,thereisnoguaranteeofatomicoperation.
RISC-VExtensions:Thisinstructionrequiresthe“F”extension.
Encoding: ThisisanI-typeinstruction.
FloatingStore(Double)
GeneralForm:FSD FReg2,Immed-12(Reg1)
Example:FSD f4,1234(x9) # Mem[x9+1234] = f4
Description:A64-bitvalueiscopiedfromregisterFReg2tomemory.ThememoryaddressisformedbyaddingtheoffsettothecontentsofReg1.
RISC-VArchitectureSummary/Porter Page� of� 126 323
Chapter4:FloatingPointInstructions
Comment:Thetargetlocationgivenbythe12-bitoffsetmustbewithintherangeof-2,048..2,047relativetothevalueinReg1.
Theaddressofthememorylocationisnotrequiredtobeproperlyaligned(i.e.doubleword-aligned),butitisassumedthatthisinstructionwillexecutefasterwithproperlyalignedaddresses.
Thisinstructionisguaranteedtoexecuteatomicallyiftheaddressisproperlyaligned.Iftheaddressisnotaligned,thereisnoguaranteeofatomicoperation.
RISC-VExtensions:Thisinstructionrequiresthe“D”extension.
Encoding: ThisisanI-typeinstruction.
FloatingStore(Quad)
GeneralForm:FSQ FReg2,Immed-12(Reg1)
Comment: Analogous;Onlyavailablein“Q”quadprecision2loatingpointextension.
BasicFloatingPointArithmeticInstructions
FloatingAdd
GeneralForm:FADD.S FRegD,FReg1,FReg2 (single precision)FADD.D FRegD,FReg1,FReg2 (double precision)FADD.Q FRegD,FReg1,FReg2 (quad precision)
Example:FADD.S f4,f9,f13 # f4 = f9+f13 (32 bits)FADD.D f4,f9,f13 # f4 = f9+f13 (64 bits)FADD.Q f4,f9,f13 # f4 = f9+f13 (128 bits)
RISC-VArchitectureSummary/Porter Page� of� 127 323
Chapter4:FloatingPointInstructions
Description:ThevalueinFReg1isaddedtothevalueinFReg2andtheresultisplacedinFRegD.
RoundingMode:Thisinstructioncontainsa3bitRoundingMode(RM)2ield,whichwilldeterminetheroundingmethodtobeused.
RISC-VExtensions:FADD.Srequiresthe“F”extension.FADD.Drequiresthe“D”extension.FADD.Qrequiresthe“Q”extension.
Encoding: ThisisanR-typeinstruction.
FloatingSubtract
GeneralForm:FSUB.S FRegD,FReg1,FReg2 (single precision)FSUB.D FRegD,FReg1,FReg2 (double precision)FSUB.Q FRegD,FReg1,FReg2 (quad precision)
Example:FSUB.S f4,f9,f13 # f4 = f9-f13 (32 bits)FSUB.D f4,f9,f13 # f4 = f9-f13 (64 bits)FSUB.Q f4,f9,f13 # f4 = f9-f13 (128 bits)
Description:ThevalueinFReg1issubtractedfromthevalueinFReg2andtheresultisplacedinFRegD.
RoundingMode:Thisinstructioncontainsa3bitRoundingMode(RM)2ield,whichwilldeterminetheroundingmethodtobeused.
RISC-VExtensions:FSUB.Srequiresthe“F”extension.FSUB.Drequiresthe“D”extension.FSUB.Qrequiresthe“Q”extension.
Encoding: ThisisanR-typeinstruction.
RISC-VArchitectureSummary/Porter Page� of� 128 323
Chapter4:FloatingPointInstructions
FloatingMultiply
GeneralForm:FMUL.S FRegD,FReg1,FReg2 (single precision)FMUL.D FRegD,FReg1,FReg2 (double precision)FMUL.Q FRegD,FReg1,FReg2 (quad precision)
Example:FMUL.S f4,f9,f13 # f4 = f9*f13 (32 bits)FMUL.D f4,f9,f13 # f4 = f9*f13 (64 bits)FMUL.Q f4,f9,f13 # f4 = f9*f13 (128 bits)
Description:ThevalueinFReg1ismultipliedbythevalueinFReg2andtheresultisplacedinFRegD.
RoundingMode:Thisinstructioncontainsa3bitRoundingMode(RM)2ield,whichwilldeterminetheroundingmethodtobeused.
RISC-VExtensions:FMUL.Srequiresthe“F”extension.FMUL.Drequiresthe“D”extension.FMUL.Qrequiresthe“Q”extension.
Encoding: ThisisanR-typeinstruction.
FloatingDivide
GeneralForm:FDIV.S FRegD,FReg1,FReg2 (single precision)FDIV.D FRegD,FReg1,FReg2 (double precision)FDIV.Q FRegD,FReg1,FReg2 (quad precision)
Example:FDIV.S f4,f9,f13 # f4 = f9/f13 (32 bits)FDIV.D f4,f9,f13 # f4 = f9/f13 (64 bits)FDIV.Q f4,f9,f13 # f4 = f9/f13 (128 bits)
Description:ThevalueinFReg1isdividedbythevalueinFReg2andtheresultisplacedinFRegD.
RISC-VArchitectureSummary/Porter Page� of� 129 323
Chapter4:FloatingPointInstructions
RoundingMode:Thisinstructioncontainsa3bitRoundingMode(RM)2ield,whichwilldeterminetheroundingmethodtobeused.
RISC-VExtensions:FDIV.Srequiresthe“F”extension.FDIV.Drequiresthe“D”extension.FDIV.Qrequiresthe“Q”extension.
Encoding: ThisisanR-typeinstruction.
FloatingMinimum
GeneralForm:FMIN.S FRegD,FReg1,FReg2 (single precision)FMIN.D FRegD,FReg1,FReg2 (double precision)FMIN.Q FRegD,FReg1,FReg2 (quad precision)
Example:FMIN.S f4,f9,f13 # f4 = Min(f9,f13) (32 bits)FMIN.D f4,f9,f13 # f4 = Min(f9,f13) (64 bits)FMIN.Q f4,f9,f13 # f4 = Min(f9,f13) (128 bits)
Description:ThevaluesinFReg1andFReg2arecomparedandthesmalleroneisplacedinFRegD.
RISC-VExtensions:FMIN.Srequiresthe“F”extension.FMIN.Drequiresthe“D”extension.FMIN.Qrequiresthe“Q”extension.
Encoding: ThisisanR-typeinstruction.
FloatingMaximum
GeneralForm:FMAX.S FRegD,FReg1,FReg2 (single precision)FMAX.D FRegD,FReg1,FReg2 (double precision)FMAX.Q FRegD,FReg1,FReg2 (quad precision)
Example:FMAX.S f4,f9,f13 # f4 = Max(f9,f13) (32 bits)
RISC-VArchitectureSummary/Porter Page� of� 130 323
Chapter4:FloatingPointInstructions
FMAX.D f4,f9,f13 # f4 = Max(f9,f13) (64 bits)FMAX.Q f4,f9,f13 # f4 = Max(f9,f13) (128 bits)
Description:ThevaluesinFReg1andFReg2arecomparedandthelargeroneisplacedinFRegD.
RISC-VExtensions:FMAX.Srequiresthe“F”extension.FMAX.Drequiresthe“D”extension.FMAX.Qrequiresthe“Q”extension.
Encoding: ThisisanR-typeinstruction.
FloatingSquareRoot
GeneralForm:FSQRT.S FRegD,FReg1 (single precision)FSQRT.D FRegD,FReg1 (double precision)FSQRT.Q FRegD,FReg1 (quad precision)
Example:FSQRT.S f4,f9 # f4 = sqrt(f9) (32 bits)FSQRT.D f4,f9 # f4 = sqrt(f9) (64 bits)FSQRT.Q f4,f9 # f4 = sqrt(f9) (128 bits)
Description:ThesquarerootofthevalueinFReg1iscomputedandtheresultisplacedinFRegD.
RoundingMode:Thisinstructioncontainsa3bitRoundingMode(RM)2ield,whichwilldeterminetheroundingmethodtobeused.
RISC-VExtensions:FSQRT.Srequiresthe“F”extension.FSQRT.Drequiresthe“D”extension.FSQRT.Qrequiresthe“Q”extension.
Encoding: ThisisaR-typeinstruction,inwhichtheReg22ieldisallzeros.
RISC-VArchitectureSummary/Porter Page� of� 131 323
Chapter4:FloatingPointInstructions
FloatingFusedMultiply-Add
GeneralForm:FMADD.S FRegD,FReg1,FReg2,FReg3 (single precision)FNMADD.S FRegD,FReg1,FReg2,FReg3 (single precision)FMADD.D FRegD,FReg1,FReg2,FReg3 (double precision)FNMADD.D FRegD,FReg1,FReg2,FReg3 (double precision)
Example:FMADD.S f2,f5,f6,f7 # f2 = (f5*f6)+f7FNMADD.S f2,f5,f6,f7 # f2 = (-(f5*f6))+f7FMADD.D f2,f5,f6,f7 # f2 = (f5*f6)+f7FNMADD.D f2,f5,f6,f7 # f2 = (-(f5*f6))+f7
Description:Thisinstructionperformsamultiplicationandanaddition.Avariation(whichisshownherewiththeopcodeFNMADD)willoptionallynegatetheproductbeforetheaddition.Seethespecconcerningcaseswhentheargumentsare0,∞,andNaN.
RoundingMode:Thisinstructioncontainsa3bitRoundingMode(RM)2ield,whichwilldeterminetheroundingmethodtobeused.
RISC-VExtensions:FMADD.SandFNMADD.Srequirethe“F”extension.FMADD.DandFNMADD.Drequirethe“D”extension.
Encoding: Thisinstructionisunusualinthatithasfouroperands,allofwhichare
registers.Thus,itdoesnot2itintoanyoftheinstructionformatsthatweredescribedearlier.ThisinstructiontypeiscalledanR4-typeinstruction.TheR4-typeinstructionformatisusedonlyforthisandthefusedmultiply-subtractinstructions.
FloatingFusedMultiply-Subtract
GeneralForm:FMSUB.S FRegD,FReg1,FReg2,FReg3 (single precision)FNMSUB.S FRegD,FReg1,FReg2,FReg3 (single precision)FMSUB.D FRegD,FReg1,FReg2,FReg3 (double precision)FNMSUB.D FRegD,FReg1,FReg2,FReg3 (double precision)
RISC-VArchitectureSummary/Porter Page� of� 132 323
Chapter4:FloatingPointInstructions
Example:FMSUB.S f2,f5,f6,f7 # f2 = (f5*f6)-f7FNMSUB.S f2,f5,f6,f7 # f2 = (-(f5*f6))-f7FMSUB.D f2,f5,f6,f7 # f2 = (f5*f6)-f7FNMSUB.D f2,f5,f6,f7 # f2 = (-(f5*f6))-f7
Description:Thisinstructionperformsamultiplicationandasubtraction.Avariation(whichisshownherewiththeopcodeFNMSUB)willoptionallynegatetheproductbeforethesubtraction.Seethespecconcerningcaseswhentheargumentsare0,∞,andNaN.
RoundingMode:Thisinstructioncontainsa3bitRoundingMode(RM)2ield,whichwilldeterminetheroundingmethodtobeused.
RISC-VExtensions:FMSUB.SandFNMSUB.Srequirethe“F”extension.FMSUB.DandFNMSUB.Drequirethe“D”extension.
Encoding: Thisinstructionisunusualinthatithasfouroperands,allofwhichare
registers.Thus,itdoesnot2itintoanyoftheinstructionformatsthatweredescribedearlier.ThisinstructiontypeiscalledanR4-typeinstruction.TheR4-typeinstructionformatisusedonlyforthisandthefusedmultiply-addinstructions.
FloatingFusedMultiply-Add/Subtract(Quad)
GeneralForm:FMADD.Q FRegD,FReg1,FReg2,FReg3FNMADD.Q FRegD,FReg1,FReg2,FReg3FMSUB.Q FRegD,FReg1,FReg2,FReg3FNMSUB.Q FRegD,FReg1,FReg2,FReg3
Comment: Analogous;Onlyavailablein“Q”quadprecision2loatingpointextension.
FloatingPointConversionInstructions
RISC-VArchitectureSummary/Porter Page� of� 133 323
Chapter4:FloatingPointInstructions
Thereisasetofinstructionsforconvertingfromonerepresentationtoanother.Thefollowingcodesareusedtonamethedifferentformats:
W 32-bitsignedinteger WU 32-bitunsignedinteger L 64-bitsignedinteger LU 64-bitunsignedinteger S Single-precision2loatingpointnumber D Double-precision2loatingpointnumber
FloatingPointConversion(Integerto/fromSinglePrecision)
GeneralForms:FCVT.W.S RegD,FReg1FCVT.WU.S RegD,FReg1FCVT.L.S RegD,FReg1 (availableinRV64orRV128only)FCVT.LU.S RegD,FReg1 (availableinRV64orRV128only)FCVT.S.W FRegD,Reg1FCVT.S.WU FRegD,Reg1FCVT.S.L FRegD,Reg1 (availableinRV64orRV128only)FCVT.S.LU FRegD,Reg1 (availableinRV64orRV128only)
Examples:FCVT.W.S r2,f5 # r2(int32) = f5(float)FCVT.WU.S r2,f5 # r2(uint32) = f5(float)FCVT.L.S r2,f5 # r2(int64) = f5(float)FCVT.LU.S r2,f5 # r2(uint64) = f5(float)FCVT.S.W f4,r7 # f4(float) = r7(int32)FCVT.S.WU f4,r7 # f4(float) = r7(uint32)FCVT.S.L f4,r7 # f4(float) = r7(int64)FCVT.S.LU f4,r7 # f4(float) = r7(uint64)
Description:Theseinstructionsperformaconversionofasingleprecision2loatingformatto/fromanintegerformat.
Rounding:TheseinstructionscontainRoundingMode(RM)bitswhichdeterminehowroundingistobedone.
Whenconvertingtoaninteger,thevalues+∞,NaN,andvaluesthataretoolargetoberepresentedinthetargetformatarestoredasthelargestinteger
RISC-VArchitectureSummary/Porter Page� of� 134 323
Chapter4:FloatingPointInstructions
valuethatcanberepresented.Thevalue–∞andvaluesthataretoonegativetoberepresentedinthetargetformatareconvertedintothesmallestintegervaluethatcanberepresented.
RISC-VExtensions:Theseinstructionsrequirethe“F”extension.Furthermore,theinstructionsinvolving64bitintegersarenotavailableinRV32,asnotedinthegeneralformsabove.
Encoding: ThisisanR-typeinstruction,inwhichtheReg22ieldisusedforadditional
opcodebits.
FloatingPointConversion(Integerto/fromDoublePrecision)
GeneralForms:FCVT.W.D RegD,FReg1FCVT.WU.D RegD,FReg1FCVT.L.D RegD,FReg1 (availableinRV64orRV128only)FCVT.LU.D RegD,FReg1 (availableinRV64orRV128only)FCVT.D.W FRegD,Reg1FCVT.D.WU FRegD,Reg1FCVT.D.L FRegD,Reg1 (availableinRV64orRV128only)FCVT.D.LU FRegD,Reg1 (availableinRV64orRV128only)
Examples:FCVT.W.D r2,f5 # r2(int32) = f5(double)FCVT.WU.D r2,f5 # r2(uint32) = f5(double)FCVT.L.D r2,f5 # r2(int64) = f5(double)FCVT.LU.D r2,f5 # r2(uint64) = f5(double)FCVT.D.W f4,r7 # f4(double) = r7(int32)FCVT.D.WU f4,r7 # f4(double) = r7(uint32)FCVT.D.L f4,r7 # f4(double) = r7(int64)FCVT.D.LU f4,r7 # f4(double) = r7(uint64)
Description:Theseinstructionsperformaconversionofadoubleprecision2loatingformatto/fromanintegerformat.
Rounding:TheseinstructionscontainRoundingMode(RM)bitswhichdeterminehowroundingistobedone.
RISC-VArchitectureSummary/Porter Page� of� 135 323
Chapter4:FloatingPointInstructions
Whenconvertingtoaninteger,thevalues+∞,NaN,andvaluesthataretoolargetoberepresentedinthetargetformatarestoredasthelargestintegervaluethatcanberepresented.Thevalue–∞andvaluesthataretoonegativetoberepresentedinthetargetformatareconvertedintothesmallestintegervaluethatcanberepresented.
RISC-VExtensions:Theseinstructionsrequirethe“D”extension.Furthermore,theinstructionsinvolving64bitintegersarenotavailableinRV32,asnotedinthegeneralformsabove.
Encoding: ThisisanR-typeinstruction,inwhichtheReg22ieldisusedforadditional
opcodebits.
ProgrammingHint:Tostore+0.0ina2loatingregister,usetheseinstructions:
FCVT.S.W FRegD,x0 # Single precisionFCVT.D.W FRegD,x0 # Double precision
FloatingPointConversion(Integerto/fromQuadPrecision)
GeneralForms:FCVT.W.Q RegD,FReg1FCVT.WU.Q RegD,FReg1FCVT.L.Q RegD,FReg1 (availableinRV64orRV128only)FCVT.LU.Q RegD,FReg1 (availableinRV64orRV128only)FCVT.Q.W FRegD,Reg1FCVT.Q.WU FRegD,Reg1FCVT.Q.L FRegD,Reg1 (availableinRV64orRV128only)FCVT.Q.LU FRegD,Reg1 (availableinRV64orRV128only)
Comment: Analogous;Onlyavailablein“Q”quadprecision2loatingpointextension.
FloatingPointConversion(Single/Doubleto/fromQuadPrecision)
GeneralForms:FCVT.S.Q FRegD,FReg1FCVT.Q.S FRegD,FReg1
RISC-VArchitectureSummary/Porter Page� of� 136 323
Chapter4:FloatingPointInstructions
FCVT.D.Q FRegD,FReg1FCVT.Q.D FRegD,FReg1
Comment: Analogous;Onlyavailablein“Q”quadprecision2loatingpointextension.
FloatingPointMoveInstructions
Thenextthreeinstructionscopyasignedvaluefromoneregistertoanotherregister,whilemodifyingthesignbitbasedonthesignfromanothervalue.
FloatingSignInjection(SinglePrecision)
GeneralForms:FSGNJ.S FRegD,FReg1,FReg2 (inject)FSGNJN.S FRegD,FReg1,FReg2 (negate)FSGNJX.S FRegD,FReg1,FReg2 (exclusive-or)
Examples:FSGNJ.S f2,f5,f6 # f2 = sign(f6) * |f5|FSGNJN.S f2,f5,f6 # f2 = -sign(f6) * |f5|FSGNJX.S f2,f5,f6 # f2 = sign(f6) * f5
Description:EachoftheseinstructionscopiesthevalueinFReg1intoFRegD.Howeverthesignoftheresultischanged,basedonthesignofthevalueinFReg2.Inthe“inject”instruction,thesignfromFReg2isusedfortheresult.Inthe“negate”instruction,thesignfromFReg2is2irst2lippedandthenusedfortheresult.Inthe“exclusive-or”instruction,thesignsfromFReg1andFReg2areXOR-edtogetherandthenusedfortheresult.
Comments:TheseinstructionsareusefulasimplementationsofspecialcasesfortheinstructionsFNEG.S,FABS.S,andFMV.S.Inaddition,theIEEE754-2008speccallsfora“copysign”operation.Theseoperationsmayalsohaveuseinsomemathlibraryfunctions.
RISC-VExtensions:Theseinstructionsrequirethe“F”extension.Presumably,inthe“D”extension,thisinstructionwillonlymodifythelowerorder32bits,butitispossiblethatitwillalsosettheupper32bitstoallones.???
RISC-VArchitectureSummary/Porter Page� of� 137 323
Chapter4:FloatingPointInstructions
Encoding: ThisisanR-typeinstruction.
FloatingSignInjection(DoublePrecision)
GeneralForms:FSGNJ.D FRegD,FReg1,FReg2 (inject)FSGNJN.D FRegD,FReg1,FReg2 (negate)FSGNJX.D FRegD,FReg1,FReg2 (exclusive-or)
Examples:FSGNJ.D f2,f5,f6 # f2 = sign(f6) * |f5|FSGNJN.D f2,f5,f6 # f2 = -sign(f6) * |f5|FSGNJX.D f2,f5,f6 # f2 = sign(f6) * f5
Description:EachoftheseinstructionscopiesthevalueinFReg1intoFRegD.Howeverthesignoftheresultischanged,basedonthesignofthevalueinFReg2.Inthe“inject”instruction,thesignfromFReg2isusedfortheresult.Inthe“negate”instruction,thesignfromFReg2is2irst2lippedandthenusedfortheresult.Inthe“exclusive-or”instruction,thesignsfromFReg1andFReg2areXOR-edtogetherandthenusedfortheresult.
Comments:TheseinstructionsareusefulasimplementationsofspecialcasesfortheinstructionsFNEG.D,FABS.D,andFMV.D.Inaddition,theIEEE754-2008speccallsfora“copysign”operation.Theseoperationsmayalsohaveuseinsomemathlibraryfunctions.
RISC-VExtensions:Theseinstructionsrequirethe“D”extension.
Encoding: ThisisanR-typeinstruction.
FloatingSignInjection(DoublePrecision)
GeneralForms:FSGNJ.Q FRegD,FReg1,FReg2 (inject)FSGNJN.Q FRegD,FReg1,FReg2 (negate)FSGNJX.Q FRegD,FReg1,FReg2 (exclusive-or)
Comment: Analogous;Onlyavailablein“Q”quadprecision2loatingpointextension.
RISC-VArchitectureSummary/Porter Page� of� 138 323
Chapter4:FloatingPointInstructions
FloatingMove
GeneralForms:FMV.S FRegD,FReg1 (single precision)FMV.D FRegD,FReg1 (double precision)FMV.Q FRegD,FReg1 (quad precision)
Examples:FMV.S f2,f5 # f2 = f5 (32 bits)FMV.D f2,f5 # f2 = f5 (64 bits)FMV.Q f2,f5 # f2 = f5 (128 bits)
Description:ThevalueinFReg1iscopiedintoFRegD.
RISC-VExtensions:FMV.Srequiresthe“F”extension.FMV.Drequiresthe“D”extension.FMV.Qrequiresthe“Q”extension.
Encoding:Theseinstructionsarespecialcasesofmoregeneralinstructions.Theyareassembledidenticallyto:
FSGNJ.S FRegD,FReg1,FReg1 (inject) FSGNJ.D FRegD,FReg1,FReg1 (inject)
FloatingNegate
GeneralForms:FNEG.S FRegD,FReg1 (single precision)FNEG.D FRegD,FReg1 (double precision)FNEG.Q FRegD,FReg1 (quad precision)
Examples:FNEG.S f2,f5 # f2 = -(f5) (32 bits)FNEG.D f2,f5 # f2 = -(f5) (64 bits)FNEG.Q f2,f5 # f2 = -(f5) (128 bits)
Description:ThevalueinFReg1isnegatedandthencopiedintoFRegD.
RISC-VArchitectureSummary/Porter Page� of� 139 323
Chapter4:FloatingPointInstructions
RISC-VExtensions:FNEG.Srequiresthe“F”extension.FNEG.Drequiresthe“D”extension.FNEG.Qrequiresthe“Q”extension.
Encoding:Theseinstructionsarespecialcasesofmoregeneralinstructions.Theyareassembledidenticallyto:
FSGNJN.S FRegD,FReg1,FReg1 (negate) FSGNJN.D FRegD,FReg1,FReg1 (negate)
FloatingAbsoluteValue
GeneralForms:FABS.S FRegD,FReg1 (single precision)FABS.D FRegD,FReg1 (double precision)FABS.Q FRegD,FReg1 (quad precision)
Examples:FABS.S f2,f5 # f2 = |f5| (32 bits)FABS.D f2,f5 # f2 = |f5| (64 bits)FABS.Q f2,f5 # f2 = |f5| (128 bits)
Description:TheabsolutevalueofthequantityinFReg1iscopiedintoFRegD.
RISC-VExtensions:FABS.Srequiresthe“F”extension.FABS.Drequiresthe“D”extension.FABS.Qrequiresthe“Q”extension.
Encoding:Theseinstructionsarespecialcasesofmoregeneralinstructions.Theyareassembledidenticallyto:
FSGNJX.S FRegD,FReg1,FReg1 (exclusive-or) FSGNJX.D FRegD,FReg1,FReg1 (exclusive-or)
FloatingMoveTo/FromIntegerRegister(SinglePrecision)
GeneralForms:FMV.X.W RegD,FReg1FMV.W.X FRegD,Reg1
RISC-VArchitectureSummary/Porter Page� of� 140 323
Chapter4:FloatingPointInstructions
Examples:FMV.X.W x2,f5 # x2 = f5FMV.W.X f5,x2 # f5 = x2
Description:32-bitsarecopiedfroma2loatingpointregistertoanintegerregister,orfromanintegerregistertoa2loatingpointregister.Thebitsareunaltered;i.e.,noconversionisperformed.
InthecaseofRV64andRV128(wheretheintegerregistersarelargerthan32bits),whenthedestinationisanintegerregister,theupperbitsoftheintegerregisterwillbe2illedwithzeros.Whenthesourceisanintegerregister,theupperbitswillbeignored.
Noerrorconditionscanarise.
Inanolderversionofthespec,theletter“S”wasusedinplaceof“W”intheopcode,i.e.,“FMV.X.S”and“FMV.S.X”.
RISC-VExtensions:Theseinstructionsrequirethe“F”extension.
Encoding: ThisisanR-typeinstruction.
FloatingMoveTo/FromIntegerRegister(DoublePrecision)
GeneralForms:FMV.X.D RegD,FReg1FMV.D.X FRegD,Reg1
Examples:FMV.X.D x2,f5 # x2 = f5FMV.D.X f5,x2 # f5 = x2
Description:64-bitsarecopiedfroma2loatingpointregistertoanintegerregister,orfromanintegerregistertoa2loatingpointregister.Thebitsareunaltered;i.e.,noconversionisperformed.
InthecaseofRV128(wheretheintegerregistersarelargerthan64bits),whenthedestinationisanintegerregister,theupperbitsoftheintegerregisterwillbe2illedwithzeros.Whenthesourceisanintegerregister,theupperbitswillbeignored.
RISC-VArchitectureSummary/Porter Page� of� 141 323
Chapter4:FloatingPointInstructions
Noerrorconditionscanarise.RISC-VExtensions:
Theseinstructionsrequirethe“D”extension.Encoding: ThisisanR-typeinstruction.
Note:The“Q”extensiondoesnotprovidethefollowinginstructions.Thisisintentional.
FMV.X.Q RegD,FReg1FMV.Q.X FRegD,Reg1
Inordertomoveaquadprecisionvaluefroma2loatingpointregistertoanintegerregister,you’llneedtomoveittomemory2irst,thenmoveittotheintegerregister.
FloatingPointCompareandClassifyInstructions
FloatingPointComparison
GeneralForms:FLT.S RegD,FReg1,FReg2 (single precision)FLE.S RegD,FReg1,FReg2 (single precision)FEQ.S RegD,FReg1,FReg2 (single precision)
FLT.D RegD,FReg1,FReg2 (double precision)FLE.D RegD,FReg1,FReg2 (double precision)FEQ.D RegD,FReg1,FReg2 (double precision)
FLT.Q RegD,FReg1,FReg2 (quad precision)FLE.Q RegD,FReg1,FReg2 (quad precision)FEQ.Q RegD,FReg1,FReg2 (quad precision)
RISC-VArchitectureSummary/Porter Page� of� 142 323
Chapter4:FloatingPointInstructions
Examples:FLT.S x2,f5,f6 # x2 = (f5<f6) ? 1 : 0FLE.S x2,f5,f6 # x2 = (f5≤f6) ? 1 : 0FEQ.S x2,f5,f6 # x2 = (f5=f6) ? 1 : 0
FLT.D x2,f5,f6 # x2 = (f5<f6) ? 1 : 0FLE.D x2,f5,f6 # x2 = (f5≤f6) ? 1 : 0FEQ.D x2,f5,f6 # x2 = (f5=f6) ? 1 : 0
FLT.Q x2,f5,f6 # x2 = (f5<f6) ? 1 : 0FLE.Q x2,f5,f6 # x2 = (f5≤f6) ? 1 : 0FEQ.Q x2,f5,f6 # x2 = (f5=f6) ? 1 : 0
Description:Two2loatingpointvaluesinthe2loatingpointregistersarecomparedandavalueindicatingtheresultisstoredinanintegerregister.Theinstructionsstorea“1”iftherelationship(<,≤,or=)holdstrueand“0”iftherelationshipdoesnothold.
Thereisnoneedfor>or≥instructionssincetheprogrammercanachievethesameresultbyusingthe<and≤instructionsbysimplyswappingthetworegisteroperands.
ErrorConditions:IfeitheroperandisaNot-a-Number(NaN)valuethena“0”willbestoredinthedestinationregister.Thisappliestoallthreecomparisoninstructions(<,≤,and=).
TheIEEEspecrequiresthatthe<or≤shouldresultina“signalingNaN”ifeitheroperandisaNaN.ForRISC-V,theinstructionexecutionwillneverbeinterrupted,soasignalingNaNcannotbeimplementedbyinterruptingtheinstruction2low.RISC-Vhandlesthiscaseasfollows:For<and≤comparisons,ifeitheroperandisNaN,thentheNV(invalidoperation)bitintheFloatingPointerControlandStatusRegister(FCSR)willbesetto1.Otherwise,thebitwillbeleftunchanged.
TheIEEEspectreatsthe=comparisondifferently,sayingthatifeitheroperandisNaN,thentheresultshouldbea“quietNaN”.TheRISC-Vimplementsthisafollows:Forthe=comparison,theNV(invalidoperation)bitwillneverbemodi2ied.
RISC-VExtensions:The“.S”instructionsrequirethe“F”extension.
RISC-VArchitectureSummary/Porter Page� of� 143 323
Chapter4:FloatingPointInstructions
The“.D”instructionsrequirethe“D”extension.The“.Q”instructionsrequirethe“Q”extension.
Encoding: ThisisanR-typeinstruction.
FloatingPointClassify
GeneralForms:FCLASS.S RegD,FReg1 (single precision)FCLASS.D RegD,FReg1 (double precision)FCLASS.Q RegD,FReg1 (quad precision)
Examples:FCLASS.S x2,f5 # x2 = class(f5)FCLASS.D x2,f5 # x2 = class(f5)FCLASS.Q x2,f5 # x2 = class(f5)
Description:Theinstructionlooksata2loatingpointvalueanddetermineswhatsortofanumberitis.Forexample,doesthevaluerepresent+0.0,-0.0,Nan,+∞,-∞,etc?
Theresultoftheclassi2icationofthe2loatingpointvaluewillbestoredinanintegerregister,asfollows:Allbitsoftheintegerregisterwillbeclearedto“0”,exceptthatexactlyonebitwillbesetto“1”,toindicatewhatsortofthingthe2loatingpointregistercontains.
Herearethepossibleresultsthatcanbestoredintotheintegerregister:
BitSet ValueStored Meaning0 1 FReg1contains−∞.1 2 FReg1containsanegativenormalnumber.2 4 FReg1containsanegativesubnormalnumber.3 8 FReg1contains−0.4 16 FReg1contains+0.5 32 FReg1containsapositivesubnormalnumber.6 64 FReg1containsapositivenormalnumber.7 128 FReg1contains+∞.8 256 FReg1containsasignalingNaN.9 512 FReg1containsaquietNaN.
RISC-VArchitectureSummary/Porter Page� of� 144 323
Chapter4:FloatingPointInstructions
RISC-VExtensions:The“.S”instructionsrequirethe“F”extension.The“.D”instructionsrequirethe“D”extension.The“.Q”instructionsrequirethe“Q”extension.
Encoding: ThisisanR-typeinstruction.
RISC-VArchitectureSummary/Porter Page� of� 145 323
Chapter5:RegisterandCallingConventions
StandardUsageofGeneralPurposeRegisters
The32bitversionoftheRISC-Varchitecturetreatsallregistersidenticallysotheassemblylanguageprogrammerisfreetouseanyregisterashe/shewishes.Theonlyexceptionisregisterx0,whichalwayscontainszero,andcanbeusedasadestinationwhenthedatashouldbediscarded.
However,thereisaRISC-V“standardcallingconvention”whichspeci2ieshowregisterswillnormallybeusedbythecompilerandmostassemblylanguageprogrammerswillfollowthisconvention.
Inthe“C”(CompressedInstructions)extension,16bitinstructionsaresupported,andeachcompressedinstructionisashorter,abbreviatedformofa32-bitinstruction.Thecompressedinstructionsetwasdesignedundertheassumptionthattheregisterswillbeusedinthestandardwaysandonlythemostcommoninstructionsaregiven16bitequivalents.Assuch,theregistersarenottreatedidenticallyinthecompressedinstructionset.
Forexample,the32bit“call”instruction(JAL)cansavethereturnaddressinanyregister,althoughthestandardconventionsmandatethatregisterx1willalwaysbeusedtosavethereturnaddress.Makinguseofthisconvention,the16bitversionoftheinstruction(C.JAL)savesthereturnaddressinx1andonlythisregister.Thereisnocompressedinstructiontosavethereturnaddressinanyotherregister,whichisreasonablesincethisisnotsomethingnormallydone.
Thenamesofthegeneralpurposeregistersarex0,x1,…x31.Theregistersarealsogivensecond,alternatenames.Theassemblerwillaccepteithername,sotheprogrammercanusewhichevernameseemsclearest.
Herearetheregisters:
RISC-VArchitectureSummary/Porter Page� of�146 323
Chapter5:RegisterandCallingConventions
Other SavedAcross Name Name Description Calls x0 zero Zero x1 ra ReturnAddress x2 sp StackPointer yes /calleesaved x3 gp GlobalPointer x4 tp ThreadPointer x5 t0 Temp x6 t1 Temp
x7 t2 Temp x8 s0,fp SavedReg/FramePointer yes/calleesaved x9 s1 SavedReg yes/calleesaved x10 a0 Functionargument x11 a1 Functionargument x12 a2 Functionargument x13 a3 Functionargument x14 a4 Functionargument x15 a5 Functionargument x16 a6 Functionargument x17 a7 Functionargument x18 s2 SavedReg yes /calleesaved x19 s3 SavedReg yes /calleesaved x20 s4 SavedReg yes /calleesaved x21 s5 SavedReg yes /calleesaved x22 s6 SavedReg yes /calleesaved x23 s7 SavedReg yes /calleesaved x24 s8 SavedReg yes /calleesaved x25 s9 SavedReg yes /calleesaved x26 s10 SavedReg yes /calleesaved x27 s11 SavedReg yes /calleesaved x28 t3 Temp x29 t4 Temp x30 t5 Temp x31 t6 Temp
Thecompressedinstructionsaredesignedtoalloweasyaccessto8registers,namelyx8,x9,…x15.Intheabovetable,these8registersarehighlighted.
RISC-VArchitectureSummary/Porter Page� of� 147 323
Chapter5:RegisterandCallingConventions
SavingRegistersAcrossCalls
Byconvention,someregistersaresavedacrossfunctioncalls.Forprogramsthatfollowthestandardcallingconventions(andweassumethatalmostalldo),afunction(say“foo”)willcallanotherfunction(say“bar”).The“caller”isfooandthe“callee”isbar.
Wenormallyusetheterminology“calleesaved”toindicatethataregisterwillbepreservedacrossacall.Forexample,functionbarwillnotmodifytheregister(orifitdoes,itwill2irstsaveandthenrestoretheregister),sothatthecallerfoocanrelyonitsvalueremainingunchangedbybar.
Presumablythecallerfoowillhavelocalvariableswhosevaluesmustbepreservedacrossacalltobar.Presumablythecalleebarwillneedsometemporaryvariablestocompleteitstaskandcomputeitsresult.Iffoocannottrustbartopreserveitslocalvariables,thenfoowillneedtosaveregistersand,afterbarreturns,restoretheirvalues.Thissavingandrestoringisdonetomainmemory(typicallytothelocalstackframe),andsuchmemoryreferencesareveryslow.Iffooreliesonbartodothesavingandrestoringtheregisters,thentheburdenisplacedonbartoperformtheseslowmemorytasks.
Therearetradeoffsandgeneratingoptimalcodeistricky.Itmaybemostef2icientforthecallerfootosavetheregisters,sinceitmaybethecasethatcallerfoocansavearegisterjustonce,makeanumberofcallstobar,andthenrestoretheregister.Iftheregisterhadbeensavedinthecalleebar,therewouldhavebeenmultiplesavesandrestores.Ontheotherhand,itmaybebestforthesavingandrestoringtobeperformedinthecalleebar.Perhapsbarwilloftentakeanexecutionpaththatdoesnotdisrupttheregister,sothesavingandrestoringcanbeavoidedaltogetherinmanycases.
Thetradeoff(whethertoperformthesavinginthecallerorcallee)isadecisionthatmustbemadeforeachoftheregistersinthecallerfoo.Ifbothfunctionsareprogrammedsimultaneouslybyacleverprogrammer,thenhe/shecanmakeadhocdecisionsaboutthesavingandrestoringofresistersinanattempttoproduceoptimalcode.
However,almostallcodeiscompiledandmostcompilerslookateachfunctionincompleteisolation.Effectively,eachfunctionisseparatelycompiled,sothereisno
RISC-VArchitectureSummary/Porter Page� of� 148 323
Chapter5:RegisterandCallingConventions
possibilityofcoordinatingandoptimizingthesavingofregistersbetweenmultiplefunctions.
Instead,theapproachistocreateastandardconventionmandatingwhichregistersaresavedbythecalleebarandwhicharenot.Registersthatarenotrequiredbytheconventiontobepreservedacrossacallaretermed“callersaved”,whichmeansthatitisuptothecallerfootosaveandrestoretheregisters,iftheycontaindatathatmustbepreservedacrossthefunctioncalltobar.
Theconventiondictateswhichregistersarerequiredtobepreservedacrossacall,anditisassumedthatcompilersandprogrammerswillfollowthecallingconventions.Iftheydo,thentheseparatelycompiledorseparatelywrittencodeisinteroperable.
Assumingthattheconventionsaretobefollowedwhencompilingsomefunctionfoo,thecompiler/programmerisfreetousewhicheverregistermakesmostsense.Forexample,avariablethatmustbepreservedacrossfunctioncalls(e.g.,tootherfunctionslikebar)willmostlikelybeplacedina“calleesaved”register.Therefore,thesaveandrestoreinstructionsareavoidedinthecaller’scode,andmightbeavoidedaltogetherifbardoesn’tevenusetheregisterinquestion.Ontheotherhand,thecompiler/programmermightprefertouseatemporary(callersaved)register,especiallyiftherearenocallswithinfoo,orifthevariabledoesnotneedtobesavedacrossanycallsthatdooccur.
Byestablishingacallingconventionsaboutwhichregistersarecalleesavedandwhicharenot,thedesignersaremakingtheimportantdecisionabouttheratiobetweentemporarycalleesavedregisters.Assumingthatthereareenoughregistersineachclass,thennoregistersaving/restoringwillbenecessary.
Butnotethatwithrecursiveroutines,therecanbeacon2lict.Consideralocalvariablethatmustbepreservedacrossarecursivecall.Byplacingthevariableinatemporary(callersaved)registertherecanbeaproblem.Sincethisregisterwillnotbesavedacrosstherecursivecall,thefunctionmustsaveitbeforethecallandrestoreitafterwards.Ontheotherhand,ifitisplacedinacalleesavedregisterwedon’thavetobotherwithsaving/restoringitaroundtherecursivecall.Butsincethefunctionwillbeusingacalleesavedregister,thefunctionitselfmustsavethepreviousvalueuponentryandrestoreitbeforereturn.
Soeitherway,theregistermustbesaveandrestored.Thismakessenseforrecursivefunctionsthathavelocalvariablesthatmustbepreservedacrosstherecursivecall.
RISC-VArchitectureSummary/Porter Page� of� 149 323
Chapter5:RegisterandCallingConventions
Theonlyquestioniswhethertoperformthesavebeforethecallinstructionorafterthecallinstruction,andlikewiseperformtherestoreafterthereturninstructionorbeforethereturninstruction.Thecompilermakesthisdecisionwhenitchoosestoplacethevariableinacallee-orcaller-savedregister.
Notethatcompilers/programmersaresometimesfreetoviolatethecallingconventionsordertoproducesuperiorcode.Thisisonlyfeasiblewhenthecompiler/programmerhasaccesstoboththecalledfunction(bar)andallpotentialcallers(suchasfoo).Insuchcases,thecompiler/programmercanchoosetousetheregistersinanon-standardwaytoimproveef2iciency.Forexample,itmaybethatbarwouldbene2itfromhavingmoretemporaryregistersandnoneofthecallershavedatathatmustbepreservedacrossthecalltobar,sothecompiler/programmercanuseregistersthatwouldotherwisebe“calleesaved”as“callersaved”.Compilerscankeepallthedetailsstraightbutprogrammersshouldbecareful.Manyassemblylanguageprogrammingbugsaretheresultofmistakesinvolvingthepropersavingandrestoringofregisters.
Registerx0-TheZeroRegister(“zero”)
Asmentionedearlier,registerx0isadummyregister.Whenread,itsvalueisalwayszero;whenstoredinto,thedataissimplydiscarded.
Registerx1-TheReturnAddress(“ra”)
Whenafunctioniscalled,thereturnaddressmustbesavedsothattheRETURNinstructioncanreturntothecorrectlocationinthecaller’scode.Inoldercomputers,theCALLinstructionstoredthereturnaddressinmemoryonthestack;inmodernISAs(includingRISC-V)theCALLinstructionsavesthereturnaddressinaregister.TheRETURNinstructionretrievesthevaluefromthisregister.TheRETURNinstructionisthennothingmorethanaJR(jumpthroughregister)instruction.
Byconvention,registerx1isusedforthispurpose.
SavingthereturnaddressinaregisterisfasterthansavingitonthestacksincethememoryWRITEandREADoperationsareavoided.Thisschemeisidealforleaffunctions.(“Leaf”functionsdonotcallotherfunctionsandarethusleafnodesinthe
RISC-VArchitectureSummary/Porter Page� of� 150 323
Chapter5:RegisterandCallingConventions
activation/callingtree.)Manyfunctioncallsaretoleaffunctionssothissavesalotmemoryactivity.
Fornon-leaffunctions,thereturnaddressmustbesavedsomewhereelse.Sincex1willbeusedinsubsequentcalls,thereturnaddressinx1mustbemovedsomewhereelsetoavoidgettingclobberedbeforeitcanbeusedintheRETURNinstruction.Thevaluecaneitherbesavedinacalleesavedregisteroronthestack.(Movingthereturnaddresstoacalleesavedregistermaynecessitateotherregistersavingelsewhere,forcingasavetostackmemoryelsewhere.Forrecursivefunctions,thereisnowaytoavoidusingthein-memorystack;wecanonlyshiftaroundwhenthememoryREADs/WRITEsoccurs.)
Registerx2-TheStackPointer(“sp”)
RISC-Vassumesthatx2containsthestacktoppointerandthatthestackgrowsdownward.Thestackisassumedtoalwaysbequadword(16byte)aligned.Somefunctions/methodsmayoperateentirelyusingregistersandmaynotneedanystackspace.TheRISC-Vdesigntriestofacilitatesuchfunctionssincetheymakenomemoryaccesses.However,manyfunctionswillallocatestackspacetoholdreturnaddresses,localvariables,savedregisters,etc.Thelocalvariables,etc.,arestoredina“stackframe”,whichissometimescalledan“activationrecord.”Recursivefunctionsareknownforneedingstackspaceandallocatinganewstackframeforeachinvocation.
Afunctionthatrequiresstackstoragewillgrowthestack(alwaysbyamultipleof16)bysubtractingfromthestacktoppointersp.Variableswithinthestackframecanbeaddressedusingpositiveoffsetsfromregisterx2.Thisorganizationmotivatesseveralofthecompressedinstructions,inwhichtheoffsetmustbepositiveandisneversignextended.
Registerx2iscallee-saved,whichmeansthatanyfunctionthatgrowsthestacktocreateastackframemustalsoshrinkitbyanequalamountbeforereturning.Inotherwords,afunctionmusthavetheneteffectofpoppingeverythingthatitpushed,leavingthestackasitfoundit,andeffectivelypreservingthevalueofx2.Thus,x2issaidtobe“callee-saved”.
RISC-VArchitectureSummary/Porter Page� of� 151 323
Chapter5:RegisterandCallingConventions
Registerx3-TheGlobalPointer(“gp”)
Globalvariablesarealsocalledstaticvariables,andarecontrastedwithlocalvariables.Globalvariablesaresharedbyallfunctions.Globalvariablesremaininexistencethroughouttheentireprogram,whilelocalvariablescomeintoexistencewhenafunctioniscalledandgooutofexistencewhenthefunctionreturns.
Whilelocalvariablesareoftenkeptinregistersoronthestackatunpredictablelocations,globalvariablesareplacedat2ixedlocationsinmemory.Unfortunatelytheselocationsareoftenfarawayfromthecodethataccessesthem.PC-relativeaddressingcanbeused,butisoftenineffectivesincetherequiredoffsetscanbelarge.Absoluteaddressescanoftenbeproblematictoo,sincetheabsoluteaddressescanalsobelarge,whichwouldnecessitateusingextrainstructions.
ThesolutionsupportedbytheRISC-Vcallingconventionistoplacetheglobalvariablestogetherandinitializearegistertopointtothisarea.Byconvention,registerx3isusedforthis.Thentheindividualvariablescanbeconvenientlyaddressedbyusingasmalloffsetfromtheglobalpointer.
Theglobalpointeristypicallyinitializedearlyintheprogramandneverchanged.So,insomesense,itis“callee-saved”althoughwedidn’tmarkitassuchinthetableabove.
Notallprogramswillusethistechnique,sothisregistermayactuallybeusedforsomethingcompletelydifferent.
Registerx4-TheThreadBasePointer(“tp”)
Inmulti-threadedprograms,eachthreadmayhaveacollectionof“thread-speci2ic”variables,whicharenotlocaltoanyfunction.Instead,theyaresharedbyallcoderunninginthisthread,butareinvisibletootherthreads.
Thread-speci2icvariablesaredistinctfromglobalvariables.Thereisonlyonecopyofeachglobalvariableandeachvariablehasauniqueoffsetfromtheglobalbaseregister,x3/gp.Thethreadbaseregister(tp)hasthesamevalueinallcoderunninginallthreads,whichcausesallcodetoaccessthatsamesetofvariables,regardlessofwhichthreaditisexecutingin.
RISC-VArchitectureSummary/Porter Page� of� 152 323
Chapter5:RegisterandCallingConventions
Inamulti-threadedprogram,eachthreadhasauniqueregisterset.WhentheOSswitchesfromonethreadtoanother,allregistersfromthepreviouslyrunningthreadwillbesavedtosomeareaofmemoryandthenewlyactivethreadwillhaveitsregistersrestoredfromadifferentmemoryarea.Ingeneral,theregistersinonethreadwillbecompletelydifferentfromtheregistersinanotherthread,eventhoughthethreadsarecooperatingandexecutingintheexactsameaddressspace.Forexample,eachthreadwillhaveitsownstack,sothex2(sp)registerineachthreadwillcontainadifferentvalue.
Sinceallthreadssharetheglobalvariables,registerx3(gp)willhavethesamevalueinallthreads.Inaddition,eachthreadcanhaveitsownprivatesetofvariables,whicharecalled“thread-speci2ic”variables.Byconvention,thisblockofvariableswillbepointedtobyregisterx4(tp),soeachthreadwillhaveadifferentvalueinitsx4register.Asinglesectionofcodethataccessesthread-speci2icvariableswillaccessadifferentsetofvariables,dependingonwhichthreaditisrunningin,andthisworksbecauseeachthreadhasauniquevalueinitsx4(tp)register.
Manyapplicationsarenotmulti-threaded,orcontainnothread-speci2icvariables.Insuchcases,itseemstheconventionleavesunspeci2iedhowx4istobeused.
Thethreadpointeristypicallyinitializedearlyintheexecutionofanewthreadandneverchanged.So,insomesense,itis“callee-saved”althoughwedidn’tmarkitassuchinthetableabove.
Registerx5-x7,x28-x31-TempRegisters(“t0-t6”)
Byconvention,thereare7registerswhicharefreetobeusedbyanyfunctionwithoutsavingtheirpreviouscontents.Typicallythecompilerwillplacetheintermediateresultsofcomputationintotempregisters.Insomecases,thecompilermayplacelocalvariablesinthetempregisters.
Thetempregistersare“callersaved”,whichmeansthatthecodewithinafunction“foo”isnotrequiredtosavetheirpreviouscontentsbeforeusingthem.Ontheotherhand,sincethisruleappliestoallfunctions,itmustbeassumedthatcallinganyotherfunction(suchas“bar”)mayresultintheseregistersbeingchanged.Thus,ifthecaller(foo)caresaboutthevalueofatempregister,itmustsavetheregisterbeforecallingbar.
RISC-VArchitectureSummary/Porter Page� of� 153 323
Chapter5:RegisterandCallingConventions
Registerx8,x9,x18-x27-SavedRegisters(“s0-s11”)
Byconvention,thereare12registerswhicharedesignatedascallee-savedregisters.Theconventionisthatafunctionmustnotmodifytheseregisters.Ifsomefunctionwishesorneedstouseoneoftheseregisters,thatfunctionmust2irstsaveitspreviousvalueandthenrestorethatvaluebeforereturning.
Thismakestheseregisterespeciallyusefulinstoringlocalvariablesthatmustbepreservedacrosscallstootherfunctions.Thereisareasonablechancethatthecalledfunctionwillnottouchtheseregistersatall,sosaving/restoringtomainmemoryisoftenavoidedcompletely.
Twooftheseregisters(x8/s0andx9/s1)areespeciallyeasytoaccessusingthe16bitcompressedinstructions.Thereforethecompilershouldprefertouseoneofthesetworegisterswheneverpossible.
Registerx8-FramePointer(“fp”)
Registerx8(whichisalsos0,oneofthecallee-savedregisters)isdesignatedastheframepointerandgivenasecondname,“fp”.
Noteveryfunctionwillneedaframepointerand,forafunctionthatdoesnotneedaframepointer,thisregistercanbeviewedass0,justanothercallee-savedregister.
Tounderstandthepurposeofaframepointer,considertwosortsoffunction.
The2irstsortoffunctionwillhaveasmallnumberoflocalvariables.Furthermore,eachofthelocalvariableswillhavea2ixedsize,knownatcompile-time.Forsuchafunction,thestackframecanbelaidoutbythecompilerandalloffsetswithinthestackframewillbeknown,smallnumbers.Thissortoffunctioncanaccessitsvariablesinthestackframebyusingpositiveoffsetsfromthestacktoppointer(x2).Thissortofafunctionwillnotneedaframepointer.
Thesecondsortoffunctionmayhavealargenumberoflocalvariablesormaycreatealocalvariable(suchasadynamicarray)whosesizeisnotknownuntilruntimewhenthefunctionisactuallycalled.Insuchafunction,thecompilerwillnotknow
RISC-VArchitectureSummary/Porter Page� of� 154 323
Chapter5:RegisterandCallingConventions
thelayoutoftheframe.Thelocalvariablescannotbeaccessedusingsmallpositiveoffsetsfromthestackpointerregister,x2.
Theideawithusingaframepointeristhis:Uponentrytothefunction,theframepointerregisterwillbesettoaknown,2ixedvalue.Typically,thiswouldbetheoldvalueofthestackpointerbeforethenewstackframeisallocated.Then,thestackframeisallocatedinacoupleofsteps.Firstthe2ixedsizedvariablesareallocatedbyadjustingthestacktoppointer.Thenthedynamicallysizedvariablesareallocated,bydecrementingthestacktoppointerbyvaluesdeterminedatruntime.Afterthisinitializationandcreationoflocalvariables,theframepointerisleftpointingtooneend(thetop,higherend)oftheframe(ormorepreciselytothe2irstbyteofthepreviousframe)andthestackpointerisleftpointingtothebottom,lowerendoftheframe.Offsetsrelativetothestacktoppointerareproblematic,sincethecompilerwillnotknowexactlyhowmuchthestackpointerwasadjusted.Instead,thelocalvariableswillbeaccessedusingnegativeoffsetsrelativetotheframepointer.
(Inanotherapproach,thelocalvariablescanbepushedontothestack2irst.Thentheframepointerisinitializedtothecurrentstacktop.Finally,thedynamicallysizedvariablesarepushedontothestack.Thisallowsthelocalvariablestobeaccessedusingapositiveoffsettotheframepointer.IntheRISC-Vdesign,positiveoffsetsarepreferredinthe16bitcompressedinstructionformat,sothismightbeabetterapproach.)
Theframepointeriscalleesaved.Thismeansthatthevalueoftheregistermustberestoredbeforethefunctionreturns.Typically,thefunctionprologuewillstorethepreviousvalueofthefpregisterintheframeandthefunctionepiloguewillrestorefprightbeforereturning.
Registerx10-x17-ArgumentRegisters(“a0-a7”)
Typically,argumentstofunctionsarepassedinregistersandthisgroupofregistersismeanttobeusedforthatpurpose.
Also,avaluethatistobereturnedfromafunctionwillusuallybereturnedinaregister.
RISC-VArchitectureSummary/Porter Page� of� 155 323
Chapter5:RegisterandCallingConventions
Thereturnvaluewillbereturnedinx10(a0),orifthevalueistoolargeto2itintoasingleregister,itwillbereturnedintheregisterpairx10-x11(a0-a1).Thisisthesameregister(s)thatisusedtopassinthe2irstargument.
CallingConventionsforArguments
Argumentsarenormallypassedtoafunctioninregistersandthecallingconventiondetailsexactlyhowargumentvaluesaretobeplacedinregisters.
Iftherearetoomanyarguments,thenthe2irstargumentsarepassedintheregisters,andtheremainderarepassedonthestack.Also,ifanindividualargumentistoolarge,itwillbepassedinmemoryandtheregisterwillcontainapointertothismemoryregion.
Ifthereturnedvalueissmallenough,itwillbereturnedina0-a1,otherwiseitisreturnedinmemory.
Here,wesummarizethecallingconventionsforRV32,the32bitarchitecture.
• Floatsarepassedinthe2loatingpointregisters,iftheyexist.Otherwise,2loatsarepassedinthegeneralpurposeregisters,justlikeints.
• Othervaluesbrides2loats(e.g.,ints,chars,bools,pointers,structs)arepassedinthe8generalpurposeregistersa0-a7,aslongastheyarenolargerthan64bits.So,ifthevaluewill2itintooneortworegisters,theywillbepassedinregisters.
• Valuessmallerthan32bitsaresign-extendedto32bitsandpassedinaregister.[Apparently,theyare“widenedaccordingtothesignoftheirtype,thensignextended”;myinterpretationmaybeincorrect.???]
• Valuesof32bitsarepassedinasingleregister.
• Valuesofupto64bitsinsizearepassedinaregisterpair.Theleastsigni2icant32-bitswillbeplacedintheregisterwiththesmallernumber,inlittleendianstyle.
RISC-VArchitectureSummary/Porter Page� of� 156 323
Chapter5:RegisterandCallingConventions
• Valueslargerthan64bitsarealwayspassed“byreference”.Thismeansthatthevalueisplacedinaregionofmemory.Thisregioniseitherinthecaller’sstackframeorispushedontothestackatthetimeofthefunctioncall.Theaddresstothisregionappearsinthelistofargumentsandispassedinaregister,inplaceofthevalue.[Itseemsthatanaddressisalwaysused,regardlessofwhetherornotthereisroomintheregistersfortheaddress.]
• Iftherearenotenoughregisters,onlythe2irstfewargumentswillbepassedinregistersandtheremainderwillbepassedinmemory.Theextra,remainingargumentswillpushedontothestack,sotheywillbeatknownoffsetsfromthestacktoppointeruponentrytothecalledfunction.Thus,theycanbeeasilyaccessedusingpositiveoffsetsfromtheframepointerregister(fp).
• Iftherearenotquiteenoughregisters,a64bitvaluecanbesplitintotwo32-bitparts,withtheleastsigni2icantwordpassedinaregisterandthemostsigni2icantwordpassedinmemory.
• Followingthe“C”languageconvention,argumentsarealways“passedbyvalue”,whichmeanstheyareeffectivelylocalvariableswhichcanbemodi2ied;afterthefunctionreturns,themodi2iedvalueislost.Forexample,structspassedasargumentswillbecopiedandanychangestothestuccowillbelost.Ifthisisnotwhattheprogrammerwants,thenhe/shecanexplicitlypassanaddress,inwhichcasethepointeris“passedbyvalue”,andthevaluepointedtoiseffectively“passedbyreference”.
• Allvaluespassedinmemory(e.g.,argumentsinthestackframe)arealigned,accordingtothesizeofthevalue.
• Ifthereturnedvalueis64bitsorless,itwillbereturnedinregistera0-a1(orthe2loatingpointregisters,ifapplicable).
• Ifthereturnedvalueislargerthan64bits,thecallerwillallocatememoryspaceforthereturnedvalue(inthecaller’sstackframe)andwillpassapointertothisblockofmemory.Thepointerwillbepassedasthe2irstargument,implicitlyinsertedinfrontoftheotherarguments.
Sixoftheseregisters(x10-x15,i.e.a0-a5)areespeciallyeasytoaccessusingthe16bitcompressedinstructions.Therefore,themajorityoffunctions(whichhave6orfewerarguments)canbecalledusingregistersalone.
RISC-VArchitectureSummary/Porter Page� of� 157 323
Chapter5:RegisterandCallingConventions
Commentary:Theargumentregistersareotherwisetreatedliketemporaries,inthattheyarecallersaved,i.e.,notpreservedacrosscalls.Ifanargumentregisterisnotusedforpassingarguments,thenititisfreetobeusedasatemporaryworkregister.
Itisunclearwhyitisnecessarytodifferentiatebetweenthesetwoclasses.Whycan’ttempregistersbeusedtopassargumentsincaseswheretherearealotofarguments?Ifafunctionwithmanyargumentstrulyneedsatemporaryregister,thenitcansavesomeofitsargumentsinitsstackframe.Thisisnoextrawork,sincethecallerwouldhavehadtosavethevaluesinitsframeotherwise.Anditwillprobablybethecasethatthecalleefunctioncanbeabitsmarteraboutwhathastobesavedinmemorythanthecaller.
FloatingPointRegisters
Herearethe2loatingpointregisters,withtheirintendedusage.The16bitcompressedinstructionsfavorregistersf8-f15andthesearehighlightedbelow.
RISC-VArchitectureSummary/Porter Page� of� 158 323
Chapter5:RegisterandCallingConventions
Other SavedAcross Name Name Description Calls f0 ft0 Temp f1 ft1 Temp f2 ft2 Temp f3 ft3 Temp f4 ft4 Temp f5 ft5 Temp f6 ft6 Temp
f7 ft7 Temp f8 fs0 SavedReg yes/calleesaved f9 fs1 SavedReg yes/calleesaved f10 fa0 Functionargument f11 fa1 Functionargument f12 fa2 Functionargument f13 fa3 Functionargument f14 fa4 Functionargument f15 fa5 Functionargument f16 fa6 Functionargument f17 fa7 Functionargument f18 fs2 SavedReg yes /calleesaved f19 fs3 SavedReg yes /calleesaved f20 fs4 SavedReg yes /calleesaved f21 fs5 SavedReg yes /calleesaved f22 fs6 SavedReg yes /calleesaved f23 fs7 SavedReg yes /calleesaved f24 fs8 SavedReg yes /calleesaved f25 fs9 SavedReg yes /calleesaved f26 fs10 SavedReg yes /calleesaved f27 fs11 SavedReg yes /calleesaved f28 ft8 Temp f29 ft9 Temp f30 ft10 Temp f31 ft11 Temp
RISC-VArchitectureSummary/Porter Page� of� 159 323
Chapter6:CompressedInstructions
CompressedInstructions
Normally,RISC-Vinstructionsare32bitsinlength.However,the“C”(CompressedInstructions)extensionaddsinstructionswhichare16bitslong.
The“C”extensionmerelyaddsinstructionstotheISA;alltheother32bitinstructionsremainvalid,unchangedandfullyfunctional.
Each16bitinstructionisanabbreviationforalonger32bitinstruction.Thus,nonewfunctionalityisadded.Thepurposeofthe16bitinstructionsistoreducecodesize:byreplacinglongerinstructionswiththeirequivalentshorterversions,codesizeisreduced.
Notall32bitinstructionshaveanequivalent16bitcounterpart.ThegoaloftheRISC-Vdesignistoprovide16bitinstructionsforthemostpopularandfrequentlyoccurring32bitinstructions,therebyreducingthecodesizeasmuchaspossible.Ablockofcodeinwhichall32bitinstructionshappentohave16bitcounterpartscanbereducedtohalfitssize.Buttypicalcodesequenceswillcontainsomeinstructionswhichhaveno16bitcounterparts,sothereductioninsizewillnotbeasgreat.Thedesignersestimatean25%reductionincodesize.
InotherISAdesignsthereisa“compressedmode”:Whentheprocessorisplacedincompressedmode,theshorterinstructionsareexecuted.RISC-Vdoesnotworkthisway.TheRISC-Vdesignhascarefullyencodedtheinstructionssothatthelengthoftheinstructioncanbedeterminedandthebytesfetchedfrommemorycanbeunambiguouslyinterpretedaseither16bitor32bitinstructions.Thus,theprogrammercanfreelyintermixlongandshortinstructions.
RISC-VArchitectureSummary/Porter Page� of�160 323
Chapter6:CompressedInstructions
InstructionFormats
Thereareseveralinstructionformatsusedforthecompressedinstructions.Inlistingtheformatsbelow,wewillusetheseabbreviationsfor2ields:
------ OpcodeBitsDDDDD RegDDDD RegD(x8..x15only)
aaaaa Reg1aaa Reg1(x8..x15only)bbbbb Reg2bbb Reg2(x8..x15only)VVVVV ImmediateValueXXXXX Offset
Herearethemaininstructionforms.(Eachcharacterindicatesasinglebit.)
RegisterFormat----aaaaabbbbb------DDDDDbbbbb--
ImmediateFormat---VaaaaaVVVVV-----VDDDDDVVVVV--
WideImmediateFormat---VVVVVVVVDDD--
Stack-relativeStoreFormat---VVVVVVbbbbb--
LoadFormat---VVVaaaVVDDD--
StoreFormat---VVVaaaVVbbb--
BranchFormat---XXXaaaXXXXX--
JumpFormat---XXXXXXXXXXX--
Theseformatsareschematic.Fortheindividualinstructionsdescribedinthefollowingsections,wegivetheexactencoding,includingtheactualopcodebits,aswellasotherencodingconstraints.
RISC-VArchitectureSummary/Porter Page� of� 161 323
Chapter6:CompressedInstructions
LoadInstructions
C.LW-LoadWord
GeneralForm:C.LW RegD,Immed-6(Reg1)
Description:Movea32bitvaluefrommemoryintoregisterRegD.Toformtheaddress,theImmed-6valueiszero-extended,multipliedby4,andaddedasapositiveoffsettothebaseaddressinregisterReg1.
SincetheImmed-6valueisscaledby4,thisgivesaneffectiverangeof0..252,inmultiplesof4.
RegistersRegDandReg1arerestrictedtox8..x15.Encoding:
010VVVaaaVVDDD00WhereDDD=RegD.
Whereaaa=Reg1.WhereVVVVVV=Immed-6.
Availability:Requires“C”(CompressedInstruction)extension.
Equivalent32BitInstruction:LW RegD,Offset(Reg1)
C.LW-LoadDoubleword
GeneralForm:C.LD RegD,Immed-6(Reg1)
Description:Movea64bitvaluefrommemoryintoregisterRegD.Toformtheaddress,theImmed-6valueiszero-extended,multipliedby8,andaddedasapositiveoffsettothebaseaddressinregisterReg1.
RISC-VArchitectureSummary/Porter Page� of� 162 323
Chapter6:CompressedInstructions
SincetheImmed-6valueisscaledby8,thisgivesaneffectiverangeof0..504,inmultiplesof8.
RegistersRegDandReg1arerestrictedtox8..x15.Encoding:
011VVVaaaVVDDD00WhereDDD=RegD.
Whereaaa=Reg1.WhereVVVVVV=Immed-6.
Availability:Requires“C”(CompressedInstruction)extension.OnlyavailableforRV64andRV128.
Equivalent32BitInstruction:LD RegD,Offset(Reg1)
C.LQ-LoadQuadword
GeneralForm:C.LQ RegD,Immed-6(Reg1)
Description:Movea128bitvaluefrommemoryintoregisterRegD.Toformtheaddress,theImmed-6valueiszero-extended,multipliedby16,andaddedasapositiveoffsettothebaseaddressinregisterReg1.
SincetheImmed-6valueisscaledby16,thisgivesaneffectiverangeof0..1,008,inmultiplesof16.
RegistersRegDandReg1arerestrictedtox8..x15.Encoding:
001VVVaaaVVDDD00 WhereaaadenotesReg1.
WhereVVVVVV=Immed-6.Availability:
Requires“C”(CompressedInstruction)extension.OnlyavailableforRV128.
Equivalent32BitInstruction:LQ RegD,Offset(Reg1)
RISC-VArchitectureSummary/Porter Page� of� 163 323
Chapter6:CompressedInstructions
C.FLW-LoadSingleFloat
GeneralForm:C.FLW FRegD,Immed-6(Reg1)
Description:Movea32bitvaluefrommemoryinto2loatingpointregisterFRegD.Toformtheaddress,theImmed-6valueiszero-extended,multipliedby4,andaddedasapositiveoffsettothebaseaddressingeneralpurposeregisterReg1.
SincetheImmed-6valueisscaledby4,thisgivesaneffectiverangeof0..252,inmultiplesof4.
RegisterFRegDisrestrictedtof8..f15.
RegisterReg1isrestrictedtox8..x15.Encoding:
011VVVaaaVVDDD00 WhereaaadenotesReg1.
WhereVVVVVV=Immed-6.Availability:
Requires“C”(CompressedInstruction)extension.Requires“F”(SinglePrecisionFloatingPoint)extension.OnlyavailableforRV32.
Equivalent32BitInstruction:FLW FRegD,Offset(Reg1)
C.FLD-LoadDoubleFloat
GeneralForm:C.FLD FRegD,Immed-6(Reg1)
Description:Movea64bitvaluefrommemoryinto2loatingpointregisterFRegD.Toformtheaddress,theImmed-6valueiszero-extended,multipliedby8,andaddedasapositiveoffsettothebaseaddressingeneralpurposeregisterReg1.
SincetheImmed-6valueisscaledby8,thisgivesaneffectiverangeof0..504,inmultiplesof8.
RegisterFRegDisrestrictedtof8..f15.
RISC-VArchitectureSummary/Porter Page� of� 164 323
Chapter6:CompressedInstructions
RegisterReg1isrestrictedtox8..x15.Encoding:
001VVVaaaVVDDD00 WhereaaadenotesReg1.
WhereVVVVVV=Immed-6.Availability:
Requires“C”(CompressedInstruction)extension.Requires“D”(DoublePrecisionFloatingPoint)extension.OnlyavailableforRV32andRV64.
Equivalent32BitInstruction:FLD FRegD,Offset(Reg1)
C.LWSP-LoadWordfromStackFrame
GeneralForm:C.LWSP RegD,Immed-6
Description:Movea32bitvaluefrommemoryintoregisterRegD.Toformtheaddress,theImmed-6valueiszero-extended,multipliedby4,andaddedtothestackpointer(x2).
SincetheImmed-6valueisscaledby4,thisgivesaneffectiverangeof0..252,inmultiplesof4.
Thedestinationregistercanbeanyofthe32registers,exceptx0.Encoding:
010VDDDDDVVVVV10WhereDDDDD=RegDandcannotbex0=00000.WhereVVVVVV=Immed-6.
Availability:Requires“C”(CompressedInstruction)extension.
Equivalent32BitInstruction:LW RegD,Offset(x2)
RISC-VArchitectureSummary/Porter Page� of� 165 323
Chapter6:CompressedInstructions
C.LDSP-LoadDoublewordfromStackFrame
GeneralForm:C.LDSP RegD,Immed-6
Description:Movea64bitvaluefrommemoryintoregisterRegD.Toformtheaddress,theImmed-6valueiszero-extended,multipliedby8,andaddedtothestackpointer(x2).
SincetheImmed-6valueisscaledby8,thisgivesaneffectiverangeof0..504,inmultiplesof8.
Thedestinationregistercanbeanyofthe32registers,exceptx0.Encoding:
011VDDDDDVVVVV10WhereDDDDD=RegDandcannotbex0=00000.WhereVVVVVV=Immed-6.
Availability:Requires“C”(CompressedInstruction)extension.OnlyavailableforRV64andRV128.
Equivalent32BitInstruction:LD RegD,Offset(x2)
C.LQSP-LoadQuadwordfromStackFrame
GeneralForm:C.LQSP RegD,Immed-6
Description:Movea128bitvaluefrommemoryintoregisterRegD.Toformtheaddress,theImmed-6valueiszero-extended,multipliedby16,andaddedtothestackpointer(x2).
SincetheImmed-6valueisscaledby16,thisgivesaneffectiverangeof0..1008inmultiplesof16.
Thedestinationregistercanbeanyofthe32registers,exceptx0.
RISC-VArchitectureSummary/Porter Page� of� 166 323
Chapter6:CompressedInstructions
Encoding:001VDDDDDVVVVV10
WhereDDDDD=RegDandcannotbex0=00000.WhereVVVVVV=Immed-6.
Availability:Requires“C”(CompressedInstruction)extension.OnlyavailableforRV128.
Equivalent32BitInstruction:LQ RegD,Offset(x2)
C.FLWSP-LoadSingleFloatfromStackFrame
GeneralForm:C.FLWSP FRegD,Immed-6
Description:Movea32bitvaluefrommemoryinto2loatingpointregisterFRegD.Toformtheaddress,theImmed-6valueiszero-extended,multipliedby4,andaddedtothestackpointer(x2).
SincetheImmed-6valueisscaledby4,thisgivesaneffectiverangeof0..252,inmultiplesof4.
Thedestinationregistercanbeanyofthe322loatingpointregisters.Encoding:
011VDDDDDVVVVV10WhereDDDDD=FRegD.WhereVVVVVV=Immed-6.
Availability:Requires“C”(CompressedInstruction)extension.Requires“F”(SinglePrecisionFloatingPoint)extension.OnlyavailableforRV32.
Equivalent32BitInstruction:FLW FRegD,Offset(x2)
RISC-VArchitectureSummary/Porter Page� of� 167 323
Chapter6:CompressedInstructions
C.FLDSP-LoadDoubleFloatfromStackFrame
GeneralForm:C.FLDSP FRegD,Immed-6
Description:Movea64bitvaluefrommemoryinto2loatingpointregisterFRegD.Toformtheaddress,theImmed-6valueiszero-extended,multipliedby8,andaddedtothestackpointer(x2).
SincetheImmed-6valueisscaledby8,thisgivesaneffectiverangeof0..504,inmultiplesof8.
Thedestinationregistercanbeanyofthe322loatingpointregisters.Encoding:
001VDDDDDVVVVV10WhereDDDDD=FRegD.WhereVVVVVV=Immed-6.
Availability:Requires“C”(CompressedInstruction)extension.Requires“D”(DoublePrecisionFloatingPoint)extension.OnlyavailableforRV32andRV64.
Equivalent32BitInstruction:FLD FRegD,Offset(x2)
StoreInstructions
C.SW-StoreWord
GeneralForm:C.SW Reg2,Immed-6(Reg1)
Description:Movea32bitvaluefromregisterReg2tomemory.Toformtheaddress,theImmed-6valueiszero-extended,multipliedby4,andaddedasapositiveoffsettothebaseaddressinregisterReg1.
RISC-VArchitectureSummary/Porter Page� of� 168 323
Chapter6:CompressedInstructions
SincetheImmed-6valueisscaledby4,thisgivesaneffectiverangeof0..252,inmultiplesof4.
RegistersReg1andReg2arerestrictedtox8..x15.Encoding:
110VVVaaaVVbbb00 WhereaaadenotesReg1andbbbdenotesReg2.
WhereVVVVVV=Immed-6.Availability:
Requires“C”(CompressedInstruction)extension.Equivalent32BitInstruction:
SW Reg2,Offset(Reg1)
C.SD-StoreDoubleword
GeneralForm:C.SD Reg2,Immed-6(Reg1)
Description:Movea64bitvaluefromregisterReg2tomemory.Toformtheaddress,theImmed-6valueiszero-extended,multipliedby8,andaddedasapositiveoffsettothebaseaddressinregisterReg1.
SincetheImmed-6valueisscaledby8,thisgivesaneffectiverangeof0..504,inmultiplesof8.
RegistersReg1andReg2arerestrictedtox8..x15.Encoding:
111VVVaaaVVbbb00 WhereaaadenotesReg1andbbbdenotesReg2.
WhereVVVVVV=Immed-6.Availability:
Requires“C”(CompressedInstruction)extension.OnlyavailableforRV64andRV128.
Equivalent32BitInstruction:SD Reg2,Offset(Reg1)
RISC-VArchitectureSummary/Porter Page� of� 169 323
Chapter6:CompressedInstructions
C.SQ-StoreQuadword
GeneralForm:C.SQ Reg2,Immed-6(Reg1)
Description:Movea128bitvaluefromregisterReg2tomemory.Toformtheaddress,theImmed-6valueiszero-extended,multipliedby16,andaddedasapositiveoffsettothebaseaddressinregisterReg1.
SincetheImmed-6valueisscaledby16,thisgivesaneffectiverangeof0..1,008,inmultiplesof16.
RegistersReg1andReg2arerestrictedtox8..x15.Encoding:
101VVVaaaVVbbb00 WhereaaadenotesReg1andbbbdenotesReg2.
WhereVVVVVV=Immed-6.Availability:
Requires“C”(CompressedInstruction)extension.OnlyavailableforRV128.
Equivalent32BitInstruction:SQ Reg2,Offset(Reg1)
C.FSW-StoreSingleFloat
GeneralForm:C.FSW FReg2,Immed-6(Reg1)
Description:Movea32bitvaluefrom2loatingpointregisterFReg2tomemory.Toformtheaddress,theImmed-6valueiszero-extended,multipliedby4,andaddedasapositiveoffsettothebaseaddressingeneralpurposeregisterReg1.
SincetheImmed-6valueisscaledby4,thisgivesaneffectiverangeof0..252,inmultiplesof4.
RegisterReg1isrestrictedtox8..x15.RegisterFReg2isrestrictedtof8..f15.
RISC-VArchitectureSummary/Porter Page� of� 170 323
Chapter6:CompressedInstructions
Encoding:111VVVaaaVVbbb00
WhereaaadenotesReg1andbbbdenotesFReg2.WhereVVVVVV=Immed-6.
Availability:Requires“C”(CompressedInstruction)extension.Requires“F”(SinglePrecisionFloatingPoint)extension.OnlyavailableforRV32.
Equivalent32BitInstruction:FSW FReg2,Offset(Reg1)
C.FSD-StoreDoubleFloat
GeneralForm:C.FSD FReg2,Immed-6(Reg1)
Description:Movea64bitvaluefrom2loatingpointregisterFReg2tomemory.Toformtheaddress,theImmed-6valueiszero-extended,multipliedby8,andaddedasapositiveoffsettothebaseaddressingeneralpurposeregisterReg1.
SincetheImmed-6valueisscaledby8,thisgivesaneffectiverangeof0..504,inmultiplesof8.
RegisterReg1isrestrictedtox8..x15.RegisterFReg2isrestrictedtof8..f15.Encoding:
101VVVaaaVVbbb00 WhereaaadenotesReg1andbbbdenotesFReg2.
WhereVVVVVV=Immed-6.Availability:
Requires“C”(CompressedInstruction)extension.Requires“D”(DoublePrecisionFloatingPoint)extension.OnlyavailableforRV32andRV64.
Equivalent32BitInstruction:FSD FReg2,Offset(Reg1)
RISC-VArchitectureSummary/Porter Page� of� 171 323
Chapter6:CompressedInstructions
C.SWSP-StoreWordtoStackFrame
GeneralForm:C.SWSP Reg2,Immed-6
Description:Movea32bitvaluefromregisterReg2tomemory.Toformtheaddress,theImmed-6valueiszero-extended,multipliedby4,andaddedtothestackpointer(x2).
SincetheImmed-6valueisscaledby4,thisgivesaneffectiverangeof0..252,inmultiplesof4.
Thesourceregistercanbeanyofthe32generalpurposeregisters.Encoding:
110VVVVVVbbbbb10 WherebbbbbdenotesReg2.
WhereVVVVVV=Immed-6.Availability:
Requires“C”(CompressedInstruction)extension.Equivalent32BitInstruction:
SW Reg2,Offset(x2)
C.SDSP-StoreDoublewordtoStackFrame
GeneralForm:C.SDSP Reg2,Immed-6
Description:Movea64bitvaluefromregisterReg2tomemory.Toformtheaddress,theImmed-6valueiszero-extended,multipliedby8,andaddedtothestackpointer(x2).
SincetheImmed-6valueisscaledby8,thisgivesaneffectiverangeof0..504,inmultiplesof8.
Thesourceregistercanbeanyofthe32generalpurposeregisters.Encoding:
111VVVVVVbbbbb10
RISC-VArchitectureSummary/Porter Page� of� 172 323
Chapter6:CompressedInstructions
WherebbbbbdenotesReg2.WhereVVVVVV=Immed-6.
Availability:Requires“C”(CompressedInstruction)extension.OnlyavailableforRV64andRV128.
Equivalent32BitInstruction:SD Reg2,Offset(x2)
C.SQSP-StoreQuadwordtoStackFrame
GeneralForm:C.SQSP Reg2,Immed-6
Description:Movea128bitvaluefromregisterReg2tomemory.Toformtheaddress,theImmed-6valueiszero-extended,multipliedby16,andaddedtothestackpointer(x2).
SincetheImmed-6valueisscaledby16,thisgivesaneffectiverangeof0..1,016,inmultiplesof16.
Thesourceregistercanbeanyofthe32generalpurposeregisters.Encoding:
101VVVVVVbbbbb10 WherebbbbbdenotesReg2.
WhereVVVVVV=Immed-6.Availability:
Requires“C”(CompressedInstruction)extension.OnlyavailableforRV128.
Equivalent32BitInstruction:SQ Reg2,Offset(x2)
C.FSWSP-StoreSingleFloattoStackFrame
GeneralForm:C.FSWSP FReg2,Immed-6
RISC-VArchitectureSummary/Porter Page� of� 173 323
Chapter6:CompressedInstructions
Description:Movea32bitvaluefromregisterFReg2tomemory.Toformtheaddress,theImmed-6valueiszero-extended,multipliedby4,andaddedtothestackpointer(x2).
SincetheImmed-6valueisscaledby4,thisgivesaneffectiverangeof0..252,inmultiplesof4.
Thesourceregistercanbeanyofthe322loatingpointregisters.Encoding:
111VVVVVVbbbbb10 WherebbbbbdenotesFReg2.
WhereVVVVVV=Immed-6.Availability:
Requires“C”(CompressedInstruction)extension.Requires“F”(SinglePrecisionFloatingPoint)extension.OnlyavailableforRV32.
Equivalent32BitInstruction:FSW FReg2,Offset(x2)
C.FSDSP-StoreDoubleFloattoStackFrame
GeneralForm:C.FSDSP FReg2,Immed-6
Description:Movea64bitvaluefromregisterFReg2tomemory.Toformtheaddress,theImmed-6valueiszero-extended,multipliedby8,andaddedtothestackpointer(x2).
SincetheImmed-6valueisscaledby8,thisgivesaneffectiverangeof0..504,inmultiplesof8.
Thesourceregistercanbeanyofthe322loatingpointregisters.Encoding:
101VVVVVVbbbbb10 WherebbbbbdenotesFReg2.
WhereVVVVVV=Immed-6.Availability:
Requires“C”(CompressedInstruction)extension.
RISC-VArchitectureSummary/Porter Page� of� 174 323
Chapter6:CompressedInstructions
Requires“D”(DoublePrecisionFloatingPoint)extension.OnlyavailableforRV32andRV64.
Equivalent32BitInstruction:FSD FReg2,Offset(x2)
Jump,Call,andConditionalBranchInstructions
C.J-Jump(PC-relative)
GeneralForm:C.J Immed-11
Description:AjumpistakenusingaPC-relativeoffset.The11bitimmediatevalueismultipliedbytwo(sinceinstructionsmustbehalfwordaligned),sign-extended,andaddedtotheprogramcounter(PC),togivethetargetaddress.
SincetheImmed-11valueisscaledby2,thisgivesaneffectiverangeof-2,048..+2,046,inmultiplesof2.
Encoding:101XXXXXXXXXXX01
WhereXXXXXXXXXXXdenotestheImmed-11offset.Availability:
Requires“C”(CompressedInstruction)extension.Equivalent32BitInstruction:
JAL x0,Offset
C.JAL-JumpandLink/Call(PC-relative)
GeneralForm:C.JAL Immed-11
Description:AfunctioncallistakenusingaPC-relativeoffset.The11bitimmediatevalueismultipliedbytwo(sinceinstructionsmustbehalfwordaligned),sign-extended,andaddedtotheprogramcounter(PC),togivethetargetaddress.
RISC-VArchitectureSummary/Porter Page� of� 175 323
Chapter6:CompressedInstructions
SincetheImmed-11valueisscaledby2,thisgivesaneffectiverangeof-2,048..+2,046,inmultiplesof2.
Thereturnaddress(i.e.,theaddressoftheinstructionfollowingtheC.JAL)issavedinregisterx1(alsoknownasregisterra).
Encoding:001XXXXXXXXXXX01
WhereXXXXXXXXXXXdenotestheImmed-11offset.Availability:
Requires“C”(CompressedInstruction)extension.OnlyavailableforRV32.
Equivalent32BitInstruction:JAL x1,Offset
C.JR-JumpRegister
GeneralForm:C.JR Reg1
Description:AjumpistakentotheaddressinregisterReg1.
RegisterReg1canbex1,x2,…x31,i.e.,anygeneralpurposeregisterexceptx0.Encoding:
1000aaaaa0000010 WhereaaaaadenotesReg1andcannotbex0=00000.Availability:
Requires“C”(CompressedInstruction)extension.Equivalent32BitInstruction:
JAL x0,Reg1,0
C.JALR-JumpRegisterandLink/CallGeneralForm:
C.JALR Reg1 Description:
AfunctioncallistakentotheaddressinregisterReg1.
Thereturnaddress(i.e.,theaddressoftheinstructionfollowingtheC.JALR)issavedinregisterx1(alsoknownasregisterra).
RISC-VArchitectureSummary/Porter Page� of� 176 323
Chapter6:CompressedInstructions
RegisterReg1canbex1,x2,…x31,i.e.,anygeneralpurposeregisterexceptx0.Encoding:
1001aaaaa0000010 WhereaaaaadenotesReg1andcannotbex0=00000.Availability:
Requires“C”(CompressedInstruction)extension.Equivalent32BitInstruction:
JALR x1,Reg1,0
C.BEQZ-BranchifZero(PC-relative)
GeneralForm:C.BEQZ Reg1,Immed-8
Description:AconditionalbranchistakenusingaPC-relativeoffset.ThebranchistakenifthevalueinregisterReg1iszero.
The8bitimmediatevalueismultipliedbytwo(sinceinstructionsmustbehalfwordaligned),sign-extended,andaddedtotheprogramcounter(PC),togivethetargetaddress.
SincetheImmed-8valueisscaledby2,thisgivesaneffectiverangeof-256..+254,inmultiplesof2.
Encoding:110XXXaaaXXXXX01
WhereXXXXXXXXdenotestheImmed-8offset. WhereaaadenotesReg1.Availability:
Requires“C”(CompressedInstruction)extension.Equivalent32BitInstruction:
BEQ Reg1,x0,Offset
C.BNEQZ-BranchifNotZero(PC-relative)
GeneralForm:C.BNEQZ Reg1,Immed-8
RISC-VArchitectureSummary/Porter Page� of� 177 323
Chapter6:CompressedInstructions
Description:AconditionalbranchistakenusingaPC-relativeoffset.ThebranchistakenifthevalueinregisterReg1isnotzero.
The8bitimmediatevalueismultipliedbytwo(sinceinstructionsmustbehalfwordaligned),sign-extended,andaddedtotheprogramcounter(PC),togivethetargetaddress.
SincetheImmed-8valueisscaledby2,thisgivesaneffectiverangeof-256..+254,inmultiplesof2.
Encoding:111XXXaaaXXXXX01
WhereXXXXXXXXdenotestheImmed-8offset. WhereaaadenotesReg1.Availability:
Requires“C”(CompressedInstruction)extension.Equivalent32BitInstruction:
BNE Reg1,x0,Offset
LoadConstantintoaRegister
C.LI-LoadImmediate
GeneralForm:C.LI RegD,Immed-6
Description:Loadanimmediatevalueintherange-32..+31intoregisterRegD.
Thetargetregistercanbex1,x2,…x31,i.e.,anygeneralpurposeregisterexceptx0.
The6bitimmediatevalueissign-extended.Encoding:
010VDDDDDVVVVV01 WhereVVVVVVdenotestheImmed-6value. WhereDDDDDdenotesRegD,andcannotbex0=00000.
RISC-VArchitectureSummary/Porter Page� of� 178 323
Chapter6:CompressedInstructions
Availability:Requires“C”(CompressedInstruction)extension.
Equivalent32BitInstruction:ADDI RegD,x0,Value
C.LUI-LoadUpperImmediate
GeneralForm:C.LUI RegD,Immed-6
Description:ThisinstructionisashortformfortheLUIinstruction,butwitharestrictedrangeofvalues.Thisinstructionloadsbits[17:12]ofthedestinationregister.Theupperbits[…:18]are2illedwiththesignextension.Thelower12bits[11:0]are2illedwithzeros.
Toputitanotherway,theimmediatevalueissignextended,multipliedby4,096,andloadedintothedestinationregister.SincetheImmed-6valueisscaledby212=4,096,thisgivesaneffectiverangeofvaluesof-131,040..+126,976,inmultiplesof4,096.
Thetargetregistercanbeanygeneralpurposeregisterexceptx0orx2.(Registerx2isnormallyusedforthestackpointer.)
Encoding:011VDDDDDVVVVV01
WhereVVVVVVdenotestheImmed-6value. WhereDDDDDdenotesRegD,andcannotbex0=00000orx2=00010.Availability:
Requires“C”(CompressedInstruction)extension.Equivalent32BitInstruction:
LUI RegD,Value
RISC-VArchitectureSummary/Porter Page� of� 179 323
Chapter6:CompressedInstructions
Arithmetic,Shift,andLogicInstructions
C.ADDI-AddImmediate
GeneralForm:C.ADDI RegD,Immed-6
Description:Addsanimmediatevalueintherange-32..+31toregisterRegD.
Theregistercanbex1,x2,…x31,i.e.,anygeneralpurposeregisterexceptx0.
The6bitimmediatevalueissign-extended.Encoding:
000VDDDDDVVVVV01 WhereVVVVVVdenotestheImmed-6valueandcannotbezero. WhereDDDDDdenotesRegD,andcannotbex0=00000.Comment: IfVVVVVV=000000andDDDDD=00000,thisencodingisidenticaltoC.NOP. IfVVVVVV≠000000andDDDDD=00000,thisencodingcanbeahint. IfVVVVVV=000000andDDDDD≠00000,thisencodingisreserved.Availability:
Requires“C”(CompressedInstruction)extension.Equivalent32BitInstruction:
ADDI RegD,RegD,Value
C.ADDIW-AddImmediate(Word)
GeneralForm:C.ADDIW RegD,Immed-6
Description:Addsanimmediatevalueintherange-32..+31toregisterRegD.
Theregistercanbex1,x2,…x31,i.e.,anygeneralpurposeregisterexceptx0.
The6bitimmediatevalueissign-extended.
ThepreviouslydescribedC.ADDIinstructionperformstheadditionineither32,64,or128bits,asdeterminedbythearchitecture,RV32,RV64,orRV128.
RISC-VArchitectureSummary/Porter Page� of� 180 323
Chapter6:CompressedInstructions
ThisinstructionC.ADDIWwilltruncatetheresultto32bits,thensignextendedthe32bitstothefullregistersize.
Avalueofzeroisallowed.Thiswillsimplyforcealargervalueintoa32bitvalue,signextendingtothefullregistersize.
Encoding:001VDDDDDVVVVV01
WhereVVVVVVdenotestheImmed-6value.(Zeroisallowed). WhereDDDDDdenotesRegD,andcannotbex0=00000.Comment: IfDDDDD=00000,thisencodingcanbeahint.Availability:
Requires“C”(CompressedInstruction)extension.OnlyavailableforRV64andRV128.
Equivalent32BitInstruction:ADDIW RegD,RegD,Value
C.ADDI16SP-AddImmediatetoStackPointer
GeneralForm:C.ADDI16SP Immed-6
Description:Thisinstructionisusedtogroworshrinkthestackbyadjustingthestackpointerregister.
The6bitimmediatevalueismultipliedby16,sign-extended,andaddedtothestackpointerregister(x2).
Itisassumedthatthestackpointerisalwaysquadwordaligned,i.e.,amultipleof16.SincetheImmed-6valueisscaledby16,thisgivesaneffectiveadjustmentvalueof-512..+496,inmultiplesof16.
Avalueofzeroisnotallowed,sinceadjustingthestackpointerbyzeroispointless.
Encoding:011V00010VVVVV01
WhereVVVVVVdenotestheImmed-6value,andcannotbe000000.Availability:
Requires“C”(CompressedInstruction)extension.
RISC-VArchitectureSummary/Porter Page� of� 181 323
Chapter6:CompressedInstructions
Equivalent32BitInstruction:ADDI x2,x2,Value
C.ADDI4SPN-AddImmediatetoStackPointer
GeneralForm:C.ADDI14SPN RegD,Immed-8
Description:Thisinstructionisusedtoloadtheaddressofalocalstackvariableintoaregister.Weassumethevariableislocatedwiththestackframe,thatthestackgrowsdownward,andthatthestackpointerpointstothelowestaddress.Therefore,alllocalvariables(locatedwithinthestackframe)willbeaccessedwithpositiveoffsetsfromregisterx2(thestackpointer).
The8bitimmediatevalueismultipliedby4,zero-extended,andaddedtothestackpointerregister(x2),andmovedintothedestinationregisterRegD.
SincetheImmed-8valueisscaledby4,thisgivesanoffsetintherange-512..+508,inmultiplesof4.
Thedestinationregisterislimitedtox8,x9,…x15.
Avalueofzeroisnotallowed,sinceaMVinstructioncanbeusedinstead.Encoding:
000VVVVVVVVDDD00 WhereVVVVVVVVdenotesImmed-8,andcannotbe00000000. WhereDDDdenotesRegD.Availability:
Requires“C”(CompressedInstruction)extension.AvailableonlyforRV32andRV64.
Equivalent32BitInstruction:ADDI RegD,x2,Value
C.SLLI-ShiftLeftLogical(Immediate)
GeneralForm:C.SLLI RegD,Immed-6
RISC-VArchitectureSummary/Porter Page� of� 182 323
Chapter6:CompressedInstructions
Description:ThisinstructionshiftsthevalueinaregisterRegDleftbyanamountspeci2iedbytheImmed-62ield.
ForRV32(32-bitarchitectures),theshiftamountmustbe1..31(i.e.,000001-011111).
ForRV64(64-bitarchitectures),theshiftamountmustbe1..63(i.e.,000001-111111).
ForRV128(128-bitarchitectures),theshiftamountmaybe1..64.(Shiftamountsof1..63areencodedas000001-111111;ashiftamount64isencodedas000000.)
Thedestinationregistercanbex1,x2,…x31,i.e.,anygeneralpurposeregisterexceptx0.
Encoding:000VDDDDDVVVVV10
WhereVVVVVVdenotesImmed-6.Seerestrictionsinthedescription. WhereDDDDDdenotesRegD,andcannotbex0=00000.Availability:
Requires“C”(CompressedInstruction)extension.Equivalent32BitInstruction:
SLLI RegD,RegD,Value
C.SRLI-ShiftRightLogical(Immediate)
GeneralForm:C.SRLI RegD,Immed-6
Description:ThisinstructionshiftsthevalueinaregisterRegDrightbyanamountspeci2iedbytheImmed-62ield.Thisisa“logicalshift,i.e.,zerosareshiftedinfromtheleft.
ForRV32(32-bitarchitectures),theshiftamountmustbe1..31(i.e.,000001-011111).
ForRV64(64-bitarchitectures),theshiftamountmustbe1..63(i.e.,000001-111111).
RISC-VArchitectureSummary/Porter Page� of� 183 323
Chapter6:CompressedInstructions
ForRV128(128-bitarchitectures),thepossibleshiftamountsare1..31,64,and96-127.[Myreadingofthespecsuggeststhattheencodingoftheshiftamountisthis:000001-011111means1-31.000000means64.100000-111111isinterpretedasasignedvalueintherange-32..-1,whichwecancallX.Theshiftamountisrightby128-Xbits.Thisgivesarangeof96..127,resp.]
Thedestinationregistercanbex8,x9,…x15.Encoding:
100V00DDDVVVVV01 WhereVVVVVVdenotesImmed-6.Seerestrictionsinthedescription. WhereDDDdenotesRegD.Availability:
Requires“C”(CompressedInstruction)extension.Equivalent32BitInstruction:
SRLI RegD,RegD,Value
C.SRAI-ShiftRightArithmetic(Immediate)
GeneralForm:C.SRAI RegD,Immed-6
Description:ThisinstructionshiftsthevalueinaregisterRegDrightbyanamountspeci2iedbytheImmed-62ield.Thisisa“arithmeticshift,i.e.,thesignbitisrepeatedlyshiftedinfromtheleft.
ForRV32(32-bitarchitectures),theshiftamountmustbe1..31(i.e.,000001-011111).
ForRV64(64-bitarchitectures),theshiftamountmustbe1..63(i.e.,000001-111111).
ForRV128(128-bitarchitectures),thepossibleshiftamountsare1..31,64,and96-127.[Myreadingofthespecsuggeststhattheencodingoftheshiftamountisthis:000001-011111means1-31.000000means64.100000-111111isinterpretedasasignedvalueintherange-32..-1,whichwecancallX.Theshiftamountisrightby128-Xbits.Thisgivesarangeof96..127,resp.]
RISC-VArchitectureSummary/Porter Page� of� 184 323
Chapter6:CompressedInstructions
Thedestinationregistercanbex8,x9,…x15.Encoding:
100V01DDDVVVVV01 WhereVVVVVVdenotesImmed-6.Seerestrictionsinthedescription. WhereDDDdenotesRegD.Availability:
Requires“C”(CompressedInstruction)extension.Equivalent32BitInstruction:
SRAI RegD,RegD,Value
C.ANDI-LogicalAND(Immediate)
GeneralForm:C.ANDI RegD,Immed-6
Description:ThisinstructionperformsthelogicalANDoperationwiththevalueinaregisterandwritestheresulttothatsameregister.
TheImmed-6valueissign-extended.
TheregisterRegDcanbex8,x9,…x15.Encoding:
100V10DDDVVVVV01 WhereVVVVVVdenotesImmed-6. WhereDDDdenotesRegD.Availability:
Requires“C”(CompressedInstruction)extension.Equivalent32BitInstruction:
ANDI RegD,RegD,Value
C.MV-MoveRegistertoRegister
GeneralForm:C.MV RegD,Reg2
Description:ThisinstructionmovesavaluefromReg2toRegD.
RISC-VArchitectureSummary/Porter Page� of� 185 323
Chapter6:CompressedInstructions
TheregistersRegDandReg2canbex1,x2,…x31,i.e.,anygeneralpurposeregisterexceptx0.
Encoding:1000DDDDDbbbbb10
WhereDDDDDdenotesRegD. WherebbbbbdenotesReg2.Availability:
Requires“C”(CompressedInstruction)extension.Equivalent32BitInstruction:
ADD RegD,x0,Reg2
C.ADD-AddRegistertoRegister
GeneralForm:C.ADD RegD,Reg2
Description:ThisinstructionaddsthevaluesinRegDandReg2andplacestheresultinRegD.
TheregistersRegDandReg2canbex1,x2,…x31,i.e.,anygeneralpurposeregisterexceptx0.
Encoding:1001DDDDDbbbbb10
WhereDDDDDdenotesRegD. WherebbbbbdenotesReg2.Availability:
Requires“C”(CompressedInstruction)extension.Equivalent32BitInstruction:
ADD RegD,RegD,Reg2
C.AND-AddRegistertoRegister
GeneralForm:C.AND RegD,Reg2
Description:ThisinstructionlogicallyANDsthevaluesinRegDandReg2andplacestheresultinRegD.
RISC-VArchitectureSummary/Porter Page� of� 186 323
Chapter6:CompressedInstructions
TheregistersRegDandReg2canbex8,x9,…x15.Encoding:
100011DDD11bbb01 WhereDDDdenotesRegD. WherebbbdenotesReg2.Availability:
Requires“C”(CompressedInstruction)extension.Equivalent32BitInstruction:
AND RegD,RegD,Reg2
C.OR-AddRegistertoRegister
GeneralForm:C.OR RegD,Reg2
Description:ThisinstructionlogicallyORsthevaluesinRegDandReg2andplacestheresultinRegD.
TheregistersRegDandReg2canbex8,x9,…x15.Encoding:
100011DDD10bbb01 WhereDDDdenotesRegD. WherebbbdenotesReg2.Availability:
Requires“C”(CompressedInstruction)extension.Equivalent32BitInstruction:
OR RegD,RegD,Reg2
C.XOR-AddRegistertoRegister
GeneralForm:C.XOR RegD,Reg2
Description:ThisinstructionlogicallyXORsthevaluesinRegDandReg2andplacestheresultinRegD.
TheregistersRegDandReg2canbex8,x9,…x15.
RISC-VArchitectureSummary/Porter Page� of� 187 323
Chapter6:CompressedInstructions
Encoding:100011DDD01bbb01
WhereDDDdenotesRegD. WherebbbdenotesReg2.Availability:
Requires“C”(CompressedInstruction)extension.Equivalent32BitInstruction:
XOR RegD,RegD,Reg2
C.SUB-AddRegistertoRegister
GeneralForm:C.SUB RegD,Reg2
Description:ThisinstructionsubtractsthevalueinReg2fromthevalueinRegDandplacestheresultinRegD.
TheregistersRegDandReg2canbex8,x9,…x15.Encoding:
100011DDD00bbb01 WhereDDDdenotesRegD. WherebbbdenotesReg2.Availability:
Requires“C”(CompressedInstruction)extension.Equivalent32BitInstruction:
SUB RegD,RegD,Reg2
C.ADDW-AddRegistertoRegister
GeneralForm:C.ADDW RegD,Reg2
Description:ThisinstructionaddsthevalueinReg2tothevalueinRegDandplacestheresultinRegD.Theresultvalueistruncatedto32bitsandthensignextendedtothefulllengthoftheregister.
TheregistersRegDandReg2canbex8,x9,…x15.
RISC-VArchitectureSummary/Porter Page� of� 188 323
Chapter6:CompressedInstructions
Encoding:100111DDD01bbb01
WhereDDDdenotesRegD. WherebbbdenotesReg2.Availability:
Requires“C”(CompressedInstruction)extension.AvailableforRV64andRV128only.
Equivalent32BitInstruction:ADDW RegD,RegD,Reg2
C.SUBW-AddRegistertoRegister
GeneralForm:C.SUBW RegD,Reg2
Description:ThisinstructionsubtractsthevalueinReg2fromthevalueinRegDandplacestheresultinRegD.Theresultvalueistruncatedto32bitsandthensignextendedtothefulllengthoftheregister.
TheregistersRegDandReg2canbex8,x9,…x15.Encoding:
100111DDD00bbb01 WhereDDDdenotesRegD. WherebbbdenotesReg2.Availability:
Requires“C”(CompressedInstruction)extension.AvailableforRV64andRV128only.
Equivalent32BitInstruction:SUBW RegD,RegD,Reg2
MiscellaneousInstructions
C.NOP-NopInstruction
GeneralForm:C.NOP
RISC-VArchitectureSummary/Porter Page� of� 189 323
Chapter6:CompressedInstructions
Description:Thisinstructiondoesnothing.TheencodingcanbeviewedasaspecialcaseoftheC.ADDIinstructionwherethedestinationisx0andtheimmediatevalueis0.
Encoding:0000000000000001
Availability:Requires“C”(CompressedInstruction)extension.
Equivalent32BitInstruction:ADDI x0,x0,0
C.EBREAK-DebuggingBreakInstruction
GeneralForm:C.EBREAK
Description:Thisinstructionisusedbydebuggers.Theideaisthatadebuggerwilltemporarilyreplacesomeinstructionwiththisinstruction.Whenexecuted,anexceptionwilloccur,allowingthedebuggertogetcontrol.
Encoding:1001000000000010
Availability:Requires“C”(CompressedInstruction)extension.
Equivalent32BitInstruction:EBREAK
C.ILLEGAL-IllegalInstruction
GeneralForm:C.ILLEGAL
Itisdoubtfulthatmostassemblerswouldrecognizetheopcode“C.ILLEGAL”,butweincludeitanyway.
Description:Thispatternisde2inedasbeingillegal.Anyattempttoexecuteitwillcausean“illegalinstruction”exception.Allzeroswasintentionallychosensothatanyattempttobranchintounpopulatedordysfunctionalmemory(whichoftenreadsaszeros)willresultinanimmediatetrap.
RISC-VArchitectureSummary/Porter Page� of� 190 323
Chapter6:CompressedInstructions
Encoding:0000000000000000
Availability:Requires“C”(CompressedInstruction)extension.
Equivalent32BitInstruction:ILLEGAL
InstructionEncoding
Thefollowingtablelistsallthecompressedinstructionsandisorderedinanattempttoshowthecompletecoverageofthe16bitinstructionencodingspace.
SomebitpatternsarereservedbytheRISC-Vspecforfutureuse;suchpatternsaremarked<Reserved>andshouldnotbeusedbyimplementors.
Somebitpatternsareunassignedandavailableforspeci2icimplementationstode2ineastheywish;suchpatternsaremarked<NSE>(NonstandardExtension).
Somebitpatternsaremarkedas<Hint>;implementorsarefreetousethesebitpatternstoencodehintstotheprocessortotheimproveexecutionspeed.Inallcases,thepatternsmarked<Hint>areequivalenttoinstructionsthatinvolvemodifyingaregisterandthatspeci2icallydisallowtheuseofregisterx0asthedestinationregister.Thus,implementorsarefreetoignorethe<Hint>patternsandexecutetheminthenaturalway,bycomputingaresultbutsendingthatvaluestothezeroregisterx0,causingtheseinstructionstobehaveasnopinstructions.
Recallthattheleastsigni2icant2bitsareusedtodistinguish16-bitinstructionsfromlongerinstructions.Iftheleastsigni2icant2bitsare11,thentheinstructionisnota16bitcompressedinstruction.
Thistableismore-or-lesssortedonleastsigni2icant2bits(00,01,10),followedbythemostsigni2icantbitsinorder.
C.ILLEGAL 0000000000000000
<Reserved> 00000000000DDD00 RegD≠0
C.ADDI4SPN 000VVVVVVVVDDD00 Value≠0
RISC-VArchitectureSummary/Porter Page� of� 191 323
Chapter6:CompressedInstructions
C.FLD 001VVVaaaVVDDD00 RV32andRV64only
C.LQ 001VVVaaaVVDDD00 RV128only
C.LW 010VVVaaaVVDDD00
C.FLW 011VVVaaaVVDDD00 RV32only
C.LD 011VVVaaaVVDDD00 RV64andRV128only
<Reserved> 100XXXXXXXXXXX00
C.FSD 101VVVaaaVVbbb00 RV32andRV64only
C.SQ 101VVVaaaVVbbb00 RV128only
C.SW 110VVVaaaVVbbb00
C.FSW 111VVVaaaVVbbb00 RV32only
C.SD 111VVVaaaVVbbb00 RV64andRV128only
C.ADDI 000VDDDDDVVVVV01 RegD≠0;Value≠0
<Reserved> 0000DDDDD0000001 RegD≠0;Value=0
<Hint> 000V00000VVVVV01 RegD=0;Value≠0
C.NOP 0000000000000001 RegD=0;Value=0
C.JAL 001VVVVVVVVVVV01 RV32only
C.ADDIW 001VDDDDDVVVVV01 Rv64andRV128only;RegD≠0
<Hint> 001V00000VVVVV01 Rv64andRV128only;RegD=0
C.LI 010VDDDDDVVVVV01 RegD≠0
<Hint> 010VDDDDDVVVVV01 RegD=0
C.LUI 011VDDDDDVVVVV01 RegD≠0,2;Value≠0
<Reserved> 011VDDDDDVVVVV01 RegD≠0,2;Value=0
<Hint> 011V00000VVVVV01 RegD=0
C.ADDI16SP 011V00010VVVVV01 RegD=2;Value≠0
<Reserved> 011V00010VVVVV01 RegD=2,Value=0
C.SRLI 100000DDDVVVVV01 RV32;V≠0
<Hint> 100000DDD0000001 RV32;V=0
<NSE> 100100DDDVVVVV01 RV32;
RISC-VArchitectureSummary/Porter Page� of� 192 323
Chapter6:CompressedInstructions
C.SRLI 100V00DDDVVVVV01 RV64;V≠0
<Hint> 100000DDD0000001 RV64;V=0
C.SRLI 100V00DDDVVVVV01 RV128only;V≠0
C.SRLI64 100000DDD0000001 RV128only;V=0
C.SRAI 100001DDDVVVVV01 RV32;V≠0
<Hint> 100001DDD0000001 RV32;V=0
<NSE> 100101DDDVVVVV01 RV32;
C.SRAI 100V01DDDVVVVV01 RV64;V≠0
<Hint> 100001DDD0000001 RV64;V=0
C.SRAI 100V01DDDVVVVV01 RV128only;V≠0
C.SRAI64 100001DDD0000001 RV128only;V=0
C.SUB 100011DDD00bbb01
C.XOR 100011DDD01bbb01
C.OR 100011DDD10bbb01
C.AND 100011DDD11bbb01
C.SUBW 100111DDD00bbb01 RV64andRV128only
<Reserved> 100111DDD00bbb01 RV32only
C.ADDW 100111DDD01bbb01 RV64andRV128only
<Reserved> 100111DDD01bbb01 RV32only
<Reserved> 100111XXX10XXX01
<Reserved> 100111XXX11XXX01
C.ANDI 100V10DDDVVVVV01
C.J 101VVVVVVVVVVV01
C.BEQZ 110VVVaaaVVVVV01
C.BNEZ 111VVVaaaVVVVV01
C.SLLI 0000DDDDDVVVVV10 RV32only;RegD≠0;V≠0
<Hint> 0000DDDDD0000010 RV32only;RegD≠0;V=0
<NSE> 0001DDDDDVVVVV10 RV32only;RegD≠0
RISC-VArchitectureSummary/Porter Page� of� 193 323
Chapter6:CompressedInstructions
<Hint> 000V00000VVVVV10 RV32only;RegD=0
C.SLLI 000VDDDDDVVVVV10 RV64only;RegD≠0;V≠0
<Hint> 0000DDDDD0000010 RV64only;RegD≠0;V=0
<Hint> 000V00000VVVVV10 RV64only;RegD=0
C.SLLI 000VDDDDDVVVVV10 RV128only;RegD≠0;V≠0
C.SLLI64 0000DDDDD0000010 RV128only;RegD≠0;V=0
<Hint> 000V00000VVVVV10 RV128only;RegD=0
C.FLDSP 001VDDDDDVVVVV10 RV32andRV64only
C.LQSP 001VDDDDDVVVVV10 RV128only;RegD≠0
<Reserved> 001VDDDDDVVVVV10 RV128only;RegD=0
C.LWSP 010VDDDDDVVVVV10 RegD≠0
<Reserved> 010VDDDDDVVVVV10 RegD=0
C.LDSP 011VDDDDDVVVVV10 RV64andRV128only;RegD≠0
<Reserved> 011VDDDDDVVVVV10 RV64andRV128only;RegD=0
C.FLWSP 011VDDDDDVVVVV10 RV32only
C.JR 1000aaaaa0000010 RegA≠0
<Reserved> 1000aaaaa0000010 RegA=0
C.MV 1000DDDDDbbbbb10 RegB≠0;RegD≠0
<Hint> 1000DDDDDbbbbb10 RegB≠0;RegD=0
C.EBREAK 1001000000000010
C.JALR 1001aaaaa0000010 RegA≠0
C.ADD 1001DDDDDbbbbb10 RegB≠0;RegD≠0
<Reserved> 1001DDDDDbbbbb10 RegB≠0;RegD=0
C.FSDSP 101VVVVVVbbbbb10 RV32andRV64only
C.SQSP 101VVVVVVbbbbb10 RV128only
C.SWSP 110VVVVVVbbbbb10
C.FSWSP 111VVVVVVbbbbb10 RV32only
C.SDSP 111VVVVVVbbbbb10 RV64andRV128only
RISC-VArchitectureSummary/Porter Page� of� 194 323
Chapter7:ConcurrencyandAtomicInstructions
HardwareThreads(HARTs)
Asimpleprocessorhasasingle2low-of-control.Thatis,itoperatesasasingle“thread”,withaFETCH-DECODE-EXECUTEloop.Instructionsareexecutedone-after-the-other.(Inthecaseofpipelining,severalinstructionsmaybeinsomestageofexecutionatanymoment,buttheinstructionsarecompleted(or“retired”)one-after-the-other,andthepipelinedbehaviorisexactlythesameasifeachinstructionhadbeenexecutedalonetocompletionintheordertheyappearinthecode.)
Normally,anoperatingsystemwillmultiplextheprocessor,therebyprovidingmanythreads,viatimeslicing.Forclarity,wecandistinguishbetween“hardwarethread”and“softwarethreads”.TheOScanmultiplexasinglehardwarethreadtoprovidemultiplesoftwarethreads.
Inamulti-coreormultiprocessorsystem,eachcoreisrunningitsownindependentFETCH-DECODE-EXECUTEcycle.Thus,therewillbeanumberofhardwarethreadsequaltothenumberofcores.HowtheOSimplementssoftwarethreadsontopoftheavailablehardwarethreadsisacomplextopicwhichconcernsthekerneldesignandorganization.
Ina“superscalar”core,thereismorethanoneinstructionstream.Thereisa2ixednumber(suchas2or4)ofindependentinstructionstreams.Forexample,someparticularsuperscalarcoremightexecutetwoindependentinstructionstreamssimultaneously.Suchacoreisprovidingtwohardwarethreads.Eachstreamhasitsownindependentsetofregistersanditsownprogramcounter,more-or-lessasiftheywereseparatecoresinamultiprocessorsystem.
Onebene2itofthesuperscalarorganizationisthatexpensiveandrarelyusedcircuits(e.g.,2loatingpointdivide)canbeprovidedonlyinasinglecopy,andsharedbybothhardwarethreads.Also,theremaybemultipleinstantiationsofmorecriticalcircuits
RISC-VArchitectureSummary/Porter Page� of�195 323
Chapter7:ConcurrencyandAtomicInstructions
(e.g.,integeraddition)whichareallocateddynamicallytothoseinstructionsthatneedthem.
Withsuperscalarorganization,itispossiblethatmorethanoneinstructioncanberetiredperclockcycle,providinggreaterperformance.
De=initionof“HART”:Inthediscussionofconcurrencycontrolandsynchronization,theRISC-Vdocumentationusestheterm“HART”torefertoahardwarethread.Thehardwarethreads(HARTs)mightbeimplementedondistinctprocessorswithinamultiprocessorsystem,orbymultiplecoresinasingleprocessor,oronasinglesuperscalarcore.TheRISC-Vspeccoversallpossibilities,includinghybridcombinations.
TheRISC-VMemoryModel
It’sacommonlyacceptedprincipleinprocessordesignthat,withinasingleinstructionstream,datadependenciesarerespected.Inotherwords,theprocessorexecutesinstructionsinorderandwillnotreorderinstructions.
Forexample,aninstructionthatloadsaregisterwithavaluefrommemoryandaninstructionthatstoresthecontentsofthatsameregisterinmemoryshouldneverbereordered.Assemblylanguageprogrammersarefreetoassumethateachinstructionisexecutedfrombeginningtocompletionintheorderitwasspeci2ied.Anyoptimizationsorreorderingdonebythehardwaremustpreservethisconstraint.Forexample,twoinstructionsthataccessdifferentregistersanddifferentmemoryaddressescansafelybereorderedorexecutedsimultaneouslytoachievefasterexecution.
However,betweendifferenthardwarethreads,theorderinwhichoperationsexecutemaybeindeterminate.Forexample,considertwoindependentprocessorsinamultiprocessorsystemwithsharedmemory.Imaginetwoinstructionswhichbothstoreintoasinglememorylocation;theymayexecuteinarbitraryorder,resultinginadifferentvaluebeingstoredlast.
Hereisonecommonrequirementforpropersynchronizationinasystemofcooperatingprocesses.(By“cooperatingprocesses”,wemeanasystemwithmultiplethreadsthataredesignedtoworktogetherconcurrentlyexecutingto
RISC-VArchitectureSummary/Porter Page� of� 196 323
Chapter7:ConcurrencyandAtomicInstructions
completesomegivenprogrammingtaskcorrectly,regardlessofthevagariesofindeterminateexecution.)
SynchronizationofReadersandWriters:Withapotentiallysharedpieceofdata,awritermustwaittowritethedatauntilallpreviousreadersare2inishedreadingtheoldvalue.Likewise,areadermustwaituntilanypreviouswriteris2inishedwritingthenewvalue.
TheRISC-Vspecassumesthatreadsandwritesissuedbyasinglehardwarethread(HART),mustbevisibletothatsameHARTintheorderexecuted.Thatis,thecore’sactualorderofexecutionofaHART’sinstructionsmustbeindistinguishablefromthesequentialorderinwhichtheprogrammerwrotetheinstructions.Perhapsthecorewillreorderinstructionstoimproveperformance,butanyreorderingmustbe“safe”inthesenseofhavingthesameeffectasbeingissuedintheoriginalorder.
Thisassumptionissomethingthateveryprogrammertakesforgranted:thecomputerwillexecutetheircodeinstructionsintheordertheprogrammerwrotethem.Forsinglethreadsexecutinginisolation,thereisnothingmuchmoretotalkabout.Butthingsquicklygetcomplicatedwhentherearemultiplethreads.
AsfarastheapparentorderoftheoperationsasviewedbyotherHARTs,theRISC-Vspecdoesnotassumeanything.Infact,theRISC-Vspecallowsthefollowing:
ItmayappeartoasecondHARTthattheinstructionsexecutedbytheZirstHARTaredoneoutoforder.
SoitispossiblethattwodistinctHARTswillseethesamesetofoperationshappeninginadifferentorder.ThisisaveryrealpossibilityonsomesystemswhentheoperationsinquestionarereadsandwritestomemoryandtheindividualHARTsareoperatingwithprivatecaches.
Forexample,considerthefollowingsequenceofinstructionsissuedbyHARTAtomemorylocationX.
WriteX!4 ReadX WriteX!5 ReadX
RISC-VArchitectureSummary/Porter Page� of� 197 323
Chapter7:ConcurrencyandAtomicInstructions
The2irstreadwillreturn4andthesecondreadwillreturn5.HoweverHARTBmightexecutetheseinstructions:
ReadX ReadX
The2irstreadmightreturn5andthesecondreadmightreturn4.ItappearstoHARTBthatthewriteswereexecutedoutoforder,probablyduetoinconsistenciesbetweentheircaches.
ConcurrencyControl-ReviewandBackground
Thissectionisnotspeci2ictoRISC-Vandcanbesafelyskippedifyouarefamiliarwithconcurrencycontrol.
Inanysystemwithmultiplethreads,someformofsynchronizationandconcurrencycontrolisrequired.Withoutit,thebehaviorofsomecodesequencesmaybenondeterministicandprogramerrorsmayresult.Forexample,eachofseveralthreadsmayneedtoperiodicallyqueryandupdateasharedpieceofdata.Withoutanycontrol,twoormorethreadsmaybetryingtoreadandupdatetheshareddatasimultaneously;therearisesapossibilitythatsomethreadsmayseeaninconsistentstateofthedataorthatsomeupdatesmaybelostorthatseveralconcurrentupdateswillresultintheshareddatabeingputintoainvalidorinconsistentstate.
Inonecommonapproachtoconcurrencycontrol—particularlyamongsoftwarethreads—a“lock”iscreatedtoprotecttheshareddata.Anyblockofcodeaccessingtheshareddatamust2irst“acquire”(or“obtain”or“set”)thelockbeforeaccessingtheshareddata,andmust“release”(i.e.,“free”,“clear”)thelockaftertheaccessiscomplete.Asequenceofcodewhichaccessestheshareddataiscalleda“criticalsection”andtheprogrammermustremembertoacquirethelockatthebeginningofthecriticalsectionandreleasethelockattheendofthecriticalsection.
Operatingsystemstypicallyprovide“locks”,alongwith“acquire”and“release”operations.Implementingtheseoperationsonasingleprocessorsystemissimple:theOSmerelyhastoavoidathread-switchoccurringduringtheacquireandreleaseoperationsthemselves.Thiscanbedoneeasilyandef2icientlybydisablinginterruptsforthedurationoftheacquire/releaseoperations.
RISC-VArchitectureSummary/Porter Page� of� 198 323
Chapter7:ConcurrencyandAtomicInstructions
Thingsaresigni2icantlymorecomplexformulti-coresystemswithsharedmemory.Nowthereis“multiprocessing”(trueconcurrency)andnotjust“multithreading”(simulatedconcurrencyusingtime-slicing).Thelockitselfbecomesshareddataandaccesstothelockineffectbecomesacriticalsectionofitsown.Blockinginterruptsononeprocessorisnolongeradequate.Thereexistalgorithmstosolvethisproblempurelyinsoftware(Peterson’salgorithm;Dekker’salgorithm)butarchitecturalsupportintheISAisrequiredforperformancereasons.
Inthepasttherehavebeenacoupleofinstructionsproposedtosupportthe“acquire”and“release”operationsforlocks.Onesuchinstructionwasthe“test-and-set”instruction.Thisinstructionincludestwooperands:anaddressandaregister.Theinstructionreadsfrommemoryattheaddressgivenandstorestheresultintotheregister.Thentheinstructionsstoresa2ixedknownvalue(usually“1”)intothesamememorylocation.Boththeread-memoryandthewrite-memoryoperationarespeci2iedtobe“atomic”,whichmeansthatthereisaguaranteethatnootherinstructionwilloccurbetweentheread-memoryandwrite-memoryfromanyothersource(e.g.,othercoresorI/Odevices).
Historically,hardwarecanenforcetheatomicityguaranteebysomehowblockingallaccessestoallmemorylocationsforallothercoresbetweenthereadandwritephasesofthetest-and-setinstruction.Thiswillslowotherpartsofthesystemdownalittle,butifthenumberofcoresissmall,it’snotmuchofaburden.
Thetest-and-setinstructioncanbeusedtoimplementalockverysimply.Here’show:thelockisrepresentedasasinglememorylocation,with“0”meaning“notlocked”and“1”meaning“locked”.The“acquire”operationconsistsofexecutingthetest-and-setinstruction.Aftertheinstruction,thelockwillbeinthe“set”stateandwillcontain“1”.The“acquire”operationthenexaminesthepreviouslockvalue,whichwasretrievedfrommemoryandstoredinaregister.Ifitwas“0”,thenallisgood:thelockwaspreviouslyfree.Ithasnowbeenacquiredandthecriticalsectioncanbeentered.However,ifthepreviousvaluewas“1”,thenthelockwasapparentlyalreadyset.Thelockhasnotbeenacquired.Theacquirecodemusttryagain,eitherimmediately,orafterawait,orafterthreadrescheduling.The“release”operationisimplementedbysimplystoringa“0”intothememorylocationrepresentingthelock.
AnotherapproachtoISAdesignistoprovidea“swap”instruction,insteadofthetest-and-setinstruction.Likethetest-and-setinstruction,theswapinstructionhasaread-memoryphaseandawrite-memoryphase.Theinstructionistakestwooperands:aregisterandamemoryaddress.Theinstructionswapsthevalues;aftertheinstructiontheregistercontainsthepreviousvaluefromthememorywordand
RISC-VArchitectureSummary/Porter Page� of� 199 323
Chapter7:ConcurrencyandAtomicInstructions
thememorywordcontainsthepreviousvalueintheregister.Theswapinstructionisageneralizationofthetest-and-setinstruction:byplacinga“1”intheregisterbeforetheswapinstructionisexecuted,theeffectwillbeidenticaltoatest-and-setinstruction.
Anotherapproachisthe“compare-and-swap”(CAS)instruction,whichisusedintheX-86architecture.TheCASinstructiontakesfouroperands:amemoryaddress,aregistercontainingan“expectedvalue”,andaregistercontaininga“newvalue”,andaregisterinwhichtostoreareturncode.Theinstructiondoesseveralthingsatomically,i.e.,withoutthepossibilityofinterruptionorinterferencefromotherthreads.First,theinstructionreadsavaluefrommemoryandcomparesittothe“expectedvalue.”Ifthetwoaredifferent,theinstructionfailsandreturnsacodetoindicatefailure.Butifthetwovaluesareequal,theinstructionstoresthe“newvalue”intothememorylocationandreturnsacodeindicatingsuccess.
Theproblemwithalltheseinstructionsisthateachrequirestwomemoryoperations.ThisviolatesthegeneralRISCprincipleofkeepingallinstructionssimple,minimal,andfast.Twomemoryoperationsinasingleinstructionsistoomuchandcomplicatesthecircuitry.
Furthermore,theseinstructionscanslowothercoresdownsincetheymustbeexecutedatomicallyandpresumablytheyworkbyfreezinguptheentirememorysystemduringtheirexecution.
Theproblemswithatomicallyexecutingbothareadandwritetogethergetworsewithmorecores.Theseproblemsalsogetworsewithmorefrequentlockingoperations.Tosupportincreasedconcurrency,it’sagoodideatohave2iner-grainedlocking,i.e.,tohavemorelocks,havingeachlockprotectlessdata.Finer-grainedlockingimpliesmore“acquire”and“release”operations,evenifthelocksareusuallyfoundtobefree.Theproblemsalsogetworsewithincreasedsharingofdata,ageneralizedtrendweareseeinginmanycontexts.
AReviewofCachingConcepts
Thissectionisnotspeci2ictoRISC-Vandcanbesafelyskippedifyouarefamiliarwithcaching.
RISC-VArchitectureSummary/Porter Page� of� 200 323
Chapter7:ConcurrencyandAtomicInstructions
Theproblemofsynchronizationbetweenmultipleprocessors(e.g.,coresorHARTs)getsmuchmorecomplexinthepresenceofcachememories.Inshort,theproblemstemsfromthefactthatasinglelocationinmemorycanhavemultiplevaluesatonce,becausethedatacanbepresentinseveralcacheswithinconsistentvalues.Let’sreviewthebasicideasbehindcachememoriesbeforewediscusstheRISC-Vinstructions.
Inmanycomputers,thebandwidthbetweenmainmemoryandtheprocessorcoreisaperformancebottleneck.Toincreaseperformance,a“cache”memoryisplacedbetweenmainmemoryandthecore.
Wheneverthecoretriestofetchdatafrommemory,thecacheischeckedand,ifitcontainsthedata,thedatacanbesenttothecoreimmediately.Otherwise,thedatamustbefetchedfrommainmemory,whichtakesmoretime.Thedatawillbesenttothecoreaswellasbeingstoredincachememorytospeedfuturerequestsforthesamedata.
Whendataiswrittenfromthecoretomemory,thedatawillsometimesbewrittentothecachememory,whichspeedsfutureaccessesifthecorewantstoreadthedataagainsoon(“writeallocate”).Somesystemsavoidcachingtheupdateddata(“no-write-allocate”).Ineithercase,theupdateddatamustbecopiedtomainmemoryatsomepoint.Thewritetomainmemorymayoccurimmediately,atthetimethecore2irstwritesthedatatocache,andthisiscalled“write-through”.Alternatively,writetomainmemorycanbedelayeduntillater.Atsomelatertime,whenspaceinthecacheisneededforsomeotherdata,theupdatedmodi2ieddatavaluewill2inallybewrittentomainmemory.Thispolicyiscalled“write-back”.
Cachesdonotoperateonbytesindividually;insteadtheystoredatainunitsof“cacheblock”.Acachelineisablockofcontiguousdatabytes(typically64bytes)whicharecopiedtogetherasaunitinandoutofthecache.Inarequestforasinglebyteorhalfword(forexample),theentirecacheblockcontainingthedesiredbyteswilleitherbeinthecache(a“cachehit”)ornot(a“cachemiss”).
Thetypicalalignmentrequirementsensurethatamultipledataunit(suchasahalfword)willnevercrossacacheblockboundary;thedatawilleitherbeentirelyinoneblockorentirelyinanotherblock.Misaligneddatawilloccasionallycrosscacheblockboundariesandtheprocessofreadingintoorwritingfromthecoretothememorysystemwillbecomplicatedandgenerallyrequireabouttwiceasmuchtime.
RISC-VArchitectureSummary/Porter Page� of� 201 323
Chapter7:ConcurrencyandAtomicInstructions
Cachesareoftenorganizedintolevels.Forexample,athree-levelcachemightconsistof:
L1 Closesttothecore.Fastest,smallestincapacity. L2 Intermediate L3 Closesttomainmemory.Slowest,largestincapacity.
Inasystemwithmultiplecoresonasinglechip,eachindividualcoremightpossessitsownprivateL1andL2caches,whileallcoreswillsharealargerL3cache.
Therearedifferencesbetweeninstructionandregulardata.Inparticular,instructionsarealwaysread-only.Sincethedataisnevermodi2iedbythecore,thereisnoneedtoprovidehardwaretoimplementthewrite-allocate/write-no-allocateorwrite-back/write-throughpolicies.
Asatypicalexample,eachcorewillhavetwoL1caches;onefordataandoneforinstructions.Theterm“uni=iedcache”meansthatthecacheholdsbothinstructionsanddata.Typicallytheslower,largercache(e.g.,L3)isuni2ied.
InmanyISAs,includingRISC-V,allaccesstoI/Odevicesis“memory-mappedI/O”whichmeansthattheI/Odevicesareassignedaddressesinthephysicalmemorymap.TosenddataorcommandstoanI/Odevice,thecorewillissue“store”instructionsandtoretrievedataorstatusinformationfromanI/Odevice,thecorewillexecute“load”instructions.
ManyI/Odevicesalsoaccessmainmemorydirectly.Forexample,thecoremightmovedataintomemoryandthenissuecommandstoanI/Odevicewhichsubsequentlycausethedevicetoretrievethesamedatadirectlyfrommemory.
Thus,themainmemorysystemmayhaveseveral“ports”allowingdatatobereadandwrittenseparatelyby:
•Differentcoresonasinglechip •Differentprocessorchips •VariousI/Odevices
Amoderncomputersystemwillconsistofanumberofbussesandanumberofdifferentdevicessittingoneachbus.
RISC-VArchitectureSummary/Porter Page� of� 202 323
Chapter7:ConcurrencyandAtomicInstructions
Whichallthiscomplexity,thereisaproblemofsynchronizationanddataconsistency.Forexample,acoremaywriteabyteofdatatomemoryandexpectsomeI/Odeviceorothercoretofetchthatbyte.Inthepresenceofcaches,theremaybeseveraldifferentvaluesforthesamememoryaddressatthesameinstant.Withoutfurthercontrol,a“reader”maynotseethevaluewrittenbya“writer”eventhoughthewriteoccursearlierintime.
Notethatthedataconsistencyproblemdoesnotoccurforread-onlydata.Ifaparticularbyteofdataneverchanges,thenthereisnotaproblem.Perhapsthedataiscachedorperhapsitmustbefetchedfrommainmemory.Butsincethereisonlyonevalue,thereisneverapossibilitythatthecachecontainsanoutdatedvalue.Asynchronizationproblemcanonlyoccurwhendataisupdated.
Notethatalldataiswrittenatsomepoint,exceptperhapsdatastoredinRead-OnlyMemory(ROM).Wemustassumethatwhenanycomputerispoweredup,thecontentsofdynamicmemoryareunde2ined;evenread-onlydatamustinvolveaninitialwritetothememorysocaremustbetakentoavoidreadingtheinitialunde2ineddata.
Thesynchronizationproblemoccursbecausemultiplevaluesforagivenmemorylocationcanbestoredinvariouscaches.Ifthecachesareeliminated,thenthedatamustnecessarilybeconsistent;butcachesyieldtremendousperformancebene2its,sosupportforsynchronizationin2luencesISAdesign.
Wecanmakeadistinctionbetweenprivatecachesandshared(i.e.,memory-sideorglobal)caches.Considerasystemwithseveralcoresandasinglesharedmainmemory.Eachcoremayhaveitsownprivatememorycache,inwhichcaseasinglebyteofmemorymaybecachedsimultaneouslyinseveraldifferentprivatecaches.Anupdatebyanysinglecoretothisbyteoughttobevisibletoallcoresbut(withoutsynchronization)othercoresmightreadoutdatedvaluesfromtheirprivatecaches.
If,insteadthereisonlyasinglesharedcachesittingbetweenmainmemoryandallcores,thenthesynchronizationproblemgoesaway.Agivenbytecanonlybecachedinoneplace,namelythesharedcache.Sinceallcoresareusingthissharedcache,anyupdatetothebytewillbere2lectedinthesharedcache,andthereforeseenbyallothercores.
Modernsystemsoftenuseamixedofprivateandsharedcaches.
Acachememoryconsistsofasetof“cachelines”.
RISC-VArchitectureSummary/Porter Page� of� 203 323
Chapter7:ConcurrencyandAtomicInstructions
Eachlineinacachecontainsacouplebitsidentifyingwhatdataiscontainedinthelineandwhetherthedatais“valid”or“invalid”.Moreprecisely,acachelinegenerallycontainsthefollowingpiecesofinformation:
•Thecachelinedata(e.g.,ablockof64databytes) •Theaddressofthedatablock •A“validbit”,totellwhetherthiscachelineismeaningful •A“dirtybit”,totellwhetherthedatahasbeenmodi2ied
Wediscussvirtualmemoryandaddresstranslationelsewhere.Butherewepointoutthateachcachelinemustcontaintheaddressofthedatastoredinthatcacheline.
Intheeasiest-to-understandapproach,eachcachelinecontainsthephysicaladdressofitsdatablock.Thecacheisan“associativememory”usingthephysicaladdressasthesearchkey(i.e.,theassociativeindex).However,becauseofvirtualmemoryaddresstranslation,thevirtualaddressisknownearlierintimethanthephysicaladdress,soitmakessensetoindexthecachelinesbyvirtualaddress.Realdesignsaremorecomplex,butforourpurposeswedonotneedtogofurtherhere.
Acrudesolutiontothecacheconsistencyproblemistoperiodicallyemptytheentirecache.
“Flushingthecache”meanswritingallupdateddatabacktomainmemoryandmarkingallcachelines“invalid”.Soasimpleapproachisto2lushallcachesafteranywrite.Flushingtheentirecacheafteranywriteisoverlyconservativeandnotthemostef2icientapproach,butitwillsolvethecacheconsistencyproblem.
Thesoftwaremightbealittlesmarterandavoidfullcache2lusheswhennotstrictlyrequired.Forexample,whenonecorewritestomainmemory,thereisaquestionofwhethercachesbelongingtoothercoresmustbe2lushed.Iftheprogrammercanbecertainthatthedatawillneverbereadbyothercores,thencallingforacache2lushcanbeavoided.
TheRISC-Vdesignersrecognizethata2iner-grainedcontrolovercachesandmemorysynchronizationisrequired,andprovidetheFENCEinstruction,whichwillbedescribedlater.
RISC-VArchitectureSummary/Porter Page� of� 204 323
Chapter7:ConcurrencyandAtomicInstructions
RISC-VConcurrencyControl
RISC-Vsupportsconcurrencycontrol,synchronization,andcacheconsistencywiththeseinstructions:
Alwaysavailable: FENCE Synchronizedatareadsandwrites FENCE.I Synchronizetheinstructioncache
Availableinthe“A”extension(for“AtomicInstructions”) LR LoadReserved SC StoreConditional AMOSWAP Atomicallyswaptwovalues AMOADD Atomicallyaddtwovalues AMOAND AtomicallyANDtwovalues AMOOR AtomicallyORtwovalues AMOXOR AtomicallyXORtwovalues AMOMAX AtomicallyMAXtwovalues AMOMIN AtomicallyMINtwovalues
UnderRevision:TheRISC-Vspecstatesthattheirmemorymodelisunderrevision,implyingthattheseinstructionsmaychange.
TheFENCEInstructions
Therearetwofenceinstructions:FENCEandFENCE.Iandtheseinstructionsarealwaysavailable.Theydonotrequirethe“A”(AtomicOperations)extension.
Thegeneralideawitha“fence”isthatalloperationsarepartitionedintotwosets:thepredecessorsetandthesuccessorset.
Thefencerequiresalloperationsinthepredecessorsettocompletebeforeanyoftheoperationsinthesuccessorsetcanproceed.InthecaseofRISC-V,thepredecessoroperationsaretheinstructionsthatmustcompletebeforethefence,andthesuccessoroperationsarethosethatmaynotbeginuntilafterthefence.
RISC-VArchitectureSummary/Porter Page� of� 205 323
Chapter7:ConcurrencyandAtomicInstructions
Asananalogy,considertheTourdeFrance,whereallthecyclistsmustcompleteeachstageoftherace,beforeanyridercanbeginthenextstage.Onlyaftertheslowestrider2inishesthepreviousstageoftheraceareanyridersallowedtostartthenextstage.Thestagesareeffectivelyseparateby“fences”.
TheFENCEinstructioninRISC-Vmayonlyaffectinstructionsthatreadorwritetothememorysystem.Itisusedtocontrolconcurrentaccessestosharedmemory.
Recallthatthephysicalmemoryaddressspacecanbepopulatedwith:
•Normal,physicalmemory(containingstoreddatabytes) •Memory-mappedI/Odevices
Ahardwarethread(HART)caneitherreadorwritevaluesto/fromthememoryspace.Thisyieldsthesecombinations:
R Read–readdatafromphysicalmemory W Output–writedatatophysicalmemory I Input–readdatafromamemory-mappedI/Odevice O Output–writedatatoamemory-mappedI/Odevice
Ofcoursemanyoperations(suchasamovementofdatafromoneregistertoanother)donottouchmemoryatall.Suchoperationsdonotfallintoanyofthesefourclassi2icationsandareignoredbytheFENCEinstruction.
ImaginethatsomeHARTexecutestwomemorywrite(“W”)operationsinsuccession.Perhapsthe2irstwriteoperationissettingalock,whichisintendedtoprotectsomepieceofsharedata.Onlyafterthelockhasbeenset,isitpermissibletoupdatetheshareddata.(Thisistheideabehindprotectingshareddatawithlocks;theshareddatashouldonlybeaccessedwhenthelockisheldbythethread.)
OfcoursetheHARTinquestionwillexecutethewritetothelockbeforethewritetothedata,butbecauseoftherelaxedmemorymodelassumedbytheRISC-Vspec,itispossiblethatotherHARTswillseethedatawriteappearingtocomebeforethelockwrite.OtherHARTscouldthereforeseetheupdateddatabeforeseeingthelockgettingset,therebyviolatingthefunctionalityandintegrityofthelockprotocol.TheFENCEinstructionprovidesawaytopreventthisdisasterandallowsprogrammerstoimplementproperconcurrencycontrolandcreatecorrect,reliablecooperatingprocesses.
RISC-VArchitectureSummary/Porter Page� of� 206 323
Chapter7:ConcurrencyandAtomicInstructions
TheFENCEinstructioncanbeusedtomakesurethatallotherHARTswillseethewriteoccurringbeforetheread.Inthisexample,wewilluseaFENCEinstructiontomakesurethatallwritestomemorythatwereexecutedbeforetheFENCEinstructionwillappeartoexecutebeforeallwritetomemorythatwillbeexecutedaftertheFENCEinstruction.
ForthepurposesoftheFENCEinstruction,weuse“P”foroperationsoccurringpriortotheFENCE(predecessoroperations)and“S”foroperationsoccurringaftertheFENCEinstruction(successoroperations).Thus,wehave:
Predecessoroperations: PR DatareadsoccurringbeforetheFENCE PW DatawritesoccurringbeforetheFENCE PI Memory-mappedreads(inputs)occurringbeforetheFENCE PO Memory-mappedwrites(output)occurringbeforetheFENCE
Successoroperations: SR DatareadsoccurringaftertheFENCE SW DatawritesoccurringaftertheFENCE SI Memory-mappedreads(inputs)occurringaftertheFENCE SO Memory-mappedwrites(output)occurringaftertheFENCE
Inourexample,wewanttoforceallwritestodatamemorythatwereissuedbeforetheFENCEtocompletebeforeallwritetodatamemorythatwillbeissuedaftertheFENCEinstruction.Thus,wecaninsertthefollowingFENCEinstructionbetweenthewritetothelockandthereadfromtheshareddata:
...writetolock…FENCE PW,SW # Complete past writes before future writes
…writetoshareddata…
FENCE
GeneralForm:FENCE PI,PO,PR,PW,SI,SO,SR,SW
Example:Thisinstructioncontains8singlebit2ields.Eachcanbeenabled(setto1)ordisabled(clearedto0).The2ieldsarecalledPI,PO,PR,PW,SI,SO,SR,SW.Thespecdoesnotindicatehowthisinstructionistobecodedinassembly.Perhapsthebitscanbesetbybeingmentionedandclearedotherwise.Forexample:
RISC-VArchitectureSummary/Porter Page� of� 207 323
Chapter7:ConcurrencyandAtomicInstructions
FENCE PR,PW,SR,SWDescription:
Seetextabove.Availability:
Thisinstructiondoesnotrequirethe“A”extension(AtomicInstructions).Itisalwaysavailable.
Encoding: ThisisavariationofanI-typeinstruction,wheretheRegDandReg22ieldsare
notusedandsettozeroandonly8oftheImmed-12bitsareused.
ImplementationofFENCE:OnesimpleimplementationofFENCEisforthehardwaretosimplyignorethedistinctionbetweenR,W,I,andOandsimplycompletealloperationsoccurringbeforetheFENCEbeforestartinganyoperationsaftertheFENCE.Inthemostconservativebutsimplestimplementation,thehardwaremightjust2lushallcaches,forcingasortofhard,system-widefenceoperation.Hopefullysomeimplementationswillprovidethe2iner-grainedcontrolpermittedbytheinstruction.
TheFLUSHinstructionisthemaintoolfor2lushingthecache,butisintendedtooperateononlydatacaches.However,instructionsmaybecachedinaseparateinstructioncacheandthereisaneedtoupdatethiscachefromtimetotime.ThisisthepurposeoftheFENCE.Iinstruction.
Wheneverbytesarewrittentomemoryandthesebytesareintendedtobesubsequentlyexecutedasinstructions,anupdatetotheinstructioncachemayberequired.Forexample,awrite(e.g.,theSTstoreinstruction)tomemorylocationXmaybeexecuted.SubsequentlythecorewillexecutetheinstructionlocatedataddressX;ofcoursethenewvalueshouldbefetched.However,iftheinstructioncachealreadycontainsthepreviouscontentsforlocationX,theremaybeaproblem.Thecorecannotsimplyusethecachedvalue.
TheeffectoftheFENCE.Iinstructionisequivalenttoinvalidatingtheentireinstructioncache,therebyforcingallfuturefetchestogotomemory.ThisguaranteesthatthenewestvalueforlocationXwillbefetched.
FENCE.I–(InstructionCacheFlushing)
GeneralForm:FENCE.I
RISC-VArchitectureSummary/Porter Page� of� 208 323
Chapter7:ConcurrencyandAtomicInstructions
Example:FENCE.I # Flush the instruction cache
Description:InvalidatetheentireinstructioncachelocaltothecurrentHART,oratleastdosomethingoperationallyequivalent.Seetext.
Availability:Thisinstructiondoesnotrequirethe“A”extension(AtomicInstructions).Itisalwaysavailable.
Encoding: ThisisavariationofanI-typeinstruction,wheretheRegD,Reg2,and
Immed-122ieldsarenotusedandsettozero.
Note:ThisinstructiononlyaffectsthecurrentHART.Inamulti-coresystem,itisprobablethatinstructionswillbecachedininstructioncachesthatareprivatetoeachcore.IfcoreAwritesdatatomemorylocationXandexpectstoexecutethatsamedata(e.g.,jumptoX),thenitshouldexecuteaFENCE.Iinstructiontomakesurethatitscachedoesn’tcontainanobsoletevalueforlocationX.
However,thiswillnothelpothercores,whichmayhavecachedolderdatafromlocationXintheirprivateinstructioncaches.Inorderto2lushtheinstructioncachesofothercores,softwareoncoreAwillhavetosomehowsignalsoftwarerunningontheothercores,whichwilltheneachneedtoexecuteFENCE.Iinstructions,to2lushtheirownprivatecaches.
ImplementationNotes:OneimplementationofFENCE.Iissimplyto2lushtheentireinstructioncache,i.e.,tosetallcachelinesto“invalid”.
Thismayseemlikeacoarse-grainedapproach,butitisassumedthatsoftwaretypicallyupdatesalargeblockofinstructionsinmemoryand,onlyafterwritingallthedata,willtherebeajumptothenewlycreatedcode.Forexample,consideraJust-In-Time(JIT)compiler,whichonlycompilesafunction/methodatthetimeitis2irstcalled/invoked.Thereisacleartransitionfromcompile-phasetoexecution-phase.Thus,onlyasingleFENCE.Iinstructionwouldbeneeded,andthepreviouscontentsoftheinstructioncachearelikelytobeunneededinthenearfutureanyway,soinvalidatingthemisacceptable.
Anotherapproachassumesthatthedataandinstructioncachesarekeptconsistent.Forexample,theinstructioncachemightbe“snooping”writestothe
RISC-VArchitectureSummary/Porter Page� of� 209 323
Chapter7:ConcurrencyandAtomicInstructions
datacache.Everywritetothedatacachewouldhavetheside-effectofupdatingtheinstructioncachewhennecessary.Whenevertheinstructioncachehappenedtocontaindatafromthesameaddressbeingwrittento,thatcachelineintheinstructioncachewouldbeupdatedoratleastinvalidated.Ifthecachesarekeptcoherent,thentheFENCE.Iinstructioncanbeimplementedbysimply2lushingthepipeline,whichmaypotentiallycontainoutdatedinstructiondata.
Load-Reserved/Store-ConditionalSemantics
TheRISC-Vdesignsupportsanapproachtoconcurrencycontrolcalledload-reserved/store-conditional(LR/SC).Thisisalsocalled“load-link/store-conditional(LL/SC).
We’llbeginbydescribingtheLRandSCinstructions,whichareonlyavailableinthe“A”(AtomicOperations)extensiontotheRISC-Vspec.
Theideaisthatanatomicoperationsuchastest-and-set,swap,orcompare-and-setisbrokenintotwoseparateinstructions.The2irstinstruction(“load-reserved”)performsthereadphaseandthesecondinstruction(“store-conditional”)performsthewritephase.
TheLRinstructionisverysimilartoatypicalLOADinstruction.Itreadsavaluefrommemoryandstoresitintoaregister.Butinaddition,theinstructionalso“reserves”thatmemorylocation.Youcanimaginethatthisreservationconsistsofmakinganoteofwhichcore(or“HART”inRISC-Vterminology)touchedwhichmemorylocationwithanLRinstruction.Thisreservationnoteissomethingthatismanagedbythememorysynchronization/controlhardwareandnotbythecoresattemptingtoaccessmemory.
Atsomelatertime,anSCinstructionwillbeexecuted.Itwillstoreavaluefromaregisterintomemory,justlikeatypicalSTOREinstruction.However,theinstructionmay“succeed”or“fail”.TheSCinstructionhasthreeoperands:aregistercontainingthevaluetobestoredinmemory,amemoryaddress,andaregisterintowhichthereturncodewillbestored.
Therearetworeturncodes:“0”indicatessuccessand“non-zero”indicatesfailure.Failuremightbecausedbymisalignedaddresses,accessviolations,etc.,butthese
RISC-VArchitectureSummary/Porter Page� of� 210 323
Chapter7:ConcurrencyandAtomicInstructions
arebesidethepoint.Theprimaryuseofthesuccess/failurehastodowithconcurrencycontrol.
TheproperwaytousetheseinstructionistoexecuteanLRinstructiononsomememorylocationandthen,shortlythereafter,toexecuteanSCinstructiononthatsamememorylocation.Ifnoothercore/HARThasstoredintothatmemorylocationsincetheLRinstruction,thentheSCwillsucceedandthevaluewillbestoredintomemory.Butifsomeothercore/HARThasstoredanythingintothatmemorylocation,thentheSCwillfailandnothingwillbestoredintomemory.Presumably,thecodewillthenloopandretrytheLR/SCsequenceuntilitsucceeds.
AreservationismadebyaLRinstructionandremainsvalidforasmall,2initeamountoftime(about16instructions)speci2iedbytheRISC-Vspec.TheSCinstructionmustbeexecutedwithinthiswindowofopportunity.Ifsomeothercore/HARTsneaksinanexecutesanLRonthesamememorylocation,thenthereservationiscancelledandtheSCwillfail.
LoadReserved(Word)
GeneralForm:LR.W RegD,(Reg1)
Example:LR.W x4,(x9) # x4 = Mem[x9] and reserve
Description:A32-bitvalueisfetchedfrommemoryandmovedintoregisterRegD.ThememoryaddressisinReg1.
Comment:Thisinstructionplacesa“reservation”onablockofmemorycontainingthisword.Seecommentsinthetextregardingsemantics.
Theaddressbeproperlyaligned.Thereservationmayincludemorethanjustthebytesaddressed,butwillatleastincludethesebytes.
RV64/RV128:Foramachinewitharegisterwidthlargerthan32-bits,thevalueissign-extendedtothefulllengthoftheregister.
Availability:Thisinstructionisonlyavailableinthe“A”extension(AtomicInstructions).
Encoding: ThisisavariationofanR-typeinstruction,whereReg2is00000.
RISC-VArchitectureSummary/Porter Page� of� 211 323
Chapter7:ConcurrencyandAtomicInstructions
LoadReserved(Doubleword)
GeneralForm:LR.D RegD,(Reg1)
Example:LR.D x4,(x9) # x4 = Mem[x9] and reserve
Description:A64-bitvalueisfetchedfrommemoryandmovedintoregisterRegD.ThememoryaddressisinReg1.
Comment:SeeLR.W.
RV64/RV128:OnlyavailableonRV64andRV128.ForRV128,thevalueissign-extendedtothefulllengthoftheregister.
Availability:Thisinstructionisonlyavailableinthe“A”extension(AtomicInstructions).
Encoding: ThisisavariationofanR-typeinstruction,whereReg2is00000.
StoreConditional(Word)
GeneralForm:SC.W RegD,Reg2,(Reg1)
Example:SC.W x5,x4,(x9) # Mem[x9]=x4; x5=success code
Description:A32-bitvalueiscopiedfromregisterReg2tomemory.ThememoryaddressisinReg1.Asuccess/failcodeisplacedinRegDwhere0=successandnon-zero=failure.Onfailure,memoryisnotchanged.
Comment:Thisinstructionassumesa“reservation”haspreviouslybeenmadebyanLRinstructiononablockofmemorycontainingthisword.Seecommentsinthetextfordetails.
Theaddressbeproperlyaligned.Thereservationmayincludemorethanjustthebytesaddressed,butwillatleastincludethesebytes.
RISC-VArchitectureSummary/Porter Page� of� 212 323
Chapter7:ConcurrencyandAtomicInstructions
Thevalueof“1”willusuallybeusedtoindicatefailure,butthereisapossibilityfordifferentcodestobeusedtoindicatedifferentreasonsforfailure.
RV64/RV128:Foramachinewitharegisterwidthlargerthan32-bits,theupperbitsoftheregisterareignored.
Availability:Thisinstructionisonlyavailableinthe“A”extension(AtomicInstructions).
Encoding: ThisisanR-typeinstruction.
StoreConditional(Doubleword)
GeneralForm:SC.D RegD,Reg2,(Reg1)
Example:SC.D x5,x4,(x9) # Mem[x9]=x4; x5=success code
Description:A64-bitvalueiscopiedfromregisterReg2tomemory.ThememoryaddressisinReg1.Asuccess/failcodeisplacedinRegDwhere0=successandnon-zero=failure.Onfailure,memoryisnotchanged.
Comment:SeeSC.W
RV64/RV128:OnlyavailableonRV64andRV128.ForRV128,theupperbitsoftheregisterareignored.
Availability:Thisinstructionisonlyavailableinthe“A”extension(AtomicInstructions).
Encoding: ThisisanR-typeinstruction.
AtomicMemoryControlBits:“aq”and“rl”
The“A”extensionprovidesthefollowingatomicmemoryoperation(AMO)instructions:
RISC-VArchitectureSummary/Porter Page� of� 213 323
Chapter7:ConcurrencyandAtomicInstructions
LR LoadReserved SC StoreConditional AMOSWAP Atomicallyswaptwovalues AMOADD Atomicallyaddtwovalues AMOAND AtomicallyANDtwovalues AMOOR AtomicallyORtwovalues AMOXOR AtomicallyXORtwovalues AMOMAX AtomicallyMAXtwovalues AMOMIN AtomicallyMINtwovalues
Eachinstructioncontainstwoadditionalbitstocontroltheiroperationabitfurtherthansuggestedpreviously.Thesebitsarecalled“aq”and“rl”(foracquireandrelease).
Thebitscanbesetusingthefollowingassemblernotation:
lr.w.aq x5,(x6) # set the “aq” bit to 1lr.w.rl x5,(x6) # set the “rl” bit to 1lr.w.aq.rl x5,(x6) # set both bits
Sometimesinstructionsonasinglethread/HARTwillbereorderedbythecoreexecutionunittoachievegreaterperformance.Forexample,imaginetwoinstructionsappearinginsequence:2irstastoreinstructionfollowedbyaloadinstruction.Iftheaddressesintheseinstructionsaredifferent,thentheexecutionunitcanreasonablyperformbothoperationsinanyorder.Perhaps,thecorewillissuebothinstructionssimultaneously;thememoryunitmaycompletetheoperationoftheinstructionsinanyorder,perhapsduetothechancecontentsofthecache.Sincethetwoinstructionsaretouchingdifferentmemorylocations,theycanbesafelyreordered,right?
Usuallythereorderingissafe,butinthepresenceofotherconcurrentthreads/HARTs,therecanbeissues.Forexample,imaginethatoneofthetwomemorylocationsrepresentsalockthatisintendedtocontrolaccesstotheotherlocation.Now,itbecomescriticalthattheoperationsareexecutedinthecorrectorder.
(Asanexampleoftheproblem,imaginethatthestoreoperationisbeingusedtosetalock.Onlyafterthelockisset,isitallowabletoloadvaluesfromtheshareddata.Theshareddataisonlyguaranteedtobeinaconsistent,usablestatebyotherprocesseswhenthelockisfree;wheneversomeprocessholdsthelock,thestateofthedatamaybeinaninconsistent,un2inishedstate,andmustnotbeaccessedby
RISC-VArchitectureSummary/Porter Page� of� 214 323
Chapter7:ConcurrencyandAtomicInstructions
otherprocesses.Butnowimaginethattheloadoperationisperformed2irst,beforethelockissetwiththestoreinstruction.ThismeansthattheHARThasaccessedsharedmemorywithout2irstacquiringthelock.Thismightallowittoseeaversionorstateofthedatathatitshouldnotsee;aviolationoftheconventiontoacquirelocksbeforeaccessingtheshareddata.)
The“aq”and“rl”bitscontroltheorderingofinstructionsonasingleHART.TheydonothaveanyeffectontheschedulingofinstructionsexecutedondifferentHARTs.Inotherwords,thebitsaretherejusttomakesurethatreorderingofinstructionsonthecurrentHART(whichtheexecutionunitmightnormallydotoincreaseperformance)aretobesuppressedsothatconcurrencycontrolcodewillfunctionproperly.
The“aq”bithasthefollowingmeaning.Whenaninstructionwiththe“aq”(acquire)bitsetisexecuted,itmeansthatanyinstructionsthatfollowthe“aq”instructionmustbedelayedandcannotbeexecutedbeforethe“aq”instruction.The“aq”instructionwillbeexecuted2irst,beforetheinstructionsthatfollowitinthesequential2lowofinstructionsonthisHART.
The“rl”bithasthefollowingmeaning.Whenaninstructionwiththe“rl”(release)bitsetisexecuted,itmeansthatanyinstructionthatprecedesthe“rl”instructionmustbedoneandexecutedtocompletionbeforethe“rl”instruction.The“rl”instructionwillbeexecutedlast,aftertheinstructionsthatprecedeitinthesequential2lowofinstructionsonthisHART.
WhilewehavejustdescribedthebitsasapplyingtoonlyoneHART,thiswasasimpli2ication.Therearereallymultipleviewsoftheorderinwhichinstructionsareexecuted.First,asmentioned,thereistheorderthattheHARTitselfexecutesthetwoinstructions.Butsecond,thereistheorderinwhichotherHARTsobservetheexecution.Perhapsduetocachingeffects,thesecondordermaybedifferent.
Forexample,imaginethattherearetwostoreinstructionstobeexecutedbyoneHART.The2irstinstructionisusedtosetalockandthesecondinstructionisanupdatetotheshareddata.Presumably,anyupdateshouldonlybedoneprivatelybyaHARTthatisalreadyholdingthelock.TheconcernistheorderinwhichtheotherHARTsobservethesetwomemorystoreoperations:otherHARTsmustobservethestoretothelocktohappen2irst,thussignalingthatthelockisnolongerfree.OtherHARTsmustnotbeallowedtoseeaviewinwhichthesecondstoreappearedtooccurbeforethelockwasset,otherwisethisopensthedoortoallowingsomeHARTtoaccessdatathatshouldnotbeaccessible.
RISC-VArchitectureSummary/Porter Page� of� 215 323
Chapter7:ConcurrencyandAtomicInstructions
The“aq”and“rl”bitsimposeorderingconstraintsontheviewsobservedbyallHARTsoftheinstructionstreamoftheHARTusingthem.OtherHARTswillseeonlythelegalinstructionsequences.
Whenboth“aq”and“rl”bitsaresetinsomeinstruction,somethingcalled“sequentialconsistency”isimposed.AllotherHARTswillseethesameorderingasoccurredintheHARTcontainingtheinstruction:AllinstructionsthatwereexecutedbeforetheinstructioninquestionwillappeartoallotherHARTsashappeningbeforetheinstruction.AllinstructionsthatwereexecutedaftertheinstructionwillappeartoallotherHARTsashappeningaftertheinstruction.
RISC-Vdividesaccessestothememorysystemintotwocategories:
•Accessestophysicalmemory •Accessestomemory-mappedI/O
EachAMOinstruction(i.e.,LR,SC,AMOSWAP,AMOADD,AMOAND,AMOOR,AMOXOR,AMOMAX,AMOMIN)willaccessexactlyoneofthesetwodomains:eithermemoryorI/O.
The“aq”and“rl”bitsapplyonlytothedomainthatisbeingaccessed.Thebitsimposeorderingconstrainsonallaccessestothedomaininquestion,butimposenoconstraintsontheotherdomain.
Commentary:I’mguessingthemotivationisthateachdomain(memoryorI/O)willhaveaseparateapproachtoimplementingthesynchronizationconstraintsimposedbythe“aq”and“rl”bits.Theuseofthebitswilltriggereitheronemechanismortheother.
Orperhapstheimplementationsineachdomainwillimposewidelydifferentperformancehits.Certainlythe“aq”and“rl”bitswillberequiredtoworkproperlyinthememorydomainandpresumablytheimplementationwillendeavortoprovidehighperformance,sincelocksinmemoryarecommon.ButmaybetheirusewithintheI/Odomainwillberareandasuper-slowimplementationisacceptable,aslongasitcanbeisolatedfromthememorysystem.
Orperhapsitisenvisionedthat,insomeimplementationsofthestandard,thememorysystemwillsupportthe“aq”and“rl”semantics,butwillnotsupportthespeci2iedbehaviorforaccessestotheI/Osystem.
RISC-VArchitectureSummary/Porter Page� of� 216 323
Chapter7:ConcurrencyandAtomicInstructions
???
UsingLRandSCInstructions
Takealookatthefollowingcodesequence.Weassumethatthereisalock,whoseaddressisinregisterx10.Whenthelockholds“0”thelockisfree;thevalueof“1”isusedtoindicatethelockisset.
Inline2weloadthecurrentvalueofthelockandplaceareservationonthatlocation.Inline3,wechecktoseeifthelockwasalreadyset.Ifso,wejumptosomeothercode.Thiscodemightsimplyjumpbackto“lock”totryagainimmediately;thisiscalleda“spin”lockbecausethecodeloops,waitingforthevaluetobecomezero.Orperhapstheothercodewillwait,reschedulethreadsorwhatever,andcomebacktotryagainsometimelater.
Line4loadsthevalue“1”intoatemporaryregister.Line5istheSTORE-CONDITIONALinstruction,whichstoresthe“1”(indicating“locked”)intothelock.Thisinstructionwillsucceedorfailandthecodeisplacedintox5totellwhichhappened.
IfsomeotherHARTmanagedtostoreintothelock,thentheSCwillfailandwillnotbeupdatedbythisHART.Otherwise,ifallisgoodandthelockreservationisstillintact,thentheSCwillsucceedand“1”(representing“locked”)willbestoredintothelock.
ThelastlinelooksatthereturncodefromtheSCinstruction.IftheSCfailed,thiscodeloopsbacktothebeginningtotryagain.
1 lock:2 lr.w x5,(x10) # Grab the lock’s value3 bnz x5,fail # If already locked, fail4 li x7,1 # Store “1” into lock5 sc.w x5,x7,(x10) # to indicate “locked”6 bnz x5,lock # If failure, try again
RISC-VArchitectureSummary/Porter Page� of� 217 323
Chapter7:ConcurrencyandAtomicInstructions
Toreleasealock,allweneedtodoisstorea“0”intothelockvalue.Seethefollowingcode.Recallthatx0alwaysholds“0”.
7 unlock:8 sw x0,(x10) # Set to “unlocked”
Theinstructionsequencesaboveareokayinonerespect,but2lawedinanother.First,theywillatomicallyacquiredthelockinsuchawaythatonlyoneHARTatatimewillacquirethelock.However,itisassumedthatlocksareusedtoprotectcriticalsectionsofcode.Thecriticalsectioncodewillaccesssomeshareddata.Allaccessestotheshareddatamustoccurafteralockoperationandbeforeanunlockoperation,inorderforthelocktobemeaningful.
ButwithmultipleHARTs,itispossiblethat(duetoreorderingofmemoryoperationsanddifferentviewsoftheorder)otherHARTsmightexperiencetheseaccessesinthecriticalsectioncodehappeningeitherbeforethelockoperationoraftertheunlockoperation.Thiscouldsabotagethelock’sfunctionality.
Sowecanrewritethiscode,makinguseofthe“aq”and“rl”bits.Thecorrectedcodeisshownbelow.
The“rl”bitissetwithintheLRinstruction,whichmeansthatanymemoryoperationsthatcameearlierinthesequencewillbecompleted(inthesenseofbeingvisibletootherHARTs)beforethelocksequenceisbegun.Wedon’twantanythingwe’vedonethatshouldbevisibleoutsidethecriticalsectiontobeaccidentallyhidden.
The“aq”bitissetwithintheSCinstruction,whichmeansthatanythingthathappensafterthelocksequence(i.e.,anythinginthecriticalsectioncode)willbeseenbyotherHARTsasoccurringafterthelocksequence.Thismakessurethatnocriticalsectionstuffcanleakoutinfrontofthelocksequence.
1 lock:2 lr.w.rl x5,(x10) # Grab the lock’s value3 bnz x5,fail # If already locked, fail4 li x7,1 # Store “1” into lock5 sc.w.aq x5,x7,(x10) # to indicate “locked”6 bnz x5,lock # If failure, try again
Likewise,wedothesamewithintheunlocksequence,shownnext.
RISC-VArchitectureSummary/Porter Page� of� 218 323
Chapter7:ConcurrencyandAtomicInstructions
7 unlock:8 lr.w.rl x5,(x10) # Reserve the lock9 sc.w.aq x5,x0,(x10) # Set to “unlocked”10 bnz x5,unlock # If failure, try again
WeswitchfromusingaSTOREWORDinstructiontousingaLR/SCcombination,sothatwecanusethe“rl”and“aq”bits.WeuseanSCinstructiontostoretheunlockcode(“0”)intothelockandweuseanLRinstruction,becausetheSCinstructionwillfailifwedonotholdareservationonthememorylocation.The“rl”bitforceseverythingwehavedonebeforetheunlocksequencetoappeartootherHARTsasoccurringbeforethelockissetto“0”(unlocked).The“aq”bitontheSCinstructionforcesanythingwedonext,toappearasoccurringaftertheunlocksequence.
Wedon’tbothertochecktheoldvalueofthelock,sinceweassumethat(intheabsenceofbugs),wewouldnevertrytounlockalockthatwasnotpreviouslylocked.However,wemusttestforthefailureoftheSC;itisalwayspossiblethatsomeotherHARTtriedtoacquirethelock,touchedthememorylocationrepresentingthelock,andinvalidatedourreservation.
DeadlockandStarvation–Background
Deadlockoccurswhenevertwo(ormore)processesareeachholdingaresource(e.g.,theyhaveeachacquiredalock)andeachiswaitingforaresourceheldbyanotherprocess(e.g.,tryingtoacquireanotherlock).Onceadeadlocksituationoccurs,theprocesseswillbefrozen,pendingsomeoutsideintervention.DeadlockisaconcernaddressedbytheOSandthereareanumberofapproachestoavoidingordealingwithit.
Thereisanothersimilarproblemcalled“starvation”.(Theterm“livelockissometimesusedtomeanstarvation.)Withstarvation,theprocessesarenotnecessarilyfrozenbut,duetothevagariesofchance,someprocessesnevermakeforwardprogress.
Deadlockrequiresatleasttworesources(suchaslocks)thatarebeingfoughtover.Ontheotherhand,“starvation”canoccurwithonlyoneresource.
RISC-VArchitectureSummary/Porter Page� of� 219 323
Chapter7:ConcurrencyandAtomicInstructions
Tounderstandstarvation,imagineascenarioinwhichtwoHARTs/processes(calledAandB)competeforasinglelock.ThecoderunningonHARTA2indsthatthelockiscurrentlyheldbyBandrespondsbywaitingandthenretryingtoacquirethelock.Weassumethatnothread/processholdsalockinde2initely(i.e.,weassumethesearecooperatingprocesseswithoutanyprogrambugs),soBmusteventuallyreleasethelock.Butwhatif,duetothepoorluckofA,HARTBhappenstore-acquirethelockagainbeforeAhasachancetocheckit.WhenAchecksasecondtime,Aonceagain2indsthatthelockisstillunavailable.Imaginethatthisrepeatsinde2initely:Bisrepeatedlyreleasingthelock,butAisunabletomakeanyforwardprocess,becausebythetimeAchecks,Bhasalreadyre-acquiredthelock.Biswell-behaved,neverholdingthelockinde2initely,yetAendsupbeingfrozenandunabletomakeprogress.Thisisstarvation.
ItseemslikestarvationoughttoeventuallyresolveitselfwhenthebadluckofA2inallyendsandAgetsachancetoacquirethelock,butthislineofreasoningisdangerous.Funnythingscanhappenwhenprocesses/threads/HARTsarenotbehavingpurelystochastically.Thepossibilityofstarvationmustbeaddressed.
StarvationandLR/SCSequences
Considerthecodesequencetoacquirealockshownearlier.ThatsequenceissuesaLRinstructionandthen,acoupleofinstructionslater,itissuesanSCinstruction.TheguaranteemadebytheRISC-Vspecisthat,iftheSCissuccessfulthennootherHARTshavewrittentothelocation(evenifthevaluewrittenhappenstobethesamevalue).IfanotherHARTmanagedtoexecuteitsSC2irst,thentheSConthisHARTwillfail.TheSCmayfailforotherreasonsaswell.ButiftheSCsucceeds,wecanbesurethatthelocationinquestionhasnotbeenmodi2iedsincetheLRretrievedtheavalue.
However,theRISC-Vspecmakesanadditional,muchstrongerguarantee,anditisthis:IfyoukeepretryinganLR/SCsequence,itwilleventuallysucceed.Starvation(i.e.,livelock)isguaranteednottohappen.TheSCmayfailafewtimes,butitwilleventuallysucceed.Theideaisthata“denialofserviceattack”byaheavy2lowofrequestsfromotherHARTswillneverpreventanyHARTfrommakingprogress.
TherearesomeconstraintsplacedontheLR/SCsequenceinorderforthisguaranteetobemade,andthe“lock”codesequenceshownabovemeetsthem.TherecanbecodebetweentheLRandSC(includingarepeatlooptokeeptestinguntilit
RISC-VArchitectureSummary/Porter Page� of� 220 323
Chapter7:ConcurrencyandAtomicInstructions
succeeds).Thebasicrestrictionisthatthesequenceofinstructionscannotbetoolong,namely16instructions.Alsoothermemoryoperationsareforbidden.Alsoexception/trapprocessingisforbidden.Alsoinstructionsthatmighttakealongtimetocomplete(suchasintegermultiplyor2loatingpointoperations)areforbidden.
Commentary:Guaranteeingthatstarvationwillnotoccurisnotnecessarilystraightforward.Oneapproachinvolvesmaintainingaqueuethatsomehowre2lectshowlongaprocesshasbeenwaiting,andgivingprioritytotheprocessthathasbeenwaitinglongest.Anotherapproachinvolvespassinga“baton”aroundinsequencefromprocesstoprocess.Eachprocessgetsthebatoninturnand,withit,achancetomoveforward.Butafteralimitedamountoftime,itmustpassthebatonontothenextprocess.
TheAtomicMemoryOperation(AMO)Instructions
TheAMOSWAP(AtomicSwap)instructioncanbeusedtosimultaneouslyreadavaluefrommemoryandstoreanewvalueintothatsamelocation.Theinstructioncandothisatomically,whichmeansthatnointerveninginstructionthattriestostoreintothislocationcansneakinandexecute.
Typically,anatomicswapinstructionisusedtosetalock.Theinstructionsetsthelocktothe“locked”valuewhileatthesametimereadingtheoldvaluetomakesurethelockwaspreviously“unlocked.”Theinstructionmustbeatomictoensurethattwoconcurrentprocessesdon’tsimultaneouslystorethe“locked”valueintoan“unlocked”lockwithoutrealizingit.
AtomicSwap
GeneralForm:AMOSWAP.W.aq.rl RegD,Reg2,(Reg1) # 32-bitsAMOSWAP.D.aq.rl RegD,Reg2,(Reg1) # 64-bits
Examples:AMOSWAP.W x5,x4,(x9) # x5=Mem[x9]; Mem[x9]=x4AMOSWAP.W.aq x5,x4,(x9) # x5=Mem[x9]; Mem[x9]=x4AMOSWAP.D.rl x5,x4,(x9) # x5=Mem[x9]; Mem[x9]=x4
Description:
RISC-VArchitectureSummary/Porter Page� of� 221 323
Chapter7:ConcurrencyandAtomicInstructions
AvalueisreadfromthememorylocationwhoseaddressisinReg1andthevalueisplacedintoRegD.ThenthevaluefromReg2iswrittentothememorylocation.InthecasethatReg2andRegDarethesameregister,thevaluesareswapped,i.e.,thevaluestoredinmemoryisthepreviousvalueofReg2.
Thisinstructioncontainsan“aq”bitandan“rl”bit,whichcontroltheorderingofinstructionsappearingbeforeandafterthisinstruction.
Thememoryaddressmustbeproperlyalignedorelsean“AMOAddressmisaligned”exceptionwillberaised.
RV64/RV128:AMOSWAP.DisonlyavailableonRV64andRV128.
Availability:Thisinstructionisonlyavailableinthe“A”extension(AtomicInstructions).
Encoding: ThisisanR-typeinstruction.
CodeExample:TheAMOSWAPinstructioncanbeusedtoimplementthe“lock”and“unlock”operationsonalock,asweshowhere.Thelockwillberepresentedas
0=unlocked,i.e.,free 1=locked
Weassumeregisterx10containstheaddressofthelockandx5isusedasatemporary.Hereiscodetosetthelock:
li x5,1 # x5 = 1retry:
amoswap.w.aq x5,x5,(x10) # x5 = oldlock & lock = 1bnez x5,retry # Retry if previously set
Afterthecriticalsection,whichpresumablyaccessesandmodi2iessomeshareddata,wehavethefollowingcodetorelease(i.e.,free)thelock:
amoswap.w.rl x0,x0,(x10) # Release lock by storing 0
TheinitialAMOSWAPhasthe“aq”bitset,whichmeansthatanyinstructionsthatfollowtheAMOSWAPcannotbeobservedbyotherHARTstoexecutebeforetheAMOSWAP.Thispreventscodefromthecriticalsectionfrom“leaking”outbeforethelockisset.
RISC-VArchitectureSummary/Porter Page� of� 222 323
Chapter7:ConcurrencyandAtomicInstructions
The2inalAMOSWAPhasthe“rl”bitset,whichmeansthatanyinstructionsthatprecedetheAMOSWAPcannotbeobservedbyotherHARTstoexecuteaftertheAMOSWAP.Thispreventscodefromthecriticalsectionfrom“leaking”outafterthelockisreleased.
AtomicAdd/AND/OR/XOR/Max/Min(Word)
GeneralForm:AMOADD.W.aq.rl RegD,Reg2,(Reg1) # additionAMOAND.W.aq.rl RegD,Reg2,(Reg1) # logical ANDAMOOR.W.aq.rl RegD,Reg2,(Reg1) # logical ORAMOXOR.W.aq.rl RegD,Reg2,(Reg1) # logical XORAMOMAX.W.aq.rl RegD,Reg2,(Reg1) # signed maximumAMOMAXU.W.aq.rl RegD,Reg2,(Reg1) # unsigned maximumAMOMIN.W.aq.rl RegD,Reg2,(Reg1) # signed minimumAMOMINU.W.aq.rl RegD,Reg2,(Reg1) # unsigned minimum
Examples:AMOADD.W x5,x4,(x9) # x5=Mem[x9]; Mem[x9]=x4+x5AMOADD.W.aq x5,x4,(x9) # x5=Mem[x9]; Mem[x9]=x4+x5
Description:AvalueisreadfromthememorylocationwhoseaddressisinReg1andthevalueisplacedintoRegD.AbinaryoperationisthenperformedonthevaluefetchedfrommemoryandthevalueinReg2andtheresultiswrittenbacktothememorylocation.InthecasethatReg2andRegDarethesameregister,theoperationisperformedusingtheinitialvalueoftheregister;thevaluestoredintheregisterwillbethevaluefetchedfrommemory.
Thisinstructioncontainsan“aq”bitandan“rl”bit,whichcontroltheorderingofinstructionsappearingbeforeandafterthisinstruction.
Thememoryaddressmustbeproperlyalignedorelsean“AMOAddressmisaligned”exceptionwillberaised.
RV64/RV128:TheresultstoredintoRegDwillbesign-extendedtothefullregistersize,i.e.,to64or128bits.
Availability:Thisinstructionisonlyavailableinthe“A”extension(AtomicInstructions).
Encoding:
RISC-VArchitectureSummary/Porter Page� of� 223 323
Chapter7:ConcurrencyandAtomicInstructions
TheseareR-typeinstructions.
AtomicAdd/AND/OR/XOR/Max/Min(Doubleword)
GeneralForm:AMOADD.D.aq.rl RegD,Reg2,(Reg1) # additionAMOAND.D.aq.rl RegD,Reg2,(Reg1) # logical ANDAMOOR.D.aq.rl RegD,Reg2,(Reg1) # logical ORAMOXOR.D.aq.rl RegD,Reg2,(Reg1) # logical XORAMOMAX.D.aq.rl RegD,Reg2,(Reg1) # signed maximumAMOMAXU.D.aq.rl RegD,Reg2,(Reg1) # unsigned maximumAMOMIN.D.aq.rl RegD,Reg2,(Reg1) # signed minimumAMOMINU.D.aq.rl RegD,Reg2,(Reg1) # unsigned minimum
Examples:AMOADD.D x5,x4,(x9) # x5=Mem[x9]; Mem[x9]=x4+x5AMOADD.D.aq x5,x4,(x9) # x5=Mem[x9]; Mem[x9]=x4+x5
Description:AvalueisreadfromthememorylocationwhoseaddressisinReg1andthevalueisplacedintoRegD.AbinaryoperationisthenperformedonthevaluefetchedfrommemoryandthevalueinReg2andtheresultiswrittenbacktothememorylocation.InthecasethatReg2andRegDarethesameregister,theoperationisperformedusingtheinitialvalueoftheregister;thevaluestoredintheregisterwillbethevaluefetchedfrommemory.
Thisinstructioncontainsan“aq”bitandan“rl”bit,whichcontroltheorderingofinstructionsappearingbeforeandafterthisinstruction.
Thememoryaddressmustbeproperlyalignedorelsean“AMOAddressmisaligned”exceptionwillberaised.
RV64/RV128:TheseinstructionsareonlyavailableonRV64andRV128.
Availability:Thisinstructionisonlyavailableinthe“A”extension(AtomicInstructions).
Encoding: TheseareR-typeinstructions.
Implementation:EachAMOoperationrequiresbothareadofmemoryandawritetomemory,aswellasasimplebinaryoperation.Itmaybethattheread-
RISC-VArchitectureSummary/Porter Page� of� 224 323
Chapter7:ConcurrencyandAtomicInstructions
operate-writefunctionalitywillbeimplementedcompletelywithinthememorysubsystem,insteadofwithinthecore.UponencounteringanAMOinstruction,thecorewouldsendtheinitialvalueofReg2tothememorysystem.Thememorysubsystemwouldthencompletetheoperation,performingtheatomicread-operate-writecycleandthensendingtheresultvaluebacktothecore,tobestoredinRegD.Byof2loadingtheAMOfunctionalitytothememorysubsystem,theatomicityisguaranteedbythememorysystemalone.
Implementation:AnotherapproachtoimplementingtheAtomicMemoryOperations(AMO)wouldmakeuseofthemoreprimitiveLRandSCoperations.Themicroarchitecturemight,forexample,implementtheAMOinstructionsintermsofthesimplerfunctionality,whichthemicroarchitecturealsousestoimplementtheLRandSCinstructions.
SequentialConsistency:Aseriesofreadsandwritestomemoryissaidtobe“sequentiallyconsistent”ifthefollowingstatementshold.(1)AllofthereadandwriteoperationsissuedbyalltheHARTscanbelinearizedintoasingle,undisputedorder,andthesameorderisobservedbyallHARTs.(2)TheoperationsofanyoneHARTmustofcourseappearinthisorderingintheordertheHARTactuallycalledforthem,althoughtheoperationsfromanytwoHARTscanbeinterleavedarbitrarily.(3)Oratleasttheresultsofthecomputationareindistinguishablefromsuchaglobal,linearorderingofalloperations.
Toforceallreadsandwritestoconformtosequentialconsistency,theprogrammercanusetheatomicoperationsasfollows.Thiswilleffectivelylinearizeallreadsandwritestomemory,sothateveryHARTwillagreeontheorderthateverythingisperformed.
Forreadssuchas: LD.Wx4,(x7)
Substitute: LR.W.aq.rl x4,(x7)
Forwritessuchas: ST.W(x7),x4
Substitute: AMOSWAP.W.aq.rl x0,x4,(x7)
RISC-VArchitectureSummary/Porter Page� of� 225 323
Chapter7:ConcurrencyandAtomicInstructions
Settingthe“aq”and“rl”bitsforcesallotherHARTstoseeinstructionsthatoccurbeforetheinstructiontoappeartoexecutetocompletionbeforetheinstruction.Also,allinstructionsthatoccuraftertheinstructionwillappeartootherHARTstoexecuteaftertheinstruction.
Implementation:NotethatanAMOSWAPoperationthatdiscardsthepreviousvaluefrommemory(i.e.,RegD=x0),canavoidthereadphaseoftheinstruction.Thisisstillausefulatomicinstruction,distinctfromastore(ST)instruction,duetothepresenceofthe“aq”and“rl”bits.Perhapssuchaninstructioncanbeoptimizedbythemicroarchitecturetoavoidthereadphase.
RISC-VArchitectureSummary/Porter Page� of� 226 323
Chapter8:PrivilegeModes
PrivilegeModes
Atanytime,theRISC-Vprocessorisoperatinginexactlyoneofthefollowing“modes”:
U-Mode(UserMode) !Lowestprivilege S-Mode(SupervisorMode) M-Mode(MachineMode) !Highestprivilege
Commentary:Previousversionsincludedafourthprivilegelevel,inwhichahypervisorwastoexecute.HypervisorModewasprettymuchidenticaltoSupervisorModeandwas2laggedas“tobespeci2iedinthefuture.”
U-Mode(UserMode) !Lowestprivilege S-Mode(SupervisorMode) H-Mode(HypervisorMode) M-Mode(MachineMode) !Highestprivilege
Typically,user-levelapplicationswillexecuteinUserMode.WhenevertheapplicationwantsanOSservice,itwillmakeasystemcall.TheOScodethathandlesthiscallwillexecuteinSupervisorMode,i.e.,atahigherprivilegelevel.Uponreturntotheapplication,themodewillbeloweredbacktoUserMode.
Mostinstructionscanbeexecutedinanymode,butsomeinstructionsareprivilegedandcanonlybeexecutedinahighermode.
Theprivilegelevelsarestrictlyordered.Anythingthatcanbedoneatalowerprivilegelevelcanbedoneinanyoneofthehigherlevels.Forexample,anythingthatcanbedoneinUserModecanbedoneinSupervisorMode.
RISC-VArchitectureSummary/Porter Page� of�227 323
Chapter8:PrivilegeModes
MachineModeisthehighestprivilegelevel;allinstructionsarelegalatthislevelandnothingisprotected.Whentheprocessorbeginsoperationafterpoweringuporafterareset,itisplacedinMachineMode.
Theremainingmodes(UserandSupervisormodes)areprettymuchthesameaseachother.Inotherwords,thereislittletodistinguishthem,excepttheirrelationshiptoeachother.IfyouunderstandUserMode,thenyouwillalsounderstandmostofSupervisorMode.
Thereisoneexceptiontotheuniformityandthisregardsvirtualmemoryandpagetables.ThisfunctionalityislocatedinSupervisorMode.
TheRISC-Vspecdescribestheprivilegemechanisminsomewhatgeneraltermsanditiseasytoimaginetheinsertionofadditionallevelsofprivilege,althoughitshouldbenotedthatafourthmode(HypervisorMode),whichexistedinpreviousversions,hasbeeneliminated.
Itappearsthatthecurrentmodeisnotavailableinanyuser-visibleregister.Inotherwords,itisdif2icultforcodetodeterminethecurrentmode.Itcannotsimplybereadfromaregister;insteadthecurrentmodeisimplicitinthefunctionalitythatisavailable.
[Thislackseemstobeintentional.Afterall,codeistypicallywrittentoruninonemodeoranotherandrarelyneedstoaskwhichmodeitisrunningin.Forexample,ahypervisormaywishtorunahostedOSisarestrictedlower-privilegemodethantheOSismeanttorunat,therebymaintainingfullcontrolofthemachineandpreventingthehostedOSfromcorruptingthehypervisoritselforotherhostedOSes.ThehostedOSexpectstoruninahigherprivilegemodethanitactuallyisrunningat;thehypervisormustcreateandpreservetheillusionthattheOSisrunninginahigherprivilegemodethanitactuallyis.]
TheRISC-Vspeci2icationdoesnotrequireallmodestobeimplemented.Someprocessorsmaynothaveallmodes.Herearethelegalpossibilities:
Option1 Option2 Option3 U U S M M M
RISC-VArchitectureSummary/Porter Page� of�228 323
Chapter8:PrivilegeModes
ThemostbasicprocessorwillhaveonlyMachineMode.Inasense,thereisnoprivilegesysteminsuchasimpleprocessorsincealloperationscanbeperformedatanytime.Animplementationlikethiswouldbesuitableforanembeddedmicrocontroller.
�
Withoption2,therearetwoprivilegelevels:UserandMachine.ThismightbeappropriateforasimpleOSwithsomeprotectedsecuritymonitorrunninginMachineModeandoneormoreapplicationsrunninginUserMode.
�
Option3isintendedformoreelaboratesystemsthatwillberunningatraditionalOSlikeUnix/Linux.
�
RISC-VArchitectureSummary/Porter Page� of�229 323
Chapter8:PrivilegeModes
Insomeimplementationsseveralinstructionsand/orISAfeatureswillbemissingandtherewillbeaneedtoemulatethemissingfunctionality.Thenaturalorganizationwillplacethecodetoemulatethemissinginstructions/featuresinMachineModesothattheOS(whichrunsinSupervisorMode)doesnotneedtobeawareofthesedetails.
PreviousRISC-VversionsdocumentedaHypervisorModeintendedtosupporthypervisors.Thehypervisorwastointendedtorunatitsownprivilegelevel,whileeachofthehostedOSesraninSupervisorMode.AlthoughHypervisorModehasbeeneliminated,weincludethefollowingdiagramanyway.
�
Intheabovediagramswesuggestwhichmodeswillbeusedforwhichsortsofcode,butthisisnotmandatory.Inparticular,anoperatingsystemkernelmightruninMachineModeinsomesystemsandinSupervisorModeinothersystems.
InterfaceTerminology
TheRISC-Vdocumentationdiscussestheinterfacingbetweenapplications,operatingsystems,andhypervisors.
Everyapplicationprogramrunsinaparticularenvironment,whichiscalledthe“ApplicationExecutionEnvironment”(AEE).Howtheapplicationinterfaceswiththeunderlyingexecutionenvironmentiscalledthe“ApplicationBinaryInterface”(ABI).
RISC-VArchitectureSummary/Porter Page� of�230 323
Chapter8:PrivilegeModes
�
TypicallyanOSsitsunderneaththeapplicationandimplementstheApplicationBinaryInterface,whichconsistsoftheUserModeinstructionsetalongwiththecollectionofsystemcallsthatareavailabletoapplications.TheApplicationBinaryInterfaceisthesumtotalofwhattheapplicationprogrammerneedstounderstandinordertowriteprograms;theprogrammerdoesnothavetounderstandorknowwhatisgoingonwithintheApplicationExecutionEnvironment.
AnexampleApplicationBinaryInterfacewouldcombinetheprocessorISAalongwiththeOSsystem-callinterface.
IfanapplicationisportedtoanewApplicationExecutionEnvironment,thentheapplicationwillfunctionidenticallywithnoproblems,aslongasthenewenvironmentimplementsexactlythesameApplicationBinaryInterface.
Typicallythereareseveralapplicationsrunningatanyonetime.TheOSgenerallyprovidesoneApplicationBinaryInterfaceandallapplicationsarewrittentomeetthisspeci2ication.However,anOSmightimplementmorethanoneApplicationBinaryInterface,asshowninthenextdiagram.
�
TheOSmightberunningonbaremetal.OrtheOSmightberunningontopofahypervisororsomeothermonitor/security/bootstrappingsoftware.Inanycase,theOSiscreatedandwrittentomeetthespeci2icationsofa“SupervisorExecution
RISC-VArchitectureSummary/Porter Page� of�231 323
Chapter8:PrivilegeModes
Environment”(SEE).Thisspeci2icationiscalledthe“SupervisorBinaryInterface”(SBI).
IftheOSisportedtoanewenvironment(forexample,tonewcomputerhardware),theOSshouldfunctionidentically,aslongasthenewexecutionenvironmentimplementstheexactsameSupervisorBinaryInterface.
TypicallyanOSisdesignedtorundirectlyonabaremachinewithnounderlyingsoftware.Inthiscase,theSBIconsistsofaspeci2icationoftheprocessor’sISA.
However,theOSmayinfactberunningontopofahypervisor.AslongasthehypervisorimplementsthecorrectSupervisorBinaryInterface,theOSshouldbeunawarethatitisrunningontopofahypervisor.
ItisconceivablethatthehypervisorprovidesmultipleSupervisorBinaryInterfaces.Forexample,onehypervisormightprovideanSBImimickingaPCandasecondSBImimickinganApplecomputer.SuchahypervisorwouldthanfacilitatethesimultaneousexecutionofbothaMSWindowsoperatingsystemandtheMacOS.Suchasetupisshownbelow.
�
Thehypervisoritselfrunsinsomeenvironment,whichiscalledthe“Hyperv2laisorExecutionEnvironment”(HEE).Theunderlyingenvironmentimplementsaparticularinterface,calledthe“HypervisorBinaryInterface”(HBI).
A“type-1hypervisor”(ornativehypervisor)isaprogramthatisintendedtorunonabaremachine.Inotherwords,thereisnosoftwarebelowthehypervisor.Thehypervisorachievesallitsworkusingonlytheinstructionsavailableontheprocessoranddoesnotmakeanysystemcalls.Foratype-1hypervisor,theHypervisorExecutionEnvironmentisjusttheprocessorcore’sinstructionsetarchitecture(ISA).
RISC-VArchitectureSummary/Porter Page� of�232 323
Chapter8:PrivilegeModes
A“type-2hypervisor”(orhostedhypervisor)isaprogramthatisintendedtorunontopofothersoftware.Thehypervisorsitsontopofsomeotheroperatingsystemand,asfarastheunderlyingOSisconcerned,thehypervisorisjustanotherapplicationprogram.AnexampleofthiswouldbeVMWare(whichrunsontopoftheAppleMacOSandprovidesaSupervisorBinaryInterfacetomimicthePCsoWindowscanrun.OtherexamplesareQEMUandVirtualBox.
Inordertoruncomputersystemsasef2icientlyaspossible,hardwareimplementationformostinstructionsisdesirable.Themajorityofcommonoperationsmustbeimplementeddirectlyinhardware,byinstructionsexecutedinthecurrentmode.
However,somecodewilloccasionallytrytoexecuteinstructionsthatrequiresoftwareintervention.Perhapsaparticularimplementationfailstoprovidehardwaretosupportcertainoperations;exampleswouldbethelackofhardwaresupportfor“integermultiplication”or“2loatingpointarithmetic”.Orperhapsthecodeisrunninginahypervisorenvironmentinwhichaparticularinstructioncannotbeblindlyexecuted,butmustbeintercepted,monitored,and/oralteredbeforebeingexecuted.Regardlessofthereasontheinstructioncannotbeexecuteddirectly,an“exception”willoccurwhentheinstructionisencountered.Themodewillbeincreasedtoahigherprivilegelevel,softwarewillintervene,andeventuallyareturntotheoriginalprivilegemodewilloccurandinstructionexecutionwillresume.
AllISAdesignerstrytominimizethenumberofoperationsthatrequiresoftwareinterventionand,wheninterventionisnecessary,trytomaketheupcalltotheexceptionhandlerquickandef2icient.
OvertheyearswehaveaccumulatedgoodexperiencewiththeinterfacebetweenUserModecodeandoperatingsystemcode.Muchhasbeendonetoreducethefrequencyofsystemcallsandmaketheupcallinterfacefastandef2icient.
Morerecentlyweareusingmorehypervisorsoftwaretosupportlegacyoperatingsystems.InordertoimplementtheSupervisorBinaryInterface(SBI)expectedbytheOS,thehypervisormaybeobligatedtostepinfrequently,whenevertheOSexecutesaprivilegedinstruction.ProvidinggoodISAsupportforef2icienthypervisorsisanactiveresearcharea.
RISC-VArchitectureSummary/Porter Page� of�233 323
Chapter8:PrivilegeModes
Exceptions,Interrupts,TrapsandTrapHandlers
Duringtheexecutionofaninstruction,an“exception”mayoccur.
HerearethedifferenttypesofexceptionsmentionedintheRISC-Vspec:
SynchronousExceptions:Illegalinstruction
Instructionaddressmisaligned Instructionaccessfault Loadaddressmisaligned Loadaccessfault Store/AtomicMemoryOperation(AMO)addressmisaligned Store/AtomicMemoryOperation(AMO)accessfault Environmentcall(i.e.,the“syscall”trapinstruction) BreakpointAsynchronousExceptions(Interrupts): Timerinterrupt Softwareinterrupt Externalinterrupt
Asynchronousexception(sometimesjustcalledan“exception”)occursasaresultoftryingtoexecuteaninstructionandissaidtobesynchronoussinceitisintimatelyconnectedwithaparticularinstruction.Thesynchronousexceptionwillbedetectedduringinstructionexecutionatthetimetheexceptionalconditionisdetected.Forexample,duringinstructiondecode,thehardwaremaydetectabadopcode2ield,whichwillinitiatethe“illegalinstruction”exceptionprocessing.
Theothertypeofexceptionisan“interrupt”.Duringtheexecutionofaninstruction,aninterruptmayoccur.Interruptsarecausedbyeventsoutsidethecurrentinstructionexecutionand,assuch,areasynchronous.Wheneveraninterruptoccurs,itwillbecomeassociatedwithasingleinstruction(basicallythecurrentlyexecutinginstruction)andthatinstructionischosentoreceivetheinterruptexception.
Anexceptioncanbehandledorignored.Iftheexceptionishandled,thena“trap”willoccur.Thetrapprocessinginvolvesatransferofcontroltoa“traphandler”routine.Trapprocessingconsistsofafewhardwareoperations,suchasmodifyingacoupleofhardware2lags,savingthePC,andeffectingatransferofcontroltothe2irstinstructionofthetraphandlerroutine.
RISC-VArchitectureSummary/Porter Page� of�234 323
Chapter8:PrivilegeModes
A“traphandler”isasoftwareroutineandwillusuallybeexecutedatahigherprivilegemodethantheinstructionreceivingtheexception.Forexample,an“illegalinstruction”exceptionmightoccurinapplicationcoderunninginUserMode.ThetraphandlerwillruninSupervisorModeand,whencomplete,thehandlermayreturntoexecutinginstructionsinUserMode.
However,thechangeofmodemaynotalwaysoccur.Insomecases,thetraphandlerwillexecuteatthesameprivilegemodeastheinstructionreceivingtheexception.WithRISC-V,sometrapsmaybeentirelycontainedinUserMode;theapplicationcodewillberesponsibleforhandlingitsowntrapsandsupervisorcodewillnotbeinvolved.
A“timerinterrupt”iscausedwhenaseparatetimercircuitindicatesthatapredetermineintervalhasended.Thetimersubsystemwillinterruptthecurrentlyexecutingcode.TimerinterruptsaretypicallyhandledbytheOSwhichusesthemtoimplementtime-slicedmultithreading.
An“externalinterrupt”comesfromoutsidetheprocessorandtheprecisenatureofthecausewilldependontheapplication.Forexample,aRISC-Vprocessorusedinanembeddedprocesscontrolsystemmightreceiveexternalinterruptsfromvarioussensorsdemandingattention.
A“softwareinterrupt”iscausedbysettingabitinthemachinestatusword.Thiscanbeusefulinamulti-corechipwhereathreadrunningononecoreneedstosendaninterruptsignaltoanothercore.
ControlandStatusRegisters(CSRs)
Inadditiontothebasicuser-levelinstructionsdiscussedpreviously,whichcanbeexecutedbycoderunninginanymode,thereareanumberofadditionalfeaturesthatanyonewritinganOSkernelcodewillneedtounderstand.
AtthecenteroftheRISC-VprivilegesystemareanumberofControlandStatusRegisters(CSRs),whicharedifferentfromthegeneralpurposeand2loatingpointregisters.EachCSRhasauniquenameandspecializedfunction.
RISC-VArchitectureSummary/Porter Page� of�235 323
Chapter8:PrivilegeModes
ControloftheprivilegesystemisperformedentirelybyreadingandwritingtheCSRs.OS/Kernelcodecanperformprivilegedoperationssolelybyreading/writingtoCSRs.
TheRISC-Vspeci2icationallowsforupto4,096CSRstobepresent,butinmostimplementations,notalladdresseswillbeused.Infact,thereareonlyacoupleofdozenCSRsde2inedbytheRISC-Vspeci2ication.Theotheraddressesareleftopenanddifferentimplementationsmaychoosetoaddadditionalnon-standardCSRs,uptothelimitof4,096registers.
EachCSRhasanaddress.Or,ifyouprefer,youcanthinkofeachCSRashavinganumberintherange0to4,095.
Sincetherecanbeupto4,096CSRs,a12-bitaddressisusedtoaddresseachCSR.(Recallthat212=4,096.)TheinstructionsthatreadandwritetheCSRsusetheI-typeinstructionformat.Thisinstructionformatincludesa12-bitimmediate2ield,andthis2ieldisusedtocontainthe12bitaddressoftheCSRregisterbeingoperatedon.
ThesizeoftheCSRsisspeci2iedtobethesamesizeasthegeneralpurposeregisters.SoalltheCSRswillbe32bitsinaRV32machine.Ina64-bitmachine,theCSRsare64bitswide.Likewise,inamachinewith128bitregisters,theCSRsare128bitswide.
Inordertoworkwithallsizesofmachines,thespeci2icationnevermakesuseofmorethan32bitsinanyCSRregister.For64-bitand128-bitmachines,theupper32or96bitsareneverusedandare2illedwithzeros.[Actually,thereareminorexceptions.]
EachCSRhasadifferentmeaninganduse.SomeCSRsarecomprisedofacollectionofspecialized2ields(some2ieldsareonly1or2bitsinlength),whileotherCSRscontainasinglefulllengthvalue,suchasa32bitaddress.
EachCSRbelongstooneoftheprivilegemodes.Inotherwords,someCSRsareMachineModeCSRs,someCSRsareSupervisorModeCSRs,andtheremainingCSRsareUserModeCSRs.
AMachineModeCSRcanonlyberead/writtenwhentheprocessorisinMachineMode;itisillegaltoaccessitwhenrunninginanyothermode.ASupervisorModeCSRcanberead/writtenwhentheprocessorisineitherMachineModeor
RISC-VArchitectureSummary/Porter Page� of�236 323
Chapter8:PrivilegeModes
SupervisorMode,butnotfromcoderunninginUserMode.AUserModeCSRcanberead/writtenregardlessofthecurrentoperatingmode.
Thereareseveralinstructionswhichareusedtomanageandcontroltheprivilegesystem,andtoperformprivilegedoperations.Theinstructionsare:
ECALL–Tomakeasystemcallfromalowerprivilegeleveltoahighermode EBREAK–Usedbydebuggerstogetcontrol;similartoECALL URET–ToreturnfromtraphandlerthatwasrunninginUserMode SRET–ToreturnfromtraphandlerthatwasrunninginSupervisorMode MRET–ToreturnfromtraphandlerthatwasrunninginMachineMode WFI–Gointosleep/lowpowerstateandWaitForInterrupt CSR…–Instructionstoread/writetheControlandStatusRegisters(CSRs)
WritingtoaCSRwillchangethestateoftheprocessorcore.Forexample,toenable/disableinterrupts,theprogramwouldwritetooneoftheCSRs.
Reading/writingsomeCSRsisnotallowedatlowerprivilegelevels.SomeCSRsareread-onlyatallprivilegelevelsandarethereforesetinstonebythechipdesigners,e.g.,todescribethecore’scapabilities.
KeyIdea:TounderstandtheRISC-Vprivilegesystemandexceptionprocessing,itisbothnecessaryandsuf2icienttounderstandtheCSRsandtheirfunctionality.
CSRListing
TheControlandStatusRegisters(CSRs)arelistedbelowforreference.Theirfunctionswillbedescribedlater.
ForeachCSR,wegiveits12-bitaddress(using3hexdigits).Wealsoindicatewhetheritcanbeaccessedineachoftheprivilegemodes(User,Supervisor,Machine)and,ifso,whatsortofaccess(read-onlyorread/write)isallowed.
Recallthattheprocessorisexecutinginoneofthethreemodesatanymoment.ItisaprivilegeviolationtotrytoaccessaCSRthatisnotvisibleinthecurrentmode.Likewise,itisaprivilegeviolationtotrytowritetoaCSRthatisread-onlyinthecurrentmode.Ifthishappens,anillegalinstructionexceptionwillbesignaled.
RISC-VArchitectureSummary/Porter Page� of�237 323
Chapter8:PrivilegeModes
Visibilityin…AddrName Description U S M
Counters:
C00 cycle Clockcyclecounter r r rB00 mcycle r/wC80 cycleh Upperhalfofcycle(RV32only) r r rB80 mcycleh r/w
C01 time Currenttimeinticks r r rC81 timeh Upperhalfoftime(RV32only) r r r
C02 instret Numberofinstructionsretired r r rB02 minstret r/wC82 instreth Upperhalfofinstret(RV32only) r r rB82 minstreth r/w
ExceptionProcessing:
000 ustatus Statusregister r/w r/w r/w100 sstatus r/w r/w300 mstatus r/w 004 uie Interrupt-enableregister r/w r/w r/w104 sie r/w r/w304 mie r/w
005 utvec Traphandlerbaseaddress r/w r/w r/w105 stvec r/w r/w305 mtvec r/w
102 sedeleg Exceptiondelegationregister r/w r/w302 medeleg r/w
103 sideleg Interruptdelegationregister r/w r/w303 mideleg r/w 106 scounteren Counterenable r/w r/w
RISC-VArchitectureSummary/Porter Page� of�238 323
Chapter8:PrivilegeModes
306 mcounteren r/w
TrapHandling:
040 uscratch Tempregisterforuseinhandler r/w r/w r/w140 sscratch r/w r/w340 mscratch r/w
041 uepc PreviousvalueofPC r/w r/w r/w141 sepc r/w r/w341 mepc r/w
042 ucause Trapcausecode r/w r/w r/w142 scause r/w r/w342 mcause r/w
043 utval Badaddressorbadinstruction r/w r/w r/w143 stval r/w r/w343 mtval r/w
044 uip Interruptpending r/w r/w r/w144 sip r/w r/w344 mip r/w
VirtualMemory:
180 satp Addresstranslationandprotection r/w r/w
Informational:
301 misa ISAandextensions r/wF11 mvendorid VendorID rF12 marchid ArchitectureID rF13 mimpid ImplementationID rF14 mhartid HardwarethreadID r
FloatingPoint:
001 f=lags Floatingpointing2lags r/w r/w r/w002 frm Dynamicroundingmode r/w r/w r/w
RISC-VArchitectureSummary/Porter Page� of�239 323
Chapter8:PrivilegeModes
003 fcsr Concatenationoffrm+f2lags r/w r/w r/w
PerformanceMonitoring:
323 mhpmevent3 Eventselector#3 r/wC03 hpmcounter3 Eventcounter#3 r r rB03 mhpmcounter3 r/wC83 hpmcounter3h Upperhalfofcounter(RV32only) r r rB83 mhpmcounter3h r/w
324 mhpmevent3 Eventselector#4 r/wC04 hpmcounter4 EventCounter#4 r r rB04 mhpmcounter4 r/wC84 hpmcounter4h Upperhalfofcounter(RV32only) r r rB84 mhpmcounter4h r/w
… … … … … …
33F mhpmevent31 Eventselector#31 r/wC1F hpmcounter31 EventCounter#31 r r rB1F mhpmcounter31 r/wC9F hpmcounter31h Upperhalfofcounter(RV32only) r r rB9F mhpmcounter31h r/w
PhysicalMemoryProtection:
3A0 pmpcfg0 PMPCon2igurationword#0 r/w3A1 pmpcfg1 PMPCon2igurationword#1 r/w3A2 pmpcfg2 PMPCon2igurationword#2 r/w3A3 pmpcfg3 PMPCon2igurationword#3 r/w
3B0 pmpaddr0 PMPAddress#0 r/w3B1 pmpaddr1 PMPAddress#1 r/w… … … …3BF pmpaddr15 PMPAddress#15 r/w
Debug/Trace:
7A0 tselect Triggerregisterselect r/w7A1 tdata1 Triggerdata#1 r/w
RISC-VArchitectureSummary/Porter Page� of�240 323
Chapter8:PrivilegeModes
7A2 tdata2 Triggerdata#2 r/w7A3 tdata3 Triggerdata#3 r/w
7B0 dcsr Debugcontrolandstatus r/w7B1 dpc DebugPC r/w7B2 dscratch Scratchregister r/w
InstructionstoRead/WritetheCSRs
RegardlessofwhichControlandStatusRegister(CSR)isinvolved,thereareonly6instructionsthatareusedtoreadandwritetheregisters:
CSRRW–ReadandwriteaCSR CSRRS–Readandsetselectedbitsto1 CSRRC–Readandclearselectedbitsto0 CSRRWI–ReadandwriteaCSR(fromimmediatevalue) CSRRSI–Readandsetselectedbitsto1(usingimmediatemask) CSRRCI–Readandclearselectedbitsto0(usingimmediatemask)
ThereareafewotherinstructionsthatreadormodifyaCSR,buttheseadditionalinstructionsaremerelyspecialcases(i.e.,shorthandorsyntacticsugar)foroneoftheaboveinstructions.
Eachofthese6instructionsbothreadsandwritesasingleCSRinasingle,atomicaction.
EachoftheseinstructionsreadsaCSRbycopyingitspreviousvalueintooneofthegeneralpurposeregisters(x1,x2,…).Often,youonlywanttomodifytheregisteranddon’tcareaboutitspreviousvalue;ifso,youcanspecifythedestinationregistertobex0.
EachoftheseinstructionsisalsocapableofmodifyingtheCSR.Often,youonlywanttoreadtheregister;ifso,youcanspecifythesourcevalueasregisterx0.(Thisisaspecialcase.TheCSRisremainsunchanged;itisnotsettozero.)
Thenewvaluecanbeeitherafullvalue,orabitmaskwhichwillbeusedtosetorclearselectedbits.Thenewvalue(orbitmask)caneithercomefromaregisterorcanbecontainedwithinanimmediatedata2ieldwithintheinstructionitself.
RISC-VArchitectureSummary/Porter Page� of�241 323
Chapter8:PrivilegeModes
WhethercodeisallowedtoaccessaparticularCSRisdependentonthecurrentprivilegemodeatwhichthecodeisbeingexecuted,aswellaswhichCSRisinvolved.Forexample,ifcoderunninginUserModeattemptstoreadorupdateoneoftheMachineModeCSRs(suchasmstatus),thiswillnotbeallowed.
Ifanattemptismadetoreadand/orwritetoaCSRthatisforbiddenatthecurrentprivilegelevel,thenan“IllegalInstruction”exceptionwilloccur.Theinstructionwillnotcompleteandexceptionprocessingwillensue.
WithinafewoftheCSRs,theprotectionisona2ield-by-2ieldbasis.Inotherwords,someofthebitsmaybeprotectedfromchange(i.e.,read-only),whiletheremainingbitscanbefreelychanged(i.e.,read-write).Forexample,withinthemstatusCSR,itispossibletomodifysomebitsbutnotallowabletomodifyotherbits.(ButforotherCSRs,theentireregisteriseitherwritableornot;thatis,allbitshavethesameprotection.)Anyattempttomodifybitsthatareread-onlywillsimplybeignored.
CSRRead/Write
GeneralForm:CSRRW RegD,Reg1,Immed-12
Example:CSRRW x9,x4,cycle # x9=CSR[0xC00]; CSR[0xC00]=x4
Description:Thisinstructioncanbeusedtoreadfromand/orwritetoaCSR.
TheImmed-122ieldisusedtoencodetheaddressofoneofthe4,096ControlandStatusRegisters(CSRs).ThepreviousvalueoftheCSRiscopiedtothedestinationregisterandthevalueofthesourceregisterReg1iscopiedtotheCSR.Thisisanatomicoperation.
Intheaboveexample,aCSRnamed“cycle”isbeingaccessed.ThisCSRhappenstohavethe12-bitaddress0xC00.Thisregistercontainsacounterofthenumberofclockcyclessinceitwaslastwrittento.Thecounterisautomaticallyincrementedasinstructionsareexecuted.
ToreadaCSRwithoutwritingtoit,thesourceregisterReg1canbespeci2iedasx0.TowriteaCSRwithoutreadingit,thedestinationregisterRegDcanbespeci2iedasx0.
RISC-VArchitectureSummary/Porter Page� of�242 323
Chapter8:PrivilegeModes
Comment:InlowerprivilegemodessomeoftheCSRsareinaccessible.AnattempttoreadfromorwritetoaCSRmaycauseanillegalinstructionexception.
Encoding: ThisisanI-typeinstruction.
CSRReadandSetBits
GeneralForm:CSRRS RegD,Reg1,Immed-12
Example:CSRRS x9,x4,mstatus # x9=CSR[0x300]; CSR[0x300]|=x4
Description:TheImmed-122ieldisusedtoencodetheaddressofoneofthe4,096ControlandStatusRegisters(CSRs).ThepreviousvalueoftheCSRiscopiedtothedestinationregisterandthensomeselectedbitsoftheCSRaresetto1.ThevalueinReg1isusedasabitmasktoselectwhichbitsaretobesetintheCSR.Otherbitsareunchanged.Thisisanatomicoperation.
Intheaboveexample,aCSRnamed“mstatus”isbeingaccessedwhichhasthe12-bitaddress0x300.Thisregistercontainsanumberofbitswhichcontrolwhichcontrolinterruptprocessing.
ThisinstructioncanbeusedtosimplyreadaCSRwithoutupdatingit.IfReg1isx0,thennoupdatetotheCSRwilloccur.
Exception:Maycauseanillegalinstructionexceptionifthecurrentprivilegemodeisnothighenough.
Encoding: ThisisanI-typeinstruction.
CSRReadandClearBits
GeneralForm:CSRRC RegD,Reg1,Immed-12
Example:CSRRC x9,x4,mstatus # x9=CSR[0x300]; CSR[0x300]&=~x4
RISC-VArchitectureSummary/Porter Page� of�243 323
Chapter8:PrivilegeModes
Description:TheImmed-122ieldisusedtoencodetheaddressofoneofthe4,096ControlandStatusRegisters(CSRs).ThepreviousvalueoftheCSRiscopiedtothedestinationregisterandthensomeselectedbitsoftheCSRareclearedto0.ThevalueinReg1isusedasabitmasktoselectwhichbitsaretobeclearedintheCSR.Otherbitsareunchanged.Thisisanatomicoperation.
ThisinstructioncanbeusedtosimplyreadaCSRwithoutupdatingit.IfReg1isx0,thennoupdatetotheCSRwilloccur.
Exception:Maycauseanillegalinstructionexceptionifthecurrentprivilegemodeisnothighenough.
Encoding: ThisisanI-typeinstruction.
CSRRead/WriteImmediate
GeneralForm:CSRRWI RegD,Immed-5,Immed-12
Example:CSRRWI x9,3,mstatus # x9=CSR[0x300]; CSR[0x300] = 3
Description:TheImmed-122ieldisusedtoencodetheaddressofoneofthe4,096ControlandStatusRegisters(CSRs).ThepreviousvalueoftheCSRiscopiedtothedestinationregisterandthentheentireCSRiswrittento.The5-bit2ieldthatisnormallyusedforReg1iszero-extendedandusedasthesourcevaluethatismovedintotheCSR.Thisisanatomicoperation.
Thisinstructionmakesbits[4:0]inanyCSRparticularlyeasytomodify.Exception:
Maycauseanillegalinstructionexceptionifthecurrentprivilegemodeisnothighenough.
Encoding: ThisisanI-typeinstruction.
RISC-VArchitectureSummary/Porter Page� of�244 323
Chapter8:PrivilegeModes
CSRReadandSetBitsImmediate
GeneralForm:CSRRSI RegD,Immed-5,Immed-12
Example:CSRRSI x9,3,mstatus # x9=CSR[0x300]; CSR[0x300]|=3
Description:TheImmed-122ieldisusedtoencodetheaddressofoneofthe4,096ControlandStatusRegisters(CSRs).ThepreviousvalueoftheCSRiscopiedtothedestinationregisterandthensomeselectedbitsoftheCSRaresetto1.The5-bit2ieldthatisnormallyusedforReg1iszero-extendedandusedasabitmasktoselectwhichbitsaretobesetintheCSR.Otherbitsareunchanged.Thisisanatomicoperation.
Thisinstructionmakesbits[4:0]inanyCSRparticularlyeasytosetto“1”.Exception:
Maycauseanillegalinstructionexceptionifthecurrentprivilegemodeisnothighenough.
Encoding: ThisisanI-typeinstruction.
CSRReadandClearBitsImmediate
GeneralForm:CSRRCI RegD,Immed-5,Immed-12
Example:CSRRCI x9,3,status # x9=CSR[0x300]; CSR[0x300]&=~3
Description:TheImmed-122ieldisusedtoencodetheaddressofoneofthe4,096ControlandStatusRegisters(CSRs).ThepreviousvalueoftheCSRiscopiedtothedestinationregisterandthensomeselectedbitsoftheCSRareclearedto0.The5-bit2ieldthatisnormallyusedforReg1iszero-extendedandusedasabitmasktoselectwhichbitsaretobeclearedintheCSR.Otherbitsareunchanged.Thisisanatomicoperation.
Thisinstructionmakesbits[4:0]inanyCSRparticularlyeasytoclearto“0”.Exception:
Maycauseanillegalinstructionexceptionifthecurrentprivilegemodeisnothighenough.
RISC-VArchitectureSummary/Porter Page� of�245 323
Chapter8:PrivilegeModes
Encoding: ThisisanI-typeinstruction.
BasicCSRs:CYCLE,TIME,INSTRET
TherearethreeControlandStatusRegisters(CSRs)thatareparticularlyeasytounderstand,solet’sstartwiththem:
cycle–counterofclockcyclestime–currentrealtimeinstret–counterofinstructionsretired(i.e.,executed)
EachoftheseCSRsisautomaticallyincrementedasinstructionsareexecuted.Allarereadableatanyprivilegemode.Togethertheycanbeusedtomeasurecodespeedandperformance.
TheCSR“cycle”countsthenumberofclockcycles.TheCSR“time”measuresrealtimeinunitsof“ticks.”Thenumberoftickspersecondisimplementationdependentandcanbedeterminedelsewhere.TheCSR“instret”countsthenumberofinstructionsretired(i.e.,thenumberofinstructionscompleted).
TheseCSRsareallUserModeregisters,whichmeanstheycanbereadbycodeexecutinginanymode.Theseregistersareread-only,sotheycannotbewrittento.
Wecanresetthesecounters,buttheresetoperationrequireshigherprivilege;UserModecodeshouldnotbeabletowritetothesecounters.Toaccommodateupdatingbutonlyathigherprivileges,theseregistersare“mirrored”byMachineModeregisters.Themirroredversions(calledmcycle,minstret,…)havedifferentnamesanddifferentaddresses.Assuch,theycanonlybewrittentobycoderunninginMachineMode.
Toavoidpossibleover2low,thesecountersneedtobe64-bits.For64-bitand128-bitmachines,eachofthethreeCSRswillbelargeenoughtoavoidover2low.
For32-bitmachines,theRV32speci2icationbreakseachcounterintotwo32-bitCSRs.Inotherwords,theRV32speci2icationintroduces3additionalregisterstocontainthehigh-order32-bits:
RISC-VArchitectureSummary/Porter Page� of�246 323
Chapter8:PrivilegeModes
cycle–clockcycles(lower32bits)cycleh–clockcycles(higher32bits)time–realtime(lower32bits)timeh–realtime(higher32bits)instret–instructionsretired(lower32bits)instreth–instructionsretired(higher32bits)
Thefollowinginstructionsareshortformsofotherinstructions.Theyaredesignedtomakereadingtheabove-mentionedcountersespeciallyeasy.
Read“CYCLE”
GeneralForm:RDCYCLE RegD
Example:RDCYCLE x9 # x9=CSR[cycle]
Encoding: Thisisasimpli2iedformofamoregeneralinstruction.
Read“CYCLEH”
GeneralForm:RDCYCLEH RegD
Example:RDCYCLEH x9 # x9=CSR[cycleh]
Comment: Onlyon32-bitmachines.Encoding: Thisisasimpli2iedformofamoregeneralinstruction.
Read“TIME”
GeneralForm:RDTIME RegD
Example:RDTIME x9 # x9=CSR[time]
RISC-VArchitectureSummary/Porter Page� of�247 323
Chapter8:PrivilegeModes
Encoding: Thisisasimpli2iedformofamoregeneralinstruction.
Read“TIMEH”
GeneralForm:RDTIMEH RegD
Example:RDTIMEH x9 # x9=CSR[timeh]
Comment: Onlyon32-bitmachines.Encoding: Thisisasimpli2iedformofamoregeneralinstruction.
Read“INSTRET”
GeneralForm:RDINSTRET RegD
Example:RDINSTRET x9 # x9=CSR[instret]
Encoding: Thisisasimpli2iedformofamoregeneralinstruction.
Read“INSTRETH”
GeneralForm:RDINSTRETH RegD
Example:RDINSTRETH x9 # x9=CSR[instreth]
Comment: Onlyon32-bitmachines.Encoding: Thisisasimpli2iedformofamoregeneralinstruction.
ExampleCode:Thefollowingsequencecanbeusedtocorrectlyreada64-bitcounterona32-bitmachinewherethecounterisbrokenintotwo32-bitpieces.
RISC-VArchitectureSummary/Porter Page� of�248 323
Chapter8:PrivilegeModes
Thereisapossibilitythat,betweenreadingthetwohalves,over2lowfromthelower-order32bitsintotheupperbitsmightoccur.
again: rdcycleh x3 # Read high bits rdcycle x2 # Read low bits rdcycleh x4 # Check for overflow bne x3,x4,again # from low to high
CSRRegisterMirroring
Atanymomenttheprocessorisexecutinginsomemode,i.e.,ineitherUser(U),Supervisor(S),orMachine(M)Mode.
TheRISC-Vspecde2inesafewdozenControlandStatusRegisters(CSRs)andtheyarebrokeninto3groups.Thereisonegroupforeachmode,soeachCSRbelongstoeitherUserMode,SupervisorMode,orMachineMode.
WhenrunninginMachineModeallCSRsareaccessible,regardlessofwhichgrouptheybelongto,sinceMachineModeisthehighestprivilegelevel.InSupervisorMode,onlytheSupervisorandUserCSRscanbeaccessed.Finally,whenrunninginUserMode,onlyUserCSRsareaccessible.
Tostatethesamethinganotherway,aMachineCSRcanonlybereadorwrittenwhenrunninginMachineMode.ASupervisorCSRcanbewrittenwhenrunningineitherMachineModeorSupervisorMode.Finally,aUserCSRcanbereadorwrittenregardlessofthecurrentprivilegemode.
EachCSRhasaname.
Inmanycases,the2irstletteroftheregisternameindicateswhichgroupitisin(either“u”,“s”,or“m”).
Examplesthatfollowthe2irst-letterconventionaremstatusandmisa(accessibleonlyinMachineMode),sstatusandsatp(accessibleinSupervisorandMachineModes),andustatusanducause(accessibleinUser,Supervisor,andMachine
RISC-VArchitectureSummary/Porter Page� of�249 323
Chapter8:PrivilegeModes
Modes).Counterexamplesarecycle,time,andinstretwhichoughttobeginwith“u”sincetheyareaccessibleinUser,Supervisor,andMachineModes.
AparticularimplementationmayaddnewCSRsinadditiontothosedescribedintheRISC-Vstandard.EachCSRregisterhasanaddressandtheaddressesare12bitswide,allowingforupto4,096CSRs.
Details:TheaddressspaceforCSRsusesa12-bitaddress,allowingfor4,096differentCSRregisters.Thisaddressspaceispartitionedinto4regions,withupto1,024registerseach.Thereisoneblockofregistersforeachmode(U,S,andM)alongwithanextra“reserved”blockwhichwasformerlyusedforHypervisorMode.HypervisorModehasbeeneliminated.
Furthermore,eachregisteralsofallsintooneofthefollowingaccessrestrictioncategories:
registersde2inedbytheRISC-Vstandard •read-only •read-write extensions,notde2inedbythestandard •read-only •read-write registersusedfordebugmode •standard •non-standardextensions
ThisallowsanimplementationoftheRISC-Vspectoaddnew,non-standardCSRswithoutfearthattheiraddresseswillbeoverloadedinfuturerevisionstothestandard.
Themode(U,S,orM)andtheaccessrestrictioncategorycanbedeterminedsolelyfrombitsintheCSRaddress.Thus,thehardwarecaneasilycheckwhetheragiventypeofaccess(e.g.,aread-writeaccessfromcoderunninginUserMode)isallowed,simplybylookingattheaddress.TheencodingofbitsintheCSRaddressisratherconvoluted;consulttheof2icialdocumentationfordetails.
Someoftheregistersaremirrored,whichmeansthesameconceptualregisterisavailableattwoormoreaddresses.Forexample,thecycleregister,whichcontainsa
RISC-VArchitectureSummary/Porter Page� of�250 323
Chapter8:PrivilegeModes
counterofthecurrentnumberofclockcycles,ismirroredinbothUserModeandMachineMode.AmirroredCSRwillhavetwonamesandtwoaddresses:
Address RegisterName 0xC00 cycle(read-only) 0xB00 mcycle(read-write)
Thisgroupingandmirroringtechniquehasacoupleofadvantages.
Oneadvantageofmirroringtheregistersisthatasingleregistercanberead-onlyatoneprivilegelevelandyetwritableatahigherprivilegelevel.
Forexample,the“cycle”register(whichcountshowmanyclockcycleshaveoccurredsincelastbeingreset)isconceptuallyasingleregisterandwouldcertainlybeimplementedassuch.Theregistershouldbevisibleandreadabletoallcode,buttheresetoperation(whichwritestotheregister)shouldonlybeallowedwhenrunningatahigherprivilegelevel.Thus,wewantthe“cycle”registertoberead-onlyforUserModecode,butmodi2iablebyMachineModecode.
Toachievethis,theregisteris“mirrored”.Thereisaread-onlyCSRnamedcyclewhichcanbeaccessedatanyprivilegelevel,andthereisadifferentread-writeCSRnamedmcyclewhichcanonlybeaccessedwhenrunningattheMachineModeprivilegelevel.Actually,thereisonlyasingle,underlying“mirrored”register,soanyvaluewrittentomcyclewillbeimmediatelyvisiblewhenreadingfromcycle.
Accesstoregistersatahigherprivilegelevelthanthecurrentmodeisalwaysforbidden.Thisallowsanoperatingsystemtohideinformationfromuser-levelcode.Forexample,itmightbeusedinahypervisortohideinformationsothatanoperatingsystemcanbetrulyfooledaboutwhatsortofprocessoritisrunningon.
ThisgroupingmechanismisauniformapproachtopreventinglowerprivilegecodefrommodifyingCSRsthatmustbeprotected.Forexample,user-levelcodemustbepreventedfrommodifyingthevirtualmemorypagingschemeandthishappensnaturallybecausetheCSRassociatedwithpagetablesisnotaCSRthatisaccessibleinUserMode.
Someregisterscontainanumberofbit2ieldsandthese2ieldsneedtohavedifferentaccessibility.Forexample,inthemachinestatusregister,some2ieldsshouldberead-onlywhileother2ieldsareupdatable.Thisissupportedinthefollowingway.An
RISC-VArchitectureSummary/Porter Page� of�251 323
Chapter8:PrivilegeModes
attempttoupdatea2ieldthatisread-onlywillbeignored,eventhoughother2ieldsinthesameCSRcanbeupdated.
The“status”registerisusedindeterminingwhetherinterruptsareenabledandwhatvirtualmemorymechanismiscurrentlyineffect,amongotherthings.Thisregisterismirroredatallprivilegelevels,usingadifferentCSRateachlevel.
Address RegisterName 0x000 ustatus 0x100 sstatus 0x300 mstatus
Withinthestatusregister,some2ieldsareinvisiblewhenrunningatlowerprivilegelevels.Thisissupportedbygivingtheregisteradifferentnameforeachmode.Somebitsthatarede2inedinthemstatusregisterwillbereadsimplyaszerosinthesstatusregister,tore2lectthefactthatthesebitsmustbeaccessibleinMachineModebutnotinSupervisorMode.
AnOverviewofImportantCSRs
InthissectionwedescribethemostimportantControlandStatusRegisters(CSRs).Thisdescriptiongoeshand-in-handwithdescribingfeaturesofaRISC-VpricessornotcoveredelsewhereandofconcernonlytoprogrammersofOSandkernelcode.
Recallthatthe2irstletteroftheCSRregisternamewilltypicallybem,s,orutoindicatewhichprivilegelevelisrequiredinordertoaccessthisregister.Forexample,the“misa”registercanonlybereadwhenrunninginMachineMode,whilethe“ustatus”registermaybeaccessedfromanymode.
misa–MachineInstructionSetArchitecture(ISA)
ThisCSRgivesinformationaboutthebasicarchitectureofthemachine.Ittellstheregisterwidth(32,64,or128)andindividualbitsinthisCSRalsoindicatewhichofthevariousoptionsandextensionsdetailedbytheRISC-Vspeci2icationhavebeenimplemented.ThisregisterencodeswhetherthismachineisanRV32IM,anRV64IMAFDQ,orsomeothervariantoftheRISC-Varchitecture.
RISC-VArchitectureSummary/Porter Page� of�252 323
Chapter8:PrivilegeModes
Theregisterwidthofthemachine(either32,64,or128)isencodedinthemostsigni2icanttwobitsofthisCSR.Thisisclever,sinceitmakesitpossibleforthesamecodetobeexecutedondifferentmachineswithdifferentregisterwidths.Usingonlyinstructionstotestthesignoftheregisterandshifttheregisterleft,thecodecandeterminewhattheregistersizeis,andthenbranchaccordinglytodifferentcodeblocksforeachofthethreepossibilities.
Thisregisterisread-write.
Somemachinesmaysupportmultipleregisterwidths.Forexample,anRV64machinemaybecapableofrunningas(i.e.,emulating)anRV32machine.Uponpower-onorreset,themisaregisterwillbesettoindicatethewidestregisterwidththecoreiscapableofimplementing.Softwarecansetthisregistertoeffectivelyturn(forexample)anRV64machineintoanRV32machine.
Thelower-order26bitscorrespondtothelettersA,B,…Z(“A”=bit0,“B”=bit1,etc.)Eachbitwillbesettoindicatewhetherthisimplementationsupportsthecorrespondingextension.Forexample,bit5willbesetifthecoresupportsthe“F”singleprecision2loatingpointextension.
mvendorid–MachineVendorID
Forcommercialimplementations,thisCSRidenti2iesbynumberthevendor/manufacturer/organizationthathasproducedthischip.ThenumberusedhereistheIDissuedbyasemiconductorengineeringtradeorganizationcalledJEDEC.Forresearchandnon-commercialimplementations,thisregisterwillcontainzero.
Thisregisterisread-only.
marchid–MachinearchitectureID
ThisCSRidenti2iestheparticulararchitectureofthepartandisessentiallythe“partnumber”or“modelnumber”.Forcommercialimplementations,thisnumberisassignedbythevendor.Forsomenon-commercialoropen-sourceprojects,anumbermaybeassignedbytheRISC-VFoundation.Otherwise,thisregisterwillcontainzero.
Thisregisterisread-only.
RISC-VArchitectureSummary/Porter Page� of�253 323
Chapter8:PrivilegeModes
mimpid–MachineImplementationID
Givenaparticularvendor(asidenti2iedinmvendorid)andapart/modelnumber(asidenti2iedinmarchid),theremaybeseveralversions.Thisnumberidenti2iestheparticularimplementationorversionoftheprocessor.Itmaybezero.
Thisregisterisread-only.
cycle–CycleCounter(read-only)mcycle–CycleCounter(writable)
Thecycleregisterisaccessibleread-onlyinallmodes.Itcountshardwareclockcycles.ThecountercanberesetbywritingtothemcycleCSR,whichisonlyaccessibleinMachineMode.
instret–InstructionCounter(read-only)minstret–InstructionCounter(writable)
Theinstretregisterisaccessibleread-onlyinallmodes.Itcountsthenumberofinstructionsexecuted(ormoreprecisely,thenumberofinstructionscompleted“instructionsretired”).ThecountercanberesetbywritingtotheminstretCSR,whichisonlyallowableinMachineMode.
time–CurrentTime(read-only)
Thecurrentrealtimeinticks.Seethecommentsformtime.Thisregisterisashadowofmtime.ThetimeCSRisaccessibleatallprivilegelevels.
Details:Theseregistersallrequire64bits,whichisproblematiconRV32machines.Todealwith32-bitmachines,thereareadditionalregisters(whosenamesendin“h”)tocontaintheupper32bits.
loworderbits highorder32bits(RV32only) cycle cycleh mcycle mcycleh instret instreth minstret minstreth time timeh
RISC-VArchitectureSummary/Porter Page� of�254 323
Chapter8:PrivilegeModes
mtime–CurrentTimemtimecmp–TimeofNextInterrupt
Themtimeregisterprovidesthecurrentrealtimeasacountof“ticks”.Themeaningof“tick”(i.e.,thetick-rateofhowmanytickspersecondandthevalueoftime-zero,atwhichthecounterbegancounting)isdeterminedelsewhere.
Themtimecmpregisterisusedtotriggerthenexttimerinterrupt.Whenthemtimeregisterequals(orexceeds)thevalueinmtimecmp,atimerinterruptwillbetriggered.
TechnicallymtimeandmtimecmparenotControlandStatusRegisters(CSRs),buttheyarelistedheresincetheyaresimilar.Each“register”isa64bitcounterwhichisactuallyimplementedviamemory-mapping,unliketheCSRs.
Commentary:Puttingareal-timeclockinsidethecoreisnotpracticalsinceacorecanberunatdifferentfrequenciesatdifferenttimes(e.g.,under-clockingtoreducepowerconsumptionorover-clockingto…uh…causeittomalfunction).TypicallythereareseveralcoresonachipandtherealtimeclockisessentiallyanI/Odevicesharedbyallcores,providingconsistenttimevaluestoallcores.Thus,therealtimeclockisaccessedbytheuseofmemory-mappedaccesses,likeotherI/Odevices.
Typicallyprocessorchipsareclockedbyimpreciseclockcircuitry.Insystemsthataimtoprovideaccuraterealtimeclocks,theclockisnotimplementedonthesamechipastheprocessorcores.
Presumably,therealtimeclockhasonemtimecmpregisterpercore,allowingeachcoretodeterminewhenitsnexttimerinterruptwillhappen.
Commentary:Themtimecmpregisteris64bitsbutiftheRISC-Vcoreisonly32bits,thentherecouldbeaproblemsettingit.Itmustbeupdatedusingtwostore-word(SW)instructions,eachstoring32bits.Iftheprogrammeriscareless,the2irstloadmighttriggeraspurioustimerinterrupt.TheRISC-Vdocumentationprovidesthiscodetoavoidtheproblem:
RISC-VArchitectureSummary/Porter Page� of�255 323
Chapter8:PrivilegeModes
# The new value for mtimecmp is in registers x11:x10li x5,-1 # 0xffffffff for lower 32 bitssw x5,mtimecmp # The 64-bit value is ≥ old valuesw x11,mtimecmp+4 # The 64-bit value is ≥ new valuesw x10,mtimecmp
mhartid–MachineHardwareThread(HART)ID
RISC-Vdocumentationde2inesthetermHARTtomean“HardwareThread”.Thisregisterdoesnotre2lectahigherlevel(e.g.,operatingsystem)conceptofthread.
Inasingle-coresystemwithasingle,simpleFETCH-DECODE-EXECUTEpipeline,thereonlyoneHART.
Inamulti-coresystem,whereeachcorewillexecuteasingle2low-of-control,eachcorewillhaveitsownHART.Eachcore’sHARTwillexecuteconcurrentlywiththeothercores’HARTs.ThisCSRidenti2ieswhichcoreisexecuting.
Itmaybeimportanttoidentifyonethreadasa“masterthread”.OneHARTmustbegivenanIDofzero.
Thenumberofhardwarethreadsis2ixedbuttheapplicationsoftwarewillneedanunpredictableandchangingnumberofthreads.TheOSwillmaptraditionalOSthreadsontotheavailablehardwarethreads.
Someadvancedsuperscalarcoresmayimplement“hardwaremultithreading,”inwhichasinglecoreiscapableofexecutingmorethanaoneindependent2low-of-controlatatime.Inotherwords,multi-threadingisperformeddirectlyinhardware.Forexample,asinglecoremightbeabletoexecutetwohardwarethreadsatatime.Theadvantageofhardwaremultithreadingisthatthecomputationalresourcesofthecore(e.g.,multipliers,adders,etc.)canbeusedmoreef2iciently.Insuchasystem,eachhardwarethreadwouldbeidenti2iedwithauniqueHARTandeachcorewouldbeexecutingmorethanoneHARTsimultaneously.
Thisregisterisread-only.
RISC-VArchitectureSummary/Porter Page� of�256 323
Chapter8:PrivilegeModes
mstatus–MachineStatusRegistersstatus–SupervisorStatusRegisterustatus–UserStatusRegister
Conceptually,theprocessor“statusregister”isasingleregister,butitismirroredatallprivilegelevels.
Bymodifyingthisregister,thesoftwarecandothingslikeenable/disableinterruptsandchangetheISA.(Bysettingbitsinthestatusregister,onecancausetheprocessortoexecuteasifitwerea32bitRV32machine,eventhoughitisactuallya64bitRV64machine.)
Understandingsomeofthe2ieldinthisCSRrequiresunderstandinghowexceptionsareprocessedandtraphandlersareinvoked.
ThisCSRiscomplexandcontainsalargenumberof2ields.Assuch,wedelayfurtherdiscussionuntillater.
mtvec–MachineTrapVectorBaseAddressstvec–SupervisorTrapVectorBaseAddressutvec–UserTrapVectorBaseAddress
Whenatrapoccurs(asaresultofasynchronousexceptionorasynchronousinterrupt),ajumpwillbetakendirectlytothetraphandlerroutine.
Thereisasingletraphandlerforeachprivilegelevel.Theseregisterscontaintheaddressofthesethreetraphandlers.Inotherwords,whenanexceptionoccurs(andistobehandled,notignored),theprogramcounter(PC)issettothevalueinthisCSR,causingajumptothe2irstinstructionofthetraphandlercode.
Thenextsectiongivesadditionaldetail.
ExceptionProcessingandInvokingaTrapHandler
ExceptionsalwaystraptotheMachineModetraphandler2irst,regardlessoftheprivilegemodeatthetimetheexceptionoccurs.TheMachineModetraphandlerwillexecuteinMachineModeand(presumably)endwithanMRETinstruction,whichwillreturntothecodethatwasinterrupted.Theinterruptedcodemayhave
RISC-VArchitectureSummary/Porter Page� of�257 323
Chapter8:PrivilegeModes
beenrunninginUser,Supervisor,orMachineMode;theMRETinstructionwillrestoretheprivilegeleveltowhateveritwaswhentheexceptionoccurred.
However,wemaynotwanttohandletheexceptioninMachineMode;wemightwanttohandleitinSupervisorModeorevenUserMode.Assuch,thereisafacilityto“delegate”someorallexceptionstothelowerprivilegelevels.
Forexample,ifaparticulartypeofexceptionshouldbehandledbytheSupervisorModetraphandler,thenthisexceptionwillbedelegatedfromMachineModedowntoSupervisorMode.Thetraphandleraddresswillbedeterminedfromthestvecregister(notmtvec);thehandlerwillruninSupervisorMode;andthehandlerwillendwithanSRETinstruction.
Likewise,ifaparticulartypeofexceptionoughttobedealtwithinUserMode,thenanyexceptionofthistypewillbefurtherdelegatedfromSupervisortoUserMode.Thetraphandleraddresswillbedeterminedfromtheutvecregister;thehandlerwillruninUserMode;andthehandlerwillendwithanURETinstruction.
WhetheranexceptionistobedelegatedfromMachineModetoSupervisorMode,ordownfurthertoUserMode,isdeterminedbythesettingsofvariousbitsinotherCSRs.Wedescribethedelegation(“deleg”)CSRselsewhere.
Thereisonlyasingletraphandlerateachofthethreeprivilegelevels(atleastinthebasespeci2ication).Themtvec,stvec,andutvecseCSRsarenormallywritableandOSsoftwarewillstoretheaddressofthetraphandlerintoaCSRbeforetheexceptionoccurs.
Oncetheexceptionoccursandthehandlerisinvoked,itscodemustexamineotherCSRstodeterminethenatureoftheexception,i.e.,whichexceptioncausedthetraphandlertobeinvoked.
(Thetraphandleraddressesmaybehardwired,inwhichcasetheseCSRsareread-only.Itisalsoallowablefortheretobeimplementation-dependentrestrictionsonwhichvaluesthemtvec/stvec/utvecCSRscancontain.)
ImplementationsarefreetoaddadditionaltraphandlersasextensionstotheRISC-Vspec.Inparticular,itmaymakesensefortheprocessortohaveadifferenttraphandlerforeachkindofexception.Thiscanmakehandlingtrapsfaster,sinceiteliminatestheneedforsoftwaretodeterminewhichtypeofexceptioncausedthetrapandjumpconditionally.
RISC-VArchitectureSummary/Porter Page� of�258 323
Chapter8:PrivilegeModes
Thetraphandler(orhandlers,iftherearemorethanone)mustbeginonword-alignedaddresses.Thesemeansthatanyaddressstoredinthemtvec/stvec/utvecCSRsmusthave“00”astheleastsigni2icanttwobits.TheRISC-Vspecmakesuseofthesetwobitsasfollows.
Ifthelasttwobitsare“00”,thenitmeanstheCSRcontainstheaddressofasingletraphandleranditisuptothecodeinthehandlertodeterminewhichtypeofexceptionhasoccurred.
Ifthelasttwobitsare“01”,thenitmeansthereisacollectionoftraphandlers,oneforeachtypeofasynchronousinterrupt.Thisisgoodsinceweoftenwanttohandleasynchronousinterruptsreallyquickly.Moreparticularly,thereisajumptable(i.e.,anarrayofwords,whereeachwordcontainsaJUMPinstruction)andthemtvec/stvec/utvecCSRcontainstheaddressofthisjumptable.
[Details:Whenanasynchronousinterruptoccurs,theCSRisconsultedtolocatethejumptable.Theinterrupttypeisconsultedtodeterminewhichtableentryistobeused(i.e.,anindexintothearray).Thenajumpismadetotheassociatedentry,byloadingthePCwiththecomputedaddress.Presumably,eacharrayelementcontainsajumpinstructiontothe2irstinstructionofthehandler.Althoughtherecanbeadifferenttraphandlerforeachkindofasynchronousinterrupt(“timerinterrupt”,“externalinterrupt”,…),therewillonlybeonehandlerforallsynchronousexceptions(“illegalinstruction”,“addressmisaligned”,…).The2irstelementinthejumptable(index=0)willbeatraphandlerforallsynchronousexceptions,aswellasfora“user-modesoftwareinterrupt”.]
Theremainingbitpatterns“10”and“11”arenotused.
TheNon-MaskableInterrupt:HardwareFailure
Sometrapsare“maskable”andothersare“non-maskable”.Amaskableinterruptcaneitherbehandled,orcanbeignored,orcanbepassedfromahigherprivilegeleveltoalowerprivilegelevel.
Anon-maskableinterrupt(NMI)willbehandledimmediatelyandcannotbeignored.Onlyoneeventcancauseanon-maskableinterrupt:
•Ahardwarefailureisdetected(e.g.,byerrorcheckingcircuitry)
RISC-VArchitectureSummary/Porter Page� of�259 323
Chapter8:PrivilegeModes
IntheeventofahardwareerrorNMI,thefollowinghappens:
•TheprivilegemodeissettoMachineMode.•ThepreviousPCwillbesavedinthemepcregister.•ThePCissettosome2ixed,predetermined,implementationdependentvalue.•Themcauseregisterissettoindicatethenatureofthefailure.
Nootherprocessorstateisaltered,whichmaybehelpfulindiagnosingtheproblem.
ResetProcessing
Thefollowingeventsarenotconsideredtobeexceptionsandarenothandledastraps:
•Power-on-reset,triggeredwhenthesystemisturnedon•Pressingaphysicalrestart/resetbutton•Awatchdogtimertimesout•Powerlevelsdropbelowspecs,triggeringa“brownout”event•Alow-powersleepstateisendedandtheprocessor“wakesup”
Intheaboveevents,thefollowinghappens:
•TheprivilegemodeissettoMachineMode•TheMIE(Interruptenable)bitinthestatuswordissetto0.•TheMPRVbitissetto0,whichturnsoffanyaddresstranslation.•Themcauseregisterissettoindicatewhicheventhasoccurred.•ThePCissettosome2ixed,predetermined,implementationdependentvalue.
Theremainingprocessorstateisunde2ined.
Power-On-Reset:Whenpoweris2irstappliedtoaprocessor,theelectronicswilltypicallycausea“power-on-resetinterrupt”tooccur.Typically,thisinitialprocessingsequencewillforceajumptoaknown,predeterminedaddressinsomeread-onlyportionofmemorybyloadingtheProgramCounter(PC)withsomeknownvalue.Theeffectistoforceajumptotheinitialstartupbootstrapprogram.
WatchdogTimer:A“watchdogtimer”consistsofanindependentelectroniccircuitthatisusedtorestartacomputerthathasexperiencedacatastrophicsoftwarefailure,e.g.,gonebrain-deadinanin2initeloop.
RISC-VArchitectureSummary/Porter Page� of�260 323
Chapter8:PrivilegeModes
Thistechniquecanallowacomputertorecoverfromotherwisefatalmalfunctionssuchasprogrambugsandtransienthardwareerrors.Thewatchdogtimertechniquehassavedspaceprobesthathavehadfailuresin2light:theprobemaygosilentforseveraldaysbutthenwakesup,performsafullreset,andcontinuestocompletethemission.
Awatchdogtimerwillcauseanon-maskableinterruptafteracertainamountoftimehaselapsedwithoutthetimerbeingreset.Resettingthetimeriscalled“feedingthedog”.Whensoftwareisworkingnormally,theassumptionisthatthetimerwillberesetatregularintervals,thusfeedingthedog.Butifsomethinggoeswrongandthetimerisnotreset,theinterruptwilloccur.
ExceptionDelegation
Wenextlookatthemechanismthatallowsexceptionstobemaskedand/ordelegatedtolowerprivilegelevels.
Bydefault,everyexception(whethersynchronousorasynchronous)ishandledbyatraphandlerrunninginMachineMode.Ifthekernelprogrammerwantsthetraptobehandledatalowerprivilegelevel,thenonepossibilityisfortheprogrammertowritecodetoswitchthemodeandthenpasscontrolbacktothelowerprivilegelevel.
However,thistechniqueofsoftwaredelegationisnotveryef2icientanditisfastertohavesomeexceptionsautomaticallytraptoahandlerrunningatthelowerprivilegelevel.RISC-Vsupportsdelegationoftrapsbythehardware,asonepossibledesignoption.Ifthisoptionispresentintheimplementation,thentheexceptionwillbypasstheMachineModetraphandlerandgodirectlytoatraphandlerrunningatalowerlevel.
Thisiscontrolledbythefollowing4delegationCSRs:
RISC-VArchitectureSummary/Porter Page� of�261 323
Chapter8:PrivilegeModes
medeleg–MachineExceptionDelegationRegistersedeleg–SupervisorExceptionDelegationRegister andmideleg–MachineInterruptDelegationRegistersideleg–SupervisorInterruptDelegationRegister
Weareusingthisterminology:
“exception”=synchronousexception(e.g.,illegalinstruction) “interrupt”=asynchronousexception(e.g.,timerinterrupt)
IfthebitcorrespondingtotheexceptiontypeintheMachineModedelegationregisters(medelegormideleg)isset,thentheMachineModetraphandlerwillnotbeinvoked.Instead,thetrapwillbeimmediatelydelegatedtothenextlowestprivilegelevel,i.e.,toSupervisorMode.
Next,theappropriatebitintheSupervisorModedelegationregisters(sedelegandsideleg)willbechecked.Ifset,thetrapwillbefurtherdelegatedtoUserMode.
Thetrapcannotbefurtherdelegated,sotherearenoUserModedelegationCSRs.
TrapsarealwayshandledinMachineMode,unlessdelegatedtoalowerlevel.The“deleg”CSRstellwhichexceptionsandinterruptsaredelegatedtowhichprivilegelevel.
Forexample,thehandlingofsomeparticulartrapmightbedelegatedfromMachineModetoSupervisorModecode.ThetrapmightbefurtherdelegatedtoatraphandlerrunninginUserMode.
Theseregisterscanbewrittento.Theycontrolwhetherindividualexceptionsandinterruptswillbedelegated,i.e.,processedbyatraphandlerrunningatalowerprivilegelevel,orwillbeprocessedatthehigherlevel.
ThemedelegregistertellswhetheransynchronousexceptionwillbehandledbytheMachineModetraphandler(whoseaddressisinthemtvecCSR),orwillbedelegatedtothenextlowerlevel.IfSupervisorModeisimplemented(insomechipsitmaynotbe),thenthesedelegregistertellswhethertheexceptionwillbehandledinSupervisorModeorfurtherdelegatedtoUserMode.
RISC-VArchitectureSummary/Porter Page� of�262 323
Chapter8:PrivilegeModes
Thereisabitineach“exception”delegationregister(i.e.,medelegandsedeleg)foreachtypeofsynchronousexception,asfollows:
Bit Description0 instructionaddressmisaligned1 instructionaccessfault2 illegalinstruction3 breakpoint4 loadaddressmisaligned5 loadaccessfault6 store/atomicmemoryoperationmisaligned7 store/atomicmemoryoperationaccessfault8 environmentcallfromUMode9 environmentcallfromSMode10 (previouslyusedforHypervisorMode)11 environmentcallfromMMode
Ifthebitissetto1,thentheexceptionisdelegated,i.e.,handledbythenextlowerlevel.Ifthebitis0,thentheexceptionishandledatthislevel.
ThemidelegregistertellswhetheranasynchronousinterruptwillbehandledbytheMachineModetraphandler(whoseaddressisinthemtvecCSR),orwillbedelegatedtothenextlowerlevel.IfSupervisorModeisimplemented(insomechipsitmaynotbe),thenthesidelegregistertellswhethertheinterruptwillbehandledinSupervisorModeorfurtherdelegatedtoUserMode.
Thereisabitineach“interrupt”delegationregister(i.e.,midelegandsideleg)foreachtypeofasynchronousinterrupt,asfollows:
RISC-VArchitectureSummary/Porter Page� of�263 323
Chapter8:PrivilegeModes
Bit Description0 USIP–usersoftwareinterrupt1 SSIP–supervisorsoftwareinterrupt2 (previouslyusedforHypervisorMode)3 MSIP–machinesoftwareinterrupt4 UTIP–usertimerinterrupt5 STIP–supervisortimerinterrupt6 (previouslyusedforHypervisorMode)7 MTIP–machinetimerinterrupt8 UEIP–userexternalinterrupt9 SEIP–supervisorexternalinterrupt10 (previouslyusedforHypervisorMode)11 MEIP–machineexternalinterrupt
Ifthebitissetto1,thentheinterruptisdelegated,i.e.,handledbythenextlowerlevel.Ifthebitis0,thentheexceptionishandledatthislevel.
mie–MachineModeInterruptEnablemip–MachineModeInterruptPending
Therearethreesourcesofinterrupts(i.e.,asynchronousexceptions):
•Softwareinterrupt •Timerinterrupt •ExternalInterrupt
SoftwareinterruptsareusedforoneHARTtosignalanotherHART.
Timerinterruptsoccurwhenthereal-timeclock(mtime)reachesapresetlimitvalue(mtimecmp).
Externalinterruptsareforallotherdevicestointerruptaprocessor.
Themieregistercontainsabitforeachtypeofasynchronousinterrupt,asfollows:
RISC-VArchitectureSummary/Porter Page� of�264 323
Chapter8:PrivilegeModes
Bit Description0 USIE–usersoftwareinterruptenable1 SSIE–supervisor…3 MSIE–machine…
4 UTIE–usertimerinterruptenable5 STIE–supervisor…7 MTIE–machine…
8 UEIE–userexternalinterruptenable9 SEIE–supervisor…11 MEIE–machine…
Likewisemipregistercontainsabitforeachtypeofasynchronousinterrupt,withanalogousnames:
Bit Description0 USIP–usersoftwareinterruptpending1 SSIP–supervisor…3 MSIP–machine…
4 UTIP–usertimerinterruptpending5 STIP–supervisor…7 MTIP–machine…
8 UEIP–userexternalinterruptpending9 SEIP–supervisor…11 MEIP–machine…
(Bits2,6,and10wereusedforHypervisorModeandarenow“reserved”.Allremainingbitsareunused.)
Settingabitto1inthemip(interruptpending)registerindicatesthatthecorrespondinginterrupthasoccurredandshouldcauseatrapatsomepointinthefuture.Ifthesamebitisalsosetto1inthemie(interruptenable)register,thenthetrapprocessingwilloccurimmediately.
Inadditiontothependingbitsdiscussedhere,interruptsmayalsobegloballyenabledordisabled.SeetheMIE(interruptenable)bitinthemstatusregister.
RISC-VArchitectureSummary/Porter Page� of�265 323
Chapter8:PrivilegeModes
NotethatthereisabitinthestatusregistercalledMIEandaCSRwiththesamename:mie.Likewise,thereisanSIEbitinthestatusregister,aswellasaCSRcalledsie.
Moreprecisely,aninterruptwillbetaken(i.e.,thetraphandlerwillbeinvoked)ifandonlyifthecorrespondingbitinbothmieandmipregistersissetto1,andifinterruptsaregloballyenabled.
Thetimerinterruptpendingbit(MTIP)issetbyhardwarewhenthetimerexpires,i.e.,whenthecurrenttime(mtime)reachesorexceedsthetimerlimitregister(mtimecmp).
Thetimerinterruptpendingbit(MTIP)isresetbywritingtothetimercomparisonregister.Afterwritingtothetimercomparisonregister,thetimerinterruptwillnolongerbepending.Softwarecannotmodifythefollowinginterruptpendingbits;theyareread-onlyandset/clearedbyothercircuitry:
3 MSIP–machinesoftwareinterruptpending7 MTIP–machinetimerinterruptpending11 MEIP–machineexternalinterruptpending
However,softwarerunninginMachineModecanmodifythesebits:
0 USIP–usersoftwareinterruptpending1 SSIP–supervisorsoftwareinterruptpending4 UTIP–usertimerinterruptpending5 STIP–supervisortimerinterruptpending8 UEIP–userexternalinterruptpending9 SEIP–supervisorexternalinterruptpending
Ifoneofthesebitsissetandthatinterrupttypehasbeendelegatedtoalowerlevel(seethemedelegandmidelegregisters),thentheinterruptwillbesignaledatthelowerprivilegelevel.
RISC-VArchitectureSummary/Porter Page� of�266 323
Chapter8:PrivilegeModes
sip–SupervisorModeInterruptPendinguip–UserModeInterruptPending
sie–SupervisorModeInterruptEnableuie–UserModeInterruptEnable
Eachofthelowerprivilegelevelshasitsowninterruptpendingregisteranditsowninterruptenableregister.
Normallyinterruptsareprocessedatthehighestprivilegelevel.Forexample,ifatimerinterruptoccurswhilerunninginUserMode,thecorewillenterMachineModeandinvoketheMachineModetraphandler.Whencomplete,theMachineModetraphandlerwillreturntoexecutinginUserMode.
However,interruptscanbedelegatedfromahigherleveltoalowerlevel.Forexample,thetimerinterruptmightbedelegatedfromMachineModetoSupervisorMode.SoiftheinterruptoccursinUserMode,theSupervisorModetraphandlerwillbeinvokedandMachineModewillneverbeentered.
Ifaparticularinterrupttype(suchasthetimerinterrupt)hasbeendelegated(forexamplefromMachineModetoSupervisorMode),thenthecorrespondinginterruptpendingbitisshadowedfromtheMachineModeinterruptpending(mip)register.
Exactlywhatthismeansisunclear.PresumablytheMTIPbitinmipshadowedastheSTIPbit(andnottheMTIPbit)insip???
WhethertheinterruptcausesatrapinSupervisorModeornotisdeterminedbywhetheritisenabledintheSupervisorModesieregister,aswellaswhetherglobalinterruptsareenabledinSupervisorMode.
RISC-VArchitectureSummary/Porter Page� of�267 323
Chapter8:PrivilegeModes
TheStatusRegister
mstatus–MachineStatusRegistersstatus–SupervisorStatusRegisterustatus–UserStatusRegister
Conceptually,theprocessorstatusregisterisasingleregister,buttheregisterismirroredatthelowerprivilegelevels,sincesome2ieldswithinthestatusregistermustbeprotecteddifferentlyatdifferentprivilegelevels.
ThisCSRcontainsanumberof2ieldsthatcanbereadandupdated.Bymodifyingthese2ields,thesoftwarecandothingslikeenable/disableinterruptsandchangethevirtualmemorymodel.Forexample,bywritingtothisCSR,thesoftwarecanturnonvirtualmemoryandpage-tabletranslation.
Nextweshowthelayoutofthestatusregister,followedbyalistingoftheindividualbit2ields.
Asmentioned,thestatusregisterismirroredinthedifferentmodes.Some2ieldsarepresentonlyathigherprivilegelevels.(Suchbitsare2illedwithzerosatlowerprivilegelevels.)
Twoofthe2ieldsareonlyusedfor64and/or128bitmachines.Thesetwo2ieldsresideinbitspositions[35:32],sotheyarenotevenpresentin32-bitmachines.
RISC-VArchitectureSummary/Porter Page� of�268 323
Chapter8:PrivilegeModes
�
RISC-VArchitectureSummary/Porter Page� of�269 323
Chapter8:PrivilegeModes
Name #ofbits Description
UIE 1 UserModeInterruptEnableSIE 1 SupervisorModeInterruptEnableMIE 1 MachineModeInterruptEnable
UPIE 1 UserModePreviousInterruptEnableSPIE 1 SupervisorModePreviousInterruptEnableMPIE 1 MachineModePreviousInterruptEnable
SPP 1 SupervisorMode–PreviousPrivilegeModeMPP 2 MachineMode–PreviousPrivilegeMode
FS 2 FloatingPointStatus(dirty/clean/initial/off)XS 2 UserModeExtension(dirty/clean/initial/off)
MPRV 1 ModifyPrivilege SUM 1 PermitSupervisorUserMemoryAccessMXR 1 MakeExecutableReadable
TVM 1 TrapVirtualMemoryTW 1 Time-outWaitTSR 1 TrapSRETinstruction
SD 1 (FS==11)or(XS==11)
ForRV64andRV128only:
UXL 2 Emulation:RegistersizewheninUserModeSXL 2 Emulation:RegistersizewheninSupervisorMode
MIE,SIE,UIE—InterruptsEnabled
Recalltheterm“interrupt”meansasynchronousexceptions,i.e.,TimerInterrupts,SoftwareInterrupts,andExternalInterrupts.
IftheprocessorisinMachineMode,theninterruptsareenabledwheneverMIE=1;interruptprocessingforSupervisorandUsermodeisdisabled.
RISC-VArchitectureSummary/Porter Page� of�270 323
Chapter8:PrivilegeModes
IftheprocessorisinSupervisorMode,thensupervisorinterruptsareenabledwheneverSIE=1;interruptprocessingforUsermodeisdisabledandinterruptprocessingforMachineModeisalwaysenabledwhenrunninginSupervisorMode.
IftheprocessorisinUserMode,thenuserinterruptsareenabledwheneverUIE=1;interruptprocessingforMachineandSupervisorModeisalwaysenabledwhenrunninginUserMode.
MPIE,SPIE,UPIE,MPP,SPP—InterruptProcessing&PreviousState
MPIE–PreviousvalueofMIE(MachineModeInterruptEnable) SPIE–PreviousvalueofSIE(SupervisorModeInterruptEnable) UPIE–PreviousvalueofUIE(UserModeInterruptEnable)
MPP–MachineModetrap,previousPrivilegeMode(eitherM,S,orU)SPP–SupervisorModetrap,previousPrivilegeMode(eitherSorU)
Whenaninterruptoccursandatraphandleristobeinvoked,theprivilegemodewillchangeandthe“interruptenable”bitmustbesetto0topreventadditionaltrapprocessing(atthislevel)whilethetraphandlerisexecuting.
Theprocessormaybeexecutingatoneprivilegemode(sayUser)andthetrapmaybehandledatahigherlevel(saySupervisor).IfaMachineModetrapthenoccursbeforetheSupervisortraphandleriscomplete,thentheSupervisorTraphandlerwillbeinterruptedandtheMachineModetraphandlerwillexecute.Uponreturn,theSupervisorModetraphandlerwillresumeexecution;Finally,theinterruptedUserModecodewillberesumed.
Itisnecessarytosavethemachinestateinasortof“stack”,sothatuponcompletionofatraphandler,theprocessorcanreturntothepreviousstate.Sinceatraphandlerrunningatonelevel(saySupervisor)cannotbeinterruptedatthatsamelevel,thestackneednotbetoodeep.
Whenantraphandlerruns,allweneedtosaveisthepreviousmodeandtheinterruptenablebit.Notethepreviousinterruptenablebitmaybe0or1.Regardlessofitspreviousvalue,theprocessorwillchangetheinterruptenablebitto0whenthetraphandlerisinvokedandrestoreitwhenthetraphandlerreturns.
(Ofcourse,anon-maskableinterrupt,suchasahardwarefailure,orareset-typeevent,suchaspower-on-reset,mightoccuratanytime.Suchaneventcanneverbemaskedandwillalwaysbehandledregardlessofwhetherornotinterruptsare
RISC-VArchitectureSummary/Porter Page� of�271 323
Chapter8:PrivilegeModes
enabled.Butthereisnoproblemsincenon-maskableinterruptsareprocessedinMachineModeandthepreviouslyexecutingcodeissimplyabandoned.)
ConsiderwhattheprocessorwilldoupontheoccurrenceofaninterruptwhichwillbehandledbytheMachineModetraphandler.First,theMPIEbitwillbesettoholdthepreviousvalueoftheinterruptenablebitforMachineModepriortotheinterrupt(i.e.,tothepreviousvalueofMIE).InterruptswillthenbedisabledbysettingMIEto0.Also,MPPwillbesettotellwhatprivilegeleveltheprocessorwasrunningatwhentheinterruptoccurred.TheprivilegemodeineffectwhentheinterruptoccurredmayhavebeeneitherMachine,Supervisor,orUserMode.Thustwobitsarerequiredtosavethepreviousprivilegelevel.
Itispossiblethatanexceptionwilloccurduringtheexecutionofatraphandler.Thismeansthat,whenexceptionsareprocessed,thepreviouslyvalueofthe“interruptsenabled”bitcanbeeither0or1.Forexample,itmaybethatsomeinstructionisunimplementedandmustbeemulated.Sowhenthatinstructionisencountered—eitherduringnormalcodewithinterruptsenabled,orhandlercodewithinterruptdisabled—anewtraphandlerwillbeinvoked.Itwillrunwithinterruptsdisabledand,uponcompletion,the“interruptsenabled”bitwillberestoredtoitspreviousvalue.
TheMIE(“MachineInterruptEnable”)bitdetermineswhetheramaskableinterruptwillcausetrapprocessingorwillremainpending.Ifinterruptsaredisabled,thentrapprocessingwillbesignaledbutwillremainpendinguntilinterruptsareonceagainenabled.[???]
Likewise,whenaninterruptoccursthatistobehandledbytheSupervisorModetraphandler,SPIE(“SupervisorPreviousInterruptEnable”)willbesettothepreviousvalueoftheSIEinterruptenablebitpriortotheinterrupt.ThenSIE(“SupervisorInterruptEnable”)willbesetto0todisableinterruptsatthislevel.ThenSPP(“SupervisorPreviousPrivilege”)willbesettowhicheverprivilegeleveltheprocessorwasrunningatwhentheinterruptoccurred.SincetheonlychoicesareSupervisororUser,theSPP2ieldisonlyonebit.
Similarly,whenaninterruptoccursandishandledbytheUserModetraphandler,UPIE(“UserPreviousInterruptEnable”willbesettothepreviousvalueoftheUIE(“UserInterruptEnable”)interruptenablebitpriortotheinterrupt.TheinterruptedcodemusthavebeenrunninginUserMode,sinceweneverinterruptahigherlevelmodetorunatraphandleratalowerlevel.Therefore,thereisnoneedfora
RISC-VArchitectureSummary/Porter Page� of�272 323
Chapter8:PrivilegeModes
“UPP”(“UserPreviousPrivilege”)2ieldtoholdthepreviousmode;itmusthavebeen“UserMode”.
ConsideratrapwhichoccurswhilerunninginUserModetoatraphandlerinMachineMode.MPIEissettothepreviousvalueofMIE;thenMIEissetto0;alsoMPPissetto“UserMode”.
Afterthetraphasbeenhandled,anMRETinstructionwillbeexecuted.MPPwillbeconsultedtodeterminethatthepreviousmodewasUserMode.MIEissettoMPIE,restoringitsinitialvalue;themodeisrestoredtoUserMode;MPIEandMPParesettoarbitraryvalues,sincetheirinformationiseffectively“popped”offthestack.
Notethatthehandlermaychoosenottoreturntotheinterruptedcode;thisiscommoninoperatingsystems,particularlywhenservicingatimerinterrupt;thehandlerwillcauseathreadswitchandwillnotreturntotheinterruptedcodeuntilmuchlater.Whenthishappens,thesimplehardwarestacksystemisinadequate.Thehandlermustsavetheinformationinthestatusregisterandrestoreitlater.
TSR—TrapSRETInstruction
Ifthisbitis1,thenanyattempttoexecuteanSRETinstructionwillcausean“illegalinstruction”exception.If0,thenSRETmaybeexecutedwithoutinvokingatraphandler.
ThisbitisavailableonintheMachineModestatuswordmstatus,notinSupervisororUserModes.
TheabilitytointerceptandtrapanSRETinstructionmightbeusefultohypervisorcode.
TWandTVM—Time-outWaitandTrapVirtualMemory
ThesebitsareavailableonlyintheMachineModestatuswordmstatus,notinSupervisororUserModes.
Theirfunctionalityisnotdocumentedinthecurrentspec.Perhapstheyhavebeeneliminated.???
RISC-VArchitectureSummary/Porter Page� of�273 323
Chapter8:PrivilegeModes
SXLandUXL—SupervisorandUserRegisterSize
AnRV32processoralwaysusesthe32ISA(e.g.,32-bitregisters)regardlessofwhatmodeitisin.For64-bitand128-bitmachines,thereisafacilitytoexecutecodemeantforaRISC-Vprocessorwithasmallerregistersize,i.e.,inRV32orRV64mode.
TheabilityoflargermachinestoexecutetheRV32orRV64bitISAsisnotrequired,butifpresent,theSXLandUXLbitscontrolit.Ifitisnotimplemented,thenthesetwo2ieldsareread-only.
AnRV64bitprocessoralwaysusesthe64-bitISAwhenrunninginMachineMode.However,itcanexecutein32-bitmodewhenrunningineitherSupervisororUserMode.Thisiscontrolledbywriting01totheSXLorUXLbits.
AnRV128bitprocessoralwaysusesthe128-bitISAwhenrunninginMachineMode.However,itcanexecutein32-bitor64-bitmodewhenrunningineitherSupervisororUserMode.ThisiscontrolledbywritingtotheSXLorUXLbits.
TheencodingusedfortheSXLandUXL2ieldsis:
01 32-bitISA 10 64-bitISA 11 128-bitISA
FS,XS,andSD—FloatingStatusandExtensionStatusBits
Thesebitsofthestatusregisterareconcernedwithimprovingcontextswitchingtimes.First,wedescribe“FS”2irst,whichisconcernedwiththe2loatingpointregisters.
Whenthe2loatingpointregistershavebeenused,theywillneedtobesavedwhenthekernelswitchesfromonesoftwarethreadtoanother.Butwhentheregistersareunused,we’dliketoavoidtheoverheadofsavingandrestoringthem,sincethisistimeconsuming.
TheFSbit2ieldconsistsoftwobits,encodedasfollows:
RISC-VArchitectureSummary/Porter Page� of�274 323
Chapter8:PrivilegeModes
Value Meaning 00 Off 01 InitialValues 10 Clean,i.e.,updatedbutsavedinmemory 11 Dirty,i.e.,updatedbutnotsaved
Thesebitsaresetbythehardwareandcanalsobemodi2iedbysoftware.Theyalsocontrolwhethera2loatingpointinstructionwillcauseanexceptionorwillbeexecutednormally.
(Thesebitsareonlypresentifthe2loatingpointextensionisimplemented.Ifthereareno2loatingpointregisters,theFSbitsarehardwiredto00.)
TheFSbitsshouldbesetto00=“off”whenthe2loatingpointregistersarenotinuse.Anyattempttoreadormodifytheregisterswillcauseanillegalinstructionexception.Forathreadthatwillnotbeusingthe2loatingpointregisters,thisisagoodsetting,sincetheOScanavoidsavingandrestoringtheregisters.
Thecompletestateofthe2loatingpointsystemconsistsofthecontentsofofthe322loatingpointregistersandthe2loatingpointstatusregister(fcsr).Manyprogramsdon’tuse2loatingpoint,sothereismuchtobegainedbyavoidingthesave/restoreofallthisinformationoneachcontextswitch.
Whenathreadthatuses2loatingpointiscreatedandbeginslife,the2loatingpointregistersshouldbeinitializedbytheOStotheirinitialvalues(presumably+0.0)andthe2loatingpointstatusregistershouldbeinitialized(presumablytozero).
The“initial”state(01)indicatesthatthe2loatingpointregistersareusablebuttheyhavenotyetbeenalteredfromtheirinitialvalues.Thus,theyregistervaluesdonotneedtobesavedonthenextcontextswitch.
Whenevera2loatingpointregisterismodi2ied,hardwarewillchangethestateasre2lectedinFSto“11=dirty”.
The“clean”state(10)isusedtoindicatethattheregistersareinuseandhavebeenmodi2iedfromtheirinitialzerovalues,butthattheyhavenotbeenmodi2iedsincethelastcontextswitch.Inotherwords,theregisterscontainnon-zerovaluesbutthevalueslastsavedtomemoryarecurrentandaccuratelyre2lectthevaluesintheregisters.Thus,atthetimeofthenextcontextswitch,theregistersdonotneedtobe
RISC-VArchitectureSummary/Porter Page� of�275 323
Chapter8:PrivilegeModes
savedagain.(Thiswillbecommonsinceaprogramthatuses2loatingpointatsomepointwilllikelyspendmanytimesliceswithoutfurthermodifyingtheregs.)
InsomeOSes,itmaybethecasethatprocessesalwaysbeginlifewiththe2loatingpointsystemenabledbydefault,sothe“off”stateisnotparticularlyuseful.
ButinotherOSes,processeswillbeginlifewiththe2loatingpointsystemdisabled,undertheassumptionthatmostprocesseswillnotuse2loatingpoint.Thus,theOSwillsetFSto00=off.Ifanattemptismadetoreadorwritethe2loatingstate,thehardwarewillsignalanexception.
SomeOSesmayrespondbyenablingtheregisters,settingthemtotheirzerovalues,andretryingtheinstruction.OtherOSesmayrequirethe2loatingpointsystemtobeexplicitlyenabledbysomesystemcallandtreatanyaccessestodisabledstateasanerror.
Itisnecessaryforread(aswellaswrite)attemptstocauseanexceptionwhenthestateis00=off.TheOSmayoccasionallyleavetheoldstatefromanotherunrelatedthreadintheregistersandwemustpreventinformationfrombleedingoverorescapingfromonethreadtothenext.
Whenathread2inishesitstimeslice,theOScanlookatFStodeterminehowmuchstatemustbesavedatthecontextswitch:
ValueofFS Action 00=off Savingnotnecessary 01=initial Savingnotnecessary 10=clean Savingnotnecessary;previouslysavedvaluesarestillgood 11=dirty Mustsavetheregisters
Ifthestatewaspreviously11=dirty,thentheOSshouldchangeitto“clean”beforestoringthethread’sinformation,sinceitwillnowbetruethatthein-memorysavedstateisup-to-date.
WhentheOSisreadytoinitiateandscheduleanewthread,itcanlookatthestateoftheFSfromthepreviousthreadandthestateofthenewthreadtodeterminewhattodo.(Recall,thatwhentheOSpreviouslysavedtheregswhenthethreadwaslastde-scheduled,itchangedthestatefrom“dirty”to“clean”,sothestateofthenewthreadwillneverbe“dirty”atthispoint.)
RISC-VArchitectureSummary/Porter Page� of�276 323
Chapter8:PrivilegeModes
NextThread PreviousThread Action 00=off …any… Donothing 01=initial 01=initial Donothing(regsalreadyzeroed) 01=initial …other… Zerotheregs 10=clean …any… Restoreregsfrommemory 11=dirty <notpossible>
Whatwehavejustseenisthatsomesectionoftheprocessor(the2loatingpointsubsystem)requiredattentionateverycontextswitch,eithertosavestateortorestorestate.Theseactionsareexpensiveandwewishtoavoidthemwhenpossible.TheFSbitshelpdeterminewhenwecanavoidsaving/restoring.
SomeRISC-Vimplementationsmayaddnon-standardextensionswhicharenotdescribedinthespecandtheXSbitsareusedanalogouslyforthisadditionalstateinformation.
TheXSbitsworkthesamewayastheFSbitsbutcoverallotherstatethatshouldbesaved/restoreduponcontextswitch.Exactlywhatprocessorstateiscoveredby“XS”isleftas“implementationde2ined”.
Itispossiblethattherearemorethanonenon-standardextensions.TheXSbitsaremeanttoapplytoallofthem.Sothemeaningsarechangedslightlytore2lectthis.
Value Name Meaning 00 Off Allsub-systemsareoff 01 Initial Oneormoreareon;butnothingclean,nothingdirty 10 Clean Nothingisdirty,somepartsareclean 11 Dirty Atleastsomestatehasbeenmodi2ied
Itmaybethatthereareadditionalbitstodetermineexactlywhatpartsoftheprocessorstatearedirty/clean/initial,butthespecleavesthistoimplementationspeci2icdecisions.
Finally,uponcontextswitch,we’dliketobeabletoquicklydeterminewhetherwehavetodoanythingaboutthe2loatingpointregsandanynon-standardextensions.Mightweneedtosaveanything,orcanwejustignorethisissue?Thequestionwewanttoansweristhis:IsFS=11=dirty?OrisXS=11=dirty?
RISC-VArchitectureSummary/Porter Page� of�277 323
Chapter8:PrivilegeModes
TheSDbitinthestatusregisterallowstheOSsoftwaretoquicklyanswerthesequestions.Thebitissetbythehardwareandde2inedas:
SD=((FS==11)OR(XS==11))
Thisbitisplacedinthesignbitofthestatuswordsoaquickcheckcandeterminewhetheranythingisdirty.Ifso,theOSsoftwarecanlookdeepertodeterminewhatstatemustbesaved.
These2ields(FS,XS,SD)areavailableintheMachineModemstatusandSupervisorModesstatusregisters.[Thediagramofthestatusregistersincorrectlyshowstheminustatus.]
MPRV,SUM,andMXR—VirtualMemoryEnabling
TheRISC-Vchipsupportsvirtualmemory,viapagetables.Thepagetablesystemisdescribedinaseparatechapter.
Typically,user-levelcodewillrunwithvirtualmemorymappingturnedonandallmemoryaccesseswillbemappedfromvirtualaddressestophysicaladdresses.
However,user-levelcodewilloccasionallymakesystemcalls,switchingfromexecutioninatalowerprivilegelevel(e.g.,UserMode)toahigherprivilege(e.g.,MachineMode).Oftendataispassedbetweenuser-levelcodeandtheOSinmemory;forexample,theuser-levelcodemightpassanaddresspointertotheOS.ThispointerisavirtualaddressandtheOScodewhichservicesthesystemcallwillthengothememorytofetchthedata.
SincetheaddressisavirtualaddresswhiletheOSisrunningin(say)MachineModewithvirtualmemorymappingturnedoff,thereisaproblem.TheOScouldperformtheaddresstranslation(fromVirtualtoPhysical)insoftware,butthisistimeconsuminganderror-prone.
Tomakethistasksimpler,theMPRVbitcanbeused.NormallytheMPRVisclearedto0andallLOADandSTOREinstructionsdonebycoderunninginMachineModewillhavenoaddresstranslationperformed.However,theOScodecansetthisbitto1;anysubsequentLOADorSTOREinstructionswillbeperformedwithaddresstranslationturnedon.Moreprecisely,addresseswillbetranslatedasifthecurrent
RISC-VArchitectureSummary/Porter Page� of�278 323
Chapter8:PrivilegeModes
modewerethatspeci2iedbyMPP(theMachinePreviousPrivilegeModebitsinthemstatusregister),whichwasthemodefromwhichthemostrecenttraphandlerwasentered.
(SoasystemcallfromUserMode,wherepagingwason,wouldpassvirtualaddresses.Thehandler,runninginMachineMode,wouldneedaddresstranslationturnedonifitwantstoaccessthememoryusingtheseaddresses.HoweverasystemcallfromMachineModeitself(orSupervisorModewithtranslationturnedoff)wouldpassphysicaladdresses,sincetherewouldbenopaging.)
TheSUMbitisusedbycoderunninginSupervisorModeinsteadofMachineMode.IthasnoeffectoncoderunninginMachineModeorinUserMode.
Addresstranslationcanbeturnedonbywritingtothesatpregister.Ifturnedon,thenallUserModeandSupervisorModecodewillrunwithaddresstranslation.
Eachvirtualaddressspaceisdividedintobothuserandnon-userpages.Inotherwords,somepageswillbemarkedas“User”pagesandtheremainderwillbeinaccessibletoUserModecode.(TypicallytheportionoftheaddressspaceaccessibletoUserModecodeiscalledthe“bottomhalf”andtheportionaccessibleonlytoSupervisorModecodeiscalledthe“tophalf”,althoughthesetwoareasneednotbearrangedascontiguousblocksorlocatedinanyrelativeorder.)
NormallySupervisorcodewillonlyaccessnon-user(tophalf)pagesintheaddressspace.However,sometimesthesupervisorcodemustaccesstheuser(bottomhalf)portionoftheaddressspace.
TheSUMbitcanbeusedtoallowsupervisorcodetoaccesstheuserpages.IfSUM=0,thenSupervisorModecodecannotaccesspagesmarkedas“user”pages;itcanonlyaccessthenon-userportionoftheaddressspace.IfSUM=1,thenSupervisorModecodeisfreetoaddressbothuserandnon-userpages.Thisbitaffectsallloads,stores,andinstructionfetches.Normally,supervisorscodedoesnotneedtoaccessuserpages,sosupervisorwillrunwiththisbitcleartopreventinadvertentaccessestouserspace.Whenthesupervisorcodeneedstoaccessuserspace,e.g.,toretrievesyscallarguments,thisbitcanbetemporarilysetto1.
Sometimesatraphandlerwillneedtolookattheinstructionthatwasexecutingatthemomentofanexception.Forexample,iftheinstructionisunimplementedinthehardwareandmustbeemulated,thetraphandlerwillneedtoreadtheinstructiontodeterminehowtoemulateit.
RISC-VArchitectureSummary/Porter Page� of�279 323
Chapter8:PrivilegeModes
Memorypagescanbemarked“readable”and/or“executable.”Apagecontaininginstructionswillnormallybemarked“executable”butwillnotbemarked“readable”.Therefore,anyattempttofetchawordfromthispagewillcauseanexception.
Inordertoallowthetraphandlertofetchtheinstruction(asdata),thisexceptionmustbesuppressed.ThisisthefunctionoftheMXRbitinthestatusregister.Whensetto1,datareadsareallowedfrompagesmarked“executable”;whensetto0,suchreadswillcauseanexception.
AdditionalCSRs
mepc–ProgramCounterforMachineModeTrapHandlermscratch–ScratchRegisterforMachineModeTrapHandlermcause–CauseCodeforMachineModeTrapHandler
WhentheMachineModetraphandlerisinvoked,thepreviousvalueoftheprogramcounterisstoredintomepc;thisvaluewillbeusedattheendofthetraphandler,whenanMRETinstructionisexecuted.Themepcregisterisread/writeandcanbewrittenbysoftware(e.g.,byanOSduringathreadswitch)tocausetheMRETtojumpintoanarbitrarythread.
WhenaMachineModetraphandlerisinvoked,thecodemustbecarefulnottooverwriteandlosethepreviousvaluesofanyregisters,sincetheywerebeingusedbytheinterruptedcode.Insteadtheregistersmustbeimmediatelysaved.However,itisvirtuallyimpossibletodoanythingwithouttheuseofatleastonegeneralpurposeregister.
Thisisthepurposeofthemscratchregister,whichisaread/writeregister.ThetraphandlercanswapmscratchwithanygeneralpurposeregisterandthenusethenormalISAinstructionstosavetheremainingregisters.
RecallthattheCSRRWinstructionwillexchangethevalueinanarbitraryCSRwithageneralpurposeregister.Thisinstructionaccomplishesthetaskofswappingmscratchwithageneralpurposeregister,therebysavingthegeneralpurposeregister.Thisgivesthesoftwarearegisterloadedwithabasevalue,whichcansubsequentlybeusedtosaveallremainingprocessorstate.
RISC-VArchitectureSummary/Porter Page� of�280 323
Chapter8:PrivilegeModes
Themscratchregistercannotbechangedbytheexecutionofthecodeatlowerprivilegelevels.Thereforeitcanbereliedontocontainaknownvalue,uncorruptedbyuser-levelcode.Typicallyitmightcontainaframeorstackpointerorapointertoa“registersavearea.”Inanycase,uponentrytothetraphandler,mscratchcanbeswappedintoageneralpurposeregisterandthenusedtosavewhicheverregistersthetraphandlerwillbeneeding.
Commentary:SomeearlierISAdesignsusedageneralpurposeregisterforthisfunction.Theideawasthatoneregisterwould(byconvention)bereservedfortheOSkernel.Wheneveraninterruptoccurred,theOSkernelwouldbefreetousethisregisterinsavingthestateoftheinterruptedprocess.Therefore,thisregisterwasuselesstouser-levelcode.Thisregistercouldnotbeusedtoholdavaluesinceanytimeaninterruptoccurred,itwouldbemodi2iedbytheOStraphandler.
Thisdesignapproachreducedthenumberofavailablegeneralpurposeregistersavailabletouser-levelcode.
Furthermore,theOScouldnotdependontheregisterremainingunchangedduringuser-levelcodeexecution;thisnecessitatedloadingtheregisterwithaknownvaluebeforeitcouldbeused.
Also,ifnotconsistentlyzeroedbeforereturningfromOScodetouser-levelcode,datacouldpotentiallyleakfromtheOStouser-levelcode,presentingsecurityconcerns.
Themcauseregistercontainsanumericcodetoindicatewhatcausedthetrap.ItiswrittenbythehardwarewhenanexceptioncausestheMachineModetraphandlertobeinvoked.Herearethepossibleexceptioncausevalues:
RISC-VArchitectureSummary/Porter Page� of�281 323
Chapter8:PrivilegeModes
NumericCode Description0 Instructionaddressmisaligned1 Instructionaccessfault2 IllegalInstruction(orprivilegedinstr.violation)3 Breakpoint4 Loadaddressmisaligned5 Loadaccessfault6 Store/AMOaddressmisaligned7 Store/AMOaccessfault8 EnvironmentcallfromU-Mode9 EnvironmentcallfromS-Mode11 EnvironmentcallfromM-Mode12 Instructionpagefault13 Loadpagefault15 Store/AMOpagefault
MAX+0 0x80000000 UserSoftwareInterruptMAX+1 0x80000001 SupervisorSoftwareInterruptMAX+3 0x80000003 MachineSoftwareInterrupt
MAX+4 0x80000004 UserTimerInterruptMAX+5 0x80000005 SupervisorTimerInterruptMAX+7 0x80000007 MachineTimerInterrupt
MAX+8 0x80000008 UserExternalInterruptMAX+9 0x80000009 SupervisorExternalInterruptMAX+11 0x8000000B MachineExternalInterrupt
Details:Eachcausehasanumericcode.Thecodeschemeusesthesignbittoshowwhetherthecauseisasynchronousexception(MSB=0)oraninterrupt(MSB=1).Forasynchronousinterrupts,asmallcodenumber(0,1,2,…)isstoredinthelower-orderbits,andthesign-bitisalsoset.Sointheabovetable,MAXstandsfor231fora32-bitmachine,263fora64-bitmachine,etc.Tomakethisclear,wegivetheactualvalueinhexforRV32.
Commentary:A“privilegedinstructionviolation”isaninstructionthatexistsbutcannotbeexecutedinthecurrentprivilegemode.Anillegalinstructionisaninvalidinstructionencodingwhichdoesnotrepresentade2inedinstruction.Concerningexceptionsraisedbyeitheroftheseviolations,RISC-Vmakesno
RISC-VArchitectureSummary/Porter Page� of�282 323
Chapter8:PrivilegeModes
distinctionbetween“privilegedinstruction”and“illegalinstruction.”Bothcausethe“illegalinstruction”exception.
sepc–ProgramCounterforSupervisorModeTrapHandlersscratch–ScratchRegisterforSupervisorModeTrapHandlerscause–CauseCodeforSupervisorModeTrapHandler
uepc–ProgramCounterforUserModeTrapHandleruscratch–ScratchRegisterforUserModeTrapHandlerucause–CauseCodeforUserModeTrapHandler
Theseregistersarede2inedanalogously,tobeusedinthetraphandlersrunninginSupervisororUserMode.
Somecausevaluescannotshowupinthecauseregisters.Forexample,aSupervisorModetraphandlercannoteverbeconfrontedwithaMachineModetimerinterrupt.
Questions:Butcan’tanyinterruptbedelegated;woulditstillremainaMachineModetimerinterrupt,orwoulditbealteredtoaSupervisorModetimerinterrupt???
Thereisno“Loadaddressmisaligned”codeforthescauseregister,implyingthatsuchaexceptioncannotbehandledinSupervisorMode.Whynot???
Also,theof2icialdocumentationdoesnotincludeacausecodefor“EnvironmentcallfromS-Mode”asapossiblevalueinscause;isthisamistakeinTable4.2???
scounteren–CounterEnableRegisterforUserModemcounteren–CounterEnableRegisterforSupervisororUserMode
Coderunninginalowerprivilegelevel(suchasUserMode)mayperiodicallytrytoreadthevarioustimerandcounters.Coderunningatahigherlevel(suchasanOSorhypervisor)maywishtointerceptallaccessestocountersandtimersinordertofoolthelowerprivilegecode.Perhapsthehypervisorwantstopresenttheillusiontothelowerlevelcodethatitisrunningonabaremachinewhen,infact,itisnot.Thiscouldalsobeusefulforviruseswhichwanttohidetheirexistence.
TheseCSRregisterareusedtocontrolaccesstothefollowingcounter/timesCSRs:cycle,time,instret,hpmcounter3,hpmcounter4,…hpmcounter31.
RISC-VArchitectureSummary/Porter Page� of�283 323
Chapter8:PrivilegeModes
ThepurposeofthescounterenregisteristoallowSupervisorModecodetodeterminewhetheraccessestooneofthecounter/timerregistersbycoderunninginUserModeshouldcauseanexceptionornot.
ThepurposeofthemcounterenregisteristoallowMachineModecodetodeterminewhetheraccessestooneofthecounter/timerregistersbycoderunninginUserModeorSupervisorModeshouldcauseanexceptionornot.
Thereisonebitforeachofthe32counters/timers.0=disabled,causeanexception;1=enabled,noexception.
Emulationoftime/mtimeAccesses:AnotheruseofthisCSRisasfollows:RecallthattherealtimeclockmtimeisnotactuallyaCSR,butisexpectedtobeimplementedasamemory-mappedI/Odevice.HoweverthereisatrueCSRnamedtime,whichismeanttobeamirrorofmtime.BytrappingallreadstothetimeCSR,MachineModecodecanperformedtheI/OtoretrievethetimefrommtimeandproperlyemulatethereadtothetimeCSR.
SystemCallandReturnInstructions
EnvironmentCall(SystemCall)
GeneralForm:ECALL
Example:ECALL # no operands
Comment:Thisinstructionisusedtoinvokeatraphandler.Thisinstructioncausesoneofthefollowingexceptions,dependingonthecurrentprivilegelevel:
EnvironmentCallfromUserModeEnvironmentCallfromSupervisorModeEnvironmentCallfromMachineMode
Anyandallargumentsandreturnedvaluesmustbepassedinregisters.Encoding: ThisisavariantofanI-typeinstruction.
RISC-VArchitectureSummary/Porter Page� of�284 323
Chapter8:PrivilegeModes
Break(InvokeDebugger)
GeneralForm:EBREAK
Example:EBREAK # no operands
Comment:Thisinstructionisusedtoinvokeadebugger,bycausingan“Breakpoint”exception.Typicallythedebuggingsoftwarewillinsertthisinstructionatvariousplacesintheapplicationcodesequence,inordertogaincontrolfromanexecutingprogram.
Encoding: ThisisavariantofanI-typeinstruction.
MRET:MachineModeTrapHandlerReturn
GeneralForm:MRET
Example:MRET # no operands
Comment:ThisinstructionisusedtoreturnfromatraphandlerthatisexecutinginMachineMode.TheMPP2ieldofthestatusregisterwillbeconsultedtodeterminewhichmodetoreturnto(eitherm,s,oru).ThevalueintheMPIE2ieldwillbecopiedtoMIE,whichwillrestorethe“interruptsenabled”statetowhatitwaswhenthehandlerwasinvoked.ThereturnwillbeeffectedbycopyingthesavedprogramcounterfrommepctotheProgramCounter(pc).
ThisinstructionmayonlybeexecutedwhenrunninginMachineMode.Encoding: ThisisavariantofanI-typeinstruction.
SRET:SupervisorModeTrapHandlerReturn
GeneralForm:SRET
RISC-VArchitectureSummary/Porter Page� of�285 323
Chapter8:PrivilegeModes
Example:SRET # no operands
Comment:ThisinstructionisnormallyusedtoreturnfromatraphandlerthatisexecutinginSupervisorMode.TheSPP2ieldofthestatusregisterwillbeconsultedtodeterminewhichmodetoreturnto(eithersoru).ThevalueintheSPIE2ieldwillbecopiedtoSIE,whichwillrestorethe“interruptsenabled”statetowhatitwaswhenthehandlerwasinvoked.ThereturnwillbeeffectedbycopyingthesavedprogramcounterfromsepctotheProgramCounter(pc).
ThisinstructionmaybeexecutedwhenrunningineitherSupervisorModeorMachineMode.
Encoding: ThisisavariantofanI-typeinstruction.
URET:UserModeTrapHandlerReturn
GeneralForm:URET
Example:URET # no operands
Comment:ThisinstructionisnormallyusedtoreturnfromatraphandlerthatisexecutinginUserMode.UserModetraphandlersalwaysreturntoUserModecode.ThevalueintheUPIE2ieldwillbecopiedtoUIE,whichwillrestorethe“interruptsenabled”statetowhatitwaswhenthehandlerwasinvoked.ThereturnwillbeeffectedbycopyingthesavedprogramcounterfromuepctotheProgramCounter(pc).
Thisinstructionmaybeexecutedinanymode.Encoding: ThisisavariantofanI-typeinstruction.
WFI:WaitForInterrupt
GeneralForm:WFI
RISC-VArchitectureSummary/Porter Page� of�286 323
Chapter8:PrivilegeModes
Example:WFI # no operands
Comment:Thisinstructioncausestheprocessortosuspendinstructionexecution.Whenanasynchronousinterruptoccurs,theprocessorwillwakeupandresumeexecution.Thetraphandlerwillbeinvokedand,uponreturntothecodesequencecontainingtheWFIinstruction,thenextinstructionfollowingtheWFIwillbeexecuted.
Typically,softwarewillexecutethisinstructionwithinan“idle”loop,whichisonlyexecutedwhenthereisnothingtobedone.Suspendinginstructionexecutionmaysavepower.Inasystemwithmultiplecores,thisinstructionmayalsoprovideahinttotheinterruptcircuitrytoroutefutureinterruptstothiscore,sinceitisidling.
Thisinstructionmaybeimplementedasanop.Forsimplerprocessors,itmaybebettertojustspininanidleloop.Complexsystemsmayhaveaninexhaustiblesupplyofbackgroundprocesses,makingalow-powerwait-statepointless.
Ifinterruptsare(globally)disabledwhenaWFIinstructionisexecuted,thenthisdisablingisignored.Instructionexecutionwillresumewhenaninterruptoccurs.Seethespecfordetails.
Encoding: ThisisavariantofanI-typeinstruction.
RISC-VArchitectureSummary/Porter Page� of�287 323
Chapter9:VirtualMemory
PhysicalMemoryAttributes(PMAs)
Mainmemoryisthelargerangeofbyte-addressablelocations.Eachlocationhasa“physicaladdress”.
Insystemswithoutvirtualmemorymapping,eachandeveryaddressgeneratedbyaninstructionisaphysicalmemoryaddress.However,manysystemswillsupportvirtualmemorymapping,wherethereisadistinctionbetweenvirtualaddressesandphysicaladdresses.Aninstruction(e.g.,inaREADorWRITEinstruction)willgenerateavirtualaddressandthememorymanagementunit(MMU)willtranslatethisvirtualaddressintoaphysicaladdress,whichisthenusedbythehardwaretosendorretrievedatato/fromthemainmemorystore.
Thephysicaladdressspacecanbepopulatedby:
•Normalmemory •Memory-mappedI/Odevices •Empty(i.e.,unpopulatedholes)
I/Odevicesarecommonly“memorymapped”,whichmeanstheysitonthesamebusasnormalmemory.Thedeviceisassignedaspeci2icphysicaladdress(orsetofaddresses).SoftwarecansenddatatoanI/Odeviceby“writing”toamemorylocationpopulatedbythedeviceandcanretrievedatafromthedeviceby“reading”fromsuchanaddress.Fromthepointofviewoftheexecutingsoftware,thedevicelooksabitlikeanyothermemorylocation,butreadingandwritingtomemory-mappedI/Olocationsisdone,nottostoredata,butforthepurposeofsendingdatato,receivingdatafrom,andotherwisecontrollinganI/Odevice.
Tohandledvariousscenarios,thephysicaladdressspacewillbepartitionedintoregions.Theregionswillbecontiguous,one-after-the-other,non-overlappingandwillcovertheentirephysicaladdressspace.Everybyteinthephysicaladdressspacewillbeinexactlyonephysicalmemoryregion.
RISC-VArchitectureSummary/Porter Page� of�288 323
Chapter9:VirtualMemory
Eachregioncanbecharacterizedbydifferentattributes.Forexample:
Istheregionpopulatedwithmainmemory?Istheregionpopulatedwithmemory-mappedI/Oregisters?
Istheregion(i.e.,unpopulated,empty)? Doestheregionsupportreads(i.e.,datafetches)? Doestheregionsupportwrites? Doestheregionsupportexecution(i.e.,instructionfetches)? Doestheregionsupportatomicoperations? Istheregioncached? Istheregionsharedwithothercores,orprivate?
Collectively,theseare“physicalmemoryattributes”(PMAs).Someregionswillhavetheirattributes2ixedandunchangeable,otherregionswillbecon2igurableuponpower-up(i.e.,cold-pluggable),andotherregionsmighthavetheirattributeschangeduringsystemoperations(i.e.,hot-pluggable).
Memoryattributesmustcheckedconstantlyduringexecution.Forexample,aREADfromanunpopulatedmemoryregionoughttocauseanexception.
WhereshallthePMAinformationbestored?Oneapproachistostoretheinformationinthevirtualmemorytables,andrequirethememorymanagementunit(MMU)tocheckandenforcetheconstraints.Theremaybeproblemswhenthepagesizeofthememorymanagementunitdoesn’tmatchthesizeoftheregions.
IntheRISC-Vdesign,thereisaseparatesubsysteminthecorewhichisconcernedwithcheckingandenforcingthephysicalmemoryattributes.Thissubsystemiscalledthe“PMAchecker”.
Whenaninstructionattemptstoaccessmemory,theaddressis2irsttranslatedfromvirtualtophysicaladdress(ifvirtualmemoryisturnedon).Then,thePMAcheckerwillverifythatthetypeofaccessislegal.Ifnot,anexceptionwillbegenerated.Thereareseveralexceptiontypesforphysicalmemoryattribute(PMA)violationsandthesearedifferentfromthevirtualmemoryfaultexceptions.
TheinformationaboutthevariousmemoryregionsandtheirassociatedPMAswillbekeptinprocessorregisters.Theseregisters(orpartsofthem)maybehardwiredforsomeregions(whichisappropriatewhentheregionitselfisprecon2iguredandnotmodi2iable,suchasanon-chipROM),ormaybereadable(forqueryingthebus
RISC-VArchitectureSummary/Porter Page� of� 289 323
Chapter9:VirtualMemory
todeterminewhatregionsarepresentandtheirPMAs),ormaybewritable(allowingthesoftwaretoestablishandmandatePMAs).Detailsareimplementationdependent.
ConcerningtheatomicoperationsofRISC-V,itisimportanttodeterminewhetheraparticularregioncansupportaparticularinstruction.
RegionspopulatedbymainmemorymustsupportalltheAMOinstructions(AtomicMemoryOperations)andLR/SCinstructions(assumingtheseinstructionsareevenimplemented,ofcourse).
RegionspopulatedbymemorymappedI/OdevicesarenotexpectedtosupporttheLR/SCinstructions.
Memory-mappedregionsmayormaynotsupporttheAMOoperations.IfsucharegionsupportsanyAMOinstructions,thenitmustsupportAMOSWAPataminimum.TheregionmayalsooptionallysupportotherAMOinstructions.HerearethelevelsofAMOsupportthatmaybeimplemented:
•NoAMOoperationssupported •OnlyAMOSWAPsupported •AMOSWAP,AMOAND,AMOOR,AMOXOR •AllAMOoperationsaresupported
ConcerningtheFENCEinstructions,recallthataccessesareclassi2iedasbeing“memory”or“I/O”.Tosupportthis,everyregionofthephysicalmemoryaddressspacemustbeclassi2iedaseither:
•mainmemory •I/O
Itispossiblethattwodifferentcores(orotherdevicessittingonthephysicalmemorybus)mayaccessthesamememorylocationsmore-or-lesssimultaneously.Accessestomainmemoryregionsare“relaxed”,whichmeansthattheexactorderingoftheoperationsisindeterminate(unlessatomicinstructionsareused,ofcourse).Theprogrammercanuseatomicinstructionstoforceparticularorderings,butfornormalinstructions,therearenoguaranteesaboutordering.Asinglecorewillseeitsownoperationsexecutedintheordertheywereissued,butreadsandwritesfromothercoresordevicescanappearatanytime,inanyorder.
RISC-VArchitectureSummary/Porter Page� of� 290 323
Chapter9:VirtualMemory
Accesstomemory-mappedregionsmayeitherbe“relaxed”or“strong”.Iftheregionisde2inedtohaverelaxedordering,thentheorderingofoperationsisunde2ined,asjustdescribed.Iftheregionhasstrongordering,thenitwillappeartoonecore(A)thattheoperationsexecutedbysomeothercore(B)wereallexecutedintheordertheywereactuallyissuedbycore(B).
Thissummaryisasimpli2icationoftheRISC-Vspec.Thespecisdesignedtoapplytoaspectrumofdifferentkindsofsystems,fromsimpletocomplex.Moderncomplexsystemsmayhavemultipleparallelchannelsbetweencores,devices,andmainmemorystores.ImaginethatallthedevicesareconnectedbyanetworkwiththepropertiesoftheInternet:messagesaresentbetweendevices,cantakedifferentpaths,andcanbearbitrarilyreorderedintransit.Ontheotherhand,olderandsimplersystemshaveasingle,simplebusthatdoesonethingatatimeandreorderingisinconceivable;thesesystemsmayusedevicedriversthatweredesignedwithoutconcernforreorderingandonegoalistosupportsucharchitecturesandlegacyprogramming.
CachePMAs
qqqqq
PMAEnforcement
TheenforcementofPhysicalMemoryAttributes(PMAs)isoptional.
ARISC-VcoremayoptionallyprovideaPhysicalMemoryProtection(PMP)unit.PhysicalmemoryprotectionisimplementedusingasmallnumberofMachineModeCSRsthatdescribethememoryregionsandtheirPMAs.
TheCSRregistersencodethefollowinginformationforeachregion:
•Thestartingaddressoftheregion •Whethertheregionis4,8,16,32,…bytesinlength •DoestheregionsupportREADoperations? •DoestheregionsupportWRITEoperations? •DoesheregionsupportEXECUTIONfetches?
RISC-VArchitectureSummary/Porter Page� of� 291 323
Chapter9:VirtualMemory
•Isthisregion“locked”?
Upto16regionscanbedescribedintheregisters.Thisinformationiscleverlyencodedin16registersplusacoupleofadditionalregisters.
ThePMPUnitisactivewhenrunninginSupervisororUserMode;allaccessesgeneratedarechecked.TheprotectionmayormaynotbeenforcedwhenrunninginMachineMode.
SincetheseregistersareMachineModeCSRs,theycanonlybeaccessedinMachineMode.
Thereisasingle“Lock”bitforeachoneofthe16regionsanditworksasfollows.Ifthelockbitis0,thenMachineModeaccessesarenotchecked;onlySupervisorandUserModeaccessesarechecked.
IftheLockbitisset,thenallMachineModebitsarecheckedaswell.
OnceaLockbitisset,itcannotbecleared.Alllockbitsareinitiallyclear;toclearalockbitrequiresthesystemtobefullyreset(e.g.,POWER-ON-RESET).Inotherwords,oncearegionbecomelocked,itstayslockedforever.Furthermore,allaccesses,regardlessofprivilegemode,willbechecked.
ThislockingschememightbeusefulforsecurebootstrappingcodewhichloadsOScodeintomemoryandthenlocksdowntheregioncontainingtheOS,topreventmalwarefromsubsequentlycorruptingtheOScode.
TheRISC-VspecsuggeststhattheCSRsthatdescribethememoryregionsmightgetswappedinandoutduringcontextswitches.
Forexample,oneprocess(adevicedriver)mightbeallowedtoaccessagivenmemoryregionwhileanotherprocess(auserthread)oughttobeforbiddenfromaccessingthesameregion.Asanotherexample,imagineahypervisorrunninginMachineModethatishostingtwodistinctoperatingsystems,whicharerunninginSupervisorMode.Eachoperatingsystemwillbegivenarangeofavailablemainmemorytouse.ToensurethateachOSstaysoutofthememoryregionusedbytheotherOS,theMachineModesoftwarewillusethePMPunittoenforceallaccesses.WhentimeslicingbetweentwoOSes,thehypervisorcoderunninginMachineModewillneedtoswaponesetofPMPregisterswiththeotherset.
RISC-VArchitectureSummary/Porter Page� of� 292 323
Chapter9:VirtualMemory
Todescribeeachmemoryregion,wemustsomehowspecifyastartingaddressandalength.Eachmemoryregionhascertainprotectionattributesassociatedwithitandthesemustalsobegiven.
Recallthattherecanbeupto16differentmemoryregions.Thereisessentiallyatablewith16entries,whichcandescribeupto16differentmemoryregions.
Theinformationabouteachregion(i.e.,eachtableentry)isencodedintoasinglefull-length“addressregister”(e.g.,32bits)andanadditional“con2igurationbyte”(8bits).Thelengthoftheregioniscleverlyencodedwithinthesebits,asdescribednext.
Thenamesoftheaddressregistersareasfollows.EachisafullCSR,e.g.,32bitsforRV32machines,andlongerforRV64.
pmpaddr0 pmpaddr1 … pmpaddr15
Thenamesofthecon2igurationbytesaregivennext.Eachisan8bitquantity.
pmp0cfg pmp1cfg … pmp15cfg
Thecon2igurationbytesarepackedinto4registers,withfourcon2igurationbytesperregister.(…atleastfortheRV32machines.For64-bitmachines,only2CSRsareneeded,since8con2igurationbytescanbepackedintoeachregister.)
Eachregionhasacon2igurationbyte,whereinthe8bitsareusedasfollows:
Numberofbits Meaning1 Regionisreadable1 Regioniswritable1 Regionisexecutable1 Regionislocked2 “A”2ield2 unused
RISC-VArchitectureSummary/Porter Page� of� 293 323
Chapter9:VirtualMemory
The“A”2ieldhasthesepossiblevalues:
Value Meaning00 Thisentryisnotused01 Top-of-rangeaddressencoding10 Regionis4byteslong11 Regionis>4byteslong
Notallofthe16regionsneedbepopulated;the00valueofthe“A”2ieldindicatesanunusedentryintheregiontable.
Eachregionmusthavealengththatisapowerof2.Furthermore,thelengthmustbe4orlarger;thatis:4,8,16,32,64,…Theregionmustbenaturallyaligned,givenitslength.Forexample,aregionofsize1,024muststartonanaddressthatisevenlydivisibleby1,024.
Iftheregionhassize4,the“A”2ieldwillbe01andtheaddress2ieldwillcontainthestartingaddressoftheregion.
Iftheregionhasalargersize(suchas8,16,32,64,…),the“A”2ieldwillbe11andtheaddressregisterwillcontainboththestartingaddressandlength.Forexample,consideraregionwithsize1,024.Howisthisencodedintheaddressregister?Theregionmustbenaturallyalignedsothestartingaddresswilllooklikethis,inbinary:
XXXXXXXXXXXXXXXXXXXXXX0000000000
Thelast10bitsofthestartingaddresswillalwaysbe0.Thelengthoftheregion(1,024inthisexample)willbeencodedinthesezerobits,asfollows:
XXXXXXXXXXXXXXXXXXXXXX0111111111
Thehardware,bylookingatthevalueintheaddressregisterandbylocatingtherightmost0bit,candirectlyinferthesizeoftheregion.
Okay,it’smorecomplexthatthis.For32bitmachines,theregiontablecanactuallybeusedtodescriberegionsanywhereina34bitaddressspace.Theextra2bitsofaddressareachievedasfollows.
RISC-VArchitectureSummary/Porter Page� of� 294 323
Chapter9:VirtualMemory
Foraregionofsize4bytes,thelast2bitsmustbe0,sotheyarenotrepresented.All32bitsintheaddressregisterareusedforbits[33:2]oftheaddress,withbits[1:0]implicitlybeing00.
Foraregionofsize8bytes,theaddressregisterwillcontain:
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX0
Herethereare31signi2icantbits.Addingintheimplicitbits000(sincetheregionis8-bytesinsize),gives34bitsofstartingaddress.
Foraregionofsize16bytes,theaddressregisterwillcontain:
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX01
Herethereare30signi2icantbits.Addingintheimplicitbits0000(sincetheregionis16-bytesinsize),gives34bitsofstartingaddress.
Andsoonforlargerregions.ForRV64,thissameshiftingby2bitsalsooccurs.Allphysicaladdressesarelimitedto56bits,soonlythelowerorder54bitsintheaddressregistersareused.
Thereisa2inalwayofdescribingaregion’sstartingpointandsize.Ifthe“A”2ieldis01(“top-of-range”)thentheaddressregisterisinterpreteddifferently.Thisencodingschemecanbeusedwhentheregionsarecontiguousandeachregionfollowsdirectlyafterthepreviousregion.Inthisencoding,theaddressregisterdoesnotspecifythestartoftheregion,buttheendingaddressoftheregion.(Actually,itisonepastthelastaddress.)Withthiscodeforthe“A”2ield,thestartingaddressisgivenbythepreviousaddressregister(oraddress0forthe2irsttableentry).
Whenthe“top-of-range”encodingisused,theshiftingby2bitsstilloccurs.Thus,allphysicaladdressesare34bitsandeachregionmustbeginona4bytealignedaddress.
Thereasonthat34-bitand56-bitphysicaladdressesarespeci2iedisthatthepagingsystemsusethesesizes.Inparticular,theSv32pagingschemeuses34-bitphysicaladdressesandtheSv39andSv48pagingschemesuse56-bitphysicaladdresses.
Wheneveranaccesstophysicalmemoryisattempted,theprocessorwill2irstconsulttheprotectiontable.Thelowestnumberedentrythatmatchestheaddress
RISC-VArchitectureSummary/Porter Page� of� 295 323
Chapter9:VirtualMemory
willbeusedtodetermineiftheaccessisallowed.Iftheaccessisnotallowed,thenasynchronousexceptionwillberaised(i.e.,“loadaccessexception”,“storeaccessexception”,or“instructionaccessexception”).Seethespecforcompletedetailsonwhichentryisselected.
VirtualMemory
VirtualMemory(i.e.,pagetablesandvirtual-to-physicaladdresstranslation)ishandledinSupervisorMode.
ThereareseveralvirtualmemoryschemesdescribedintheRISC-Vspec,called“bare”,“Sv32”,“Sv39”,and“Sv48”.The“bare”optionmeansnovirtualaddresstranslationoccurs.
A32bitmachine(RV32)maysupporta2levelpagetable(butnot3levelsor4levels).A64bitmachine(RV64)maysupporta3leveltableora4leveltable(butnota2leveltable).
PageTable Virtual Physical Depth AddressSpace AddressSpace
RV32: Bare Sv32 2levels 32bits,4GiBytes 34bits,16GiBytes
RV64: Bare Sv39 3levels 39bits,512GiBytes 56bits,64PiBytesSv48 4levels 48bits,256TiBytes 56bits,64PiBytesSv57 …tobedeZinedlater… Sv64 …tobedeZinedlater…
Thesizeofthephysicaladdressesisimplementationde2ined.Thenumbersgivenabovearethemaximums,butinaparticularimplementationthephysicaladdressspacewilloftenbesmaller.
Regardlessofwhichvirtualmemoryschemeisused,thepagesizeis4,096(4KiBytes).Thus,byteoffsetsintoapagerequire12bits,since212=4,096.Allpagesmustbealignedonapageboundary,sothelower12bitsofanypageaddressarealways000000000000.
RISC-VArchitectureSummary/Porter Page� of� 296 323
Chapter9:VirtualMemory
WhatisaPageTable?Apagetableisatreewhereeverynodeisapage.Thesepagesareeachkeptinmainmemoryandaddressedwithphysicaladdresses.
Thepagesattheleafsarecalled“datapages”andcontaintheactualbytesinthevirtualaddressspace.
Theinteriornodesconstitutethepagetableitselfandallowthedatapagestobelocated.Therootpageplustheinteriornodesconstitutethepagetableandarenotvisibletotheuserprocess.Thedatapagesconstitutethevirtualaddressspace(oratleastthepartofitthatispopulated)andcontainthedatabytesthattheuserprocesswillaccess.
Eachinteriorpagecontainsanarrayof“PageTableEntries”(PTEs).EachPTEpointstoachildnode.SomePTEsareconsider“leafPTEs”andpointdirectlytoadatapage.OtherPTEspointtolowerlevelsinthepagetabletree.
IntheSv32addressingscheme,thepagetableisonly2levelsdeep.PageTableEntriesare4byteslong.Therefore,wehave:
1rootpage,containing1,024PTEs 1,024interiorpages,with1,024PTEseach 1,024×1,024datapages,of4,096byteseach
Thus,avirtualaddressspacecontains4GiBytes.
Theterm“pagetable”isoftenusedimpreciselytomeaneitherasingle4,096byteinteriornodeinthetree,ortheentirecollectionofinteriorpagesmakingupthewholetree,notincludingthedatapages.
Thesatp(SupervisorAddressandTranslationRegister)istheCSRthatcontrolsaddresstranslation.
Thesatpregistercontainsthree2ields:
MODE Isaddresstranslationturnedonornot? ASID AddressSpaceIdenti2ier PPN PhysicalPageNumber,i.e.,addressofpagetable’srootpage
ForRV32systems,thesatpregisteris32bits,arrangedasfollows:
RISC-VArchitectureSummary/Porter Page� of� 297 323
Chapter9:VirtualMemory
�
ForRV64systems,thesatpregisteris64bits,arrangedasfollows:
�
TheMODE2ieldindicateswhethervirtualmemoryaddresstranslationisturnedonornotand,ifso,whichpagingtableschemeisbeingused.
ForRV32systems,theMODE2ieldcanhavethesevalues:
0 addresstranslationturnedoff(i.e.,“bare”mode)1 Sv32addresstranslationisturnedon
ForRV64systems,themode2ieldcanhavethesevalues:
0000 addresstranslationturnedoff(i.e.,“bare”mode)1000 Sv39addresstranslationisturnedon1001 Sv48addresstranslationisturnedonother unused/reservedforSv57,Sv64
The“bare”modeindicatesthatthereisnovirtual-to-physicaltranslation.Alladdressesarephysical.Regardlessofmode,theenforcementofphysicalmemoryattributes(i.e.,usingthe“pmp…”registers,describedelsewhere)isstilloperative.
RISC-VArchitectureSummary/Porter Page� of� 298 323
Chapter9:VirtualMemory
ThePhysicalPageNumber(PPN)2ieldcontainsthephysicaladdressoftherootpageofthepagetabletree.
Sinceallpagesmustbealignedona4,096byteboundary,thelast12bitsoftheaddressofanypagemustbe0.ForRV32,thePhysicalPageNumber(PPN)2ieldis22bitswide,whichissuf2icienttoaddressanypageina34bitphysicaladdressspace.ForRV64,thePPN2ieldis44bitswide,whichissuf2icienttoaddressanypageina56bitphysicaladdressspace.
Eachuserprocesswillpresumablyruninadistinctaddressspace.EachaddressspaceisgivenanAddressSpaceIdenti2ier(ASID).ForRV32machines,theASIDis9bits(allowingfor29=512differentaddressspaces).ForRV64machines,theASIDis16bits(allowingfor216=65,536differentaddressspaces).
ThesatpregistercontainstheASIDofthecurrentlyexecutingprocess.
OverviewofTLBs:Tomakeaddresstranslationfastenoughforvirtualmemorytobefeasible,pagetableentriesmustbecachedinan“addresstranslationcache”.Suchacacheistraditionallycalleda“TranslationLookasideBuffer”,andsoisabbreviatedas“TLB”.
TheTLBwillcontainasmallnumberofpagetableentries.WhenaFETCH,LOADorSTOREtomemoryoccurs,theaddresstranslationhardwarewill2irstchecktheaddresstranslationcache(TLB).IftheTLBcontainsamatchingentry,theaddresstranslationhardwarewilluseittogenerateaphysicaladdressimmediately,whichismuchfasterbecauseweavoidgoingtomainmemorytoreadthepagetabletreetolocatethedatapage.
TheTLBwhichisorganizedasanassociativememory,keyedonvirtualpagenumber.IfaTLBentryispresent,thentheentrywillsupplythephysicalpagenumber.Iftheentryisabsent,thentheaddresstranslationprocessmustgotomemorytofetchtheappropriateentryfromthepagetabletree.SomeTLBentrywillbeevictedandthenewentrywillbeplacedintheTLB.Theaddresstranslationwillthenproceed.
UpdatingtheTLB,whichincludeswalkingthein-memorypagetabletreeto2indtheappropriateentryandwritingtheevictedentrybacktomemory,isacomplexandtime-consumingtask.Forfasterperformance,TLBupdatingandpage-tablewalkingareperformedinhardwareonmanysystems.However,insimpler
RISC-VArchitectureSummary/Porter Page� of� 299 323
Chapter9:VirtualMemory
implementations,walkingthepagetableandupdatingtheTLBareperformedbysoftware.
Thewaysoftwareupdatingworksisasfollows:WhenthereisnovalidentryintheTLBduringanaddresstranslation,anexceptionwillbesignaled.Thehardwareneverattemptstowalkthein-memorypagetables.Theexceptionishandledbyatraphandler(whichmayberunninginMachineMode).Thetraphandlerwillessentiallyemulatethemissinghardwareinsoftwareandwillwalkthein-memorypagetableandupdateoftheTLBdirectly.Howthisoccursisasoftwareissue,notcoveredintheRISC-Vspec.
Fromtimetotimecontextswitcheswilloccurandthecurrentlyexecutingprocesswillchange.Sinceeachprocesscanoccupyadifferentaddressspace,theentriesintheTLBmaynotbeappropriateforthenewlyexecutingprocess.ThepurposeoftheAddressSpaceIDistomakesurethataprocessonlyusespagetableentriesthatapplytoit.EachentryintheTLBwillbetaggedwithanASID;onlyentriesthatmatchthecurrentlyexecutingprocess’sASID(asstoredinthesatpregister)willbeused.
ThespecdoesnotrequireASIDstobeimplementedand,iftheyareimplemented,thefullrangeofvaluesneednotbesupported.
Thespecmentionsthatwritingtothesatpregistermayrequirecarefulconcurrencycontrol.ItcouldbethataddresstranslationsusingolderpagetablesoperateconcurrentlyandanupdatetosatpmayrequiretheSFENCE.VMAinstructioninordertoensurecorrectnessofcode.
Theideaisthat,aslongastherearenottoomanydistinctaddressspacesandtheASID2ieldislargeenough,thentheASIDmechanismwillpreventoneprocessfromusingpagetableentriesfromthewrongaddressspace.Butwithalargernumberofaddressspaces,itmaybenecessaryto2lushtheaddress-translationcache,i.e.,to2lushtheTLB.Wemayalsoneedto2lushtheTLBwhenchangesaremadetoarunningprocess’saddressspace,i.e.,toapagetablethatiscurrentlyinuse.Forexample,ifweremoveadatapagefromtheaddressspace,itisnotsuf2icienttosimplyinvalidatetherelevantPageTableEntryinthepagetablenodestoredinmemory.WealsoneedtoensurethattherearenoentriesstillsittingintheTLB.
Comment:NotethatbystoringtheMode,ASID,andPhysicalPageNumberoftherootofthepagetableinasingleregister,itallowsallthreetobeupdatedatomicallywithasingleCSRinstruction.
RISC-VArchitectureSummary/Porter Page� of� 300 323
Chapter9:VirtualMemory
SFENCE.VMA:SupervisorFenceforVirtualMemory
GeneralForm:SFENCE.VMA vaddr,asid
Example:SFENCE.VMA x5,x8
Comment:Thisinstructionisusedtoimposeorderonaccessestomemoryandupdatestopagetables.Inparticular,itisdesignedto2lushtheaddresstranslationcache(TLB).
WhenevervirtualmemoryisturnedonandaFETCH,LOADorSTOREtomemoryismade,addresstranslationmustoccursotherewillalsobeimplicitREADsandWRITEstothepagetablesstoredinmemory.Normally,theaccesstothepagetableinmemorycanbeavoidedandtheaccesswillbemadetotheaddresstranslation(TLB)cacheinstead.Butwheneverchangestothepagetablesinmemoryaremade,weneedto2lushobsoleteentriesintheTLB.Thisisthepurposeofthisinstruction.
WhentheOSupdatesapagetablenode,itwillissueaWRITEtomemory.InordertomakesurethataWRITEtothepagetableisseenbeforeallsubsequentmemoryaccesses,thisinstructionmustbeexecuted.ItensuresthatallWRITEstomemorythatoccurearlierintheinstructionstreamwillbecompletedandvisiblebeforeanyimplicitREADsorWRITEStothepagetabletriggeredbysubsequentoperationsintheinstructionstream.SincetheTLBiscachingpagetableentries,thisinstructionmustalsoinvalidateaffectedentries,forcingthememorymanagementunittogobacktomemorytoretrievecurrentpagetableentries.
[ItwouldseemnecessarytoalsoforceallearlieraddresstranslationstocompletebeforetheWRITEtothepagetablesoccurs,butthespecdoesnotmentionthisorderingconstraint.???]
Toimposea2inerlevelofgranularity,thisinstructioncanspecifyaspeci2icvirtualaddressand/oraspeci2icaddressspace.Reg1containsavirtualaddress;Reg2containsanAddressSpaceID.
IftheASIDisspeci2iedasx0,thenentriesforalladdressspacesareaffected;otherwise,onlyentriesforthegivenaddressspaceareaffected.
RISC-VArchitectureSummary/Porter Page� of� 301 323
Chapter9:VirtualMemory
Ifthevirtualaddressisspeci2iedasx0,thenalllevelsofthepagetableareaffected;otherwise,onlytherelevantleafPTEinthepagetableisaffected.
IfboththeASIDandthevirtualaddressaregivenasx0,thenitcausesacomplete2lushoftheTLB.SimpleimplementationsarefreetoalwaysignoretheASIDandvirtualaddressspaceandperformaglobalTLB2lushwheneverthisinstructionisexecuted.
Itispossiblethatseveralcoresaresharingmemoryandthattwothreadsrunningondifferentcoresaresharingasingleaddressspaceandasinglepagetabledatastructureinmemory.Ideally,anyupdatetothispagetableshouldbeseenbyallthreadsonallcores.However,thisinstructiononlyaffectsasinglecore.Tosynchronizewithotherthreadsonothercores,theOSmustuseotherinstructions.
Encoding: ThisisavariantofanR-typeinstruction,wheretheRegD2ieldisunused.
Sv32(Two-LevelPageTables)
IntheSv32addresstranslationmode,avirtualaddressis32bits.Thisispartitionedintoa20-bitvirtualpagenumber(VPN)anda12bitoffset.The20-bitVirtualPageNumberistranslatedbythememorymanagementunitintoa22-bitPhysicalPageNumber.ThePhysicalPageNumberisthenconcatenatedwiththeoffsettogivethe34-bitphysicaladdress.
�
RISC-VArchitectureSummary/Porter Page� of� 302 323
Chapter9:VirtualMemory
Thesizeofeachpageis4,096bytes,andeachbytewithinapagecanbeaddressedby12bits,since212=4,096.Allpagesmustbeproperlyalignedona4,096byteboundary.
EachPageTableEntry(PTE)is4bytes.Thus,eachpagecanhold1,024entries.Withinapage,eachentrycanbeaddressedbya10-bitaddress,since210=1,024.
Thepagetableisorganizedasa2-leveltree,whereallnodesareinterior(i.e.,non-data)pages.Therootpageisatthetopofthetreeandcontains1,024pagetableentries.Eachentrycontainsapointertoasecondlevelpage.Thesecondlevelpagescontainpagetableentrieswhichpointtothedatapages.Thesecondlevelpagesarereferredtoas“leaf”pages;thedatapagescanbeconsideredtoliebelowtheleafpages.
APageTableEntry(PTE)is4bytesandcontainsthefollowing2ields:
FieldSize Meaning 22 PhysicalPageNumber 2 RSW(Unde2ined;ReservedforSoftware) 1 D(Dirty) 1 A(Accessed) 1 G(Globalmapping) 1 U(Useraccessible) 1 V(Valid) 3 XWR(Executable,Writable,Readable)
�
TheVirtualPageNumber,whichis20bits,isdividedintotwo2ieldsof10bitseach,calledVPN[1]andVPN[0].
RISC-VArchitectureSummary/Porter Page� of� 303 323
Chapter9:VirtualMemory
Theuppermost10bits(VPN[1])isusedtoselectanentryintherootpage.ThisentrycontainsaPhysicalPageNumber(PPN)2ield,whichaddressesasecondlevelpage.
Thesecond10bitsoftheVirtualPageNumber(VPN[0])isusedtoselectanentryinthesecondlevelpage.Thisyieldsthe2inalpagetableentry(calleda“leaf”pagetableentry),whichcontainsthePhysicalPageNumberofthepagecontainingtheactualdata.
Thisdiagramshowsthis.
�
Thereisasinglerootpageandupto1,024secondlevelpages.Eachpageinthepagetable(thatis,therootpageandallsecondlevelpages)contains1,024PageTableEntries.Theentriesatthebottomofthetreearecall“leaf”entriesandpointtothedatapages.Leafentriesaredistinguishedfromnon-leafentriesbytheXWR(Execute-Write-Read)permissionbits.IfXWR=000,thentheentryisnotaleafentry.
RISC-VArchitectureSummary/Porter Page� of� 304 323
Chapter9:VirtualMemory
�
Thereisafacilityfor“megapages”.Amegapageisadatapagecontaining4MiBytes.Anoffsetintoamegapagewillbe22bits,since222=4×1,024×1,024=4,194,304.Amegapagemustbealignedtoa4MiByteboundary.
TheXWRbitsinapagetableentryindicatewhetherthatentryisaleafentry(meaningthattheentrypointstoadatapage)oranon-leafentry(meaningthattheentrypointstoanotherpageinthepagetabletree).IftheXWRbitsare000,thentheentryisanon-leafentry,pointingtothenextlevelinthepagetabletree.IftheXWRbitsarenot000,thentheyindicatethattheentrypointstoadatapageandtheygivethe“execute”,“read”,and“write”permissionsforthedatapage.
Inthecaseofmegapages,thereisnosecondlevelpageinthetree.Instead,thePageTableEntryintherootpagepointsdirectlytoadatapage,i.e.,amegapage.
RISC-VArchitectureSummary/Porter Page� of� 305 323
Chapter9:VirtualMemory
�
Thereareacoupleofbene2itsofplacingmegapagesinavirtualaddressspacethatmayalsocontainnormal4,096bytedatapages.
First,wheneverdatafromapageis2irstreferenced,theMemoryManagementUnitmustwalkthepagetableto2indtheappropriateleafpagetableentry.Withmegapages,onlytherootpagetablemustbeconsulted,sincethereisnosecondlevelpageinvolved.Thisreducesthenumberofmemoryaccessesbyone.(ThiscouldmeansubstantialsavingsifthePTEneedstobere-loadedfrequently,duetoTLBcontentionand/orneedstobeupdatedinmemoryuponTLB2lushing).
Thesecondadvantageisthatasingleentryintheaddresstranslationcache(theTLB)willcover4MiBytes,insteadof4KiBytes.ThismeansthatfewerentriesintheTLBwillbeneededforeachaddressspace.TheTLBisasigni2icantbottleneckandreducingcontentioniscritical.
[Forexample,asimpleaddressspacemayhaveoneblockofmemorymarked“read/execute”(fortheprogramandconstants)andasecondblockmarked“read/write”(forstaticvariablesandthestack).Sinceneitherofthesewillnormallyexceed4MiBytes,only2entriesintheTLBwouldbeneededfortheentireaddressspace.TLBentriesarepreciousandfewinnumber,sothisreducespressureontheTLB.]
Howevermegapagescomewithdrawbacks.First,theywilloftenbelargerthannecessary.(Thisiscalledthe“internalfragmentationproblem”.)Theextraspaceinthemegapagesrepresentsrealphysicalmemoryand,ifnotneededbytheprocess,thismemoryiseffectivelywasted.Forexample,ifanaddressspaceonlyrequires64
RISC-VArchitectureSummary/Porter Page� of� 306 323
Chapter9:VirtualMemory
KiBytes,butisallocatedtwomegapages,thenapproximately99%ofthememoryiswasted,since222–216≈222.Ifconcurrentprocessesaredoinglotsofcomplexdatasharingbysharingvirtualmemorypagesthisfragmentationproblemcanbecomeworse.
Second,theOSkernelmustbeabletoallocatelargeblocksofcontiguousmemory.Thisisnotaproblemifthememoryisdividedintoacollectionofequalsizedpages.Butiftherearemultiplepagesizes(e.g.,1,024and4,194,304)thenthingsgettrickier.Afterall,thisisessentiallytheproblemthatvirtualmemorywasdesignedtoaddress.(Thisiscalledthe“externalfragmentationproblem”.)
Next,welookatthemeaningofthebitsinthePageTableEntry(PTE).
TheValidbit(V)indicateswhetherthisPTEisvalidornot(1=valid,0=invalid).IfaninvalidPTEisencounteredbytheMemoryManagementUnit,thenanaccessfaultissignaled.(Theexceptionwillbeeithera“instructionaccessexception”,“readaccessexception”,or“storeaccessexception”asappropriate).IfthePTEisinvalid,theremainingbitsareignoredandcanbeusedbysoftwareasdesired.
TheUserAccessiblebit(U)controlswhetherthedatacanbeaccessedinSupervisorModeorUserMode.Typically,aprocess’saddressspacewillbedividedintotwoparts,sometimescalledthe“bottomhalf”andthe“tophalf”.ThebottomhalfcanonlybeaccessbycoderunninginUserMode;datainthetophalfcanonlybeaccessedwhenrunninginSupervisorMode.U=1meansbottomhalf;U=0meantophalf.IfUserModecodeattemptstoaccessdatainthetophalf,anaccessfaultwilloccur.IfSupervisorModecodeattemptstoaccessdatainthebottomhalf,anaccessfaultwilloccur.
However,theSUMbitintheSupervisorStatusword(sstatus)canbeusedtoallowSupervisorModecodetoaccesspagesinthebottomhalf.Normally,SupervisorcodewillrunwithSUM=0,inwhichcaseanaccessfaultwilloccur.Butoccasionally(e.g.,whendataispassedtotheOSinakernelcall)SupervisorCodewillsetSUM=1forashorttime.Duringthistime,LOADandSTOREinstructionswilloperateasifrunninginUserMode.Thisallowsthekernelroutinestoretrieveorstoreparametersinthebottomhalf(i.e.,intheuserportionofthevirtualaddressspace).
TheUbitisonlymeaningfulforleafPTEs;fornon-leafPTEs,thisbitwillbe0.
TheGlobalMappingbit(G)concernspagesthataresharedamongalladdressspaces.G=1meansgloballysharedbyalladdressspaces;G=0meanslocaltoasingle
RISC-VArchitectureSummary/Porter Page� of� 307 323
Chapter9:VirtualMemory
addressspaceandnotshared.Ifapageisgloballyshared,itmeansthatitismappedintoalladdressspaces.Inotherwords,theAddressSpaceID(ASID)checkingissuppressed.Forexample,asetofcommonlyused,read-onlyfunctionsanddatacanbemappedintoalladdressspaces,makingtheseroutinesavailableatessentiallynocosttoalladdressspaces.Thesecouldbeuser-levelfunctionspresentinall“bottomhalves”orcouldbeSupervisorModecodethatisplacedintoall“tophalves”.
Thespecindicatesthatthisbitismeaningfulforbothleafandnon-leafPTEs.Whensetinanon-leafPTE,itmeanstheentirepagetabletreebelowisglobalandsharedbyalladdressspaces.
[???ItisnotclearwhytheGbitisspeci2iedasbeingmeaningfulinnon-leafPTEs.IfaleafPTEthatismarked“global”isstoredintheTLB,thenthatTLBentry(andthedatapageitrefersto)isshared.ButifthereisnoleafPTEintheTLB,thentheMemoryManagementUnitwillneedtowalkthepagetabletreetolocatetheleafPTE.Duringthiswalk,itwouldnotseemtomakeanydifferencewhethersomepagetablepageissharedornot;eachpageofthepagetabletreestillhastobeaccessed.PerhapstheRISC-Vspecenvisionsputtingnon-leafPTEsintheTLB.???]
TheAccessedbit(A)issetbytheMemoryManagementUnitwheneverabyteonthedatapageisread,written,orfetchedforexecution.Insometextbooksthisbitiscalledthe“referenced”bit.ThisbitisonlymeaningfulforleafPTEs;fornon-leafPTEs,thisbitwillbe0.
TheDirtybit(D)issetbytheMemoryManagementUnitwheneverabyteonthedatapageiswrittento.ThisbitisonlymeaningfulforleafPTEs;fornon-leafPTEs,thisbitwillbe0.
DetailsonUpdatingaPageTableEntry:WejustsaidthattheAccessedBitandtheDirtyBitaresetbythehardwarewheneveranybyteinthedatapageisaccessedorupdated.(ThesearetheonlybitsinthePageTableEntry(PTE)thatwouldeverbemodi2iedbyhardware,butanymodi2icationwillrequirethatthisPTE,whenevictedfromtheTLBinthefuture,mustbewrittenbacktothein-memorypagetable.)
HowevertheRISC-Vspecalsoallowsthehardwaretoworkasecondway.Inthisalternativeapproach,apagetableentryisnevermodi2iedbythehardware.Thissimpli2iesthehardwaresincePTEsneedneverbewrittenbacktomemory.Instead,the“A”and“D”bitsaremerelychecked.Iftheyarenotalreadyset
RISC-VArchitectureSummary/Porter Page� of� 308 323
Chapter9:VirtualMemory
correctly,anexceptionissignaled.Thensoftwareinthetraphandlercantakeactionstowritetheupdatedpagetableentrybacktomemory.
Insomecases,the“A”and“D”bitsmaynotbeneeded.Forexample,ifsomedatapageisalwaysresidentinmemoryandneverswappedouttobackingstore,thenthe“A”and“D”bitscanbeignored.Oralso,ifamemory-mappedI/Odeviceismappedintosomevirtualaddressspace,thenthe“A”and“D”bitscanbeignored.Insuchcases,thebitscanbepresetto“1”toobviatetheneedforupdating.
TheExecutable(X),Writable(W),andReadable(R)bitsindicatewhetherthisPTEisaleafornon-leafentryand,forleafentries,theyindicatetheaccesspermissionsforthedataonthedatapage.IfXWR=000,thenthisPTEisnotaleafentry;thePhysicalPageNumberintheentrypointstothenextlower-levelpageinthepagetabletree.
TheRSWbitsarenotusedbythehardwareandarefreelyavailabletosoftwaretobeusedasdesired.
TheSv32PageTableAlgorithm
Wheneveranaccess(read,write,orfetch-for-execution)occurs,thefollowingstepsaretakenbytheMemoryManagementUnit,tomapavirtualaddressintoaphysicaladdress.
(TheRISC-VspecgivesthisalgorithminageneralformapplicabletoSv32,Sv39,andSv48,i.e.,two-,three-,andfour-levelpagetables.Inordertomakeiteasiertounderstand,theversiongivenheresimpli2iesandspecializesthealgorithmtotwo-leveltables.)
Inputs: va–thevirtualaddresstobetranslated(32bits),withthefollowingparts: va.VPN[1]–theuppermost10bits;offsetintothetoplevelpagetable va.VPN[0]–thefollowing10bits;offsetintothesecondlevelpagetable va.OFFSET–thelower12bits;offsetintothedatapage Thedesiredaccesstype(READ,WRITE,orFETCH) Thecurrentprivilegemode(UorS;noaddresstranslationisdoneinMmode) satp–theSupervisorAddressTranslationandProtectionCSR satp.PPN–thepagenumberoftherootpage.
RISC-VArchitectureSummary/Porter Page� of� 309 323
Chapter9:VirtualMemory
status.SUM–the“PermitSupervisorUserMemoryAccess”bitinthestatusreg status.MXR–the“MakeExecutableReadable”bitinthestatusreg
Outputs: pa–physicaladdress(34bits) or accessexception–signalapagefaultexceptionandaborttranslation
Tempvariables: a–pointerto(i.e.,physicaladdressof)apageinmemory p–pointerto(i.e.,physicaladdressof)aPTEinmemory pte–aPageTableEntry,asfetchedfrommemory
Algorithm: Compute“a”,theaddressofroottop-levelpageinthepagetable a!satp.PPN||000000000000 Compute“p”,theaddressofthePTEinthetoplevelpage p!a+(va.VPN[1]||00) Readfrommemory“pte”,aPageTableEntryfromtheroot-levelpage pte!memory[p] IfthePhysicalMemoryProtection(PMP)systemindicatesproblems thensignalanexception IfthisPTEisnotvalid(i.e.,pte.V=0orpte.XWRisaninvalidvalue) thensignalanexception Ifpte.XWR≠000then… /*ThisPTEisaleafPTE,pointingtoamegapage.*/ Ifpte.PPN[0]≠0000000000,thensignalanexception Computethephysicaladdress pa!pte.PPN[1]||va.VPN[0]||va.OFFSET Else(i.e.,pte.XWR=000)… /*WedonothavealeafPTE,solookatthesecondlevel.*/ Compute“a”,theaddressofthesecondlevelpagetablepage a!pte.PPN[1]||pte.PPN[0]||000000000000 Compute“p”,theaddressofthePTEinthesecondlevelpage p!a+(va.VPN[0]||00) Readfrommemory“pte”,aPageTableEntryfromthesecondlevelpage pte!memory[p] IfthePhysicalMemoryProtection(PMP)systemindicatesproblems thensignalanexception IfthisPTEisnotvalid(i.e.,pte.V=0orpte.XWRisaninvalidvalue)
RISC-VArchitectureSummary/Porter Page� of� 310 323
Chapter9:VirtualMemory
thensignalanexception Ifpte.XWR=000,thenwedonothavealeafPTE… thensignalanexception Computethephysicaladdress pa!pte.PPN[1]||pte.PPN[0]||va.OFFSET /*“pte”containstheleafPageTableEntry.*/ MakesurethisaccessisallowedbycheckingtheX,W,R,andUbits…. Ifpte.R≠1,thensignalanexception Ifpte.W≠1andthisisaWRITE,thensignalanexception Ifpte.X≠1andthisisaFETCH,thensignalanexception Ifthecurrentmodeis“User”… Ifpte.U=0,thensignalanexception Ifthecurrentmodeis“Supervisor”… IfthisisaREAD&pte.R≠1&status.MXR≠1,thensignalanexception Ifpte.U=1andstatus.SUM=0,thensignalanexception Checkthe“accessed”bit… Ifpte.A≠1,thensetit Checkthe“dirty”bit… IfthisisaWRITEandpte.D≠1,thensetit Ifthepte.Aorpte.Dbitswerechanged… Either •Signalanexception(andletsoftwareupdatethepagetable) •WritethePageTableEntrybacktomemory. Thismustbeatomicwithrespecttotheearlierreadingofthepte. IfthePMPsystemindicatesproblems,thensignalanexception
Sv39(Three-LevelPageTables)
BoththeSv39andSv48virtualaddressingschemesarenaturalextensionsoftheSv32scheme.Theprimarydifferenceisthattheysupportlargervirtualaddressspacesandpagetablesthatare3-levels(forSv39)and4-levels(forSv48),insteadof2-levels.
IfyouunderstandSv32then–forallintentsandpurposes–youalreadyunderstandSv39andSv48.
WithSv39,virtualaddressesare39bitsandphysicaladdressesare56bits.
RISC-VArchitectureSummary/Porter Page� of� 311 323
Chapter9:VirtualMemory
�
WecansummarizetheSv39virtualaddressingschemeasfollows:Sv39isidenticaltoSv32,withtheseexceptions:
•ItmayonlybeimplementedonRV64systems,notonRV32systems. •Virtualaddressesare39bits. •Themaximumvirtualaddressspaceis239=512GiBytes. •Thephysicaladdressesare56bits. •Thepagesizeisunchanged(i.e.,4,096bytes). •Thepagetableshave3levels,insteadof2levels. •ThePageTableEntriesare64bits,insteadof32bits. •Eachpageofthepagetablecontains512PTEs,insteadof1,204PTEs. •Offsetsintothepagetablesare9bits(insteadof10bits)since29=512. •TheRSW,D,A,G,U,X,W,R,andVbitsworkidentically. •Megapagesare2MiBytes,insteadof4MiBytes. •“Gigapages”of1GiBytearealsosupported.
WithSv39,thepagetablewillhavethreelevels:
RISC-VArchitectureSummary/Porter Page� of� 312 323
Chapter9:VirtualMemory
�
Ona64-bitmachine,addressesareinitially64bitslong,sincethisistheregisterlength.ForSv39,onlytheleastsigni2icant39bitsareusedtoaddressintothevirtualaddressspace.Theremaining25bitsmustbethesign-extension(notzero-2illed)oftheleastsigni2icantbits.Otherwise,anexceptionwillbesignaled.
Sv48(Four-LevelPageTables)
WithSv48,virtualaddressesare48bitsandphysicaladdressesare56bits.
RISC-VArchitectureSummary/Porter Page� of� 313 323
Chapter9:VirtualMemory
�
WecansummarizetheSv48virtualaddressingschemeasfollows:Sv48isidenticaltoSv32andSv39,withtheseexceptions:
•ItmayonlybeimplementedonRV64systems,notonRV32systems. •IfSv48issupported,thenSv39mustalsobesupported. •Virtualaddressesare48bits. •Themaximumvirtualaddressspaceis248=256TiBytes. •Thephysicaladdressesare56bits. •Thepagesizeisunchanged(i.e.,4,096bytes). •Thepagetableshave4levels. •ThePageTableEntriesare64bits,thesameasSv39. •Eachpageofthepagetablecontains512PTEs,thesameasSv39. •Offsetsintothepagetablesare9bits,thesameasSv39. •TheRSW,D,A,G,U,X,W,R,andVbitsworkidentically. •Megapagesare2MiBytes,thesameasSv39. •Gigapagesare1GiByte,thesameasSv39. •“Terapages”of512GiBytesarealsosupported.
Ona64-bitmachine,addressesareinitially64bitslong,sincethisistheregisterlength.ForSv48,onlytheleastsigni2icant48bitsareusedtoaddressintothevirtual
RISC-VArchitectureSummary/Porter Page� of� 314 323
Chapter9:VirtualMemory
addressspace.Theremaining16bitsmustbethesign-extension(notzero-2illed)oftheleastsigni2icantbits.Otherwise,anexceptionwillbesignaled.
RISC-VArchitectureSummary/Porter Page� of� 315 323
Chapter10:WhatisNotCovered
Besidestheextensionswehavealreadydescribed,theRISC-Vspecmentionsseveraladditionalextensionswhichmayormaynotbeimplemented.Thespeci2icationsfortheseextensionsareinastateof2lux,sowewillsimplymentiontheirexistencehere.
TheDecimalFloatingPointExtension(“L”)
Mostprogrammersarefamiliarwithbinary2loatingpoint,butthereisalsosuchathingasarepresentationof2loatingpointnumbersusingadecimalbase.TheRISC-Vspeccontainsnothingspeci2icforthisextensionandonlymentionsitasfuturework.
TheBitManipulationExtension(“B”)
TheRISC-Vspeccontainsnothingspeci2icforthisextensionandonlymentionsitasfuturework.
TheDynamicallyTranslatedLanguagesExtension(“J”)
Highlevellanguagesoftenrequiredynamictypingchecking,garbagecollectionanddynamic(just-in-time)compiling,soincludingspecializedinstructionstosupportthesecanimproveperformance.However,theRISC-Vspeccontainsnothingspeci2icforthisextensionandonlymentionsitasfuturework.
TheTransactionalMemoryExtension(“T”)
RISC-VArchitectureSummary/Porter Page� of�316 323
Chapter10:WhatisNotCovered
TheRISC-Vspeccontainsnothingspeci2icforthisextensionandonlymentionsitasfuturework.
ThePackedSIMDExtension(“P”)
RecallthatSIMDstandsfor“singleinstruction,multipledata”.Thegoalistoperformalargenumberof2loatingpointoperationssimultaneously.Thereisdataparallelism,butnotfullconcurrencysincethesameoperationisperformedonacollectionofvalues.Asingleoperation(suchas“multiply”)isperformedinparallelonNvalues,followedbythenextoperationafterallNmultiplicationshavecompleted.
Theideawiththisextensionisthateach2loatingpointregisterwillholdmorethanonevalue.Forthisextension,the2loatingpointregisterswilllikelybelargerthan32or64bits.Forexample,inoneimplementationeach2loatingpointregistermightbe1,024bitswide.Thuseachregistercanholdmorethanonevalue.Since16×64=1,024,eachregistercanhold16doubleprecisionvaluesatonce.Whenanarithmeticinstruction(suchas“add”or“multiply”)isexecuted,theoperationisperformedonall16valuessimultaneously.
Thestatusofthisextensionisunsettled.Itmaybedroppedaltogetherinfavorofthe“V”VectorSIMDExtension.
TheVectorSIMDExtension(“V”)
The“V”VectorSIMDextensionisdesignedtoaccommodatelargevectorsandSIMDoperation.Itincludesasetof32vectorregisters(v0,v1,…v31),anumberofnewcon2igurationCSRregisters,andadditionalinstructions.TheRISC-Vspeci2icationisdif2iculttounderstand.Furthermore,thisextensionissubjecttorevisionandisnotyet“frozen”.
PerformanceMonitoring
RISC-VArchitectureSummary/Porter Page� of� 317 323
Chapter10:WhatisNotCovered
RISC-Vde2inesacollectionofCSRregistersdevotedtomeasuringhardwareperformance:
mhpmcounter3 mhpmcounter4 … mhpmcounter31
mhpmevent3 mhpmevent4 … mhpmevent31
Thismechanismisnotdescribedinthespec,beyondsayingthatthe“events”tobecountedcanbeselectedbysettingmhpmevent3,mhpmevent4,….
Debug/TraceMode
TheRISC-Vspecmentionsanothermodecalled“DebugMode”andseveralrelatedControlandStatusRegisters(CSRs)namedtselect,tdata1,tdata2,andtdata3,butthespecdoesnotgiveanydetail.
RISC-VArchitectureSummary/Porter Page� of� 318 323
AcronymListABI ApplicationBinaryInterfaceAEE ApplicationExecutionEnvironmentAMO AtomicMemoryOperationASID AddressSpaceIDCSR ControlandStatusRegistercycle CSR[C00]:Clockcyclecounter,UserModecycleh CSR[C80]:Upperhalfofcycle,UserMode(RV32only)dcsr CSR[7B0]:Debugcontrolandstatusdpc CSR[7B1]:DebugPCdscratch CSR[7B2]:ScratchregisterDZ DivideByZero(abitwithintheFloatingPointFlags,FFLAGS)fcsr CSR[003]:FloatingPointControlandStatusReg(frm||f2lags)f2lags CSR[001]:Floatingpointing2lagsFFLAGS FloatingPointFlags(i.e.,NX,UF,OF,DZ,NV)frm CSR[002]:DynamicroundingmodeFRM FloatingPointRoundingModeFS Fieldinstatusregister;statusof2loating(clean/dirty/…)HART HardwareThreadHBI HypervisorBinaryInterfaceHEE HypervisorExecutionEnvironmentHEIP hypervisorexternalinterrupthpmcounter3 CSR[C03]:Eventcounter#3,UserModehpmcounter31 CSR[C1F]:EventCounter#31,UserModehpmcounter31h CSR[C9F]:Upperhalfofcounter,UserMode(RV32only)hpmcounter3h CSR[C83]:Upperhalfofcounter,UserMode(RV32only)hpmcounter4 CSR[C04]:EventCounter#4,UserModehpmcounter4h CSR[C84]:Upperhalfofcounter,UserMode(RV32only)HSIP hypervisorsoftwareinterruptHTIP hypervisortimerinterruptinstret CSR[C02]:Numberofinstructionsretired,UserModeinstreth CSR[C82]:Upperhalfofinstret,UserMode(RV32only)IR InstructionRegisterISA InstructionSetArchitecture LR LinkRegister marchid CSR[F12]:ArchitectureID mcause CSR[342]:Trapcausecode,MachineMode mcounteren CSR[306]:Counterenable,MachineMode
RISC-VArchitectureSummary/Porter Page� of� 319 323
AcronymList
mcycle CSR[B00]:Clockcyclecounter,MachineModemcycleh CSR[B80]:Upperhalfofcycle,MachineMode(RV32only) medeleg CSR[302]:Exceptiondelegationregister,MachineMode MEIP machineexternalinterrupt mepc CSR[341]:PreviousvalueofPC,MachineModemhartid CSR[F14]:HardwarethreadID mhpmcounter3 CSR[B03]:Eventcounter#3,MachineMode mhpmcounter31 CSR[B1F]:EventCounter#31,MachineMode mhpmcounter31h CSR[B9F]:Upperhalfofcounter,MachineMode(RV32only) mhpmcounter3h CSR[B83]:Upperhalfofcounter,MachineMode(RV32only) mhpmcounter4 CSR[B04]:EventCounter#4,MachineMode mhpmcounter4h CSR[B84]:Upperhalfofcounter,MachineMode(RV32only)mhpmevent3 CSR[323]:Eventselector#3,MachineModemhpmevent3 CSR[324]:Eventselector#4,MachineModemhpmevent31 CSR[33F]:Eventselector#31,MachineModemideleg CSR[303]:Interruptdelegationregister,MachineModemie CSR[304]:Interrupt-enableregister,MachineModeMIE Bitinstatusregister;MachineModeInterruptEnablemimpid CSR[F13]:ImplementationIDminstret CSR[B02]:Numberofinstructionsretired,MachineModeminstreth CSR[B82]:Upperhalfofinstret,MachineMode(RV32only)mip CSR[344]:Interruptpending,MachineModemisa CSR[301]:ISAandextensionsMMU MemoryManagementUnitMPIE Bitinstatusregister;MachineModePreviousInterruptEnableMPP Fieldinstatusregister;MachineMode–PreviousPrivilegeModeMPRV Bitinstatusregister;ModifyPrivilege mscratch CSR[340]:Tempregisterforuseinhandler,MachineModeMSIP machinesoftwareinterruptmstatus CSR[300]:Statusregister,MachineModeMTIP machinetimerinterruptmtval CSR[343]:Badaddressorbadinstruction,MachineModemtvec CSR[305]:Traphandlerbaseaddress,MachineModemvendorid CSR[F11]:VendorIDMXR Bitinstatusregister;MakeExecutableReadableNMI Non-maskableinterruptNSE NonstandardextensionNV InvalidOperation(abitwithintheFloatingPointFlags,FFLAGS)NX Inexact(abitwithintheFloatingPointFlags,FFLAGS)OF Over2low(abitwithintheFloatingPointFlags,FFLAGS)
RISC-VArchitectureSummary/Porter Page� of� 320 323
AcronymList
PC ProgramCounterPMA PhysicalMemoryAttributePMP PhysicalMemoryProtectionUnitpmpaddr0 CSR[3B0]:PMPAddress#0pmpaddr1 CSR[3B1]:PMPAddress#1pmpaddr15 CSR[3BF]:PMPAddress#15pmpcfg0 CSR[3A0]:PMPCon2igurationword#0pmpcfg1 CSR[3A1]:PMPCon2igurationword#1pmpcfg2 CSR[3A2]:PMPCon2igurationword#2pmpcfg3 CSR[3A3]:PMPCon2igurationword#3PPN PhysicalPageNumberPTE PageTableEntryRISC ReducedInstructionSetComputerRM RoundingModeBitsROM ReadOnlyMemoryRV128/RV128I Instructionspresentonlyon128bitmachinesRV32/RV32I Thebasicinstructionset,presentonallmachinesRV64/RV64I Instructionspresentonlyon64and128bitmachinessatp CSR[180]:Addresstranslationandprotectionsatp SupervisorAddressTranslationandProtectionCSRSBI SupervisorBinaryInterfacescause CSR[142]:Trapcausecode,SupervisorModescounteren CSR[106]:Counterenable,SupervisorModeSD Fieldinstatusregister,summarizesFSandXS2ieldssedeleg CSR[102]:Exceptiondelegationregister,SupervisorModeSEE SupervisorExecutionEnvironmentSEIP supervisorexternalinterruptsepc CSR[141]:PreviousvalueofPC,SupervisorModesideleg CSR[103]:Interruptdelegationregister,SupervisorModesie CSR[104]:Interrupt-enableregister,SupervisorModeSIE Bitinstatusregister;SupervisorModeInterruptEnableSIMD Singleinstruction,multipledatasip CSR[144]:Interruptpending,SupervisorModeSP StackPointerSPP Fieldinstatusregister;SupervisorMode–PrevPrivilegeModeSPIE Bitinstatusregister;SupervisorModePrevInterruptEnablesscratch CSR[140]:Tempregisterforuseinhandler,SupervisorModeSSIP supervisorsoftwareinterruptsstatus CSR[100]:Statusregister,SupervisorModeSTIP supervisortimerinterrupt
RISC-VArchitectureSummary/Porter Page� of� 321 323
AcronymList
stval CSR[143]:Badaddressorbadinstruction,SupervisorModestvec CSR[105]:Traphandlerbaseaddress,SupervisorModeSUM Bitinstatusregister;PermitSupervisorUserMemoryAccessSXL Fieldinstatusregister;Emulation:RegsizewheninSModetdata1 CSR[7A1]:Triggerdata#1tdata2 CSR[7A2]:Triggerdata#2tdata3 CSR[7A3]:Triggerdata#3time CSR[C01]:Currenttimeintickstimeh CSR[C81]:Upperhalfoftime(RV32only)TLB TranslationLookasideBuffertselect CSR[7A0]:TriggerregisterselectTSR Bitinstatusregister;TrapSRETinstruction TW Bitinstatusregister;Time-outWaitTVM Bitinstatusregister;TrapVirtualMemoryucause CSR[042]:Trapcausecode,UserModeUEIP userexternalinterruptuepc CSR[041]:PreviousvalueofPC,UserModeUF Under2low(abitwithintheFloatingPointFlags,FFLAGS)uie CSR[004]:Interrupt-enableregister,UserModeUIE Bitinstatusregister;UserModeInterruptEnableuip CSR[044]:Interruptpending,UserModeUPIE Bitinstatusregister;UserModePreviousInterruptEnableuscratch CSR[040]:Tempregisterforuseinhandler,UserModeUSIP usersoftwareinterruptustatus CSR[000]:Statusregister,UserModeUTIP usertimerinterruptutval CSR[043]:Badaddressorbadinstruction,UserModeutvec CSR[005]:Traphandlerbaseaddress,UserModeUXL Fieldinstatusregister;Emulation:RegsizewheninUserModeVPN VirtualPageNumberWFI WaitForInterruptXOR Exclusive-OrXS Fieldinstatusregister;statusofextension(clean/dirty/…)
RISC-VArchitectureSummary/Porter Page� of� 322 323
AbouttheAuthorProfessorHarryH.PorterIIIteachesintheDepartmentofComputerScienceatPortlandStateUniversity.Hehasproducedseveralvideocourses,notablyontheTheoryofComputation.Recentlyhebuiltacompletecomputerusingtherelaytechnologyofthe1940s.Thecomputerhaseightgeneralpurpose8bitregisters,a16bitprogramcounter,andacompleteinstructionset,allhousedinmahoganycabinetsasshown.PorteralsodesignedandconstructedtheBLITZSystem,acollectionofsoftwaredesignedtosupportauniversity-levelcourseonOperatingSystems.Usingthesoftware,studentsimplementasmall,butcomplete,time-sliced,VM-basedoperatingsystemkernel.Porterhashabitofdesigningandimplementingprogramminglanguages,themostrecentbeingalanguagespeci2icallytargetedatkernelimplementation.
PorterholdsanSc.B.fromBrownUniversityandaPh.D.fromtheOregonGraduateCenter.
PorterlivesinPortland,Oregon.Whennottryingto2igureouthowhiscomputerworks,heskis,hikes,travels,andspendstimewithhischildrenbuildingthings.
ProfessorPorter’swebsite:www.cs.pdx.edu/~harry
�
RISC-VArchitectureSummary/Porter Page� of�323 323