Lecture 04 RISC-V ISA · • Floating-point arguments that are part of unions or array fieldsof...
Transcript of Lecture 04 RISC-V ISA · • Floating-point arguments that are part of unions or array fieldsof...
Lecture04RISC-VISA
CSCE513ComputerArchitecture
DepartmentofComputerScienceandEngineeringYonghong Yan
[email protected]://passlab.github.io/CSCE513
1
Acknowledgement
• Slidesadaptedfrom– ComputerScience152:ComputerArchitectureand
Engineering,Spring2016byDr.GeorgeMichelogiannakis fromUCB
• Referencecontents– CAQAA.9– CODtextbook,chapter2
2
Review:ISAPrinciples-- Iron-codeSummary
• SectionA.2—Usegeneral-purposeregisterswithaload-storearchitecture.• SectionA.3—Supporttheseaddressingmodes:displacement(withanaddressoffset
sizeof12to16bits),immediate(size8to16bits),andregisterindirect.• SectionA.4—Supportthesedatasizesandtypes:8-,16-,32-,and64-bitintegersand
64-bitIEEE754floating-pointnumbers.– Nowwesee16-bitFPfordeeplearninginGPU
• http://www.nextplatform.com/2016/09/13/nvidia-pushes-deep-learning-inference-new-pascal-gpus/
• SectionA.5—Supportthesesimpleinstructions,sincetheywilldominatethenumberofinstructionsexecuted:load,store,add,subtract,moveregister- register,andshift.
• SectionA.6—Compareequal,comparenotequal,compareless,branch(withaPC-relativeaddressatleast8bitslong),jump,call,andreturn.
• SectionA.7—Usefixedinstructionencodingifinterestedinperformance,andusevariableinstructionencodingifinterestedincodesize.
• SectionA.8—Provideatleast16general-purposeregisters,besurealladdressingmodesapplytoalldatatransferinstructions,andaimforaminimalistIS
– Oftenuseseparatefloating-pointregisters.– Thejustificationistoincreasethetotalnumberofregisterswithoutraisingproblemsin
theinstructionformatorinthespeedofthegeneral-purposeregisterfile.Thiscompromise,however,isnotorthogonal.
3
WhatisRISC-V• RISC-V(pronounced"risk-five”)isaISAstandard
– Anopensourceimplementationofareducedinstructionsetcomputing(RISC)basedinstructionsetarchitecture(ISA)
– TherewasRISC-I,II,III,IVbefore• MostISAs:X86,ARM,Power,MIPS,SPARC
– Commerciallyprotectedbypatents– Preventingpracticaleffortstoreproducethecomputersystems.
• RISC-Visopen– Permittinganypersonorgrouptoconstructcompatiblecomputers– Useassociatedsoftware
• Originatedin2010byresearchersatUCBerkeley– Krste Asanović,DavidPattersonandstudents
• 2017version2oftheuserspace ISAisfixed– User-LevelISASpecificationv2.2– DraftCompressedISASpecificationv1.79– DraftPrivilegedISASpecificationv1.10
4
https://riscv.org/https://en.wikipedia.org/wiki/RISC-V
GoalsinDefiningRISC-V
• AcompletelyopenISAthatisfreelyavailabletoacademiaandindustry• ArealISAsuitablefordirectnativehardwareimplementation,notjust
simulationorbinarytranslation• AnISAthatavoids"over-architecting"for
– aparticularmicroarchitecturestyle(e.g.,microcoded,in-order,decoupled,out-of-order)or
– implementationtechnology(e.g.,full-custom,ASIC,FPGA),butwhichallowsefficientimplementationinanyofthese
• RISC-VISAincludes– A smallbaseintegerISA,usablebyitselfasabaseforcustomizedacceleratorsor
foreducationalpurposes,and– Optionalstandardextensions,tosupportgeneral-purposesoftwaredevelopment– Optionalcustomerextensions
• Supportfortherevised2008IEEE-754floating-pointstandard
5
RISC-VISAPrinciples
• Generallykeptverysimpleandextendable• Separatedintomultiplespecifications
– User-LevelISAspec(computeinstructions)– CompressedISAspec(16-bitinstructions)– PrivilegedISAspec(supervisor-modeinstructions)– More…
• ISAsupportisgivenbyRV+word-width+extensionssupported– E.g.RV32Imeans32-bitRISC-VwithsupportfortheI(nteger)
instructionset
6
UserLevelISA
• Definesthenormalinstructionsneededforcomputation– A mandatoryBaseintegerISA
• I:Integerinstructions:– ALU– Branches/jumps– Loads/stores
– StandardExtensions• M:IntegerMultiplicationandDivision• A:AtomicInstructions• F:Single-PrecisionFloating-Point• D:Double-PrecisionFloating-Point• C:CompressedInstructions(16bit)
• G=IMAFD:Integerbase+fourstandardextensions– Optionalextensions
7
RISC-VISA
• Both32-bitand64-bitaddressspacevariants– RV32andRV64
• Easytosubset/extendforeducation/research– RV32IM,RV32IMA,
RV32IMAFD,RV32G
• SPEConthewebsite– www.riscv.org
8
RV32/64ProcessorState
• Programcounter(pc)• 3232/64-bitintegerregisters
(x0-x31)– x0alwayscontainsa0– x1toholdthereturnaddressona
call.
• 32floating-point(FP)registers(f0-f31)– Eachcancontainasingle- or
double-precisionFPvalue(32-bitor64-bitIEEEFP)
• FPstatusregister(fsr),usedforFProundingmode&exceptionreporting
9
RV64GInOneTable
10
Load/StoreInstructions
11
ALUInstructions
12
ControlFlowInstructions
13
RISC-VDynamicInstructionMixforSPECint2006
14
RISC-VHybridInstructionEncoding
• 16,32,48,64…bitslengthencoding• Baseinstructionset(RV32)alwayshasfixed32-bitinstructionslowesttwobits=112
• Allbranchesandjumpshavetargetsat16-bitgranularity(eveninbaseISAwhereallinstructionsarefixed32bits
15
FourCoreRISC-VInstructionFormats
16
Reg.Source2 Reg.Source1
7-bitopcode field(butlow2bits=112)
Additionalopcodebits/immediate
DestinationReg.
Alignedonafour-byteboundaryinmemory.Therearevariants!Signbitofimmediates alwaysonbit31ofinstruction.Registerfieldsnevermove.
https://github.com/riscv/riscv-opcodes/blob/master/opcodes
Additionalopcode bits
WithVariants
17
Reg.Source2 Reg.Source1
7-bitopcode field(butlow2bits=112)
Additionalopcodebits/immediate
DestinationReg.
Additionalopcode bits
Basedonthehandlingoftheimmediates
RISC-VEncodingSummary
ImmediateEncodingVariants
• 32-bitImmediateproducedbyeachbaseinstructionformat– Instructionbit:inst[y]
19
RISC-VAddressingSummary
,i.e.,displacementaddressing
R-FormatEncodingExample
add x6, x10, x6
0000 0000 0110 0101 0000 0011 0011 0011two =0065033316
funct7 rs2 rs1 rdfunct3 opcode7 bits 7 bits5 bits 5 bits 5 bits3 bits
0 6 10 60 51
0000000 00110 01010 00110000 0110011
RISC-VI-FormatInstructions
• Immediatearithmeticandloadinstructions– rs1:sourceorbaseaddressregisternumber– immediate:constantoperand,oroffsetaddedtobaseaddress
• 2s-complement,signextended
• DesignPrinciple: Gooddesigndemandsgoodcompromises– Differentformatscomplicatedecoding,butallow32-bitinstructions
uniformly– Keepformatsassimilaraspossible
immediate rs1 rdfunct3 opcode12 bits 7 bits5 bits 5 bits3 bits
RISC-VS-FormatInstructions
• Differentimmediateformatforstoreinstructions– rs1:baseaddressregisternumber– rs2:sourceoperandregisternumber– immediate:offsetaddedtobaseaddress
• Splitsothatrs1andrs2fieldsalwaysinthesameplace
rs2 rs1 funct3 opcode7 bits 7 bits5 bits 5 bits 5 bits3 bits
imm[11:5] imm[4:0]
IntegerComputationalInstructions(ALU)• I-type(Immediate),allimmediates inallinstructionsaresign
extended– ADDI:addssignextended12-bitimmediatetors1– SLTI(U):setlessthanimmediate– ANDI/ORI/XORI:Logicaloperations– SLLI/SRLI/SRAI:Shiftsbyconstants
24
I-typeinstructionsendwithI
IntegerComputationalInstructions(ALU)• I-type(Immediate),allimmediates inallinstructionsaresign
extended– LUI/AUIPC:loadupperimmediate/addupperimmediatetopc
25
I-typeinstructionsendwithI
• Writes20-bitimmediatetotopofdestinationregister.• Usedtobuildlargeimmediates.• 12-bitimmediates aresigned,sohavetoaccountforsignwhen
building32-bitimmediates in2-instructionsequence(LUIhigh-20b,ADDIlow-12b)
IntegerComputationalInstructions• R-type(Register)
– rs1andrs2arethesourceregisters.rd thedestination– ADD/SUB:– SLT,SLTU:setlessthan– SRL,SLL,SRA:shiftlogicalorarithmeticleftorright
26
ADDIx0,x0,0
ControlTransferInstructions
27
NOarchitecturallyvisibledelayslots• UnconditionalJumps:PC+offset target
– JAL:Jumpandlink,alsowritesPC+4tox1,UJ-type• Offsetscaledby1-bitleftshift– canjumpto16-bitinstructionboundary(Sameforbranches)
– JALR:JumpandlinkregisterwhereImm (12bits)+rd1=target
ControlTransferInstructions
28
NOarchitecturallyvisibledelayslots• ConditionalBranches:SB-typeandPC+offset target
12-bitsignedimmediatesplitacrosstwofields
Branches,comparetworegisters,PC+(immediate<<1)target(Signedoffsetinmultiplesoftwo).Branchesdonothavedelayslot
LoadsandStores
• Storeinstructions(S-type)– MEM(rs1+imm)=rs2
• Loads(I-type)– Rd=MEM(rs1+imm)
29
SpecificationsandSoftwareFromriscv.org andgithub.com/riscv
• SpecificationfromRISC-Vwebsite– https://riscv.org/specifications/
• RISC-Vsoftwareincludes– GNUCompilerCollection(GCC)toolchain(withGDB,thedebugger)
• https://github.com/riscv/riscv-tools– LLVMtoolchain– A simulator("Spike")
• https://github.com/riscv/riscv-isa-sim– StandardsimulatorQEMU
• https://github.com/riscv/riscv-qemu• OperatingsystemssupportexistsforLinux
– https://github.com/riscv/riscv-linux• AJavaScriptISAsimulatortorunaRISC-VLinuxsystemonaweb
browser– https://github.com/riscv/riscv-angel
30
RISC-VImplementations
• ForRISC-Vimplementation,theUCBcreatedChisel,anopen-sourcehardwareconstructionlanguagethatisaspecializeddialectofScala.– Chisel:ConstructingHardwareInaScalaEmbeddedLanguage– https://chisel.eecs.berkeley.edu/
• In-orderRocketcoreandchipgenerator– https://github.com/freechipsproject/rocket-chip
• Out-of-orderBOOMcore– https://github.com/ucb-bar/riscv-boom
• UCBSodorcoresforeducation(singlecycle,and1-5stagespipeline)– https://github.com/ucb-bar/riscv-sodor
31
RISC-VImplementations
• Alistfrom– https://riscv.org/risc-v-cores/
• TheIndianIIT-MadrasisdevelopingsixRISC-Vopen-sourceCPUdesigns(SHAKTI)forsixdistinctusages– https://shaktiproject.bitbucket.io/index.html
• SiFive HiFive Unleashed– FirstLinuxRISC-VBoard
• Firstshipment:June2018– https://www.sifive.com/– https://github.com/sifive/freedom
32
AdditionalInformation
33
CallingConvention
• CDatatypes andAlignment– RV32employsanILP32integermodel,whileRV64isLP64– Floating-pointtypesareIEEE754-2008compatible– Allofthedatatypesarekeeped naturallyalignedwhenstoredinmemory– charisimplicitlyunsigned– InRV64,32-bittypes,suchasint,arestoredinintegerregistersaspropersignextensionsof
their32-bitvalues;thatis,bits63..31areallequal• Thisrestrictionholdsevenforunsigned32-bittypes
34
CallingConvention
• RVGCallingConvention– IftheargumentstoafunctionareconceptualizedasfieldsofaCstruct,eachwith
pointeralignment,theargumentregistersareashadowofthefirsteightpointer-wordsofthatstruct• Floating-pointargumentsthatarepartofunionsorarrayfields ofstructuresarepassedin
integerregisters• Floating-pointargumentstovariadic functions(exceptthosethatareexplicitlynamedin
theparameterlist)arepassedinintegerregisters– Theportionoftheconceptualstruct thatisnotpassedinargumentregistersis
passedonthestack• Thestackpointersppointstothefirstargumentnotpassedinaregister
– Argumentssmallerthanapointer-wordarepassedintheleast-significant bitsofargumentregisters
– Whenprimitiveargumentstwicethesizeofapointer-wordarepassedonthestack,theyarenaturallyaligned• Whentheyarepassedintheintegerregisters,theyresideinanalignedeven-oddregister
pair,withtheevenregisterholdingtheleast-significant bits– Argumentsmorethantwicethesizeofapointer-wordarepassedbyreference
35
CallingConvention• Thestackgrowsdownwardandthestackpointerisalwayskept16-bytealigned• Valuesarereturnedfromfunctionsinintegerregistersv0andv1andfloating-point
registersfv0andfv1– Floating-pointvaluesarereturnedinfloating-pointregistersonlyiftheyareprimitivesor
membersofastruct consistingofonlyoneortwofloating-pointvalues– Otherreturnvaluesthatfitintotwopointer-wordsarereturnedinv0andv1– Largerreturnvaluesarepassedentirelyinmemory;thecallerallocatesthismemory
regionandpassesapointertoitasanimplicitfirstparametertothecallee
36
MemoryModel
• RISC-V:Relaxedmemorymodel
37
ControlandStatusRegister(CSR)Instructions
• CSRInstructions
• Timerandcounters
38
DataFormatsandMemoryAddresses
39
Dataformats:8-bBytes, 16-bHalfwords, 32-bwordsand 64-bdoublewords
Someissues• Byteaddressing
•WordalignmentSupposethememoryisorganizedin32-bitwords.Canawordaddressbeginonlyat0,4,8,....?
0 1 2 3 4 5 6 7
MostSignificantByte
LeastSignificantByte
ByteAddresses
3 2 1 0
0 1 2 3BigEndian
LittleEndian(RISC-V)
ISADesign• RISC-Vhas32integerregistersandcanhave32floating-pointregisters
– Registernumber0isaconstant0– Registernumber1isthereturnaddress(linkregister)
• Thememoryisaddressedby8-bitbytes• Theinstructionsmustbealignedto32-bitaddresses• LikemanyRISCdesigns,itisa"load-store"machine
– Theonlyinstructionsthataccessmainmemoryareloadsandstores– Allarithmeticandlogicoperationsoccurbetweenregisters
• RISC-Vcanloadandstore8and16-bititems,butitlacks8and16-bitarithmetic,includingcomparison-and-branchinstructions
• The64-bitinstructionsetincludes32-bitarithmetic
40
ISADesignforPerformance
• Featurestoincreaseacomputer'sspeed,whilereducingitscostandpowerusage
– placingmost-significantbitsatafixedlocationtospeedsign-extension,andabit-arrangementdesignedtoreducethenumberofmultiplexersinaCPU
41
ISADesign
• Intentionallylacksconditioncodes,andevenlacksacarrybit– TosimplifyCPUdesignsbyminimizinginteractionsbetweeninstructions
• Buildscomparisonoperationsintoitsconditional-jumps
42
ISADesign
• Thelackofacarrybitcomplicatesmultiple-precisionarithmetic– GMP,MPFR
• Doesnotdetectorflagmostarithmeticerrors,includingoverflow,underflowanddividebyzero
– Nospecialinstructionsetsupportforoverflowchecksonintegerarithmeticoperations.• Mostpopularprogramminglanguagesdonotsupportchecksforintegeroverflow,partly
becausemostarchitecturesimposeasignificantruntimepenaltytocheckforoverflowonintegerarithmeticandpartlybecausemoduloarithmeticissometimesthedesiredbehavior
– Floating-PointControlandStatusRegister
43
ISADesign
• Lacksthe"countleadingzero"andbit-fieldoperationsnormallyusedtospeedsoftwarefloating-pointinapure-integerprocessor
• Nobranchdelayslot,apositionafterabranchinstructionthatcanbefilledwithaninstructionwhichisexecutedregardlessofwhetherthebranchistakenornot
– Thisfeaturecanimproveperformanceofpipelinedprocessors,– OmittedinRISC-Vbecauseitcomplicatesbothmulticycle CPUsandsuperscalarCPUs
• Lacksaddress-modesthat"writeback"totheregisters– Forexample,itdoesnotdoauto-incrementing
44
ISADesign
• Aloadorstorecanaddatwelve-bitsignedoffsettoaregisterthatcontainsanaddress.Afurther20bits(yieldinga32-bitaddress)canbegeneratedatanabsoluteaddress
– RISC-Vwasdesignedtopermitposition-independentcode.Ithasaspecialinstructiontogenerate20upperaddressbitsthatarerelativetotheprogramcounter.Thelowertwelvebitsareprovidedbynormalloads,storesandjumps
– LUI(loadupperimmediate)placestheU-immediatevalueinthetop20bitsofthedestinationregisterrd,filling inthelowest12bitswithzeros
– AUIPC(addupperimmediatetopc)isusedtobuildpc-relativeaddresses,formsa32-bitoffsetfromthe20-bitU-immediate,filling inthelowest12bitswithzeros,addsthisoffset tothepc,thenplacestheresultinregisterrd
45