Lecture 04 RISC-V ISA · • Floating-point arguments that are part of unions or array ﬁeldsof...

Lecture04RISC-VISA

CSCE513ComputerArchitecture

DepartmentofComputerScienceandEngineeringYonghong Yan

[email protected]://passlab.github.io/CSCE513

1

Acknowledgement

• Slidesadaptedfrom– ComputerScience152:ComputerArchitectureand

Engineering,Spring2016byDr.GeorgeMichelogiannakis fromUCB

• Referencecontents– CAQAA.9– CODtextbook,chapter2

2

Review:ISAPrinciples-- Iron-codeSummary

• SectionA.2—Usegeneral-purposeregisterswithaload-storearchitecture.• SectionA.3—Supporttheseaddressingmodes:displacement(withanaddressoffset

sizeof12to16bits),immediate(size8to16bits),andregisterindirect.• SectionA.4—Supportthesedatasizesandtypes:8-,16-,32-,and64-bitintegersand

64-bitIEEE754floating-pointnumbers.– Nowwesee16-bitFPfordeeplearninginGPU

• http://www.nextplatform.com/2016/09/13/nvidia-pushes-deep-learning-inference-new-pascal-gpus/

• SectionA.5—Supportthesesimpleinstructions,sincetheywilldominatethenumberofinstructionsexecuted:load,store,add,subtract,moveregister- register,andshift.

• SectionA.6—Compareequal,comparenotequal,compareless,branch(withaPC-relativeaddressatleast8bitslong),jump,call,andreturn.

• SectionA.7—Usefixedinstructionencodingifinterestedinperformance,andusevariableinstructionencodingifinterestedincodesize.

• SectionA.8—Provideatleast16general-purposeregisters,besurealladdressingmodesapplytoalldatatransferinstructions,andaimforaminimalistIS

– Oftenuseseparatefloating-pointregisters.– Thejustificationistoincreasethetotalnumberofregisterswithoutraisingproblemsin

theinstructionformatorinthespeedofthegeneral-purposeregisterfile.Thiscompromise,however,isnotorthogonal.

3

WhatisRISC-V• RISC-V(pronounced"risk-five”)isaISAstandard

– Anopensourceimplementationofareducedinstructionsetcomputing(RISC)basedinstructionsetarchitecture(ISA)

– TherewasRISC-I,II,III,IVbefore• MostISAs:X86,ARM,Power,MIPS,SPARC

– Commerciallyprotectedbypatents– Preventingpracticaleffortstoreproducethecomputersystems.

• RISC-Visopen– Permittinganypersonorgrouptoconstructcompatiblecomputers– Useassociatedsoftware

• Originatedin2010byresearchersatUCBerkeley– Krste Asanović,DavidPattersonandstudents

• 2017version2oftheuserspace ISAisfixed– User-LevelISASpecificationv2.2– DraftCompressedISASpecificationv1.79– DraftPrivilegedISASpecificationv1.10

4

https://riscv.org/https://en.wikipedia.org/wiki/RISC-V

GoalsinDefiningRISC-V

• AcompletelyopenISAthatisfreelyavailabletoacademiaandindustry• ArealISAsuitablefordirectnativehardwareimplementation,notjust

simulationorbinarytranslation• AnISAthatavoids"over-architecting"for

– aparticularmicroarchitecturestyle(e.g.,microcoded,in-order,decoupled,out-of-order)or

– implementationtechnology(e.g.,full-custom,ASIC,FPGA),butwhichallowsefficientimplementationinanyofthese

• RISC-VISAincludes– A smallbaseintegerISA,usablebyitselfasabaseforcustomizedacceleratorsor

foreducationalpurposes,and– Optionalstandardextensions,tosupportgeneral-purposesoftwaredevelopment– Optionalcustomerextensions

• Supportfortherevised2008IEEE-754floating-pointstandard

5

RISC-VISAPrinciples

• Generallykeptverysimpleandextendable• Separatedintomultiplespecifications

– User-LevelISAspec(computeinstructions)– CompressedISAspec(16-bitinstructions)– PrivilegedISAspec(supervisor-modeinstructions)– More…

• ISAsupportisgivenbyRV+word-width+extensionssupported– E.g.RV32Imeans32-bitRISC-VwithsupportfortheI(nteger)

instructionset

6

UserLevelISA

• Definesthenormalinstructionsneededforcomputation– A mandatoryBaseintegerISA

• I:Integerinstructions:– ALU– Branches/jumps– Loads/stores

– StandardExtensions• M:IntegerMultiplicationandDivision• A:AtomicInstructions• F:Single-PrecisionFloating-Point• D:Double-PrecisionFloating-Point• C:CompressedInstructions(16bit)

• G=IMAFD:Integerbase+fourstandardextensions– Optionalextensions

7

RISC-VISA

• Both32-bitand64-bitaddressspacevariants– RV32andRV64

• Easytosubset/extendforeducation/research– RV32IM,RV32IMA,

RV32IMAFD,RV32G

• SPEConthewebsite– www.riscv.org

8

RV32/64ProcessorState

• Programcounter(pc)• 3232/64-bitintegerregisters

(x0-x31)– x0alwayscontainsa0– x1toholdthereturnaddressona

call.

• 32floating-point(FP)registers(f0-f31)– Eachcancontainasingle- or

double-precisionFPvalue(32-bitor64-bitIEEEFP)

• FPstatusregister(fsr),usedforFProundingmode&exceptionreporting

9

RV64GInOneTable

10

Load/StoreInstructions

11

ALUInstructions

12

ControlFlowInstructions

13

RISC-VDynamicInstructionMixforSPECint2006

14

RISC-VHybridInstructionEncoding

• 16,32,48,64…bitslengthencoding• Baseinstructionset(RV32)alwayshasfixed32-bitinstructionslowesttwobits=112

• Allbranchesandjumpshavetargetsat16-bitgranularity(eveninbaseISAwhereallinstructionsarefixed32bits

15

FourCoreRISC-VInstructionFormats

16

Reg.Source2 Reg.Source1

7-bitopcode field(butlow2bits=112)

Additionalopcodebits/immediate

DestinationReg.

Alignedonafour-byteboundaryinmemory.Therearevariants!Signbitofimmediates alwaysonbit31ofinstruction.Registerfieldsnevermove.

https://github.com/riscv/riscv-opcodes/blob/master/opcodes

Additionalopcode bits

WithVariants

17

Reg.Source2 Reg.Source1

7-bitopcode field(butlow2bits=112)

Additionalopcodebits/immediate

DestinationReg.

Additionalopcode bits

Basedonthehandlingoftheimmediates

RISC-VEncodingSummary

ImmediateEncodingVariants

• 32-bitImmediateproducedbyeachbaseinstructionformat– Instructionbit:inst[y]

19

RISC-VAddressingSummary

,i.e.,displacementaddressing

R-FormatEncodingExample

add x6, x10, x6

0000 0000 0110 0101 0000 0011 0011 0011two =0065033316

funct7 rs2 rs1 rdfunct3 opcode7 bits 7 bits5 bits 5 bits 5 bits3 bits

0 6 10 60 51

0000000 00110 01010 00110000 0110011

RISC-VI-FormatInstructions

• Immediatearithmeticandloadinstructions– rs1:sourceorbaseaddressregisternumber– immediate:constantoperand,oroffsetaddedtobaseaddress

• 2s-complement,signextended

• DesignPrinciple: Gooddesigndemandsgoodcompromises– Differentformatscomplicatedecoding,butallow32-bitinstructions

uniformly– Keepformatsassimilaraspossible

immediate rs1 rdfunct3 opcode12 bits 7 bits5 bits 5 bits3 bits

RISC-VS-FormatInstructions

• Differentimmediateformatforstoreinstructions– rs1:baseaddressregisternumber– rs2:sourceoperandregisternumber– immediate:offsetaddedtobaseaddress

• Splitsothatrs1andrs2fieldsalwaysinthesameplace

rs2 rs1 funct3 opcode7 bits 7 bits5 bits 5 bits 5 bits3 bits

imm[11:5] imm[4:0]

IntegerComputationalInstructions(ALU)• I-type(Immediate),allimmediates inallinstructionsaresign

extended– ADDI:addssignextended12-bitimmediatetors1– SLTI(U):setlessthanimmediate– ANDI/ORI/XORI:Logicaloperations– SLLI/SRLI/SRAI:Shiftsbyconstants

24

I-typeinstructionsendwithI

IntegerComputationalInstructions(ALU)• I-type(Immediate),allimmediates inallinstructionsaresign

extended– LUI/AUIPC:loadupperimmediate/addupperimmediatetopc

25

I-typeinstructionsendwithI

• Writes20-bitimmediatetotopofdestinationregister.• Usedtobuildlargeimmediates.• 12-bitimmediates aresigned,sohavetoaccountforsignwhen

building32-bitimmediates in2-instructionsequence(LUIhigh-20b,ADDIlow-12b)

IntegerComputationalInstructions• R-type(Register)

– rs1andrs2arethesourceregisters.rd thedestination– ADD/SUB:– SLT,SLTU:setlessthan– SRL,SLL,SRA:shiftlogicalorarithmeticleftorright

26

ADDIx0,x0,0

ControlTransferInstructions

27

NOarchitecturallyvisibledelayslots• UnconditionalJumps:PC+offset target

– JAL:Jumpandlink,alsowritesPC+4tox1,UJ-type• Offsetscaledby1-bitleftshift– canjumpto16-bitinstructionboundary(Sameforbranches)

– JALR:JumpandlinkregisterwhereImm (12bits)+rd1=target

ControlTransferInstructions

28

NOarchitecturallyvisibledelayslots• ConditionalBranches:SB-typeandPC+offset target

12-bitsignedimmediatesplitacrosstwofields

Branches,comparetworegisters,PC+(immediate<<1)target(Signedoffsetinmultiplesoftwo).Branchesdonothavedelayslot

LoadsandStores

• Storeinstructions(S-type)– MEM(rs1+imm)=rs2

• Loads(I-type)– Rd=MEM(rs1+imm)

29

SpecificationsandSoftwareFromriscv.org andgithub.com/riscv

• SpecificationfromRISC-Vwebsite– https://riscv.org/specifications/

• RISC-Vsoftwareincludes– GNUCompilerCollection(GCC)toolchain(withGDB,thedebugger)

• https://github.com/riscv/riscv-tools– LLVMtoolchain– A simulator("Spike")

• https://github.com/riscv/riscv-isa-sim– StandardsimulatorQEMU

• https://github.com/riscv/riscv-qemu• OperatingsystemssupportexistsforLinux

– https://github.com/riscv/riscv-linux• AJavaScriptISAsimulatortorunaRISC-VLinuxsystemonaweb

browser– https://github.com/riscv/riscv-angel

30

RISC-VImplementations

• ForRISC-Vimplementation,theUCBcreatedChisel,anopen-sourcehardwareconstructionlanguagethatisaspecializeddialectofScala.– Chisel:ConstructingHardwareInaScalaEmbeddedLanguage– https://chisel.eecs.berkeley.edu/

• In-orderRocketcoreandchipgenerator– https://github.com/freechipsproject/rocket-chip

• Out-of-orderBOOMcore– https://github.com/ucb-bar/riscv-boom

• UCBSodorcoresforeducation(singlecycle,and1-5stagespipeline)– https://github.com/ucb-bar/riscv-sodor

31

RISC-VImplementations

• Alistfrom– https://riscv.org/risc-v-cores/

• TheIndianIIT-MadrasisdevelopingsixRISC-Vopen-sourceCPUdesigns(SHAKTI)forsixdistinctusages– https://shaktiproject.bitbucket.io/index.html

• SiFive HiFive Unleashed– FirstLinuxRISC-VBoard

• Firstshipment:June2018– https://www.sifive.com/– https://github.com/sifive/freedom

32

AdditionalInformation

33

CallingConvention

• CDatatypes andAlignment– RV32employsanILP32integermodel,whileRV64isLP64– Floating-pointtypesareIEEE754-2008compatible– Allofthedatatypesarekeeped naturallyalignedwhenstoredinmemory– charisimplicitlyunsigned– InRV64,32-bittypes,suchasint,arestoredinintegerregistersaspropersignextensionsof

their32-bitvalues;thatis,bits63..31areallequal• Thisrestrictionholdsevenforunsigned32-bittypes

34

CallingConvention

• RVGCallingConvention– IftheargumentstoafunctionareconceptualizedasfieldsofaCstruct,eachwith

pointeralignment,theargumentregistersareashadowofthefirsteightpointer-wordsofthatstruct• Floating-pointargumentsthatarepartofunionsorarrayfields ofstructuresarepassedin

integerregisters• Floating-pointargumentstovariadic functions(exceptthosethatareexplicitlynamedin

theparameterlist)arepassedinintegerregisters– Theportionoftheconceptualstruct thatisnotpassedinargumentregistersis

passedonthestack• Thestackpointersppointstothefirstargumentnotpassedinaregister

– Argumentssmallerthanapointer-wordarepassedintheleast-significant bitsofargumentregisters

– Whenprimitiveargumentstwicethesizeofapointer-wordarepassedonthestack,theyarenaturallyaligned• Whentheyarepassedintheintegerregisters,theyresideinanalignedeven-oddregister

pair,withtheevenregisterholdingtheleast-significant bits– Argumentsmorethantwicethesizeofapointer-wordarepassedbyreference

35

CallingConvention• Thestackgrowsdownwardandthestackpointerisalwayskept16-bytealigned• Valuesarereturnedfromfunctionsinintegerregistersv0andv1andfloating-point

registersfv0andfv1– Floating-pointvaluesarereturnedinfloating-pointregistersonlyiftheyareprimitivesor

membersofastruct consistingofonlyoneortwofloating-pointvalues– Otherreturnvaluesthatfitintotwopointer-wordsarereturnedinv0andv1– Largerreturnvaluesarepassedentirelyinmemory;thecallerallocatesthismemory

regionandpassesapointertoitasanimplicitfirstparametertothecallee

36

MemoryModel

• RISC-V:Relaxedmemorymodel

37

ControlandStatusRegister(CSR)Instructions

• CSRInstructions

• Timerandcounters

38

DataFormatsandMemoryAddresses

39

Dataformats:8-bBytes, 16-bHalfwords, 32-bwordsand 64-bdoublewords

Someissues• Byteaddressing

•WordalignmentSupposethememoryisorganizedin32-bitwords.Canawordaddressbeginonlyat0,4,8,....?

0 1 2 3 4 5 6 7

MostSignificantByte

LeastSignificantByte

ByteAddresses

3 2 1 0

0 1 2 3BigEndian

LittleEndian(RISC-V)

ISADesign• RISC-Vhas32integerregistersandcanhave32floating-pointregisters

– Registernumber0isaconstant0– Registernumber1isthereturnaddress(linkregister)

• Thememoryisaddressedby8-bitbytes• Theinstructionsmustbealignedto32-bitaddresses• LikemanyRISCdesigns,itisa"load-store"machine

– Theonlyinstructionsthataccessmainmemoryareloadsandstores– Allarithmeticandlogicoperationsoccurbetweenregisters

• RISC-Vcanloadandstore8and16-bititems,butitlacks8and16-bitarithmetic,includingcomparison-and-branchinstructions

• The64-bitinstructionsetincludes32-bitarithmetic

40

ISADesignforPerformance

• Featurestoincreaseacomputer'sspeed,whilereducingitscostandpowerusage

– placingmost-significantbitsatafixedlocationtospeedsign-extension,andabit-arrangementdesignedtoreducethenumberofmultiplexersinaCPU

41

ISADesign

• Intentionallylacksconditioncodes,andevenlacksacarrybit– TosimplifyCPUdesignsbyminimizinginteractionsbetweeninstructions

• Buildscomparisonoperationsintoitsconditional-jumps

42

ISADesign

• Thelackofacarrybitcomplicatesmultiple-precisionarithmetic– GMP,MPFR

• Doesnotdetectorflagmostarithmeticerrors,includingoverflow,underflowanddividebyzero

– Nospecialinstructionsetsupportforoverflowchecksonintegerarithmeticoperations.• Mostpopularprogramminglanguagesdonotsupportchecksforintegeroverflow,partly

becausemostarchitecturesimposeasignificantruntimepenaltytocheckforoverflowonintegerarithmeticandpartlybecausemoduloarithmeticissometimesthedesiredbehavior

– Floating-PointControlandStatusRegister

43

ISADesign

• Lacksthe"countleadingzero"andbit-fieldoperationsnormallyusedtospeedsoftwarefloating-pointinapure-integerprocessor

• Nobranchdelayslot,apositionafterabranchinstructionthatcanbefilledwithaninstructionwhichisexecutedregardlessofwhetherthebranchistakenornot

– Thisfeaturecanimproveperformanceofpipelinedprocessors,– OmittedinRISC-Vbecauseitcomplicatesbothmulticycle CPUsandsuperscalarCPUs

• Lacksaddress-modesthat"writeback"totheregisters– Forexample,itdoesnotdoauto-incrementing

44

ISADesign

• Aloadorstorecanaddatwelve-bitsignedoffsettoaregisterthatcontainsanaddress.Afurther20bits(yieldinga32-bitaddress)canbegeneratedatanabsoluteaddress

– RISC-Vwasdesignedtopermitposition-independentcode.Ithasaspecialinstructiontogenerate20upperaddressbitsthatarerelativetotheprogramcounter.Thelowertwelvebitsareprovidedbynormalloads,storesandjumps

– LUI(loadupperimmediate)placestheU-immediatevalueinthetop20bitsofthedestinationregisterrd,filling inthelowest12bitswithzeros

– AUIPC(addupperimmediatetopc)isusedtobuildpc-relativeaddresses,formsa32-bitoffsetfromthe20-bitU-immediate,filling inthelowest12bitswithzeros,addsthisoffset tothepc,thenplacestheresultinregisterrd

45

Lecture 04 RISC-V ISA · • Floating-point arguments that are part of unions or array ﬁeldsof...

Documents

Transcript of Lecture 04 RISC-V ISA · • Floating-point arguments that are part of unions or array ﬁeldsof...