CS 110 Computer Architecture Review for Midterm I · Midterm I •Switch cell phones off! (not...

Post on 16-Mar-2020

3 views 0 download

Transcript of CS 110 Computer Architecture Review for Midterm I · Midterm I •Switch cell phones off! (not...

CS110ComputerArchitectureReviewforMidtermI

Instructor:SörenSchwertfeger

http://shtech.org/courses/ca/

School of Information Science and Technology SIST

ShanghaiTech University

1Slides based on UC Berkley's CS61C

MidtermI

• Date:Tuesday,Apr.11• Time:10:15- 12:15(normallectureslot)• Venue:TeachingCenter201+203• Oneemptyseatbetweenstudents• Closedbook:– Youcanbringone A4pagewithnotes(bothsides;Englishpreferred;ChineseisOK):WriteyourChineseandPinyin nameonthetop!

– YouwillbeprovidedwiththeMIPS”greensheet”– Noothermaterialallowed!

2

MidtermI• Switchcellphonesoff!(notsilentmode– off!)– Puttheminyourbags.

• Bagsunderthetable.Nothingexceptpaper,pen,1drink,1snackonthetable!

• Nootherelectronicdevicesareallowed!– Noearplugs,music,smartwatch…

• AnybodytouchinganyelectronicdevicewillFAILthecourse!

• Anybodyfoundcheating(copyyourneighborsanswers,additionalmaterial,...)willFAIL thecourse!

3

MidtermI

• Askquestionstoday!• DiscussionisQ&Asession– Suggesttopicsforreviewinpiazza!– Nextweekexamplequestions.

• Thisreviewsessiondoesnot/cannotcoverallpossibletopics!

• NoLabnextweek… NoHWnextweek…4

Content

• Maintopics– Numberrepresentation– C–MIPS

• Plusgeneral”ComputerArchitecture”knowledge

• Everythingtilllecture8CALL– includinglecture8

5

Firstfinishlastweekslecture…

6

Hyperthreading

• Duplicateallelementsthatholdthestate(registers)• UsethesameCLblocks• Usemuxes toselectwhichstatetouseeveryclockcycle• =>run2totallyindependentthreads(samememory->sharedmemory!)• Speedup?

– Noobviousspeedup– makeuseofCLblocksincaseofunavailableresources(e.g.waitformemory) 7

instruction

mem

ory

+4

rtrsrd

registers

ALU

Data

mem

ory

imm

1.InstructionFetch

2.Decode/RegisterRead

3.Execute 4.Memory 5.WriteBack

registers

PCPC

IntelNehalemi7• Hyperthreading:

– About5%diearea– Upto30%speedgain

(BUTalso<0%possible)• Pipeline:20-24stages!• Out-of-orderexecution

1. Instructionfetch.2. Instructiondispatchtoaninstructionqueue3. Instruction:Waitinqueueuntilinput

operandsareavailable=>instructioncanleavequeuebeforeearlier,olderinstructions.

4. Theinstructionisissuedtotheappropriatefunctionalunitandexecutedbythatunit.

5. Theresultsarequeued.6. Writetoregisteronlyafterallolder

instructionshavetheirresultswritten.

8

OldSchoolMachineStructures

9

I/OsystemProcessor

CompilerOperatingSystem(MacOSX)

Application(ex:browser)

DigitalDesignCircuitDesign

InstructionSetArchitecture

Datapath&Control

transistors

MemoryHardware

Software Assembler

New-SchoolMachineStructures(It’sabitmorecomplicated!)

• ParallelRequestsAssignedtocomputere.g.,Search“cats”

• ParallelThreadsAssignedtocoree.g.,Lookup,Ads

• ParallelInstructions>1instruction@onetimee.g.,5pipelinedinstructions

• ParallelData>1dataitem@onetimee.g.,Addof4pairsofwords

• HardwaredescriptionsAllgatesfunctioningin

parallelatsametime10

SmartPhone

Warehouse-Scale

Computer

SoftwareHardware

HarnessParallelism&AchieveHighPerformance

LogicGates

Core Core…

Memory(Cache)

Input/Output

Computer

MainMemory

Core

InstructionUnit(s) FunctionalUnit(s)

A3+B3A2+B2A1+B1A0+B0

Project1

Project3

Project2

6GreatIdeasinComputerArchitecture

1. Abstraction(LayersofRepresentation/Interpretation)

2. Moore’sLaw(Designingthroughtrends)3. PrincipleofLocality(MemoryHierarchy)4. Parallelism5. PerformanceMeasurement&Improvement6. DependabilityviaRedundancy

11

#2:Moore’sLaw

12

GordonMooreIntelCofounder

Predicts:2XTransistors/chip

every2years

GreatIdea#3:PrincipleofLocality/MemoryHierarchy

3/30/17 13

GreatIdea#4:Parallelism

14

GreatIdea#5:PerformanceMeasurementandImprovement

• Tuningapplicationtounderlyinghardwaretoexploit:– Locality– Parallelism– Specialhardwarefeatures,likespecializedinstructions(e.g.,matrixmanipulation)

• Latency– Howlongtosettheproblemup– Howmuchfasterdoesitexecuteonceitgetsgoing– Itisallabouttimetofinish

15

GreatIdea#6:DependabilityviaRedundancy

• Redundancysothatafailingpiecedoesn’tmakethewholesystemfail

16

1+1=2 1+1=2 1+1=1

1+1=22of3agree

FAIL!

Increasingtransistordensityreducesthecostofredundancy

KeyConcepts• Insidecomputers,everythingisanumber• Butnumbersusuallystoredwithafixedsize– 8-bitbytes,16-bithalfwords,32-bitwords,64-bitdoublewords,…

• Integerandfloating-pointoperationscanleadtoresultstoobig/smalltostorewithintheirrepresentations:overflow/underflow

17

NumberRepresentation

18

NumberRepresentation

• Valueofi-th digitisd × Baseiwherei startsat0andincreasesfromrighttoleft:

• 12310=110 x 10102 +210 x 10101 +310 x 10100

=1x10010 +2x1010 +3x110=10010 +2010 +310=12310

• Binary(Base2),Hexadecimal(Base16),Decimal(Base10)differentwaystorepresentaninteger– Weuse1two,5ten,10hex tobeclearer

(vs.12,48,510,1016)

19

NumberRepresentation

• Hexadecimaldigits:0,1,2,3,4,5,6,7,8,9,A,B,C,D,E,F

• FFFhex =15tenx16ten2 +15tenx16ten1 +15tenx16ten0=3840ten +240ten +15ten=4095ten

• 111111111111two =FFFhex =4095ten• Mayputblankseverygroupofbinary,octal,orhexadecimaldigitstomakeiteasiertoparse,likecommasindecimal

20

SignedIntegersandTwo’s-ComplementRepresentation

• SignedintegersinC;want½numbers<0,want½numbers>0,andwantone0

• Two’scomplementtreats0aspositive,so32-bitwordrepresents232integersfrom-231(–2,147,483,648)to231-1(2,147,483,647)– Note:onenegativenumberwithnopositiveversion– Booklistssomeotheroptions,allofwhichareworse– Everycomputerusestwo’scomplementtoday

• Most-significantbit(leftmost)isthesignbit,since0meanspositive(including0),1meansnegative– Bit31ismostsignificant,bit0isleastsignificant

21

Two’s-ComplementIntegers00000000000000000000000000000000two =0ten00000000000000000000000000000001two =1ten00000000000000000000000000000010two =2ten

... ...01111111111111111111111111111101two =2,147,483,645ten01111111111111111111111111111110two =2,147,483,646ten01111111111111111111111111111111two =2,147,483,647ten10000000000000000000000000000000two =–2,147,483,648ten10000000000000000000000000000001two =–2,147,483,647ten10000000000000000000000000000010two =–2,147,483,646ten

... ...11111111111111111111111111111101two =–3ten11111111111111111111111111111110two =–2ten11111111111111111111111111111111two =–1ten

22

SignBit

WaystoMakeTwo’sComplement• ForN-bitword,complementto2tenN

– For4bitnumber3ten=0011two,two’scomplement

(i.e.-3ten)wouldbe

16ten-3ten=13ten or10000two – 0011two =1101two

23

• Hereisaneasierway:– Invertallbitsandadd1

– Computersactuallydoitlikethis,too

0011two

1100two+1two

3ten

1101two

Bitwisecomplement

-3ten

Two’s-ComplementExamples

• Assumeforsimplicity4bitwidth,-8to+7represented

24

00110010

3+25 0101

00111110

3+(-2)

1 10001

01110001

7+1-8 1000Overflow!

11011110

-3+(-2)

-5 11011

10001111

-8+(-1)+7 10111

CarryintoMSB=CarryOutMSB

CarryintoMSB=CarryOutMSB

Overflow!

Overflowwhenmagnitudeofresulttoobigsmalltofitintoresultrepresentation

Carryin=carryfromlesssignificantbitsCarryout=carrytomoresignificantbits

0to+31

-16to+15

-32to+31☐

25

Supposewehada5-bitword.Whatintegerscanberepresentedintwo’scomplement?

0to+31

-16to+15

-32to+31☐

26

Supposewehada5-bitword.Whatintegerscanberepresentedintwo’scomplement?

Processor

Control

Datapath

ComponentsofaComputer

27

PC

Registers

Arithmetic&LogicUnit(ALU)

MemoryInput

Output

Bytes

Enable?Read/Write

Address

WriteData

ReadData

Processor-MemoryInterface I/O-MemoryInterfaces

Program

Data

CProgramming

28

Quiz:Pointersvoid foo(int *x, int *y){ int t;

if ( *x > *y ) { t = *y; *y = *x; *x = t; }}int a=3, b=2, c=1;foo(&a, &b);foo(&b, &c);foo(&a, &b);printf("a=%d b=%d c=%d\n", a, b, c);

29

A:a=3 b=2 c=1B:a=1 b=2 c=3C:a=1 b=3 c=2D:a=3 b=3 c=3E:a=1 b=1 c=1

Resultis:

30

ArraysandPointersintfoo(int array[],

unsigned int size){

…printf(“%d\n”, sizeof(array));

}

intmain(void){

int a[10], b[5];int c[] = {1, 3, 2, 5, 6};… foo(a, 10)… foo(c, 5) …printf(“%d\n”, sizeof(c));

}

Whatdoesthisprint?

Whatdoesthisprint?

8

20

...becausearray isreallyapointer(andapointerisarchitecturedependent,butlikelytobe8onmodernmachines!)

Quiz:int x[] = { 2, 4, 6, 8, 10 };int *p = x;int **pp = &p;(*pp)++;(*(*pp))++;printf("%d\n", *p);

31

Resultis:A:2B:3C:4D:5E:Noneoftheabove

CMemoryManagement

• Program’saddressspacecontains4regions:– stack:localvariablesinside

functions,growsdownward– heap:spacerequestedfor

dynamicdataviamalloc();resizesdynamically,growsupward

– staticdata:variablesdeclaredoutsidefunctions,doesnotgroworshrink.Loadedwhenprogramstarts,canbemodified.

– code:loadedwhenprogramstarts,doesnotchange

code

staticdata

heap

stack~FFFFFFFFhex

~00000000hex

3232

MemoryAddress(32bitsassumedhere)

TheStack• Everytimeafunctioniscalled,anewframe

isallocatedonthestack• Stackframeincludes:

– Returnaddress(whocalledme?)– Arguments– Spaceforlocalvariables

• Stackframescontiguousblocksofmemory;stackpointerindicatesstartofstackframe

• Whenfunctionends,stackframeistossedoffthestack;freesmemoryforfuturestackframes

• We’llcoverdetailslaterforMIPSprocessor fooD frame

fooB frame

fooC frame

fooA frame

StackPointer33

fooA() { fooB(); }fooB() { fooC(); }fooC() { fooD(); }

Question!int x = 2;int result;

int foo(int n){ int y;

if (n <= 0) { printf("End case!\n"); return 0; }else{ y = n + foo(n-x);

return y;}

}result = foo(10);

Rightaftertheprintf executesbutbeforethereturn 0,howmanycopiesofx andy arethereallocatedinmemory?

A:#x=1,#y=1B:#x=1,#y=5C:#x=5,#y=1D:#x=1,#y=6E:#x=6,#y=6

34

FaultyHeapManagement

• Whatiswrongwiththiscode?• Memoryleak!

int foo() {int *value = malloc(sizeof(int));*value = 42;return *value;

}

35

UsingMemoryYouDon’tOwn• Whatiswrongwiththiscode?

int* init_array(int *ptr, int new_size) {ptr = realloc(ptr, new_size*sizeof(int));memset(ptr, 0, new_size*sizeof(int));return ptr;

}

int* fill_fibonacci(int *fib, int size) {int i;init_array(fib, size);/* fib[0] = 0; */ fib[1] = 1;for (i=2; i<size; i++)fib[i] = fib[i-1] + fib[i-2];return fib;

}36

UsingMemoryYouDon’tOwn• Impropermatchedusageofmem handles

int* init_array(int *ptr, int new_size) {ptr = realloc(ptr, new_size*sizeof(int));memset(ptr, 0, new_size*sizeof(int));return ptr;

}

int* fill_fibonacci(int *fib, int size) {int i;/* oops, forgot: fib = */ init_array(fib, size);/* fib[0] = 0; */ fib[1] = 1;for (i=2; i<size; i++)fib[i] = fib[i-1] + fib[i-2];return fib;

}37

Whatifarrayismovedtonewlocation?

Remember:reallocmaymoveentireblock

AndInConclusion,…• Pointersareanabstractionofmachinememoryaddresses

• Pointervariablesareheldinmemory,andpointervaluesarejustnumbersthatcanbemanipulatedbysoftware

• InC,closerelationshipbetweenarraynamesandpointers

• Pointersknowthetypeoftheobjecttheypointto(exceptvoid*)

• Pointersarepowerfulbutpotentiallydangerous

38

AndInConclusion,…

• Chasthreemainmemorysegmentsinwhichtoallocatedata:– StaticData:Variablesoutsidefunctions– Stack:Variableslocaltofunction– Heap:Objectsexplicitlymalloc-ed/free-d.

• HeapdataisbiggestsourceofbugsinCcode

39

IntheNews… IntelHyper-Scale

40

Intel’sMoores Lawinterpretation:Costpertransistorhalvesevery2years

41

Hyperscaling

42

Multiplediesononecarrier

43

MIPS

44

AdditionandSubtractionofIntegersExample1

• HowtodothefollowingCstatement?a=b+c+d- e;b→$s1;c→ $s2;d→ $s3;e→ $s4;a→ $s0

• Breakintomultipleinstructionsadd $t0, $s1, $s2 # temp = b + cadd $t0, $t0, $s3 # temp = temp + dsub $s0, $t0, $s4 # a = temp - e

• AsinglelineofCmaybreakupintoseverallinesofMIPS.• Noticetheuseoftemporaryregisters– don’twanttomodifythevariableregisters$s• Everythingafterthehashmarkoneachlineisignored(comments)

45

a=((b+c)+d)- e;

Overflow handling in MIPS• Somelanguagesdetectoverflow(Ada),somedon’t(mostCimplementations)•MIPSsolutionis2kindsofarithmeticinstructions:– Thesecauseoverflowtobedetected

• add(add)• addimmediate(addi)• subtract(sub)

– Thesedonotcauseoverflowdetection• addunsigned(addu)• addimmediateunsigned(addiu)• subtractunsigned(subu)

• Compilerselectsappropriatearithmetic–MIPSCcompilersproduceaddu,addiu,subu

46

Question:We want to translate *x = *y +1 into MIPS(x, y int pointers stored in: $s0 $s1)

A: addi $s0,$s1,1

B: lw $s0,1($s1)sw $s1,0($s0)

C: lw $t0,0($s1)addi $t0,$t0,1sw $t0,0($s0)

D: sw $t0,0($s1)addi $t0,$t0,1lw $t0,0($s0)

E: lw $s0,1($t0)sw $s1,0($t0)

47

Processor

Control

Datapath

ExecutingaProgram

48

PC

Registers

Arithmetic&LogicUnit(ALU)

Memory

BytesInstructionAddress

ReadInstructionBits

Program

Data

• ThePC(programcounter)isinternalregisterinsideprocessorholdingbyteaddressofnextinstructiontobeexecuted.

• Instructionisfetchedfrommemory,thencontrolunitexecutesinstructionusingdatapath andmemorysystem,andupdatesprogramcounter(defaultisadd+4bytestoPC,tomovetonextsequentialinstruction)

Question!

Whatisthecodeabove?A: whileloopB: do…whileloopC: forloopD: AorCE: Notaloop

addi $s0,$zero,0Start: slt $t0,$s0,$s1

beq $t0,$zero,Exitsll $t1,$s0,2addu $t1,$t1,$s5lw $t1,0($t1) add $s4,$s4,$t1addi $s0,$s0,1j Start

Exit:

49

MIPSFunctionCallConventions

• Registersfasterthanmemory,sousethem• $a0–$a3:fourargumentregisterstopassparameters($4- $7)

• $v0,$v1:twovalueregisterstoreturnvalues($2,$3)

• $ra:onereturnaddressregistertoreturntothepointoforigin($31)

50

InstructionSupportforFunctions(1/4)

... sum(a,b);... /* a,b:$s0,$s1 */}int sum(int x, int y) {return x+y;

}address (shown in decimal)1000 1004 1008 1012 1016 …2000 2004

C

MIPS

InMIPS,allinstructionsare4bytes,andstoredinmemoryjustlikedata.Sohereweshowtheaddressesofwheretheprogramsarestored.

51

InstructionSupportforFunctions(2/4)

... sum(a,b);... /* a,b:$s0,$s1 */}int sum(int x, int y) {return x+y;

}address (shown in decimal)1000 add $a0,$s0,$zero # x = a1004 add $a1,$s1,$zero # y = b1008 addi $ra,$zero,1016 # $ra=10161012 j sum # jump to sum1016 … # next instruction…2000 sum: add $v0,$a0,$a12004 jr $ra # new instr. “jump register”

C

MIPS

52

InstructionSupportforFunctions(3/4)

... sum(a,b);... /* a,b:$s0,$s1 */}int sum(int x, int y) {return x+y;

}

2000 sum: add $v0,$a0,$a12004 jr $ra # new instr. “jump register”

• Question:Whyuse jr here?Whynot usej?

• Answer:summightbecalledbymanyplaces,sowecan’treturntoafixedplace.Thecallingproctosummustbeabletosay“returnhere”somehow.

C

MIPS

53

InstructionSupportforFunctions(4/4)• Singleinstructiontojumpandsavereturnaddress:jumpandlink(jal)

• Before:1008 addi $ra,$zero,1016 # $ra=10161012 j sum # goto sum

• After:1008 jal sum # $ra=1012,goto sum

• Whyhaveajal?– Makethecommoncasefast:functioncalls verycommon.– Don’thavetoknowwhere codeis inmemorywithjal!

54

Question

• WhichstatementisFALSE?

55

B: jal savesPC+1in$ra

C: Thecallee canusetemporaryregisters($ti)withoutsavingandrestoringthem

D: Thecallercanrelyonsaveregisters($si)withoutfearofcallee changingthem

A:MIPSusesjal toinvokeafunctionandjr toreturnfromafunction

StackBefore,During,AfterCall

56

BasicStructureofaFunction

entry_label: addi $sp,$sp, -framesizesw $ra, framesize-4($sp) # save $rasave other regs if need be

...

restore other regs if need belw $ra, framesize-4($sp) # restore $raaddi $sp,$sp, framesizejr $ra

Epilogue

Prologue

Body (call other functions…)

ra

memory

57

InstructionFormats

• I-format:usedforinstructionswithimmediates,lw andsw (sinceoffsetcountsasanimmediate),andbranches(beq andbne)– (butnottheshiftinstructions;later)

• J-format:usedforj andjal• R-format:usedforallotherinstructions• Itwillsoonbecomeclearwhytheinstructionshavebeenpartitionedinthisway

58

R-FormatInstructions(1/5)

• Define“fields”ofthefollowingnumberofbitseach:6+5+5+5+5+6=32

• Forsimplicity,eachfieldhasaname:

• Important:Ontheseslidesandinbook,eachfieldisviewedasa5- or6-bitunsignedinteger,notaspartofa32-bitinteger– Consequence:5-bitfieldscanrepresentanynumber0-31,while

6-bitfieldscanrepresentanynumber0-63

6 5 5 5 65

opcode rs rt rd functshamt

59

I-FormatInstructions(2/4)• Define“fields”ofthefollowingnumberofbitseach:6+5+5+16=32bits

– Again,eachfieldhasaname:

– KeyConcept:OnlyonefieldisinconsistentwithR-format.Mostimportantly,opcode isstillinsamelocation.

6 5 5 16

opcode rs rt immediate

60

I-FormatExample(2/2)• MIPSInstruction:

addi $21,$22,-50

8 22 21 -50

001000 10110 10101 1111111111001110

Decimal/field representation:

Binary/field representation:

hexadecimal representation: 22D5 FFCEhex

61

BranchExample(1/2)

• MIPSCode:Loop: beq $9,$0,End

addu $8,$8,$10addiu $9,$9,-1j Loop

End:

• I-Formatfields:opcode =4 (lookuponGreenSheet)rs =9 (firstoperand)rt =0 (secondoperand)immediate =???

62

StartcountingfrominstructionAFTERthebranch

123

3

BranchExample(2/2)

• MIPSCode:Loop: beq $9,$0,End

addu $8,$8,$10addiu $9,$9,-1j Loop

End:

Fieldrepresentation(decimal):

Fieldrepresentation(binary):

63

4 9 0 331 0

000100 01001 00000 000000000000001131 0

J-FormatInstructions(2/4)

• Definetwo“fields”ofthesebitwidths:

• Asusual,eachfieldhasaname:

• KeyConcepts:– Keepopcode fieldidenticaltoR-FormatandI-Formatforconsistency

– Collapseallotherfieldstomakeroomforlargetargetaddress 64

6 2631 0

opcode target address31 0

Summary• I-Format: instructionswithimmediates,lw/sw (offsetisimmediate),andbeq/bne– Butnottheshiftinstructions– BranchesusePC-relativeaddressing

• J-Format: j andjal (butnotjr)– Jumpsuseabsoluteaddressing

• R-Format: allotherinstructions

65

opcode rs rt immediateI:

opcode target addressJ:

opcode functrs rt rd shamtR:

AssemblerPseudo-Instructions• CertainCstatementsareimplementedunintuitivelyinMIPS– e.g.assignment(a=b)viaadd$zero

• MIPShasasetof“pseudo-instructions”tomakeprogrammingeasier– Moreintuitivetoread,butgettranslatedintoactualinstructionslater

• Example:move dst,src

translatedintoaddi dst,src,0

66

MultiplyandDivide• Examplepseudo-instruction:

mul $rd,$rs,$rt– Consistsofmult whichstorestheoutputinspecialhiandloregisters,andamovefromtheseregistersto$rd

mult $rs,$rtmflo $rd

• mult anddiv havenothingimportantintherd fieldsincethedestinationregistersarehi andlo

• mfhi andmflo havenothingimportantinthers andrt fieldssincethesourceisdeterminedbytheinstruction(seeCOD)

67

Question

WhichofthefollowingplacetheaddressofLOOPin$v0?1) la $t1, LOOP

lw $v0, 0($t1)

2) jal LOOPLOOP: addu $v0, $ra, $zero

3) la $v0, LOOP

68

1 2 3A)T, T, TB)T, T, FC)F, T, TD)F, T, FE)F, F, T

StepsincompilingaCprogram§ Compiler converts a single HLL file

into a single assembly language file.§ Assembler removes pseudo-

instructions, converts what it can to machine language, and creates a checklist for the linker (relocation table). A .s file becomes a .o file.ú Does 2 passes to resolve addresses,

handling internal forward references

§ Linker combines several .o files and resolves absolute addresses.ú Enables separate compilation, libraries

that need not be compiled, and resolves remaining addresses

§ Loader loads executable into memory and begins execution.

69

Pseudo-instructionReplacement• AssemblertreatsconvenientvariationsofmachinelanguageinstructionsasifrealinstructionsPseudo: Real:subu $sp,$sp,32 addiu $sp,$sp,-32sd $a0, 32($sp) sw $a0, 32($sp)

sw $a1, 36($sp)mul $t7,$t6,$t5 mult $t6,$t5

mflo $t7addu $t0,$t6,1 addiu $t0,$t6,1ble $t0,100,loop slti $at,$t0,101

bne $at,$0,loopla $a0, str lui $at,left(str)

ori $a0,$at,right(str)

70

QuestionAtwhatpointinprocessareallthemachinecodebitsgeneratedforthefollowingassemblyinstructions:1)addu $6, $7, $82)jal fprintf

A:1)&2)AftercompilationB:1)Aftercompilation,2)AfterassemblyC:1)Afterassembly,2)AfterlinkingD:1)Afterassembly,2)AfterloadingE:1)Aftercompilation,2)Afterlinking

71