Review Problem 0
description
Transcript of Review Problem 0
![Page 1: Review Problem 0](https://reader035.fdocuments.in/reader035/viewer/2022062408/56813f93550346895daa84d4/html5/thumbnails/1.jpg)
1
Review Problem 0
As you wait for class to start, answer the following question: What is important in a computer? What features do
you look for when buying one?
![Page 2: Review Problem 0](https://reader035.fdocuments.in/reader035/viewer/2022062408/56813f93550346895daa84d4/html5/thumbnails/2.jpg)
2
Review Problem 1
What aspects of a microprocessor can affect performance?
![Page 3: Review Problem 0](https://reader035.fdocuments.in/reader035/viewer/2022062408/56813f93550346895daa84d4/html5/thumbnails/3.jpg)
3
Review Problem 2
If a 200 MHz machine runs ½ billion instructions in 10 seconds, what is the CPI of the machine?
If a second machine with the same CPI runs the program in 5 seconds, what is it’s clock rate?
![Page 4: Review Problem 0](https://reader035.fdocuments.in/reader035/viewer/2022062408/56813f93550346895daa84d4/html5/thumbnails/4.jpg)
4
Review Problem 3
A program is 20% multiplication, 50% memory access, 30% other. You can quadruple multiplication speed, or double memory speed How much faster with 4x mult:
How much faster with 2x memory:
How much faster with both 4x mult & 2x memory:
![Page 5: Review Problem 0](https://reader035.fdocuments.in/reader035/viewer/2022062408/56813f93550346895daa84d4/html5/thumbnails/5.jpg)
5
Review Problem 4
In assembly, compute the average of $a0, $a1, $a2, $a3, and put into $v0
![Page 6: Review Problem 0](https://reader035.fdocuments.in/reader035/viewer/2022062408/56813f93550346895daa84d4/html5/thumbnails/6.jpg)
6
Review Problem 5
In assembly, replace the value in $a0 with its absolute value.
![Page 7: Review Problem 0](https://reader035.fdocuments.in/reader035/viewer/2022062408/56813f93550346895daa84d4/html5/thumbnails/7.jpg)
7
Review Problem 6
Register $a0 has the address of a 3 integer array. Set $v0 to 1 if the array is sorted (smallest to largest), 0 otherwise.
![Page 8: Review Problem 0](https://reader035.fdocuments.in/reader035/viewer/2022062408/56813f93550346895daa84d4/html5/thumbnails/8.jpg)
8
Review Problem 7
Sometimes it can be useful to have a program loop infinitely. We can do that, regardless of location, by the instruction:
LOOP: BEQ $7, $7, LOOP Convert this instruction to machine code
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 09 08 07 06 05 04 03 02 01 00
![Page 9: Review Problem 0](https://reader035.fdocuments.in/reader035/viewer/2022062408/56813f93550346895daa84d4/html5/thumbnails/9.jpg)
9
Review Problem 8
What does the number 110012 represent?
![Page 10: Review Problem 0](https://reader035.fdocuments.in/reader035/viewer/2022062408/56813f93550346895daa84d4/html5/thumbnails/10.jpg)
10
Review Problem 9
Perform the following binary computations.
1 0 1 1 0+ 0 0 1 1 1
1 0 0 1- 0 0 1 1
![Page 11: Review Problem 0](https://reader035.fdocuments.in/reader035/viewer/2022062408/56813f93550346895daa84d4/html5/thumbnails/11.jpg)
11
Review Problem 10
How would the ALU be used to help with each of the following branches? The first is filled in for you: beq ($rs == $rt) subtract $rt from $rs, use zero flag bne ($rs != $rt) bgez ($rs ≥ 0) bgtz ($rs > 0) blez ($rs ≤ 0) bltz ($rs < 0)
![Page 12: Review Problem 0](https://reader035.fdocuments.in/reader035/viewer/2022062408/56813f93550346895daa84d4/html5/thumbnails/12.jpg)
12
Review Problem 11
Design a 4-bit sra (shift arithmetic right) unit. Note that sra $t0, 1 = $t0/2, $t0, 2 = $t0/4, …
![Page 13: Review Problem 0](https://reader035.fdocuments.in/reader035/viewer/2022062408/56813f93550346895daa84d4/html5/thumbnails/13.jpg)
13
Review Problem 12
Write MIPS assembly to compute $t1 = $t0*5 without using a multiply or divide instruction.
![Page 14: Review Problem 0](https://reader035.fdocuments.in/reader035/viewer/2022062408/56813f93550346895daa84d4/html5/thumbnails/14.jpg)
14
Review Problem 13
What is the value of the following floating-point number?
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 09 08 07 06 05 04 03 02 01 00
1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 00 0 0 0 0 0 1 1 0
![Page 15: Review Problem 0](https://reader035.fdocuments.in/reader035/viewer/2022062408/56813f93550346895daa84d4/html5/thumbnails/15.jpg)
15
Review Problem 14 What is done for these ops during each of the CPU’s execute steps at right?
add $t0, $t1, $t2 sw $t3, 16[$t4] lw $t5, 8[$t6]Instruction
Fetch
Instruction
Decode
Operand
Fetch
Execute
Result
Store
Next
Instruction
![Page 16: Review Problem 0](https://reader035.fdocuments.in/reader035/viewer/2022062408/56813f93550346895daa84d4/html5/thumbnails/16.jpg)
16
Review Problem 15
Immediate vals for ADDI are sign-extended, while those for ORI are extended with zeros. Build a sign-extend unit that can handle both.
![Page 17: Review Problem 0](https://reader035.fdocuments.in/reader035/viewer/2022062408/56813f93550346895daa84d4/html5/thumbnails/17.jpg)
17
Review Problem 16
Develop a single-cycle CPU that can do LW and SW (only). Make it as simple as possible
![Page 18: Review Problem 0](https://reader035.fdocuments.in/reader035/viewer/2022062408/56813f93550346895daa84d4/html5/thumbnails/18.jpg)
18
Review Problem 17
What mods are needed to support jump register: PC = Reg[RS]
SignExtn
d
WrEn AddrDin Dout
DataMemory
InstructionFetchUnit
Rs Rt Rs Rt Rd Imm16
imm16
Instructions[31:0]
[25:21]
[20:16]
[15:11]
[15:0]
BranchJump
ALUSrc
RegDst
Rd Rt
ALUcntrl
Aw Aa Ab DaDw Db
RegisterWrEn FileRegWr
MemWr MemToReg
Zero
SignExtn
dSignE
xtnd
WrEn AddrDin Dout
DataMemory
WrEn AddrDin Dout
DataMemory
InstructionFetchUnit
InstructionFetchUnit
Rs Rt Rs Rt Rd Imm16
imm16
Instructions[31:0]
[25:21]
[20:16]
[15:11]
[15:0]
BranchJump
ALUSrc
RegDst
Rd Rt
ALUcntrl
Aw Aa Ab DaDw Db
RegisterWrEn File
Aw Aa Ab DaDw Db
RegisterWrEn FileRegWr
MemWr MemToReg
Zero
Sign
Extn
d
PC
Addr[31:2] Addr[1:0] Instruction
Memory
Con
catenate
Adder
Instr[31:0]Jump
“00”PC[31:28]
Target Instr[25:0]
imm16
“1”
Zero
Branch
Cin
“0” Sign
Extn
dS
ignE
xtnd
PC
PC
Addr[31:2] Addr[1:0] Instruction
Memory
Addr[31:2] Addr[1:0] Instruction
Memory
Con
catenate
Con
catenate
Adder
Adder
Instr[31:0]Jump
“00”PC[31:28]
Target Instr[25:0]
imm16
“1”
Zero
Branch
Cin
“0”
![Page 19: Review Problem 0](https://reader035.fdocuments.in/reader035/viewer/2022062408/56813f93550346895daa84d4/html5/thumbnails/19.jpg)
19
Review Problem 18
Show the datapath changes and control settings needed to implement “addi $rs, $rt, imm16”
SignE
xtnd
WrEn AddrDin Dout
DataMemory
InstructionFetchUnit
Rs Rt Rs Rt Rd Imm16
imm16
Instructions[31:0]
[25:21]
[20:16]
[15:11]
[15:0]
BranchJump
ALUSrc
RegDst
Rd Rt
ALUcntrl
Aw Aa Ab DaDw Db
RegisterWrEn FileRegWr
MemWr MemToReg
Zero
SignE
xtndSign
Extnd
WrEn AddrDin Dout
DataMemory
WrEn AddrDin Dout
DataMemory
InstructionFetchUnit
InstructionFetchUnit
Rs Rt Rs Rt Rd Imm16
imm16
Instructions[31:0]
[25:21]
[20:16]
[15:11]
[15:0]
BranchJump
ALUSrc
RegDst
Rd Rt
ALUcntrl
Aw Aa Ab DaDw Db
RegisterWrEn File
Aw Aa Ab DaDw Db
RegisterWrEn FileRegWr
MemWr MemToReg
Zero
Sign
Extn
d
PC
Addr[31:2] Addr[1:0] Instruction
Memory
Con
catenate
Adder
Instr[31:0]Jump
“00”PC[31:28]
Target Instr[25:0]
imm16
“1”
Zero
Branch
Cin
“0” Sign
Extn
dS
ignE
xtnd
PC
PC
Addr[31:2] Addr[1:0] Instruction
Memory
Addr[31:2] Addr[1:0] Instruction
Memory
Con
catenate
Con
catenate
Adder
Adder
Instr[31:0]Jump
“00”PC[31:28]
Target Instr[25:0]
imm16
“1”
Zero
Branch
Cin
“0”RegDst
ALUSrc
MemToReg
RegWr
MemWr
Branch
Jump
ALUCntrl
![Page 20: Review Problem 0](https://reader035.fdocuments.in/reader035/viewer/2022062408/56813f93550346895daa84d4/html5/thumbnails/20.jpg)
20
Review Problem 19
To allow a CPU to spend a cycle waiting, we use a NOP (No operation) function. What are the control settings for the NOP instruction?
SignE
xtnd
WrEn AddrDin Dout
DataMemory
InstructionFetchUnit
Rs Rt Rs Rt Rd Imm16
imm16
Instructions[31:0]
[25:21]
[20:16]
[15:11]
[15:0]
BranchJump
ALUSrc
RegDst
Rd Rt
ALUcntrl
Aw Aa Ab DaDw Db
RegisterWrEn FileRegWr
MemWr MemToReg
Zero
SignE
xtndSign
Extnd
WrEn AddrDin Dout
DataMemory
WrEn AddrDin Dout
DataMemory
InstructionFetchUnit
InstructionFetchUnit
Rs Rt Rs Rt Rd Imm16
imm16
Instructions[31:0]
[25:21]
[20:16]
[15:11]
[15:0]
BranchJump
ALUSrc
RegDst
Rd Rt
ALUcntrl
Aw Aa Ab DaDw Db
RegisterWrEn File
Aw Aa Ab DaDw Db
RegisterWrEn FileRegWr
MemWr MemToReg
Zero
Sign
Extn
d
PC
Addr[31:2] Addr[1:0] Instruction
Memory
Con
catenate
Adder
Instr[31:0]Jump
“00”PC[31:28]
Target Instr[25:0]
imm16
“1”
Zero
Branch
Cin
“0” Sign
Extn
dS
ignE
xtnd
PC
PC
Addr[31:2] Addr[1:0] Instruction
Memory
Addr[31:2] Addr[1:0] Instruction
Memory
Con
catenate
Con
catenate
Adder
Adder
Instr[31:0]Jump
“00”PC[31:28]
Target Instr[25:0]
imm16
“1”
Zero
Branch
Cin
“0”
RegDst
ALUSrc
MemToReg
RegWr
MemWr
Branch
Jump
ALUCntrl
![Page 21: Review Problem 0](https://reader035.fdocuments.in/reader035/viewer/2022062408/56813f93550346895daa84d4/html5/thumbnails/21.jpg)
21
Review Problem 20
Design the PC for a multi-cycle CPU from basic components (AND, OR, INV, MUX, DFF, etc.)
![Page 22: Review Problem 0](https://reader035.fdocuments.in/reader035/viewer/2022062408/56813f93550346895daa84d4/html5/thumbnails/22.jpg)
22
Review Problem 21
Show the RTL and datapath to support jr $rs (jump to address in register $rs)
Sign
Extn
d
PC
<<
2
MD
R
AL
UO
utB
AWrEn
Addr DoutMemory
Din
IR
[25-21]
[20-16]
[15-11]
[15-0]
Aw Ab Aa Da
Registers Dw WrEn Db
4
![Page 23: Review Problem 0](https://reader035.fdocuments.in/reader035/viewer/2022062408/56813f93550346895daa84d4/html5/thumbnails/23.jpg)
23
Review Problem 22
What mods to the multi-cycle datapath are needed to support load indirect (note – 2 mem accesses): Reg[IR[20:16]] = MEM[MEM[Reg[A] + sign-extend(IR[15:0])]]
Sign
Extn
d
PC
<<
2
MD
R
AL
UO
utB
AWrEn
Addr DoutMemory
Din
IR
[25-21]
[20-16]
[15-11]
[15-0]
Aw Ab Aa Da
Registers Dw WrEn Db
4
![Page 24: Review Problem 0](https://reader035.fdocuments.in/reader035/viewer/2022062408/56813f93550346895daa84d4/html5/thumbnails/24.jpg)
24
Review Problem 23 Draw the complete state diagram (Mealy Machine) for the
Mem, Reg, and IR write enables for a machine that does just R-Type and Branch instructions
WE
State Mem Reg IR
1 0 0 1
2 0 0 0
3Br 0 0 0
3Rt 0 0 0
4Rt 0 1 0
![Page 25: Review Problem 0](https://reader035.fdocuments.in/reader035/viewer/2022062408/56813f93550346895daa84d4/html5/thumbnails/25.jpg)
25
Review Problem 24
WE ALU
State PC Mem Reg IR SrcA SrcB Op Dest MemIn RegIn PCSrc
1 1 0 0 1 PC 4 + X PC X ALU
2 0 0 0 0 PC <<2 + X X X X
3Br (zero) 0 0 0 A B - X X X ALUout
3Rt 0 0 0 0 A B (IR) X X X X
4Rt 0 0 1 0 X X X Rd X ALUout X
3St 0 0 0 0 A SE + X X X X
4St 0 1 0 0 X X X X ALUout X X
3Lo 0 0 0 0 A SE + X X X X
4Lo 0 0 0 0 X X X X ALUout X X
5Lo 0 0 1 0 X X X Rt X MDR X
We can decide whether SrcA=PC means SrcA is 0 or 1. How should we decide how to best convert the symbolic control values below to specific boolean values?
![Page 26: Review Problem 0](https://reader035.fdocuments.in/reader035/viewer/2022062408/56813f93550346895daa84d4/html5/thumbnails/26.jpg)
26
Review Problem 25
A program executes 30% ALU, 30% Load, 20% Store, and 20% Branch. What is the CPI of our multicycle processor for this
machine?
What is the CPI of our single-cycle processor for this program?
![Page 27: Review Problem 0](https://reader035.fdocuments.in/reader035/viewer/2022062408/56813f93550346895daa84d4/html5/thumbnails/27.jpg)
27
Review Problem 26
The pipelined CPU has the stage delays shown Is it better to speed up the ALU by 10ns, or the Data
Memory by 2ns?
Does you answer change for a single-cycle or multi-cycle CPU?
Register
Register
Register
Register
PC
DataMemory
Instr.Memory
RegisterFile
RegisterFile
25ns 20ns 30ns20ns 20ns
![Page 28: Review Problem 0](https://reader035.fdocuments.in/reader035/viewer/2022062408/56813f93550346895daa84d4/html5/thumbnails/28.jpg)
28
Review Problem 27
If we built our register file to have two write ports (i.e. can write two registers at once) would this help our pipelined CPU?
![Page 29: Review Problem 0](https://reader035.fdocuments.in/reader035/viewer/2022062408/56813f93550346895daa84d4/html5/thumbnails/29.jpg)
29
Review Problem 28
What registers are being read and written in the 5th cycle of a pipelined CPU running this code?
add $1, $2, $3nor $4, $5, $6sub $7, $8, $9slt $10, $11, $12nand $13, $14, $15
Ifetch Reg/Dec Mem WrExec
Ifetch Reg/Dec Mem WrExec
Ifetch Reg/Dec Mem WrExec
Ifetch Reg/Dec Mem WrExec
Ifetch Reg/Dec Mem WrExec
![Page 30: Review Problem 0](https://reader035.fdocuments.in/reader035/viewer/2022062408/56813f93550346895daa84d4/html5/thumbnails/30.jpg)
30
Review Problem 29
Do the jump instructions (j, jr) have problems with hazards?
![Page 31: Review Problem 0](https://reader035.fdocuments.in/reader035/viewer/2022062408/56813f93550346895daa84d4/html5/thumbnails/31.jpg)
31
Review Problem 30
What forwarding happens on the following code?
lw $t0, 0($t1)add $t2, $t3, $t3nor $0, $t0, $t4bne $t2, $0, ENDsub $t5, $t2, $t4
Ifetch Reg/Dec Mem WrExec
Ifetch Reg/Dec Mem WrExec
Ifetch Reg/Dec Mem WrExec
Ifetch Reg/Dec Mem WrExec
Ifetch Reg/Dec Mem WrExec
![Page 32: Review Problem 0](https://reader035.fdocuments.in/reader035/viewer/2022062408/56813f93550346895daa84d4/html5/thumbnails/32.jpg)
32
Review Problem 31
What should we do to this code to run it on a CPU with delay slots?
and $t0, $t1, $t2ori $t0, $t0, 7add $t3, $t4, $t5lw $t6, 0($t3)bgez $t6, FOOj BAR
![Page 33: Review Problem 0](https://reader035.fdocuments.in/reader035/viewer/2022062408/56813f93550346895daa84d4/html5/thumbnails/33.jpg)
33
Review Problem 32
Why might a compiler do this transformation?
/* Before */for (j=0; j<2000; j++)
for (i=0; i<2000; i++)x[i][j]+=1;
/* After */for (i=0; i<2000; i++)
for (j=0; j<2000; j++)x[i][j]+=1;
![Page 34: Review Problem 0](https://reader035.fdocuments.in/reader035/viewer/2022062408/56813f93550346895daa84d4/html5/thumbnails/34.jpg)
34
Review Problem 33
Level Hit Time Hit Rate
L1 1 cycle 95%
L2 10 cycles 90%
Main Memory 50 cycles 99%
Disk 50,000 cycles 100%
If you can speed up any level’s hit time by a factor of two, which is the best to speed up?
![Page 35: Review Problem 0](https://reader035.fdocuments.in/reader035/viewer/2022062408/56813f93550346895daa84d4/html5/thumbnails/35.jpg)
35
Review Problem 34
The length (number of blocks) in a direct mapped cache is always a power of 2. Why?
![Page 36: Review Problem 0](https://reader035.fdocuments.in/reader035/viewer/2022062408/56813f93550346895daa84d4/html5/thumbnails/36.jpg)
36
Review Problem 35
For the following access pattern, what is the smallest direct mapped cache that will not use the same cache location twice?
01391741024
![Page 37: Review Problem 0](https://reader035.fdocuments.in/reader035/viewer/2022062408/56813f93550346895daa84d4/html5/thumbnails/37.jpg)
37
Review Problem 36
How many total bits are requires for a direct-mapped cache with 64 KB of data and 8-byte blocks, assuming a 32-bit address?
Index bits:
Bits/block:Data:Valid:Tag:
Total size:
![Page 38: Review Problem 0](https://reader035.fdocuments.in/reader035/viewer/2022062408/56813f93550346895daa84d4/html5/thumbnails/38.jpg)
38
Review Problem 37
Assume we have three caches, with four one-word blocks: Direct mapped, 2-way set assoc. (w/LRU), and fully associative
How many misses will each have on this address pattern: Byte addresses: 0, 32, 0, 24, 32
![Page 39: Review Problem 0](https://reader035.fdocuments.in/reader035/viewer/2022062408/56813f93550346895daa84d4/html5/thumbnails/39.jpg)
39
Review Problem 38
Level Hit Time Hit Rate
L1
L2 10 cycles 90%
Main Memory 40 cycles 99%
Disk 4,000 cycles 100%
Which is the best L1 cache for this system? Direct Mapped: 1 cycle, 80% hit rate 2-way Set Associative: 2 cycle, 90% hit rate Fully Associative: 3 cycle, 95% hit rate
![Page 40: Review Problem 0](https://reader035.fdocuments.in/reader035/viewer/2022062408/56813f93550346895daa84d4/html5/thumbnails/40.jpg)
40
Review Problem 39
Can a direct-mapped cache ever have less cache misses than a fully associative cache of the same capacity? Why/why not?
![Page 41: Review Problem 0](https://reader035.fdocuments.in/reader035/viewer/2022062408/56813f93550346895daa84d4/html5/thumbnails/41.jpg)
41
Review Problem 40
Assume we have separate instruction and data L1 caches. For each feature, state which cache is most likely to have the given feature
Large blocksize
Write-back
2-cycle hit time
![Page 42: Review Problem 0](https://reader035.fdocuments.in/reader035/viewer/2022062408/56813f93550346895daa84d4/html5/thumbnails/42.jpg)
42
Review Problem 41
In a system with 8-bit addresses, virtual memory, and a direct mapped cache, where can these memory accesses be found?
1 011011 01010101 1 111110 00001111 0 111001 11110000 1 111001 11111111
Valid Tag Data
01101001
01100011
11011100
10001011
1 0101 1010 1 0110 1111 0 1000 0001 1 1100 0111
Valid Tag Physical page #
![Page 43: Review Problem 0](https://reader035.fdocuments.in/reader035/viewer/2022062408/56813f93550346895daa84d4/html5/thumbnails/43.jpg)
43
Review Problem 42
For a dynamic branch predictor, why is the Branch History Table a direct-mapped cache? Why not fully associative or set associative?
![Page 44: Review Problem 0](https://reader035.fdocuments.in/reader035/viewer/2022062408/56813f93550346895daa84d4/html5/thumbnails/44.jpg)
44
Review Problem 43
How would various branch predictors do on the bold branch in the following code?
A 1-bit predictor will be correct ___%
A 2-bit predictor will be correct ___%
A 2-bit correlating predictor with 2-bit global branch history will be correct ___%
while (1) { if (i<2) counter++; /* Branch when I = 2 or 3 */ i=(i+1)&3; /* I counts 0,1,2,3,0,1,2,3,… */}
![Page 45: Review Problem 0](https://reader035.fdocuments.in/reader035/viewer/2022062408/56813f93550346895daa84d4/html5/thumbnails/45.jpg)
45
Review Problem 44
Show the constraint graph for this code, indicating the type of hazard for each edge.
1: lw $t1, 4($a0)2: add $t2, $t1, $a03: lw $t3, 8($a1)4: sub $t4, $t3, $a25: sw $t5, 0($a3)6: beq $s0, $s1, FOO
![Page 46: Review Problem 0](https://reader035.fdocuments.in/reader035/viewer/2022062408/56813f93550346895daa84d4/html5/thumbnails/46.jpg)
46
Review Problem 45
Would loop unrolling & register renaming be useful for the following code? If so, what would the resulting code look like?
while (i<400) { if (x[i]==CONST) counter++; /* Count number of CONSTs in array */ i++;}
![Page 47: Review Problem 0](https://reader035.fdocuments.in/reader035/viewer/2022062408/56813f93550346895daa84d4/html5/thumbnails/47.jpg)
47
Review Problem 46
Schedule the code for a 4-way VLIW. Assume no delay slots, and all instructions in parallel with a branch/jump still execute.
ALU1 ALU2 Load/Store Branch/Jump
1: lw $t1, 4($a0)2: add $t2, $t1, $a03: lw $t3, 8($a1)4: sub $t2, $t3, $a25: sw $t5, 0($a3)6: beq $s0, $s1, FOO7: and $t7, $s0, $s1