EE457 Quiz (~10%) -
Transcript of EE457 Quiz (~10%) -
February 7, 2019 10:21 am EE457 Quiz - Spring 2019 1/12 C Copyright 2019 Gandhi Puvvada
EE457 Quiz (~10%)Closed-book Closed-notes Exam; No cheat sheets;
Ordinary calculators may be used but not the smart phone with calculators. Verilog Guides are not needed and are not allowed.Smart phones, tablets (and any kind of computing/Internet devices) are not allowed.
This is a Crowdmark exam. Please do not write on margins or on backside. Use HB or 1H pencil.
Spring 2019Instructor: Gandhi Puvvada
Thursday, 2/7/2019 (A 3-hour exam) 05:00 PM - 08:00 PM (180 min) in MHP101
Viterbi School of Engineering, University of Southern California
Ques# Topic Page# Time Points Score
1 State Diagram, RTL Design 2-4 55 min. 82
2 Signed and Unsigned numbers 5-6 30 min. 70
3 CPU Performance 7-8 30 min. 54
4 Byte-addressable processors 9 25 min. 58
5 Single-Cycle CPU 10-11 30 min. 50
Total 1+10+1 170 min. 314
Perfect Score 300
February 7, 2019 10:21 am EE457 Quiz - Spring 2019 2/12 C Copyright 2019 Gandhi Puvvada
1 ( 14+20+20+2+12+14 = 82 points) 60 min. State Diagram and RTL design
General advice: While writing RTL, you can have multiple independent "if" statements. It is unnecessary (and often problematic) to try to fit everything in one long "if" statement. Use either curly parentheses or begin..end if multiple statements are under one "if" statement or if there are nested "if" statements.
1.1 This is simpler than our Min/Max Lab, as we need to find only the Max (Maximum), but we also need to find where that Max occurred for the first time in the array (when we process it with I running from 0 to 15) and store that index I in IM (Index of Max). Here we use a single comparator to compare M[I] with the running Max. Since the array elements are all unsigned 8-bit numbers, if any M[I] is 25510 (255 decimal), then we should be able to conclude our search for Max and IM early! Note: We do not count the hardware to detect if M[I] is 25510 as a comparison unit as this can be done using just one 8-bit AND gate. Complete your design below.What if all 16 elements of the array are 255? Perhaps your design would finish in one clock!
1.2 Redesign the above design with the additional requirement as follows.Besides Max and IM, we also want to know, in K, the number of array elements M[I] which are different from the Max. If all M[I] bear the same value, then that is the Max, and in that case, K would become zero. If all M[I] are different from each other, then K would be 15 (and not 16). Note that K can be 15 even if there are repetitions in the array as long as the Max does not repeat (it occurs only once in the array). Go through the following suggestions and implement Bruin #2’s design and Trojan#2’s design.
14pts
ResetStart
Start
1
INI LOAD
DONE
CMx
I <= 0;
Max <= M[I];I <= I + 1;
C2
C2
To INI state
IM <= ;
C1
C1
C2 = C1 =
0123
131415
110120255199
5255100
Max=255IM = 2
0123
131415
100100200199
5200100
Max=200IM = 2
K = 14 assuming the Max
of 200 occurred only twice.
The repetitions of 100did not matter in the count of .K
February 7, 2019 10:21 am EE457 Quiz - Spring 2019 3/12 C Copyright 2019 Gandhi Puvvada
1.2.1 Bruin #1 (a simple-minded Bruin): I will take two passes through the array taking a total of 16+16 = 32 clocks. The first pass (I running from 0 to 15) is dedicated to finding the Max and the second pass (again I running from 0 to 15) is dedicated to finding K (count of elements different from Max).
Bruin #2 (a slightly better Bruin): I can save a clock by processing the array first for I running from 0 to 15 and then running from 15 to 0 for finding K. In my design, M[15] is processed only once (in the last clock of the first pass). I will have two states, CMx to find Max and CK to count K.Complete this Bruin’s design. Note: When (I==15), if (M[I] != Max) you would increment K for the first time in the CMx state. Also since you do not want to process M[15] for more than once, you need to decrement I from 15 to 14 as you prepare to exit to the CK (Count K) state. Two incomplete state diagrams are given below for you to complete. In the second we combined CMx and CK states into one CMxCK state. You can use a Flag F to indicate in which phase of CMxCK state you are currently operating. In each completion, carefully decide when you update Max, I, IM, K (and in the second design the flag F also).
20pts
Leftdesign
20pts
RightDesign
ResetStart
StartINI LOAD
DONE
CMxCK
I <= 0;
Max <= M[I];I <= I + 1;
When F == 0,
F <= 0;
C5C5
IM <= ;
When F == 1,
1
1
find Max. When I=15, set F and perhaps count K once.
count (or continue to count) K for I going 14 => 0
ResetStart
StartINI LOAD
DONE
CMx
I <= 0;
Max <= M[I];I <= I + 1;
C4
C3
IM <= ;
1
1
Find Max. When I= 15, decrement I
CKCount (or continue to count) K for I going 14 => 0
C4 C3
and perhaps count K once.
C4 = C3 = C5 =
February 7, 2019 10:21 am EE457 Quiz - Spring 2019 4/12 C Copyright 2019 Gandhi Puvvada
Bruin #3: I would add to the above design, the improvement of early conclusion when M[I] is 255 like in Q#1.1.
Trojan #1: You are Trojan #1. Would you agree with Bruin #3? Yes / No
Trojan #2: We, are Trojans, and we can do much better. We will solve the above problem in one pass in 16 clocks. We start with the presumption that the maximum occurs only once, which leads to the maximum value of K namely 15. We initialize K to its maximum value of 15, whenever we find a new maximum. And we decrement the K by one whenever we find the current Max repeating (M[I] == Max). If we find a new max after the previous max repeated a couple of times (consequently causing K to be decremented a couple of times), we again reinitialize K to its maximum value of 15, thinking that this max may be the new unique (non-repeating) maximum.Please go through the example table below and complete the exercise table below before attempting to work on the state diagram.
2pts
12pts
14pts
Exa
mpl
eE
xerc
ise
ResetStart
Start
1
INI
LOAD
DONE
CMxCK
I <= 0;
Max <= M[I];I <= I + 1;
C6
C6
IM <= ;
C6 =
K <= ;
February 7, 2019 10:21 am EE457 Quiz - Spring 2019 5/12 C Copyright 2019 Gandhi Puvvada
2 ( 15+10+20+15+10 = 70 points) 30 min. Signed and unsigned numbers
Given below is the Q#2.1.1 (significant part of the statement of the question and its solution) from the Fall 2018 Quiz that you were asked to go through.
2.1 Using the same 5-bit adder/subtracter reproduced below, now compare 8X (X3 X2 X1 X0 0 0 0) with Y (Y3 Y3 Y3 Y3 Y2 Y1 Y0 for signed and 0 0 0 Y3 Y2 Y1 Y0 for unsigned) to produce 8XgtY and 8XhiY.
2.1.1 Mr. Trojan says that 8XhiY can be produced without using the 5-bit adder by using simple gates. This will have much shorter logic delay. Show his design on the side.
Now we need to produce 4XgtY and 4XhiY instead of 2XgtY and 2XhiY as shown above.If we were given a 6-bit subtracter in place of the 5-bit subtracter, it would have been fairly straightforward but you are given the same 5-bit subtracter below. Perhaps you may be able to ignore Y0 (or deal with it outside the subtracter as needed) so that therest of the bits can be handled by the 5-bit subtracter. Notice that the internal carries C3, C2, andC1 are also brought out for your possible use and Z3 is produced instead of the previous Z4.
question and solution
gt = greater thanhi = higher than
15 ptsa b
cins
cout C0
AD
D/S
UB
a bcin
scout
a bcin
scout
a bcin
scout
Raw
Car
ry
Carry
VC4C5
R4 R3 R2 R1 R0
VDDR0R1R2R3R4
C3 C2 C1
a bcin
scout
10pts
February 7, 2019 10:21 am EE457 Quiz - Spring 2019 6/12 C Copyright 2019 Gandhi Puvvada
2.2 Using the same 5-bit adder/subtracter reproduced below, compare 8X with Y again, and show how to produce 8XltY ("lt" means "less than" treating the numbers as signed numbers), and 8XloY ("lo" means "lower than" treating the numbers as unsigned numbers).
2.2.1 Again Mr. Trojan says that 8XloY can be produced without using the 5-bit adder by using simple gates. This will have much shorter logic delay. Show his design on the side.
2.3 Reproduced below are some diagrams from your ALU lab using 4-bit numbers as examples .
20pts a b
cins
cout C0
AD
D/S
UB
a bcin
scout
a bcin
scout
a bcin
scout
Raw
Car
ry
Carry
VC4C5
R4 R3 R2 R1 R0
VDDR0R1R2R3R4
C3 C2 C1
a bcin
scout
15pts
4+6pts
1514
13
12
11
10
9 8
01
2
3
4
5
6
7
00000001
0010
0011
0100
0101
0110011110001001
1010
1011
1100
1101
11101111
Error point:C bit is setUNSIGNED
SMA
LLER
mag
.LA
RGER
mag
.
- 1- 2
- 3
- 4
- 5
- 6
- 7 - 8
+0+1
+2
+4
+5
+6
+7
00000001
0010
0011
0100
0101
011001111000
10011010
1011
1100
1101
11101111
Error Point:V bit is set
+3
SIGNED
SMALLER mag.
LARGER mag.
Two 16-bit numbers below are being added. The X-bits in A and the Y-bits inB are independent of each other and they can be any combination of 1s and 0s.If they are unsigned numbers, the sum would overflow. True/False/can’t tellIf they are signed numbers, the sum would overflow. True/False/can’t tellA[15:0] = 100X_XXXX_XXXX_1111 and B[15:0] = 010Y_YYYY_YYYY_1111
February 7, 2019 10:21 am EE457 Quiz - Spring 2019 7/12 C Copyright 2019 Gandhi Puvvada
3 ( (16+8) + 20 + 5 + 5 = 54 points) 30 min. Performance
The textbook authors have cautioned us about the problems associated with MIPs and MFLOPs. Con-sider the following 8 actions. Write UP/DOWN/SAME in the four columns on the right.
MFLOPs MIPs Relative MIPs
ET
1 Improving integer ADD, SUB instructions2 Improving Floating Point ADD, SUB instructions3 Adding Floating Point multiplier/divider hardware
(previously non-existing)4 Improving existing Floating Point multiplier/divider
hardware 5 Adding unnecessary NOPs in compilation6 Adding several unnecessary Floating Point add/sub
instruction in compilation7 Removing existing Floating Point multiply/Divide hardware
and requiring compiler to use software routines to perform the same using several floating point add/sub instructions.
8 Increasing the processor frequency by 20%. No other change.
16pts
8 pts
All Correct
Bonus
Blank area (use it for rough work)
February 7, 2019 10:21 am EE457 Quiz - Spring 2019 8/12 C Copyright 2019 Gandhi Puvvada
3.1 An ISA vendor (like the ARM) has licensed his ISA to both TI (Texas Instruments) and LSI Logic. The implementation technology and the hardware architecture used by TI and LSI are quite different. They (TI and LSI) have provided limited information as shown in the tables. For each of the two, we wanted to calculate Weighted Average CPI and Native MIPs rating. If information is not adequate, state what information is needed (and why it is needed) and also assume a reasonable value for the missing information and calculate Weighted Average CPI and Native MIPs rating for each of them. ____________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________
3.1.1 Mr. Bruin says, "It is just coincidental that the two CPUs had the same frequency of occurrence of instructions in the dynamic instruction trace". Miss Bruin does not agree with him. Whom should we allow to transfer to USC and why? _______________________________________ ________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________
3.2 It ___________ (is / isn’t) possible to change Relative MIPs of a processor without changing the ET (Execution Time). Explain: _______________________________________________________________________________________________________________________________
Type CPI Frequency
ABC
CPIA = 1CPIB = 2CPIC = 3
fA=40%fB=30%fC=20%
D CPID = 4 fD=10%
CPU#1 from TI
Type nanoseconds Frequency
ABC
nsA = 2nsB = 4nsC = 6
fA=40%fB=30%fC=20%
D nsD = 8 fD=10%
CPU#2 from LSI Logic
per instruction 20pts
use this space as you please for any calculations associated with this question.
5pts
5pts
February 7, 2019 10:21 am EE457 Quiz - Spring 2019 9/12 C Copyright 2019 Gandhi Puvvada
4 ( 19 +39 = 58 points) 25 min. Memory addresses
4.1 You are aware that the Intel 80486 processor is a 32-bit data 32-bit logical address byte addressable processor and the Intel i860 processor is a 64-bit data 32-bit logical address byte addressable processor. The address spaces are ______ (A/B) where A: 4 GB for each, B: 16 GB (232 locations each of 32 bits wide) for 80486 and 32 GB (232 locations each of 64 bits wide) for i860
Intel follows ___________ (Little Endian / Big Endian) system. In the Intel 80486 processor system address space, byte 1234_567FH is the ____________ (most / least) significant byte of the 32-bit word with system address __________________ (state in hexadecimal). State the next three 32-bit word addresses (in hex), next to the 32-bit word containing the above byte _________________________ _________________________ _________________________Let us now repeat the above question for the Intel i860 processor. In the Intel i860 processor system address space, byte 1234_567FH is the ____________ (most / least) significant byte of the 64-bit word with system address __________________ (state in hexadecimal). State the next three 64-bit word addresses (in hex), next to the 64-bit word containing the above byte _________________________ _________________________ _________________________
4.2 Shown on the side is the memory interface to a byte-wide memory chip in a memory system based on minimum number of byte-wide banks for an exotic USC512 processor (512-bit data, 32-bit logical address, byte-addressable processor) . USC processors are similar to Intel processors (Byte Enable pins and Endianess). The address pins on this processor are (select) (i) A[31:0] (ii) A[31:3],/BE[7:0] (iii) A[31:4],/BE[15:0](iv) A[31:5],/BE[31:0] (v) A[31:6],/BE[63:0] (vi) A[31:7],/BE[127:0]
Fill-in the 6 blanks (marked by the 6 arrows) in the figure on the side. Also find the system addresses corresponding to the lowest-addressed two bytes of this memory chip. The lowest-addressed two bytes of this chip map to the system byte addresses (in hex) _______________________________ _________________________________________________.
The system addresses mapping to any location in this memory chip will have the same upper _____ (state a number) bits namely ________________ (state their labels in the form X[13:2]).
The system addresses mapping to any location in this memory chip will have the same lower _____ (state a number) bits namely ________________ (state their labels in the form Y[13:2]).
If this chip goes bad, until you replace, you should avoid using memory addresses ______ (X/Y) where X: which map to this bad chip only, Y: which map to the composite space occupied by this chip as well as similar sized spaces in all other banks (Note: Here the words "composite space" mean contiguous or continuous address range). Address range with "holes" (bad spots) is useless!.
State the range of the unusable address range in hex: _____________________________________
19pts
8 pts
8 pts
1 pts
2 pts
A31A30A29A28A27A26A25A24
CS
WERD
A[ : ]D[7:0]
D[ : ]
A[23: ]
BE6
______KB
Note
Shift in address for 80486: ______Shift in address for USC512: ______
39pts
3 pts
11 pts
5 pts
2 pts
2 pts
2 pts
4 pts
February 7, 2019 10:21 am EE457 Quiz - Spring 2019 10/12 C Copyright 2019 Gandhi Puvvada
5 ( 15 + 29 + 6 = 50 points) 30 min. Single-cycle CPU:
You are familiar with the addi and the ordinary jump J (Jump with the 26-bit jump address field), Jal (Jump and Link), Jr rs, (Jump register rs), and the Beq (Branch if Equal) instructions.
5.1 The data path on the next page is nearly complete. Complete the connections to the 7 loose ends which were marked with numbered arrows .
Control Signal Table: Complete the four rows and three columns. Whenever possible, use don’t cares.
5.2 Mr. _____________ (Trojan/Bruin) suggested that the following part of the textbook figure shown on the left needs to be revised at the area pointed to by the arrow as shown on the right. What is the issue here? ___________________________________________________________________ __________________________________________________________________________________________ __________________________________________________________________________________________ __________________________________________________________________________________________ __________________________________________________________________________________________
Inst
ruct
ion
Mem
Rea
d
Mem
Wri
te
AL
USr
c
AL
UO
p1
AL
Uop
0
Reg
Dst
Mem
tore
g
Reg
Wri
te
Bra
nch
JUM
P
Jal
JR
R-format 0 0 0 1 0 1 0 1 0
lw 1 0 1 0 0 0 1 1 0
sw 0 1 1 0 0 X X 0 0
addi
beq 0 0 0 0 1 X X 0 1
J 0 0 X X X X X 0 X
Jal
JR rs
11+4pts
1
25+4pts
20 pts forbottom 3 rows andright 3columns..5 pointsfor addi
6pts
Textbook figure Suggested revision
Control
JumpJR
JalPCSrc
RegDst
BranchMemReadMemtoReg
ALUOpMemWrite
ALUSrcRegWrite
Zero
ALUcontrol
1
0
1
0
Jump JR
10
10
Jal
Jump Address [31:0]Instruction [31:0]
PC+4 [31:28]
21 3 4 5
6 7