EE457 Quiz (~10%) -

12
February 7, 2019 10:21 am EE457 Quiz - Spring 2019 1/12 C Copyright 2019 Gandhi Puvvada EE457 Quiz (~10%) Closed-book Closed-notes Exam; No cheat sheets; Ordinary calculators may be used but not the smart phone with calculators. Verilog Guides are not needed and are not allowed. Smart phones, tablets (and any kind of computing/Internet devices) are not allowed. This is a Crowdmark exam. Please do not write on margins or on backside. Use HB or 1H pencil. Spring 2019 Instructor: Gandhi Puvvada Thursday, 2/7/2019 (A 3-hour exam) 05:00 PM - 08:00 PM (180 min) in MHP101 Viterbi School of Engineering, University of Southern California Ques# Topic Page# Time Points Score 1 State Diagram, RTL Design 2-4 55 min. 82 2 Signed and Unsigned numbers 5-6 30 min. 70 3 CPU Performance 7-8 30 min. 54 4 Byte-addressable processors 9 25 min. 58 5 Single-Cycle CPU 10-11 30 min. 50 Total 1+10+1 170 min. 314 Perfect Score 300

Transcript of EE457 Quiz (~10%) -

Page 1: EE457 Quiz (~10%) -

February 7, 2019 10:21 am EE457 Quiz - Spring 2019 1/12 C Copyright 2019 Gandhi Puvvada

EE457 Quiz (~10%)Closed-book Closed-notes Exam; No cheat sheets;

Ordinary calculators may be used but not the smart phone with calculators. Verilog Guides are not needed and are not allowed.Smart phones, tablets (and any kind of computing/Internet devices) are not allowed.

This is a Crowdmark exam. Please do not write on margins or on backside. Use HB or 1H pencil.

Spring 2019Instructor: Gandhi Puvvada

Thursday, 2/7/2019 (A 3-hour exam) 05:00 PM - 08:00 PM (180 min) in MHP101

Viterbi School of Engineering, University of Southern California

Ques# Topic Page# Time Points Score

1 State Diagram, RTL Design 2-4 55 min. 82

2 Signed and Unsigned numbers 5-6 30 min. 70

3 CPU Performance 7-8 30 min. 54

4 Byte-addressable processors 9 25 min. 58

5 Single-Cycle CPU 10-11 30 min. 50

Total 1+10+1 170 min. 314

Perfect Score 300

Page 2: EE457 Quiz (~10%) -

February 7, 2019 10:21 am EE457 Quiz - Spring 2019 2/12 C Copyright 2019 Gandhi Puvvada

1 ( 14+20+20+2+12+14 = 82 points) 60 min. State Diagram and RTL design

General advice: While writing RTL, you can have multiple independent "if" statements. It is unnecessary (and often problematic) to try to fit everything in one long "if" statement. Use either curly parentheses or begin..end if multiple statements are under one "if" statement or if there are nested "if" statements.

1.1 This is simpler than our Min/Max Lab, as we need to find only the Max (Maximum), but we also need to find where that Max occurred for the first time in the array (when we process it with I running from 0 to 15) and store that index I in IM (Index of Max). Here we use a single comparator to compare M[I] with the running Max. Since the array elements are all unsigned 8-bit numbers, if any M[I] is 25510 (255 decimal), then we should be able to conclude our search for Max and IM early! Note: We do not count the hardware to detect if M[I] is 25510 as a comparison unit as this can be done using just one 8-bit AND gate. Complete your design below.What if all 16 elements of the array are 255? Perhaps your design would finish in one clock!

1.2 Redesign the above design with the additional requirement as follows.Besides Max and IM, we also want to know, in K, the number of array elements M[I] which are different from the Max. If all M[I] bear the same value, then that is the Max, and in that case, K would become zero. If all M[I] are different from each other, then K would be 15 (and not 16). Note that K can be 15 even if there are repetitions in the array as long as the Max does not repeat (it occurs only once in the array). Go through the following suggestions and implement Bruin #2’s design and Trojan#2’s design.

14pts

ResetStart

Start

1

INI LOAD

DONE

CMx

I <= 0;

Max <= M[I];I <= I + 1;

C2

C2

To INI state

IM <= ;

C1

C1

C2 = C1 =

0123

131415

110120255199

5255100

Max=255IM = 2

0123

131415

100100200199

5200100

Max=200IM = 2

K = 14 assuming the Max

of 200 occurred only twice.

The repetitions of 100did not matter in the count of .K

Page 3: EE457 Quiz (~10%) -

February 7, 2019 10:21 am EE457 Quiz - Spring 2019 3/12 C Copyright 2019 Gandhi Puvvada

1.2.1 Bruin #1 (a simple-minded Bruin): I will take two passes through the array taking a total of 16+16 = 32 clocks. The first pass (I running from 0 to 15) is dedicated to finding the Max and the second pass (again I running from 0 to 15) is dedicated to finding K (count of elements different from Max).

Bruin #2 (a slightly better Bruin): I can save a clock by processing the array first for I running from 0 to 15 and then running from 15 to 0 for finding K. In my design, M[15] is processed only once (in the last clock of the first pass). I will have two states, CMx to find Max and CK to count K.Complete this Bruin’s design. Note: When (I==15), if (M[I] != Max) you would increment K for the first time in the CMx state. Also since you do not want to process M[15] for more than once, you need to decrement I from 15 to 14 as you prepare to exit to the CK (Count K) state. Two incomplete state diagrams are given below for you to complete. In the second we combined CMx and CK states into one CMxCK state. You can use a Flag F to indicate in which phase of CMxCK state you are currently operating. In each completion, carefully decide when you update Max, I, IM, K (and in the second design the flag F also).

20pts

Leftdesign

20pts

RightDesign

ResetStart

StartINI LOAD

DONE

CMxCK

I <= 0;

Max <= M[I];I <= I + 1;

When F == 0,

F <= 0;

C5C5

IM <= ;

When F == 1,

1

1

find Max. When I=15, set F and perhaps count K once.

count (or continue to count) K for I going 14 => 0

ResetStart

StartINI LOAD

DONE

CMx

I <= 0;

Max <= M[I];I <= I + 1;

C4

C3

IM <= ;

1

1

Find Max. When I= 15, decrement I

CKCount (or continue to count) K for I going 14 => 0

C4 C3

and perhaps count K once.

C4 = C3 = C5 =

Page 4: EE457 Quiz (~10%) -

February 7, 2019 10:21 am EE457 Quiz - Spring 2019 4/12 C Copyright 2019 Gandhi Puvvada

Bruin #3: I would add to the above design, the improvement of early conclusion when M[I] is 255 like in Q#1.1.

Trojan #1: You are Trojan #1. Would you agree with Bruin #3? Yes / No

Trojan #2: We, are Trojans, and we can do much better. We will solve the above problem in one pass in 16 clocks. We start with the presumption that the maximum occurs only once, which leads to the maximum value of K namely 15. We initialize K to its maximum value of 15, whenever we find a new maximum. And we decrement the K by one whenever we find the current Max repeating (M[I] == Max). If we find a new max after the previous max repeated a couple of times (consequently causing K to be decremented a couple of times), we again reinitialize K to its maximum value of 15, thinking that this max may be the new unique (non-repeating) maximum.Please go through the example table below and complete the exercise table below before attempting to work on the state diagram.

2pts

12pts

14pts

Exa

mpl

eE

xerc

ise

ResetStart

Start

1

INI

LOAD

DONE

CMxCK

I <= 0;

Max <= M[I];I <= I + 1;

C6

C6

IM <= ;

C6 =

K <= ;

Page 5: EE457 Quiz (~10%) -

February 7, 2019 10:21 am EE457 Quiz - Spring 2019 5/12 C Copyright 2019 Gandhi Puvvada

2 ( 15+10+20+15+10 = 70 points) 30 min. Signed and unsigned numbers

Given below is the Q#2.1.1 (significant part of the statement of the question and its solution) from the Fall 2018 Quiz that you were asked to go through.

2.1 Using the same 5-bit adder/subtracter reproduced below, now compare 8X (X3 X2 X1 X0 0 0 0) with Y (Y3 Y3 Y3 Y3 Y2 Y1 Y0 for signed and 0 0 0 Y3 Y2 Y1 Y0 for unsigned) to produce 8XgtY and 8XhiY.

2.1.1 Mr. Trojan says that 8XhiY can be produced without using the 5-bit adder by using simple gates. This will have much shorter logic delay. Show his design on the side.

Now we need to produce 4XgtY and 4XhiY instead of 2XgtY and 2XhiY as shown above.If we were given a 6-bit subtracter in place of the 5-bit subtracter, it would have been fairly straightforward but you are given the same 5-bit subtracter below. Perhaps you may be able to ignore Y0 (or deal with it outside the subtracter as needed) so that therest of the bits can be handled by the 5-bit subtracter. Notice that the internal carries C3, C2, andC1 are also brought out for your possible use and Z3 is produced instead of the previous Z4.

question and solution

gt = greater thanhi = higher than

15 ptsa b

cins

cout C0

AD

D/S

UB

a bcin

scout

a bcin

scout

a bcin

scout

Raw

Car

ry

Carry

VC4C5

R4 R3 R2 R1 R0

VDDR0R1R2R3R4

C3 C2 C1

a bcin

scout

10pts

Page 6: EE457 Quiz (~10%) -

February 7, 2019 10:21 am EE457 Quiz - Spring 2019 6/12 C Copyright 2019 Gandhi Puvvada

2.2 Using the same 5-bit adder/subtracter reproduced below, compare 8X with Y again, and show how to produce 8XltY ("lt" means "less than" treating the numbers as signed numbers), and 8XloY ("lo" means "lower than" treating the numbers as unsigned numbers).

2.2.1 Again Mr. Trojan says that 8XloY can be produced without using the 5-bit adder by using simple gates. This will have much shorter logic delay. Show his design on the side.

2.3 Reproduced below are some diagrams from your ALU lab using 4-bit numbers as examples .

20pts a b

cins

cout C0

AD

D/S

UB

a bcin

scout

a bcin

scout

a bcin

scout

Raw

Car

ry

Carry

VC4C5

R4 R3 R2 R1 R0

VDDR0R1R2R3R4

C3 C2 C1

a bcin

scout

15pts

4+6pts

1514

13

12

11

10

9 8

01

2

3

4

5

6

7

00000001

0010

0011

0100

0101

0110011110001001

1010

1011

1100

1101

11101111

Error point:C bit is setUNSIGNED

SMA

LLER

mag

.LA

RGER

mag

.

- 1- 2

- 3

- 4

- 5

- 6

- 7 - 8

+0+1

+2

+4

+5

+6

+7

00000001

0010

0011

0100

0101

011001111000

10011010

1011

1100

1101

11101111

Error Point:V bit is set

+3

SIGNED

SMALLER mag.

LARGER mag.

Two 16-bit numbers below are being added. The X-bits in A and the Y-bits inB are independent of each other and they can be any combination of 1s and 0s.If they are unsigned numbers, the sum would overflow. True/False/can’t tellIf they are signed numbers, the sum would overflow. True/False/can’t tellA[15:0] = 100X_XXXX_XXXX_1111 and B[15:0] = 010Y_YYYY_YYYY_1111

Page 7: EE457 Quiz (~10%) -

February 7, 2019 10:21 am EE457 Quiz - Spring 2019 7/12 C Copyright 2019 Gandhi Puvvada

3 ( (16+8) + 20 + 5 + 5 = 54 points) 30 min. Performance

The textbook authors have cautioned us about the problems associated with MIPs and MFLOPs. Con-sider the following 8 actions. Write UP/DOWN/SAME in the four columns on the right.

MFLOPs MIPs Relative MIPs

ET

1 Improving integer ADD, SUB instructions2 Improving Floating Point ADD, SUB instructions3 Adding Floating Point multiplier/divider hardware

(previously non-existing)4 Improving existing Floating Point multiplier/divider

hardware 5 Adding unnecessary NOPs in compilation6 Adding several unnecessary Floating Point add/sub

instruction in compilation7 Removing existing Floating Point multiply/Divide hardware

and requiring compiler to use software routines to perform the same using several floating point add/sub instructions.

8 Increasing the processor frequency by 20%. No other change.

16pts

8 pts

All Correct

Bonus

Blank area (use it for rough work)

Page 8: EE457 Quiz (~10%) -

February 7, 2019 10:21 am EE457 Quiz - Spring 2019 8/12 C Copyright 2019 Gandhi Puvvada

3.1 An ISA vendor (like the ARM) has licensed his ISA to both TI (Texas Instruments) and LSI Logic. The implementation technology and the hardware architecture used by TI and LSI are quite different. They (TI and LSI) have provided limited information as shown in the tables. For each of the two, we wanted to calculate Weighted Average CPI and Native MIPs rating. If information is not adequate, state what information is needed (and why it is needed) and also assume a reasonable value for the missing information and calculate Weighted Average CPI and Native MIPs rating for each of them. ____________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________

3.1.1 Mr. Bruin says, "It is just coincidental that the two CPUs had the same frequency of occurrence of instructions in the dynamic instruction trace". Miss Bruin does not agree with him. Whom should we allow to transfer to USC and why? _______________________________________ ________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________

3.2 It ___________ (is / isn’t) possible to change Relative MIPs of a processor without changing the ET (Execution Time). Explain: _______________________________________________________________________________________________________________________________

Type CPI Frequency

ABC

CPIA = 1CPIB = 2CPIC = 3

fA=40%fB=30%fC=20%

D CPID = 4 fD=10%

CPU#1 from TI

Type nanoseconds Frequency

ABC

nsA = 2nsB = 4nsC = 6

fA=40%fB=30%fC=20%

D nsD = 8 fD=10%

CPU#2 from LSI Logic

per instruction 20pts

use this space as you please for any calculations associated with this question.

5pts

5pts

Page 9: EE457 Quiz (~10%) -

February 7, 2019 10:21 am EE457 Quiz - Spring 2019 9/12 C Copyright 2019 Gandhi Puvvada

4 ( 19 +39 = 58 points) 25 min. Memory addresses

4.1 You are aware that the Intel 80486 processor is a 32-bit data 32-bit logical address byte addressable processor and the Intel i860 processor is a 64-bit data 32-bit logical address byte addressable processor. The address spaces are ______ (A/B) where A: 4 GB for each, B: 16 GB (232 locations each of 32 bits wide) for 80486 and 32 GB (232 locations each of 64 bits wide) for i860

Intel follows ___________ (Little Endian / Big Endian) system. In the Intel 80486 processor system address space, byte 1234_567FH is the ____________ (most / least) significant byte of the 32-bit word with system address __________________ (state in hexadecimal). State the next three 32-bit word addresses (in hex), next to the 32-bit word containing the above byte _________________________ _________________________ _________________________Let us now repeat the above question for the Intel i860 processor. In the Intel i860 processor system address space, byte 1234_567FH is the ____________ (most / least) significant byte of the 64-bit word with system address __________________ (state in hexadecimal). State the next three 64-bit word addresses (in hex), next to the 64-bit word containing the above byte _________________________ _________________________ _________________________

4.2 Shown on the side is the memory interface to a byte-wide memory chip in a memory system based on minimum number of byte-wide banks for an exotic USC512 processor (512-bit data, 32-bit logical address, byte-addressable processor) . USC processors are similar to Intel processors (Byte Enable pins and Endianess). The address pins on this processor are (select) (i) A[31:0] (ii) A[31:3],/BE[7:0] (iii) A[31:4],/BE[15:0](iv) A[31:5],/BE[31:0] (v) A[31:6],/BE[63:0] (vi) A[31:7],/BE[127:0]

Fill-in the 6 blanks (marked by the 6 arrows) in the figure on the side. Also find the system addresses corresponding to the lowest-addressed two bytes of this memory chip. The lowest-addressed two bytes of this chip map to the system byte addresses (in hex) _______________________________ _________________________________________________.

The system addresses mapping to any location in this memory chip will have the same upper _____ (state a number) bits namely ________________ (state their labels in the form X[13:2]).

The system addresses mapping to any location in this memory chip will have the same lower _____ (state a number) bits namely ________________ (state their labels in the form Y[13:2]).

If this chip goes bad, until you replace, you should avoid using memory addresses ______ (X/Y) where X: which map to this bad chip only, Y: which map to the composite space occupied by this chip as well as similar sized spaces in all other banks (Note: Here the words "composite space" mean contiguous or continuous address range). Address range with "holes" (bad spots) is useless!.

State the range of the unusable address range in hex: _____________________________________

19pts

8 pts

8 pts

1 pts

2 pts

A31A30A29A28A27A26A25A24

CS

WERD

A[ : ]D[7:0]

D[ : ]

A[23: ]

BE6

______KB

Note

Shift in address for 80486: ______Shift in address for USC512: ______

39pts

3 pts

11 pts

5 pts

2 pts

2 pts

2 pts

4 pts

Page 10: EE457 Quiz (~10%) -

February 7, 2019 10:21 am EE457 Quiz - Spring 2019 10/12 C Copyright 2019 Gandhi Puvvada

5 ( 15 + 29 + 6 = 50 points) 30 min. Single-cycle CPU:

You are familiar with the addi and the ordinary jump J (Jump with the 26-bit jump address field), Jal (Jump and Link), Jr rs, (Jump register rs), and the Beq (Branch if Equal) instructions.

5.1 The data path on the next page is nearly complete. Complete the connections to the 7 loose ends which were marked with numbered arrows .

Control Signal Table: Complete the four rows and three columns. Whenever possible, use don’t cares.

5.2 Mr. _____________ (Trojan/Bruin) suggested that the following part of the textbook figure shown on the left needs to be revised at the area pointed to by the arrow as shown on the right. What is the issue here? ___________________________________________________________________ __________________________________________________________________________________________ __________________________________________________________________________________________ __________________________________________________________________________________________ __________________________________________________________________________________________

Inst

ruct

ion

Mem

Rea

d

Mem

Wri

te

AL

USr

c

AL

UO

p1

AL

Uop

0

Reg

Dst

Mem

tore

g

Reg

Wri

te

Bra

nch

JUM

P

Jal

JR

R-format 0 0 0 1 0 1 0 1 0

lw 1 0 1 0 0 0 1 1 0

sw 0 1 1 0 0 X X 0 0

addi

beq 0 0 0 0 1 X X 0 1

J 0 0 X X X X X 0 X

Jal

JR rs

11+4pts

1

25+4pts

20 pts forbottom 3 rows andright 3columns..5 pointsfor addi

6pts

Textbook figure Suggested revision

Page 11: EE457 Quiz (~10%) -

Control

JumpJR

JalPCSrc

RegDst

BranchMemReadMemtoReg

ALUOpMemWrite

ALUSrcRegWrite

Zero

ALUcontrol

1

0

1

0

Jump JR

10

10

Jal

Jump Address [31:0]Instruction [31:0]

PC+4 [31:28]

21 3 4 5

6 7

Page 12: EE457 Quiz (~10%) -