EE457 Quiz (~10%)

February 12, 2015 11:14 am EE457 Quiz - Spring 2015 1 / 9 C Copyright 2015 Gandhi Puvvada

EE457 Quiz (~10%)Closed-book Closed-notes Exam; No cheat sheets;

Calculators are allowed. Verilog Guides are allowed but not needed.

Spring 2015Instructor: Gandhi Puvvada

Friday, 2/13/201509:00 AM - 11:50 AM (2 Hour 50 min. = 170 min)

Location: THH301

Viterbi School of Engineering

University of Southern California

Ques# Topic Page# Time Points Score

1 State Diagram, RTL Design 2-3 40 min. 69

2 MIPS processor ISA 4 20 min. 31

3 Byte-addressable processors 5 25 min. 46

4 Unsigned and Signed numbers 6 20 min. 39

5 CPU Performance 7 20 min. 24

6 Single-Cycle CPU 8-9 25 min. 40

Total 9 150 min. 249

Perfect Score 235

Student’s Last Name: _______________________________________

Student’s First Name: _______________________________________

Student’s DEN Bb username: [email protected]


1 ( 3 + 22 + 44 = 69 points) 40 min. State Diagram and RTL design:

This is similar to your Lab 1 Min/Max finder. Here the array is a 7x4 array containing 7 unsigned 4-bit numbers. The numbers can repeat but we are told that there are at least 3 different numbers. Example: M[0] to M[6]= 4, A, 7, 9, A, 2, 9.

We need to find the HMx (= Highest Maximum = A here in this example), NHMx (= Next Highest Maximum = 9 here in this example and not A), and NNHMx (= Next to Next Highest Maximum = 7 here in this example and not 9).

Since we are writing generic RTL, please use curly parentheses {} if you have multiple statements under an "if" part or an "else if" part or an "else" part.

When the next M[I] is greater than the HMx do you just update HMx, or would you further pass the previous value of HMx to NHMx _______ (Y / N) and perhaps further pass the previous value of NHMx to NNHMx _______ (Y / N).

1.1 Part 1: Here you are given three comparators to compare M[I] with all the three

(HMx,NHMx,NNHMx) simultaneously. You can write and 6+6=12 others.

We do not use the Load state of our Lab #1 here to initialize all the three to M[0] because if M[0] is _____ (F / 0) none of the three will update with any other M[I] anymore. So in the initial state INI, we load them with the worst values in the 4-bit number range as shown below. The three (HMx/NHMx/NNHMx) will be different from each other in DONE state _______ (T/ F)

Please complete the state diagram below.

For the above example array, this design takes _______ clocks in the COM state.

A

9A

79

42

Wro

ng th

ree

Correct three:HMx = ANHMx = 9NNHMx = 7

2+1pts

M[I]>HMx, M[I]=HMx, M[I]<HMxM[I]>HMx, M[I]!=HMx, M[I]<HMx

4+2pts

16pts

INI

DONE

COM

HMx <= 0;NHMx <= 0;NNHMx <= 0;

I <= 0;

ACKACK

START

START

if (M[I] > HMx) HMx <= M[I];

RESETUse appropriate operator> / = / != / < / > / <


1.2 Part 2: Here you are given only one comparator to compare M[I] with one of the three (HMx,NHMx,NNHMx) one at a time. Please complete the state diagram below.

1.2.1 After completing the above design, state the number of clocks spent in the above three COMx states for each of the following two sorted arrays of data (order M[0] to M[6]).

1.2.2 HMx and the RESET: The low-active RESET ______________________ [clears (makes HMx = 0000) / sets (makes HMx = 1111) / initializes it to 2 (makes HMx = 0010) / none of these].

1.2.3 The datapath is designed to deliver to the data register NHMx _____ (state a number) different values namely ___________________ for ______________ (synchronous / asynchronous) deposition.

1.2.4 M.E and A.I rules apply to the conditions associated with the state transition arrows ______________________________ (diverging from a / converging on to) a state.

26pts

INI

DONE

HMx <= 0;NHMx <= 0;NNHMx <= 0;

I <= 0;

ACK

ACK

START

STARTRESETCOM1

if (M[I] > HMx) HMx <= M[I];

COM2

if (M[I] > NHMx) NHMx <= M[I];

COM3

if (M[I] > NNHMx) NNHMx <= M[I];

Use appropriate operator> / = / != / < / > / <

6+3pts

3pts

4pts

2pts


2 ( 6 + 6 + 3 + 4 + 6 + 6 = 31 points) 20 min. MIPs processor ISA

2.1 List 4 different instructions that use the Immediate format consisting of a 16-bit immediate field.Give one example instruction for each of the 4.________________________________________________________________________________________________________________________________________________________

2.2 MIPs have chosen to use fixed length instructions (all of 32-bits in size). This single decision has impacted their ISA substantially and created some limitations (though there were work-arounds for all of them). Site four of such limitations. ________________________________________________________________________________________________________________________________________________________

2.3 Because of the absence of ______________________________________ in MIPs architecture compared to the usual CISC architecture, unlike in a CISC processor (where they have a common ADD and a common SUB instruction for both signed and unsigned numbers), the MIPs processor ISA has two add instructions and two subtract instructions (ADD and SUB with overflow trap for the signed numbers, and the ADDU and SUBU instructions with no trap for overflow for unsigned numbers).

2.4 The JAL instruction of MIPs is much simpler compared to the CALL instruction of most CISC processors. Explain briefly: ______________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________

2.5 The offset in words in a beq (branch if equal instruction) is specified with respect to the incremented PC. What is the reason to specify the offset with respect to the incremented PC (and why not the unincremented PC)? _____________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________

2.6 Consider a CISC instruction such as an "add" instruction with two source memory operands and one destination memory operand. For example consider add 300($3), 200($2), 100($1); Ignore the fact that this instruction is lengthy and does not fit into the 32-bit size. Between the two, the multi-cycle CPU and the single-cycle CPU, in which is it easy to implement such a CISC instruction? Explain based on execution complexity rather than the instruction fetching difficulty. ________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________

6pts

6pts

3pts

4pts

6pts

6pts


3 ( 22 + 4 + 6 + 14 = 46 points) 25 min.

3.1 ___________________________ (Fortunately / Unfortunately) Byte-addressability ___________________ (increases / decreases) the processors address space. ___________________________ (Fortunately / Unfortunately) Byte-addressability makes text-processing ____________ (better/difficult). A 64-bit data 32-bit logical address processor such as the Intel i860 needs _________ (more / less) than 32 pins to convey ________ (address/data) to outside world. It uses _____ encoded address pins labeled as A31 to ______ and ______ byte-enables labeled as ____ to /BE0. These byte-enables replace the logical address lines ____ to A0.The i860 address space in Giga Bytes is ____________ GB. The highest addressed 64-bit long word in the i860 address space is __________________ (specify in hexadecimal notation).

If the main memory of i860 is built with byte-wide memory chips, they have to organized in ____ (state a number) byte-wide banks at minimum in a ______________ (Lower-order / Higher-order) interleaved fashion to facilitate fetching of any byte or any _________ or any __________ or any _________.

In such an arrangement of memory with the minimum number of byte-wide banks, the address pin A15 of the i860 processor is connected to the address pin _______ of a byte-wide static RAM chip. We noted that there is a shift in address (shift in connecting address lines to address pins) of _____ bits.

The byte accesses with-in a long word ______________ (shall be / shall not be) mutually exclusive. The long-word addresses are ___________ (1-apart / 2-apart / 4-apart / 8-apart / 16-apart / 32-apart / 64-apart).

3.2 You _________ (do / do not) talk about a byte-addressable 8-bit processor because _________________________________________________________________________________________________________________________________________________________________

3.3 In decimal system, if we are asked to state a natural range of one thousand containing 46321, we come up with 46000 to 46999. Note that ranges of thousand such as 46020 to 47019 are unnatural. Similarly in a 20-bit binary address system, state a 32K natural range containing the 20-bit address 1011_0110_1011_0010_1010: _____________________________________________.State the above range in hexadecimal: _________________________How big is this range 2C000 Hex - 2DFFF Hex? _________KB. Is it a natural binary range? Y / N

3.4 Shown on the side is the memory interface to a byte-wide SRAM memory chip in a memory system based on minimum number of byte-wide banks for an i860 processor.

Fill-in the 3-blanks (marked by the 3 arrows) in the figure on the side. Also find the system addresses corresponding to the lowest-addressed two bytes of this memory chip. The lowest-addressed two bytes of this chip map to the system byte addresses (in hex) _______________________________ _________________________________________________.

The system addresses mapping to any location in this memory chip will have the same upper ________ (state a number) bitsnamely ______________ (state their labels in the form X[13:2]).

22pts

4pts

6pts

A31A30A29A28

A19

A27A26A25A24

A23A22A21A20

CS

WERD

A[ ]D[7:0]

D[ ]

A[18:3]

BE6

____KB

14pts


4 ( 6 + 10 + 23 = 39 points) 20 min.

Unsigned numbers and Signed numbers represented in 2’s complement system

4.1 To check if a 4-bit X is equal to a 4-bit Y, we perform (X-Y) and check to see if the result produced is all zeros. Further observation in this context. ________ (A/B/C/D/E) (i.e. read the 5 statements below and select one)(A) This procedure is good if we perform unsigned subtraction(B) This procedure is good if we perform signed subtraction(C) since the subtraction procedure is one and the same namely (X-Y) is performed by (X+Y’+1)whether X and Y are signed numbers or unsigned numbers, we need to check V or C4 also. If V is a zero, we confirm that the signed subtraction result of zero is right and conclude that the signed X and the signed Y are equal. If the raw C4 is a 1, then we confirm that the result of zero is correct and conclude that the unsigned X and the unsigned Y are equal.(D) Irrespective of whether we treat X and Y as unsigned or signed numbers, when they are equal, our adder/subtracter will definitely produce all zeros as result when we perform subtraction and also make V a zero and make the raw C4 a 1 validating the subtraction result and concluding that the X and Y are equal whether they are treated as unsigned numbers or signed numbers.(E) none of the above (i.e. none of A, B, C, D)

4.2 To decrement a 4-bit number X (i.e to produce (X-1)), we perform (X+one’s complement of 1+1), which is (X+1110+1), which is (X+1111). Yes, this makes sense because all 1’s is a minus 1 in 2’s complement signed number representation system. Can we use this procedure for unsigned number X also? ____ Y / N. If Yes, what is the unsigned decrementation overflow indicator? _________ (V = 1 / V = 0 / Raw C4 = 1 / Raw C4 = 0 / none of these).For signed number decrementation what is the overflow indicator? _________ (V = 1 / V = 0 / Raw C4 = 1 / Raw C4 = 0 / none of these).

Irrespective of whether the above procedure of performing (X+1111) for decrementation is right or wrong, which values of X (out of the 16 possible 4-bit values) should cause overflow when decrementing a 4-bit signed X represented in 2’s complement number system? ___________ And which values of X (out of the 16 possible 4-bit values) should cause overflow when decrementing a 4-bit unsigned X? ___________

4.3 A 4-bit decrementer is shown below on the left to decrement a 4-bit number X (=X3X2X1X0) and produce the decremented value Y (=Y3Y2Y1Y0). It is good to perform Y=X-1 for ______________________________________________ (unsigned numbers only / signed numbers only / both unsigned and signed number numbers).Add 1 or 2 overflow detection(s) for the choice(s) made by you above. Label them UO for unsigned overflow (i.e. UO =1 means unsigned overflow has occurred) and SO for signed overflow (i.e SO =1 means signed overflow has occurred).

pts6

10pts

3pts

8pts

12pts

X3 X2 X1 X0

Y3 Y2 Y1 Y0

The left-side design is placed in a box below for you to use itin building B = 2A-4 where B is a 6-bit signed number and

A is 4-bit signed number. Produce SO for this operation.Hint: Some of the B bits are produced outside the box.

X 3 X 2 X 1 X 0

Y 3 Y 2 Y 1 Y 0


5 ( 14 + 10 = 24 points) 20 min.

Performance

5.1 Irony of MIPs: 1. XYZ reported that when he improved his compiler, ET (execution time) went down ( ), but

MIPs also went down ( ). Is this possible or not possible? _______________Explain using the words native or relative MIPs. ____________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________

2. ABC reported that when he improved his ISA by adding a very useful instruction, ET (execution time) went down ( ), but MIPs also went down ( ). Is this possible or not possible? ____________. Explain using the words native or relative MIPs. ________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________3. State the relation between MIPs and a subset of the three items IC, CPI, and CP (whose product produces ET) and state which of these three might have changed to cause MIPs to fall in each of the above two cases. ____________________________________________________________ ________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________

5.2 In a processor ISA, we have multiply and divide instructions, and several other instructions. Currently multiply instructions take 10 clocks where as the divide instructions take 20 clocks. In the dynamic execution trace of the bench mark, the frequency of occurrence of the multiply instruction is 20% whereas the frequency of occurrence of the divide instruction is only 10%. The hardware architecture team came up with a proposal to reduce the clocks taken by the multiply instruction by 5 clocks if you agree to allow an increase of clocks taken by the divide instruction by 5 clocks. Do we have enough data to conclude if this is an advantageous change? Yes / NoDo we have enough information to calculate the speedup? Yes / NoExplain your both your answers and also arrive at quantitative data when enough information is available. ________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________

4pts

4pts

6pts

10pts


6 ( 16 + 24 = 40 points) 25 min. Single-cycle CPU:

You are familiar with the ordinary jump instruction j (with the 26-bit jump address field) and the indirect jump jr $rs (where the jump address is the content of $rs). A new unconditional jump indirect memory (jim) instruction is proposed where the jump address is obtained by reading the contents of a memory location whose address is obtained by adding the contents of $rs to the 16-bit offset field. The jim and the lw instructions are very similar: they are identical up to reading the memory locations content. lw takes the content and deposits in $rt where as jim uses the content as jump address to jump to. jim offset($rs); (PC) <= Memory[sign-extended offset + ($rs)];

The data path on the next page supports only the ordinary jump. Let us propose two new control signals called jr and jim to control a pair of muxes intercepting the path towards PC to inject the $rs data or memory data as the jump address. Two rows and two columns are added to the control signal table. Since jim is very much similar to lw, you should be able to copy most of lw control signals into the jim’s row.

6.1 Modify the single-cycle CPU block diagram on the next page to add the intercepting mux pair and their connections and labels.

6.2 Control Signal Table: Complete the three rows and three columns. Whenever possible, use don’t cares.

Inst

ruct

ion

Reg

Dst

AL

US

rc

Mem

tore

g

Reg

Wri

te

Mem

Rea

d

Mem

Wri

te

Bra

nch

AL

UO

p1

AL

Uop

0

jum

p

jr jim

R-format 1 0 0 1 0 0 0 1 0lw 0 1 1 1 1 0 0 0 0sw X 1 X 0 0 1 0 0 0beq X 0 X 0 0 0 1 0 1jjrjim

24pts

Blank area for rough work:

It is not difficult to get an A in EE457. You need to work for it and seek help from the 457 teaching team on whatever you do not understand. We are eager to help you. The next three topics, pipelined CPU, cache and virtual memory are interesting and challenging too. They are the focus of the midterm exam. Then we cover advanced topics. Best! Gandhi, TAs: Jizhe, Sanmukh, Mentors: Madhusudhan, Minnu, HW Graders: Bhuvana, Yixian, Lab graders: Jun, Arnav


16pts

Sin

gle-

Cyc

le C

PU

Blo

ck d

iagr

am

EE457 Quiz (~10%)

Documents

Transcript of EE457 Quiz (~10%)