CS6461 – Computer Architecture Fall 2015 Instructor Morris Lancaster Adapted from Professor...

28
CS6461 – Computer Architecture Fall 2015 Instructor Morris Lancaster Adapted from Professor Stephen Kaisler’s Slides . Lecture 2 - Basic System Design

Transcript of CS6461 – Computer Architecture Fall 2015 Instructor Morris Lancaster Adapted from Professor...

Page 1: CS6461 – Computer Architecture Fall 2015 Instructor Morris Lancaster Adapted from Professor Stephen Kaisler’s Slides. Lecture 2 - Basic System Design.

CS6461 – Computer ArchitectureFall 2015

Instructor Morris LancasterAdapted from Professor Stephen Kaisler’s Slides

.

Lecture 2 - Basic System Design

Page 2: CS6461 – Computer Architecture Fall 2015 Instructor Morris Lancaster Adapted from Professor Stephen Kaisler’s Slides. Lecture 2 - Basic System Design.

04/19/23 CS6461 Computer Architecture - 2014Dept. of Computer Science

204/19/23 2

Hierarchical System Architecture

Page 3: CS6461 – Computer Architecture Fall 2015 Instructor Morris Lancaster Adapted from Professor Stephen Kaisler’s Slides. Lecture 2 - Basic System Design.

04/19/23 CS6461 Computer Architecture - 2014Dept. of Computer Science

304/19/23 3

Technology Trends

• Processor– logic capacity: 2 x increase in performance every 1.5 - 2 years; – clock rate: about 25% per year– overall performance: 1000 x in last decade

• Main Memory– DRAM capacity: 2 x every 2 years; 1000 x size in last decade– memory speed: about 10% per year– cost / bit: improves about 25% per year

• Disk– capacity: > 2 x increase in capacity every 1.5 years– cost / bit: improves about 60% per year– 120 x capacity in last decade– Disk architecture not much different than IBM’s 10 MByte disks of the

early 1980s

• Network Bandwidth– Bandwidth: 1 Gbit/s standard to the desktop in many places– Bandwidth: Probably 1 Tbit/s b end of decade, but may require new

infrastructure

Page 4: CS6461 – Computer Architecture Fall 2015 Instructor Morris Lancaster Adapted from Professor Stephen Kaisler’s Slides. Lecture 2 - Basic System Design.

04/19/23 CS6461 Computer Architecture - 2014Dept. of Computer Science

404/19/23 4

Intel Processor Evolution

Page 5: CS6461 – Computer Architecture Fall 2015 Instructor Morris Lancaster Adapted from Professor Stephen Kaisler’s Slides. Lecture 2 - Basic System Design.

04/19/23 CS6461 Computer Architecture - 2014Dept. of Computer Science

504/19/23 5

Processor Clock Speed

Page 6: CS6461 – Computer Architecture Fall 2015 Instructor Morris Lancaster Adapted from Professor Stephen Kaisler’s Slides. Lecture 2 - Basic System Design.

04/19/23 CS6461 Computer Architecture - 2014Dept. of Computer Science

604/19/23 6

Cost Per GFLOP

Page 7: CS6461 – Computer Architecture Fall 2015 Instructor Morris Lancaster Adapted from Professor Stephen Kaisler’s Slides. Lecture 2 - Basic System Design.

04/19/23 CS6461 Computer Architecture - 2014Dept. of Computer Science

704/19/23 7

# Servers Comprising WWW

Page 8: CS6461 – Computer Architecture Fall 2015 Instructor Morris Lancaster Adapted from Professor Stephen Kaisler’s Slides. Lecture 2 - Basic System Design.

04/19/23 CS6461 Computer Architecture - 2014Dept. of Computer Science

8

Technology Progress

04/19/23 8

Growth factors:

• Transistors/chip:

>100,000 since 1971

• Disk density:

• >100,000,000 since 1956

• Disk speed:

12.5 since 1956

The disk speed barrier dominates everything!

0%

5%

10%

15%

20%

25%

30%

35%

40%

45%

Compound Annual Growth Rate

Transistors/Chipssince 1971

Disk Density since 1956

Disk Speed since 1956

Page 9: CS6461 – Computer Architecture Fall 2015 Instructor Morris Lancaster Adapted from Professor Stephen Kaisler’s Slides. Lecture 2 - Basic System Design.

04/19/23 CS6461 Computer Architecture - 2014Dept. of Computer Science

9

The “1,000,000:1” disk-speed barrier

• RAM access times ~5-7.5 nanoseconds– CPU clock speed <1 nanosecond– Interprocessor communication can be ~1,000X slower than

on-chip

• Disk seek times ~2.5-3 milliseconds– Limit = ½ rotation – i.e., 1/30,000 minutes – i.e., 1/500 seconds = 2 ms

Tiering brings it closer to ~1,000:1 in practice, but even so the difference is VERY BIG

04/19/23 9

Page 10: CS6461 – Computer Architecture Fall 2015 Instructor Morris Lancaster Adapted from Professor Stephen Kaisler’s Slides. Lecture 2 - Basic System Design.

04/19/23 CS6461 Computer Architecture - 2014Dept. of Computer Science

1004/19/23 10

State of the Art

• State-of-the-art PC (on your desk) now:– Processor clock speed: ~4 GigaHertz– Memory capacity: 2 to 8 GigaBytes (Windows 7 limits to 8

GBytes; Windows 8 limits to 128 GBytes on x64 )– Disk capacity: 1 TByte for <$79; 2 TBytes for <$129 –

Wow!!– In five years, we will need new units!

• Mega -> Giga -> Tera -> Peta -> Exa (Big Data!)

Page 11: CS6461 – Computer Architecture Fall 2015 Instructor Morris Lancaster Adapted from Professor Stephen Kaisler’s Slides. Lecture 2 - Basic System Design.

04/19/23 CS6461 Computer Architecture - 2014Dept. of Computer Science

1104/19/23 11

Intel 4004 Die Photo

• (2250 transistors, 12 mm2, 108 KHz, 1970)

Page 12: CS6461 – Computer Architecture Fall 2015 Instructor Morris Lancaster Adapted from Professor Stephen Kaisler’s Slides. Lecture 2 - Basic System Design.

04/19/23 CS6461 Computer Architecture - 2014Dept. of Computer Science

1204/19/23 12

Intel 80486 Die Photo

• (1,200,000 transistors, 81 mm2, 25 MHz, 1989)

Page 13: CS6461 – Computer Architecture Fall 2015 Instructor Morris Lancaster Adapted from Professor Stephen Kaisler’s Slides. Lecture 2 - Basic System Design.

04/19/23 CS6461 Computer Architecture - 2014Dept. of Computer Science

1304/19/23 13

Pentium Die Photo

• (3,100,000 transistors; 296 mm2; 60 MHz, 1993)

Page 14: CS6461 – Computer Architecture Fall 2015 Instructor Morris Lancaster Adapted from Professor Stephen Kaisler’s Slides. Lecture 2 - Basic System Design.

04/19/23 CS6461 Computer Architecture - 2014Dept. of Computer Science

1404/19/23 14

I/O System Side

Each bus and adapter has its own specifications.•Interfaces are where the problems are - between functional units and between the computer and the outside world•Need to design against constraints of performance, power, area and cost

Page 15: CS6461 – Computer Architecture Fall 2015 Instructor Morris Lancaster Adapted from Professor Stephen Kaisler’s Slides. Lecture 2 - Basic System Design.

04/19/23 CS6461 Computer Architecture - 2014Dept. of Computer Science

1504/19/23 15

Issues

• Performance:– the key to computing for most intensive problems– what’s the secret? TIME, TIME, TIME

analogy to Real Estate: Location, Location, Location• Response Time:

– How long does it take for my job/program to run?– How long does it take to execute my job/program?

[NOTE: These are not equivalent. Why not?] – How long must I wait for a database query?

• Throughput:– How many jobs can the machine run at once?– What is the average execution rate?– How much work is getting done?– How long does it take to handle an interrupt?

• Execution Times:– Elapsed Time: counts everything, disk and memory accesses, I/O waits, etc.

Sometimes, a useful number, but not good for comparison purposes– CPU Time: counts instruction execution times, but not I/O time; basis for

MIPS/MFLOPS; often divided into system time and user time• Q? What are MIPS and MFLOPS good measures of, if anything?

Page 16: CS6461 – Computer Architecture Fall 2015 Instructor Morris Lancaster Adapted from Professor Stephen Kaisler’s Slides. Lecture 2 - Basic System Design.

04/19/23 CS6461 Computer Architecture - 2014Dept. of Computer Science

1604/19/23 16

Let’s start to design the machine for the CS211 CISC Computer!

Reset

I nitializeMachine

Register-to-Register

BranchNot Taken

Branch Taken

I nit

FetchI nstr.

XEQI nstr.

Load/StoreBranch

I ncr.PC

Page 17: CS6461 – Computer Architecture Fall 2015 Instructor Morris Lancaster Adapted from Professor Stephen Kaisler’s Slides. Lecture 2 - Basic System Design.

04/19/23 CS6461 Computer Architecture - 2014Dept. of Computer Science

1704/19/23 17

Analyze LDR/STR Instructions

• From our analysis of LDR/LDA/STR instructions, what do we know?– Memory Address Register (MAR)– Memory Buffer Register (MBR)– Program Counter (PC)– 4 GPRs (given)– Instruction Register (IR)– Register Select Register (RSR)– Instruction Operation Register (Opcode)

Page 18: CS6461 – Computer Architecture Fall 2015 Instructor Morris Lancaster Adapted from Professor Stephen Kaisler’s Slides. Lecture 2 - Basic System Design.

04/19/23 CS6461 Computer Architecture - 2014Dept. of Computer Science

1804/19/23 18

How do these hook together?

Memory

MBRMAR

R0

R1

R2

R3

PC

RFI

IR

ALU

OpCode

Carry

Condition Codes

How manyRegisters do I need to access RF?See Mul/Div instructions

X1

X2

X3

How do IHook in theIndex Registers?

Page 19: CS6461 – Computer Architecture Fall 2015 Instructor Morris Lancaster Adapted from Professor Stephen Kaisler’s Slides. Lecture 2 - Basic System Design.

04/19/23 CS6461 Computer Architecture - 2014Dept. of Computer Science

1904/19/23 19

Execution Structure

ArithmeticUnit

Shifter

Data1 Data2Carry

Carry

ARR

Data1

ALUControl

Logical Unit

Data1 Data2

LRR SRR

Opcode

R0

R1

R2

R3

MUXMBR

ALU-Result

PC

IR

MUX

Count

ArithmeticUnit

Shifter

Data1 Data2Carry

Carry

ARR

Data1

ALUControl

Logical Unit

Data1 Data2

LRR SRR

Opcode

R0

R1

R2

R3

MUXMBR

ALU-Result

PC

IR

MUX

Count

xRR = result registers, hold result of operation for store on next cycle

Page 20: CS6461 – Computer Architecture Fall 2015 Instructor Morris Lancaster Adapted from Professor Stephen Kaisler’s Slides. Lecture 2 - Basic System Design.

04/19/23 CS6461 Computer Architecture - 2014Dept. of Computer Science

2004/19/23 20

Comments on Multiplexors

• Both the arithmetic unit and the logic unit are “active” and produce outputs.– The mux determines whether the final result comes from the

arithmetic or logic unit.– The output of the other one is effectively ignored.

• Our hardware scheme may seem like wasted effort, but it’s not really.– “Deactivating” one or the other wouldn’t save that much time.– We have to build hardware for both units anyway, so we might

as well run them together.

• This is a very common use of multiplexers in logic design.

Page 21: CS6461 – Computer Architecture Fall 2015 Instructor Morris Lancaster Adapted from Professor Stephen Kaisler’s Slides. Lecture 2 - Basic System Design.

04/19/23 CS6461 Computer Architecture - 2014Dept. of Computer Science

2104/19/23 21

Shifter

• A shifter is most useful for arithmetic operations since shifting is equivalent to multiplication by powers of two. – Shifting is necessary, for example, during floating point operation

arithmetic.

• The simplest shifter is the shift register, which can shift by one position per clock cycle.

• So, the number of shifts equals the number of clock cycles consumed.

• Barrel shifter allows rotations as well

Page 22: CS6461 – Computer Architecture Fall 2015 Instructor Morris Lancaster Adapted from Professor Stephen Kaisler’s Slides. Lecture 2 - Basic System Design.

04/19/23 CS6461 Computer Architecture - 2014Dept. of Computer Science

2204/19/23 22

Adder

• The adder is probably the most studied digital circuit.– There are a great many ways to perform binary addition, each

with its own area/delay trade-offs.– Adder delay is dominated by carry chain.

• Full Adder:– Computes one-bit sum, carry:

– si = ai XOR bi XOR ci

– ci+1 = aibi + aici + bici

Page 23: CS6461 – Computer Architecture Fall 2015 Instructor Morris Lancaster Adapted from Professor Stephen Kaisler’s Slides. Lecture 2 - Basic System Design.

04/19/23 CS6461 Computer Architecture - 2014Dept. of Computer Science

2304/19/23 23

Instruction Path

• Program Counter (PC)– Keeps track of program execution– Address of next instruction to read from memory– May have auto-increment feature or use ALU

• Instruction Register (IR)– Current instruction– Includes ALU operation and address of operand– Also holds target of jump instruction– Immediate operands

• Relationship to Data Path– PC may be incremented through ALU or separate adder– Contents of IR may also be required as input to ALU

Page 24: CS6461 – Computer Architecture Fall 2015 Instructor Morris Lancaster Adapted from Professor Stephen Kaisler’s Slides. Lecture 2 - Basic System Design.

04/19/23 CS6461 Computer Architecture - 2014Dept. of Computer Science

2404/19/23 24

Questions?

• How will you do Scalar Integer Multiply/Divide?– Just use the Java operators, but must be sure to do it only on 18

bits– Think about using an Integer subclass with just 18 bits?

• There is no negating instruction. How will you compute the negative of a number?

• Should you use the Adder to increment the PC or just provide a separate adder circuit.

• How will you detect overflow/underflow when doing adding/subtracting?

Page 25: CS6461 – Computer Architecture Fall 2015 Instructor Morris Lancaster Adapted from Professor Stephen Kaisler’s Slides. Lecture 2 - Basic System Design.

04/19/23 CS6461 Computer Architecture - 2014Dept. of Computer Science

2504/19/23 25

Simple Procedure Calls

• Using a procedure involves the following sequence of actions:

1. Put arguments in places known to procedure (registers)

2. Transfer control to procedure, saving the return address (JSR)

3. Acquire storage space, if required, for use by the procedure

4. Perform the desired task

5. Put results in places known to calling program (registers or elsewhere)

6. Return control to calling point (RFS)

Page 26: CS6461 – Computer Architecture Fall 2015 Instructor Morris Lancaster Adapted from Professor Stephen Kaisler’s Slides. Lecture 2 - Basic System Design.

04/19/23 CS6461 Computer Architecture - 2014Dept. of Computer Science

2604/19/23 26

Simple Procedure Calls

Page 27: CS6461 – Computer Architecture Fall 2015 Instructor Morris Lancaster Adapted from Professor Stephen Kaisler’s Slides. Lecture 2 - Basic System Design.

04/19/23 CS6461 Computer Architecture - 2014Dept. of Computer Science

2704/19/23 27

Example: Finding the absolute value of an integer

jsr abs ; assume integer in r0…. ; instruction after subroutine call…

absstr r0,0,<tempInt> ; store r0 in <tempInt>, some locationldr r1,0,smask ; mask for sign bit = 100 000 000 000 000 000and r1,r0 ; AND r1 and r0: if r0 bit is set it will be set in r1jz r1,0,pos ; test if sign = 0, e.g., r0 bit 0 is 0src r0,1,1,1 ; shift r0 logical left 1 bitsrc r0,1,0,1 ; shift r0 logical right – sets sign bit to 0

posrfs 1 ; return with 1 => true and r0 has absolute integer…

Page 28: CS6461 – Computer Architecture Fall 2015 Instructor Morris Lancaster Adapted from Professor Stephen Kaisler’s Slides. Lecture 2 - Basic System Design.

04/19/23 CS6461 Computer Architecture - 2014Dept. of Computer Science

2804/19/23 28

Soooo!

• Convoluted?? Yes!• Why??

1. No jump less than or greater than instructions!

2. Did we really need them or were they a matter of convenience? E.g., how many instructions did we save by not having them?

3. Implicit use of r3