7/30/2019 f37 Book Intarch Pres Pt1
1/91
Jan. 2011 Computer Architecture, Background and Motivation Slide 1
Part IBackground and Motivation
7/30/2019 f37 Book Intarch Pres Pt1
2/91
Jan. 2011 Computer Architecture, Background and Motivation Slide 2
About This Presentation
This presentation is intended to support the use of the textbookComputerArchitecture:FromMicroprocessorstoSupercomputers,Oxford University Press, 2005, ISBN 0-19-515455-X. It is updatedregularly by the author as part of his teaching of the upper-divisioncourse ECE 154, Introduction to Computer Architecture, at the
University of California, Santa Barbara. Instructors can use theseslides freely in classroom teaching and for other educationalpurposes. Any other use is strictly prohibited. BehroozParhami
Edition Released Revised Revised Revised Revised
First June 2003 July 2004 June 2005 Mar. 2006 Jan. 2007
Jan. 2008 Jan. 2009 Jan. 2011
Second
7/30/2019 f37 Book Intarch Pres Pt1
3/91
Jan. 2011 Computer Architecture, Background and Motivation Slide 3
I Background and Motivation
Topics in This PartChapter 1 Combinational Digital CircuitsChapter 2 Digital Circuits with MemoryChapter 3 Computer System TechnologyChapter 4 Computer Performance
Provide motivation, paint the big picture, introduce tools: Review components used in building digital circuits Present an overview of computer technology Understand the meaning of computer performance
(or why a 2 GHz processor isnt 2 as fast as a 1 GHz model)
7/30/2019 f37 Book Intarch Pres Pt1
4/91
Jan. 2011 Computer Architecture, Background and Motivation Slide 4
1 Combinational Digital Circuits
First of two chapters containing a review of digital design: Combinational, or memoryless, circuits in Chapter 1 Sequential circuits, with memory, in Chapter 2
Topics in This Chapter1.1 Signals, Logic Operators, and Gates1.2 Boolean Functions and Expressions1.3 Designing Gate Networks1.4 Useful Combinational Parts1.5 Programmable Combinational Parts1.6 Timing and Circuit Considerations
7/30/2019 f37 Book Intarch Pres Pt1
5/91
Jan. 2011 Computer Architecture, Background and Motivation Slide 5
1.1 Signals, Logic Operators, and Gates
Figure 1.1 Some basic elements of digital logic circuits, withoperator signs used in this book highlighted.
x y
ANDName XORORNOTGraphicalsymbol
x y
Operatorsign andalternate(s)
x yx yxy
x y
x
xorx_
x
y or xyArithmeticexpression
x
y
2xyx
y
xy1x
Outputis 1 iff:
Input is 0Both inputs
are 1sAt least one
input is 1Inputs arenot equal
7/30/2019 f37 Book Intarch Pres Pt1
6/91
Jan. 2011 Computer Architecture, Background and Motivation Slide 6
The Arithmetic Substitution Method
z
= 1z NOT converted to arithmetic formxy AND same as multiplication(when doing the algebra, set zk= z)
xy=x+ yxy OR converted to arithmetic formxy=x+ y 2xy XOR converted to arithmetic form
Example: Prove the identity xyz x y z?1
LHS = [xyz x] [y z]
= [xyz+ 1x (1x)xyz] [1y+ 1z (1y)(1z)]
= [xyz+ 1x] [1 yz]
= (xyz+ 1x) + (1 yz) (xyz+ 1x)(1 yz)
= 1 +xy2z2xyz
= 1 = RHS This is addition,not logical OR
7/30/2019 f37 Book Intarch Pres Pt1
7/91
Jan. 2011 Computer Architecture, Background and Motivation Slide 7
Variations in Gate Symbols
Figure 1.2 Gates with more than two inputs and/or with
inverted signals at input or output.
OR NORNANDAND XNOR
7/30/2019 f37 Book Intarch Pres Pt1
8/91
Jan. 2011 Computer Architecture, Background and Motivation Slide 8
Gates as Control Elements
Figure 1.3 An AND gate and a tristate buffer act as controlled switchesor valves. An inverting buffer is logically the same as a NOT gate.
Enable/Pass signale
Data inx
Data outxor 0
Data inx
Enable/Pass signale
Data outxor high impedance
(a) AND gate for controlled transfer (b) Tristate buffer
(c) Model for AND switch.
x
e
No dataorx
0
1x
e
ex
0
1
0
(d) Model for tristate buffer.
7/30/2019 f37 Book Intarch Pres Pt1
9/91
Jan. 2011 Computer Architecture, Background and Motivation Slide 9
Wired OR and Bus Connections
Figure 1.4 Wired OR allows tying together of severalcontrolled signals.
e
e
e Data out(x, y, z,
or highimpedance)
(b) Wired OR of tristate outputs
e
e
e
Data out
(x, y, z, or 0)
(a) Wired OR of product terms
z
x
y
z
x
y
z
x
y
z
x
y
7/30/2019 f37 Book Intarch Pres Pt1
10/91
Jan. 2011 Computer Architecture, Background and Motivation Slide 10
Control/Data Signals and Signal Bundles
Figure 1.5 Arrays of logic gates represented by a single gate symbol.
/8
/
8
/
8
Compl
/
32
/
k
/32
Enable
/
k
/k/k
(b) 32 AND gates (c) k XOR gates(a) 8 NOR gates
7/30/2019 f37 Book Intarch Pres Pt1
11/91
Jan. 2011 Computer Architecture, Background and Motivation Slide 11
1.2 Boolean Functions and Expressions
Ways of specifying a logic function
Truth table: 2nrow, dont-care in input or output
Logic expression: w(xyz), product-of-sums,sum-of-products, equivalent expressions
Word statement: Alarm will sound if the dooris opened while the security system is engaged,
or when the smoke detector is triggered
Logic circuit diagram: Synthesis vs analysis
7/30/2019 f37 Book Intarch Pres Pt1
12/91
Jan. 2011 Computer Architecture, Background and Motivation Slide 12
Table 1.2 Laws (basic identities) of Boolean algebra.
Name of law OR version AND version
Identity x 0 =x x1 =x
One/Zero x 1 = 1 x0 = 0
Idempotent xx= x x x=x
Inverse xx = 1 x x = 0
Commutative xy= yx xy= yx
Associative (xy) z=x (yz) (xy) z=x(yz)Distributive x (y z) = (xy) (xz) x(yz) = (xy) (xz)
DeMorgans (xy) =xy (x y) =xy
Manipulating Logic Expressions
7/30/2019 f37 Book Intarch Pres Pt1
13/91
Jan. 2011 Computer Architecture, Background and Motivation Slide 13
Proving the Equivalence of Logic Expressions
Example 1.1 Truth-table method: Exhaustive verification
Arithmetic substitution
x
y=x+ yxyxy=x+ y 2xy
Case analysis: two cases,x= 0 orx= 1
Logic expression manipulation
Example: xy?xyxyx+ y2xy?(1x)y + x(1y)(1x)yx(1y)
7/30/2019 f37 Book Intarch Pres Pt1
14/91
Jan. 2011 Computer Architecture, Background and Motivation Slide 14
1.3 Designing Gate Networks
AND-OR, NAND-NAND, OR-AND, NOR-NOR
Logic optimization: cost, speed, power dissipation
(a) AND-OR circuit
z
x
y
x
y
z
(b) Intermediate circuit (c) NAND-NAND equi valent
z
x
y
x
y
z
z
x
y
x
y
z
Figure 1.6 A two-level AND-OR circuit and two equivalent circuits.
(abc) = abc
7/30/2019 f37 Book Intarch Pres Pt1
15/91
Jan. 2011 Computer Architecture, Background and Motivation Slide 15
Seven-Segment Display of Decimal Digits
Figure 1.7 Seven-segment displayof decimal digits. The three opensegments may be optionally used.The digit 1 can be displayed in twoways, with the more common right-side version shown.
Optional segment
7/30/2019 f37 Book Intarch Pres Pt1
16/91
Jan. 2011 Computer Architecture, Background and Motivation Slide 16
BCD-to-Seven-Segment Decoder
Example 1.2
Figure 1.8 The logic circuit that generates the enable signal for thelowermost segment (number 3) in a seven-segment display unit.
x3
x2
x1
x0
Signals toenable orturn on thesegments
4-bit input in [0, 9]e
0
e5
e6
e4
e2
e1
e3
1
24
5
0
3
6
7/30/2019 f37 Book Intarch Pres Pt1
17/91
Jan. 2011 Computer Architecture, Background and Motivation Slide 17
1.4 Useful Combinational Parts
High-level building blocks
Much like prefab parts used in building a house
Arithmetic components (adders, multipliers, ALUs)will be covered in Part III
Here we cover three useful parts:multiplexers, decoders/demultiplexers, encoders
7/30/2019 f37 Book Intarch Pres Pt1
18/91
Jan. 2011 Computer Architecture, Background and Motivation Slide 18
Multiplexers
Figure 1.9 Multiplexer (mux), or selector, allows one of several inputsto be selected and routed to output depending on the binary value of a
set of selection or address signals provided to it.
x
x
y
z
1
0x
x
z
y
x
x
y
z
1
0
y
/32
/32
/32 1
01
0
3
2
z
y1 0
1
0
1
0
y1
y0
y0
(a) 2-to-1 mux (b) Switch view (c) Mux symbol
(d) Mux array (e) 4-to-1 mux with enable (e) 4-to-1 mux design
0
1
y
1 1 1
0
00
x
x
x
x
10
2
3
x
x
x
x
0
1
2
3
z
e (Enable)
1 0
x2
7/30/2019 f37 Book Intarch Pres Pt1
19/91
Computer Architecture, Background and Motivation Slide 19
Decoders/Demultiplexers
Figure 1.10 A decoder allows the selection of one of 2a options usingan a-bit address as input. A demultiplexer (demux) is a decoder that
only selects an output if its enable signal is asserted.
y1
y0
x0
x3
x2
x1
1
0
3
2
y1
y0
x0
x3
x2
x1e1
0
3
2
y1
y0
x0
x3
x
2
x1
(a) 2-to-4 decoder (b) Decoder symbol(c) Demultiplexer, ordecoder with enable
(Enable)1
1 0
1
1 0
11
1
Jan. 2011
7/30/2019 f37 Book Intarch Pres Pt1
20/91
Jan. 2011 Computer Architecture, Background and Motivation Slide 20
Encoders
Figure 1.11 A 2a-to-a encoder outputs an a-bit binary number
equal to the index of the single 1 among its 2a inputs.
(a) 4-to-2 encoder (b) Encoder symbol
x0
x3
x2
x1
y1
y0
1
0
3
2
x0
x3
x2
x1
y1
y0
1 0
1
0
0
0
7/30/2019 f37 Book Intarch Pres Pt1
21/91
Jan. 2011 Computer Architecture, Background and Motivation Slide 21
1.5 Programmable Combinational Parts
Programmable ROM (PROM)
Programmable array logic (PAL)
Programmable logic array (PLA)
A programmable combinational part can do the job ofmany gates or gate networks
Programmed by cutting existing connections (fuses)
or establishing new connections (antifuses)
7/30/2019 f37 Book Intarch Pres Pt1
22/91
Jan. 2011 Computer Architecture, Background and Motivation Slide 22
PROMs
Figure 1.12 Programmable connections and their use in a PROM.
. . .
.
.
.
Inputs
Outputs
(a) ProgrammableOR gates
w
x
y
z
(b) Logic equivalentof part a
w
x
y
z
(c) Programmable read-onlymemory (PROM)
Decoder
7/30/2019 f37 Book Intarch Pres Pt1
23/91
Jan. 2011 Computer Architecture, Background and Motivation Slide 23
PALs and PLAs
Figure 1.13 Programmable combinational logic: general structure andtwo classes known as PAL and PLA devices. Not shown is PROM with
fixed AND array (a decoder) and programmable OR array.
ANDarray(ANDplane)
ORarray(OR
plane)
. . .
. . .
.
.
.
Inputs
Outputs
(a) General programmablecombinational logic
(b) PAL: programmableAND array, fixed OR array
8-inputANDs
(c) PLA: programmableAND and OR arrays
6-inputANDs
4-inputORs
7/30/2019 f37 Book Intarch Pres Pt1
24/91
Jan. 2011 Computer Architecture, Background and Motivation Slide 24
1.6 Timing and Circuit Considerations
Gate delay d: a fraction of, to a few, nanoseconds
Wire delay, previously negligible, is now important(electronic signals travel about 15 cm per ns)
Circuit simulation to verify function and timing
Changes in gate/circuit output, triggered by changes in itsinputs, are not instantaneous
7/30/2019 f37 Book Intarch Pres Pt1
25/91
Jan. 2011 Computer Architecture, Background and Motivation Slide 25
Glitching
Figure 1.14 Timing diagram for a circuit that exhibits glitching.
x= 0
y
z
a =xy
f=
az
2d 2d
Using the PAL in Fig. 1.13b to implement f=xyz
AND-OR
(PAL)
AND-OR
(PAL)
x
y
z
a
f
7/30/2019 f37 Book Intarch Pres Pt1
26/91
Jan. 2011 Computer Architecture, Background and Motivation Slide 26
CMOS Transmission Gates
Figure 1.15 A CMOS transmission gate and its use in building
a 2-to-1 mux.
z
x
x
0
1
(a) CMOS transmission gate:
circuit and symbol
(b) Two-input mux built of two
transmission gates
TG
TGTG
yP
N
7/30/2019 f37 Book Intarch Pres Pt1
27/91
Jan. 2011 Computer Architecture, Background and Motivation Slide 27
2 Digital Circuits with Memory
Second of two chapters containing a review of digital design: Combinational (memoryless) circuits in Chapter 1 Sequential circuits (with memory) in Chapter 2
Topics in This Chapter
2.1 Latches, Flip-Flops, and Registers2.2 Finite-State Machines2.3 Designing Sequential Circuits2.4 Useful Sequential Parts2.5 Programmable Sequential Parts2.6 Clocks and Timing of Events
7/30/2019 f37 Book Intarch Pres Pt1
28/91
Jan. 2011 Computer Architecture, Background and Motivation Slide 28
2.1 Latches, Flip-Flops, and Registers
Figure 2.1 Latches, flip-flops, and registers.
RQ
QS
DQ
Q
C
Q
Q
D
C
(a) SR latch (b) D latch
Q
C Q
DQ
C Q
D
(e) k-bit register(d) D flip-flop symbol(c) Master-slave D flip-flop
Q
C Q
D
FF
/ /k k
Q
C Q
D
FF
R
S
7/30/2019 f37 Book Intarch Pres Pt1
29/91
Jan. 2011 Computer Architecture, Background and Motivation Slide 29
Latches vs Flip-Flops
Figure 2.2 Operations of D latch andnegative-edge-triggered D flip-flop.
D
C
D latch: Q
D FF: Q
Setuptime Setuptime HoldtimeHoldtimeD
C
Q
Q
D
C
Q
Q
FF
7/30/2019 f37 Book Intarch Pres Pt1
30/91
Jan. 2011 Computer Architecture, Background and Motivation Slide 30
Reading and Modifying FFs in the Same Cycle
Figure 2.3 Register-to-register operation with edge-triggeredflip-flops.
/ /k k
Q
C Q
D
FF
/ /k k
Q
C Q
D
FF
Computation module(combinational logic)
ClockPropagation delayCombinational delay
7/30/2019 f37 Book Intarch Pres Pt1
31/91
Jan. 2011 Computer Architecture, Background and Motivation Slide 31
2.2 Finite-State Machines
Example 2.1
Figure 2.4 State table and state diagram for a vendingmachine coin reception unit.
DimeDimeQuarter
Dime
Quarter
DimeQuarter
DimeQuarter
ResetReset
Reset
Reset
Reset
StartQuarter
S00
S10
S20S
25
S30
S35
S10
S25
S00
S00
S00S
00
S00
S00
S20
S35
S35
S35
S35
S35
S35S30
S35
S35
------- Input -------
Dime
Quarter
Reset
Currentstate
S00S35
is the initial state
is the final state
Next state
DimeQuarter
S00
S10
S20
S25
S30
S35
7/30/2019 f37 Book Intarch Pres Pt1
32/91
Jan. 2011 Computer Architecture, Background and Motivation Slide 32
Sequential Machine Implementation
Figure 2.5 Hardware realization of Moore and Mealysequential machines.
Next-state
logic
State register/n
/m
/l
Inputs Outputs
Next-stateexcitationsignals
Presentstate
Outputlogic
Only for Mealy machine
7/30/2019 f37 Book Intarch Pres Pt1
33/91
Jan. 2011 Computer Architecture, Background and Motivation Slide 33
2.3 Designing Sequential Circuits
Example 2.3
Figure 2.7 Hardware realization of a coin reception unit (Example 2.3).
Output
Q
C Q
De
Inputs
Q
C Q
D
Q
C Q
D
FF2
FF1
FF0
q
d
Quarter in
Dime in
Finalstateis 1xx
7/30/2019 f37 Book Intarch Pres Pt1
34/91
Jan. 2011 Computer Architecture, Background and Motivation Slide 34
2.4 Useful Sequential Parts
High-level building blocks
Much like prefab closets used in building a house
Other memory components will be covered inChapter 17 (SRAM details, DRAM, Flash)
Here we cover three useful parts:shift register, register file (SRAM basics), counter
7/30/2019 f37 Book Intarch Pres Pt1
35/91
Jan. 2011 Computer Architecture, Background and Motivation Slide 35
Shift Register
Figure 2.8 Register with single-bit left shift and parallel loadcapabilities. For logical left shift, serial data in line is connected to 0.
Parallel data in
/k
/k
/k
Shift
Q
C Q
D
FF1
0
Serial data in
/k1 LSBs
Load
Parallel data out
Serial data outMSB
01001110
7/30/2019 f37 Book Intarch Pres Pt1
36/91
Jan. 2011 Computer Architecture, Background and Motivation Slide 36
Register File and FIFO
Figure 2.9 Register file with random access and FIFO.
De
coder
/k
/k
/h
Writeenable
Read address 0
Read address 1
Readdata 0
Writedata
Readenable
2 k-bit registersh
/k
/k
/k
/k
/k
/k
/k
/h
Write
address
Muxes
Readdata 1
/k
/h
/h
/h
/k
/h
Write enable
Readaddr 0
/k
/kRead
addr 1
Write
dataWriteaddr
Readdata 0
Read enable
Readdata 1
(a) Register file with random access
(b) Graphic symbolfor register file
Q
C Q
D
FF
/k
Q
C Q
D
FF
Q
C Q
D
FF
Q
C Q
D
FF
/k
Push
/k
Input Output
Pop
Full
Empty
(c) FIFO symbol
7/30/2019 f37 Book Intarch Pres Pt1
37/91
Jan. 2011 Computer Architecture, Background and Motivation Slide 37
SRAM
Figure 2.10 SRAM memory is simply a large, single-port register file.
Column mux
Row
decoder
/h
Address
Square oralmost squarememory matrix
Row buffer
Row
Column
gbits data out
/g
/h
Write enable
/g
Data in
Address
Data out
Outputenable
Chipselect
.
.
.
. . .
. . .
(a) SRAM block diagram (b) SRAM read mechanism
7/30/2019 f37 Book Intarch Pres Pt1
38/91
Jan. 2011 Computer Architecture, Background and Motivation Slide 38
Binary Counter
Figure 2.11 Synchronous binary counter with initialization capability.
Count register
Mux
Incrementer
0
Input
Load
IncrInit
x+ 1
x
0 1
1c
inc
out
7/30/2019 f37 Book Intarch Pres Pt1
39/91
Jan. 2011 Computer Architecture, Background and Motivation Slide 39
2.5 Programmable Sequential Parts
Programmable array logic (PAL)
Field-programmable gate array (FPGA)
Both types contain macrocells and interconnects
A programmable sequential part contain gates andmemory elements
Programmed by cutting existing connections (fuses)
or establishing new connections (antifuses)
7/30/2019 f37 Book Intarch Pres Pt1
40/91
Jan. 2011 Computer Architecture, Background and Motivation Slide 40
PAL and FPGA
Figure 2.12 Examples of programmable sequential logic.
(a) Portion of PAL with storable output (b) Generic structure of an FPGA
8-inputANDs
DC
FF
Mux
Mux
0 1
0 1
I/O blocks
Configurablelogic block
Programmableconnections
CLB
CLB
CLB
CLB
7/30/2019 f37 Book Intarch Pres Pt1
41/91
Jan. 2011 Computer Architecture, Background and Motivation Slide 41
2.6 Clocks and Timing of Events
Clock is a periodic signal: clock rate = clock frequencyThe inverse of clock rate is the clock period: 1 GHz 1 nsConstraint: Clock period tprop + tcomb + tsetup + tskew
Figure 2.13 Determining the required length of the clock period.
Other inputs
Combinational
logic
Clock period
FF1 beginsto change
FF1 changeobserved
Must be wide enoughto accommodate
worst-case delays
Clock1 Clock2
Q
C Q
DFF2
Q
C Q
DFF1
7/30/2019 f37 Book Intarch Pres Pt1
42/91
Jan. 2011 Computer Architecture, Background and Motivation Slide 42
Synchronization
Figure 2.14 Synchronizers are used to prevent timing problems
arising from untimely changes in asynchronous signals.
Asynch
input
Asynch
input
Synch
version
Synch
version
Asynchinput
Synchversion
Clock
(a) Simple synchronizer (b) Two-FF synchronizer
(c) Input and output waveforms
Q
C Q
DFF
Q
C Q
DFF2
Q
C Q
DFF1
7/30/2019 f37 Book Intarch Pres Pt1
43/91
Jan. 2011 Computer Architecture, Background and Motivation Slide 43
Level-Sensitive Operation
Figure 2.15 Two-phase clocking with nonoverlapping clock signals.
Combi-national
logic1
1
Clock period
Q
C Q
DLatch
1
Q
C Q
DLatch
Other inputs
Combi-national
logic2
2
Clocks withnonoverlappinghighs
Other inputs
Q
C Q
LatchD
7/30/2019 f37 Book Intarch Pres Pt1
44/91
Jan. 2011 Computer Architecture, Background and Motivation Slide 44
3 Computer System Technology
Interplay between architecture, hardware, and software Architectural innovations influence technology Technological advances drive changes in architecture
Topics in This Chapter3.1 From Components to Applications3.2 Computer Systems and Their Parts3.3 Generations of Progress3.4 Processor and Memory Technologies
3.5 Peripherals, I/O, and Communications3.6 Software Systems and Applications
7/30/2019 f37 Book Intarch Pres Pt1
45/91
Jan. 2011 Computer Architecture, Background and Motivation Slide 45
3.1 From Components to Applications
Figure 3.1 Subfields or views in computer system engineering.
High-
levelview
Computerdesigner
Circuitdesigner
Applicationdesigner
System
designer
Logicd
esigner
Software Hardware
Computer organization
Low-
levelview
Applicatio
ndomains
Electroniccomponents
Computer architecture
7/30/2019 f37 Book Intarch Pres Pt1
46/91
Jan. 2011 Computer Architecture, Background and Motivation Slide 46
What Is (Computer) Architecture?
Figure 3.2 Like a building architect, whose place at theengineering/arts and goals/means interfaces is seen in this diagram, a
computer architect reconciles many conflicting or competing demands.
ArchitectInterface
Interface
Goals
Means
ArtsEngineering
Clients taste:
mood, style, . . .
Clients requirements:
function, cost, . . .
The world of arts:aesthetics, trends, . . .
Construction technology:material, codes, . . .
7/30/2019 f37 Book Intarch Pres Pt1
47/91
Jan. 2011 Computer Architecture, Background and Motivation Slide 47
3.2 Computer Systems and Their Parts
Figure 3.3 The space of computer systems, with what we normallymean by the word computer highlighted.
Computer
Analog
Fixed-function Stored-program
Electronic Nonelectronic
General-purpose Special-purpose
Number cruncher Data manipulator
Digital
7/30/2019 f37 Book Intarch Pres Pt1
48/91
Jan. 2011 Computer Architecture, Background and Motivation Slide 48
Price/Performance Pyramid
Figure 3.4 Classifying computers by computational
power and price range.
Embedded
PersonalWorkstation
Server
Mainframe
Super $Millions
$100s Ks$10s Ks
$1000s
$100s
$10s
Differences in scale,
not in substance
7/30/2019 f37 Book Intarch Pres Pt1
49/91
Jan. 2011 Computer Architecture, Background and Motivation Slide 49
Automotive Embedded Computers
Figure 3.5 Embedded computers are ubiquitous, yet invisible. They
are found in our automobiles, appliances, and many other places.
Engine
Impact sensors
Navigation &entertainment
Centralcontroller
BrakesAirbags
7/30/2019 f37 Book Intarch Pres Pt1
50/91
Jan. 2011 Computer Architecture, Background and Motivation Slide 50
Personal Computers and Workstations
Figure 3.6 Notebooks, a common class of portable computers,are much smaller than desktops but offer substantially the samecapabilities. What are the main reasons for the size difference?
Di it l C t S b t
7/30/2019 f37 Book Intarch Pres Pt1
51/91
Jan. 2011 Computer Architecture, Background and Motivation Slide 51
Digital Computer Subsystems
Figure 3.7 The (three, four, five, or) six main units of a digitalcomputer. Usually, the link unit (a simple bus or a more elaboratenetwork) is not explicitly included in such diagrams.
Memory
Link Input/Output
To/from network
Processor
Control
Datapath
Input
Output
CPU I/O
7/30/2019 f37 Book Intarch Pres Pt1
52/91
Jan. 2011 Computer Architecture, Background and Motivation Slide 52
3.3 Generations of ProgressTable 3.2 The 5 generations of digital computers, and their ancestors.
Generation(begun)
Processortechnology
Memoryinnovations
I/O devicesintroduced
Dominantlook & fell
0 (1600s) (Electro-)mechanical
Wheel, card Lever, dial,punched card
Factoryequipment
1 (1950s) Vacuum tube Magnetic
drum
Paper tape,
magnetic tape
Hall-size
cabinet2 (1960s) Transistor Magnetic
coreDrum, printer,text terminal
Room-sizemainframe
3 (1970s) SSI/MSI RAM/ROMchip
Disk, keyboard,video monitor
Desk-sizemini
4 (1980s) LSI/VLSI SRAM/DRAM Network, CD,mouse,sound
Desktop/laptop micro
5 (1990s) ULSI/GSI/WSI, SOC
SDRAM,flash
Sensor/actuator,point/click
Invisible,embedded
7/30/2019 f37 Book Intarch Pres Pt1
53/91
Jan. 2011 Computer Architecture, Background and Motivation Slide 53
Figure 3.8 The manufacturing process for an IC part.
IC Production and Yield
15-30cm
30-60 cm
Siliconcrystalingot
SlicerProcessing:20-30 steps
Blank waferwith defects
x xx x xx x
x xx x
0.2 cm
Patterned wafer
(100s of simple or scoresof complex processors)
DicerDie
~1 cm
Gooddie
~1 cm
Dietester
Microchipor other part
MountingPart
testerUsable
partto ship
7/30/2019 f37 Book Intarch Pres Pt1
54/91
Jan. 2011 Computer Architecture, Background and Motivation Slide 54
Figure 3.9 Visualizing the dramatic decrease in yieldwith larger dies.
Effect of Die Size on Yield
120 dies, 109 ood 26 dies, 15 ood
Die yield =def(number of good dies) / (total number of dies)
Die yield = Wafer yield [1 + (Defect density Die area) / a]a
Die cost = (cost of wafer) / (total number of dies die yield)= (cost of wafer) (die area / wafer area) / (die yield)
7/30/2019 f37 Book Intarch Pres Pt1
55/91
Jan. 2011 Computer Architecture, Background and Motivation Slide 55
3.4 Processor and Memory Technologies
Figure 3.11 Packaging of processor, memory, and other components.
PC board
Backplane
Memory
CPU
Bus
Connector
(b) 3D packaging of the future(a) 2D or 2.5D packaging now common
Stacked layersglued together
Interlayer connections
deposited on theoutside of the stackDie
TIPS Tb
7/30/2019 f37 Book Intarch Pres Pt1
56/91
Jan. 2011 Computer Architecture, Background and Motivation Slide 56
Figure 3.10 Trends in processor performance and DRAMmemory chip capacity (Moores law).
MooresLaw
1Mb
19901980 2000 2010kIPS
MIPS
GIPS
Proc
essorperformanc
e
Calendar year
8028668000
80386
80486
68040
Pentium
Pentium II
R10000
1.6 / yr
10 / 5 yrs2 / 18 mos
64Mb
4Mb
64kb
256kb
256Mb
1Gb
16Mb
4 / 3 yrs
Processor
Memory
kb
Mb
Gb
Mem
orychipcapacity
7/30/2019 f37 Book Intarch Pres Pt1
57/91
Jan. 2011 Computer Architecture, Background and Motivation Slide 57
Pitfalls of Computer Technology Forecasting
DOS addresses only 1 MB of RAM because we cannot
imagine any applications needing more. Microsoft, 1980
640K ought to be enough for anybody. Bill Gates, 1981
Computers in the future may weigh no more than 1.5
tons. Popular MechanicsI think there is a world market for maybe fivecomputers. Thomas Watson, IBM Chairman, 1943
There is no reason anyone would want a computer in
their home. Ken Olsen, DEC founder, 1977
The 32-bit machine would be an overkill for a personalcomputer. Sol Libes, ByteLines
7/30/2019 f37 Book Intarch Pres Pt1
58/91
Jan. 2011 Computer Architecture, Background and Motivation Slide 58
3.5 Input/Output and Communications
Figure 3.12 Magnetic and optical disk memory units.
(a) Cutaway view of a hard disk drive (b) Some removable storage media
Typically2-9 cm
Floppydisk
CD-ROM
Magnetictape
cartridge
. .
......
1210
7/30/2019 f37 Book Intarch Pres Pt1
59/91
Jan. 2011 Computer Architecture, Background and Motivation Slide 59
Figure 3.13 Latency and bandwidth characteristics of differentclasses of communication links.
CommunicationTechnologies
3
6
9
12
9 6 3 3
Bandwidth(b/s)
Latency (s)
10
10
10
10
10 10 10 1 10
Processorbus
I/Onetwork
System-areanetwork(SAN) Local-area
network(LAN)
Metro-areanetwork(MAN)
Wide-areanetwork(WAN)
Geographically distributed
Same geographic location
(ns) (s) (ms) (min) (h)
7/30/2019 f37 Book Intarch Pres Pt1
60/91
Jan. 2011 Computer Architecture, Background and Motivation Slide 60
3.6 Software Systems and Applications
Figure 3.15 Categorization of software, with examples in each class.
Software
Application:word processor,
spreadsheet,circuit simulator,
. . .
Operating system Translator:MIPS assembler,
C compiler,. . .
System
Manager:virtual memory,
security,file system,
. . .
Coordinator:scheduling,
load balancing,diagnostics,
. . .
Enabler:disk driver,
display driver,printing,
. . .
7/30/2019 f37 Book Intarch Pres Pt1
61/91
Jan. 2011 Computer Architecture, Background and Motivation Slide 61
Figure 3.14 Models and abstractions in programming.
High- vs Low-Level Programming
Compiler
Assembler
Interpreter
temp=v[i]v[i]=v[i+1]v[i+1]=temp
Swap v[i]and v[i+1]
add $2,$5,$5add $2,$2,$2add $2,$4,$2lw $15,0($2)lw $16,4($2)sw $16,0($2)sw $15,4($2)jr $31
00a5102000421020008210208c6200008cf20004acf20000ac62000403e00008
Veryhigh-levellanguageobjectivesor tasks
High-levellanguagestatements
Assemblylanguageinstructions,mnemonic
Machinelanguageinstructions,binary (hex)
One task =many statements
One statement =several instructions
Mostly one-to-one
More abstract, machine-independent;easier to write, read, debug, or maintain
More concrete, machine-specific, error-prone;harder to write, read, debug, or maintain
7/30/2019 f37 Book Intarch Pres Pt1
62/91
Jan. 2011 Computer Architecture, Background and Motivation Slide 62
4 Computer Performance
Performance is key in design decisions; also cost and power It has been a driving force for innovation Isnt quite the same as speed (higher clock rate)
Topics in This Chapter4.1 Cost, Performance, and Cost/Performance4.2 Defining Computer Performance4.3 Performance Enhancement andAmdahls Law4.4 Performance Measurement vs Modeling4.5 Reporting Computer Performance4.6 The Quest for Higher Performance
7/30/2019 f37 Book Intarch Pres Pt1
63/91
Jan. 2011 Computer Architecture, Background and Motivation Slide 63
4.1 Cost, Performance, and Cost/Performance
19801960 2000 2020$1
Comput
ercost
Calendar year
$1 K
$1 M
$1 G
C t/P f
7/30/2019 f37 Book Intarch Pres Pt1
64/91
Jan. 2011 Computer Architecture, Background and Motivation Slide 64
Figure 4.1 Performance improvement as a function of cost.
Cost/Performance
Performance
Cost
Superlinear:economy ofscale
Sublinear:diminishingreturns
Linear(ideal?)
7/30/2019 f37 Book Intarch Pres Pt1
65/91
Jan. 2011 Computer Architecture, Background and Motivation Slide 65
4.2 Defining Computer Performance
Figure 4.2 Pipeline analogy shows that imbalance between processingpower and I/O capabilities leads to a performance bottleneck.
ProcessingInput Output
CPU-bound task
I/O-bound task
Six Passenger Aircraft to Be Compared
7/30/2019 f37 Book Intarch Pres Pt1
66/91
Jan. 2011 Computer Architecture, Background and Motivation Slide 66
Six Passenger Aircraft to Be Compared
B 747
DC-8-50
Performance of Aircraft: An Analogy
7/30/2019 f37 Book Intarch Pres Pt1
67/91
Jan. 2011 Computer Architecture, Background and Motivation Slide 67
Performance of Aircraft: An Analogy
Table 4.1 Key characteristics of six passenger aircraft: all figuresare approximate; some relate to a specific model/configuration of
the aircraft or are averages of cited range of values.
Aircraft PassengersRange(km)
Speed(km/h)
Price($M)
Airbus A310 250 8 300 895 120
Boeing 747 470 6 700 980 200
Boeing 767 250 12 300 885 120
Boeing 777 375 7 450 980 180
Concorde 130 6 400 2 200 350
DC-8-50 145 14 000 875 80
Speed of sound 1220 km / h
7/30/2019 f37 Book Intarch Pres Pt1
68/91
Jan. 2011 Computer Architecture, Background and Motivation Slide 68
Different Views of Performance
Performance from the viewpoint of a passenger: Speed
Note, however, that flight time is but one part of total travel time.Also, if the travel distance exceeds the range of a faster plane,a slower plane may be better due to not needing a refueling stop
Performance from the viewpoint of an airline: Throughput
Measured in passenger-km per hour (relevant if ticket price wereproportional to distance traveled, which in reality it is not)
Airbus A310 250 895 = 0.224 M passenger-km/hrBoeing 747 470 980 = 0.461 M passenger-km/hrBoeing 767 250 885 = 0.221 M passenger-km/hrBoeing 777 375 980 = 0.368 M passenger-km/hrConcorde 130 2200 = 0.286 M passenger-km/hrDC-8-50 145 875 = 0.127 M passenger-km/hr
Performance from the viewpoint of FAA: Safety
C t Eff ti C t/P f
7/30/2019 f37 Book Intarch Pres Pt1
69/91
Jan. 2011 Computer Architecture, Background and Motivation Slide 69
Cost Effectiveness: Cost/Performance
Table 4.1 Key characteristics of six passengeraircraft: all figures are approximate; some relate to
a specific model/configuration of the aircraft or areaverages of cited range of values.
Aircraft Passen-gers
Range(km)
Speed(km/h)
Price($M)
A310 250 8 300 895 120
B 747 470 6 700 980 200
B 767 250 12 300 885 120
B 777 375 7 450 980 180
Concorde 130 6 400 2 200 350
DC-8-50 145 14 000 875 80
Cost /Performance
536
434
543
489
1224
630
Smaller
valuesbetter
Throughput
(M P km/hr)
0.224
0.461
0.221
0.368
0.286
0.127
Larger
valuesbetter
C t f P f d S d
7/30/2019 f37 Book Intarch Pres Pt1
70/91
Jan. 2011 Computer Architecture, Background and Motivation Slide 70
Concepts of Performance and Speedup
Performance = 1 / Execution time is simplified to
Performance = 1 / CPU execution time
(Performance of M1) / (Performance of M2) = Speedup of M1 over M2
= (Execution time of M2) / (Execution time M1)
Terminology: M1 isxtimes as fast as M2 (e.g., 1.5 times as fast)M1 is 100(x 1)% faster than M2 (e.g., 50% faster)
CPU time = Instructions (Cycles per instruction) (Secs per cycle)
= Instructions CPI / (Clock rate)
Instruction count, CPI, and clock rate are not completely independent,so improving one by a given factor may not lead to overall executiontime improvement by the same factor.
Elaboration on the CPU Time Form la
7/30/2019 f37 Book Intarch Pres Pt1
71/91
Jan. 2011 Computer Architecture, Background and Motivation Slide 71
Elaboration on the CPU Time Formula
CPU time = Instructions (Cycles per instruction) (Secs per cycle)
= Instructions Average CPI / (Clock rate)
Clock period
Clock rate: 1 GHz = 109 cycles / s (cycle time 109 s = 1 ns)200 MHz = 200 106 cycles / s (cycle time = 5 ns)
Average CPI: Is calculated based on the dynamic instruction mixand knowledge of how many clock cycles are neededto execute various instructions (or instruction classes)
Instructions: Number of instructions executed, not number ofinstructions in our program (dynamic count)
Dynamic Instruction Count
7/30/2019 f37 Book Intarch Pres Pt1
72/91
Jan. 2011 Computer Architecture, Background and Motivation Slide 72
Dynamic Instruction Count
250 instructions
fori = 1, 100 do
20 instructions
forj = 1, 100 do
40 instructions
fork = 1, 100 do
10 instructions
endforendfor
endfor
How many instructionsare executed in this
program fragment?
Each for consists of two instructions:increment index, check exit condition
2 + 40 + 1200 instructions100 iterations124,200 instructions in all
2 + 10 instructions100 iterations1200 instructions in all
2 + 20 + 124,200 instructions100 iterations
12,422,200 instructions in all
12,422,450 Instructions
fori = 1, nwhile x > 0
Static count = 326
Faster Clock Shorter Running Time
7/30/2019 f37 Book Intarch Pres Pt1
73/91
Jan. 2011 Computer Architecture, Background and Motivation Slide 73
Figure 4.3 Faster steps do not necessarilymean shorter travel time.
Faster Clock Shorter Running Time
1 GHz
2 GHz
4 steps
Solution
20 steps
Suppose addition takes 1 ns
Clock period = 1 ns; 1 cycleClock period = ns; 2 cycles
In this example, addition timedoes not improve in going from1 GHz to 2 GHz clock
7/30/2019 f37 Book Intarch Pres Pt1
74/91
Jan. 2011 Computer Architecture, Background and Motivation Slide 74
0
10
20
30
40
50
0 10 20 30 40 50Enhancement factor (p )
Speedup(s)
f = 0
f = 0.1
f = 0.05
f = 0.02
f = 0.01
4.3 Performance Enhancement: Amdahls Law
Figure 4.4 Amdahls law: speedup achieved if a fraction fof atask is unaffected and the remaining 1fpart runsp times as fast.
s =
min(p, 1/f)1
f+(1f)/p
f = fractionunaffectedp = speedup
of the rest
Amdahls Law Used in Design
7/30/2019 f37 Book Intarch Pres Pt1
75/91
Jan. 2011 Computer Architecture, Background and Motivation Slide 75
Example 4.1
Amdahl s Law Used in Design
A processor spends 30% of its time on flp addition, 25% on flp mult,and 10% on flp division. Evaluate the following enhancements, each
costing the same to implement:
a. Redesign of the flp adder to make it twice as fast.
b. Redesign of the flp multiplier to make it three times as fast.
c. Redesign the flp divider to make it 10 times as fast.
Solution
a. Adder redesign speedup = 1 / [0.7 + 0.3 / 2] = 1.18
b. Multiplier redesign speedup = 1 / [0.75 + 0.25 / 3] = 1.20c. Divider redesign speedup = 1 / [0.9 + 0.1 / 10] = 1.10
What if both the adder and the multiplier are redesigned?
Amdahls Law Used in Management
7/30/2019 f37 Book Intarch Pres Pt1
76/91
Jan. 2011 Computer Architecture, Background and Motivation Slide 76
Example 4.2
Amdahl s Law Used in Management
Members of a university research group frequently visit the library.Each library trip takes 20 minutes. The group decides to subscribe
to a handful of publications that account for 90% of the library trips;
access time to these publications is reduced to 2 minutes.
a. What is the average speedup in access to publications?
b. If the group has 20 members, each making two weekly trips tothe library, what is the justifiable expense for the subscriptions?
Assume 50 working weeks/yr and $25/h for a researchers time.
Solution
a. Speedup in publication access time = 1 / [0.1 + 0.9 / 10] = 5.26
b. Time saved = 20 2 50 0.9 (20 2) = 32,400 min = 540 h
Cost recovery = 540 $25 = $13,500 = Max justifiable expense
7/30/2019 f37 Book Intarch Pres Pt1
77/91
Jan. 2011 Computer Architecture, Background and Motivation Slide 77
4.4 Performance Measurement vs Modeling
Figure 4.5 Running times of six programs on three machines.
Execution time
ProgramA E FB C D
Machine 1
Machine 2
Machine 3
Generalized Amdahls Law
7/30/2019 f37 Book Intarch Pres Pt1
78/91
Jan. 2011 Computer Architecture, Background and Motivation Slide 78
Generalized Amdahl s Law
Original running time of a program = 1 = f1 + f2 + . . . + fk
New running time after the fraction fi is speeded up by a factorpi
f1 f2 fk+ + . . . +
p1 p2 pk
Speedup formula
1
S =f1
f2
fk
+ + . . . +
p1 p2 pk
If a particular fractionis slowed down ratherthan speeded up,use sjfj instead offj/pj,where sj> 1 is theslowdown factor
Performance Benchmarks
7/30/2019 f37 Book Intarch Pres Pt1
79/91
Jan. 2011 Computer Architecture, Background and Motivation Slide 79
Performance Benchmarks
Example 4.3
You are an engineer at Outtel, a start-up aspiring to compete with Intelvia its new processor design that outperforms the latest Intel processor
by a factor of 2.5 on floating-point instructions. This level of performance
was achieved by design compromises that led to a 20% increase in the
execution time of all other instructions. You are in charge of choosing
benchmarks that would showcase Outtels performance edge.a. What is the minimum required fraction fof time spent on floating-point
instructions in a program on the Intel processor to show a speedup of2 or better for Outtel?
Solutiona. We use a generalized form of Amdahls formula in which a fraction f
is speeded up by a given factor (2.5) and the rest is slowed down byanother factor (1.2): 1/ [1.2(1f) + f/2.5] 2 f 0.875
Performance Estimation
7/30/2019 f37 Book Intarch Pres Pt1
80/91
Jan. 2011 Computer Architecture, Background and Motivation Slide 80
Performance Estimation
Average CPI = All instruction classes (Class-ifraction) (Class-iCPI)
Machine cycle time = 1 / Clock rate
CPU execution time = Instructions (Average CPI) / (Clock rate)
Table 4.3 Usage frequency, in percentage, for variousinstruction classes in four representative applications.
Application Instrn class
Datacompression
C languagecompiler
Reactorsimulation
Atomic motionmodeling
A: Load/Store 25 37 32 37
B: Integer 32 28 17 5
C: Shift/Logic 16 13 2 1
D: Float 0 0 34 42
E: Branch 19 13 9 10
F: All others 8 9 6 4
CPI and IPS Calculations
7/30/2019 f37 Book Intarch Pres Pt1
81/91
Jan. 2011 Computer Architecture, Background and Motivation Slide 81
CPI and IPS Calculations
Example 4.4 (2 of 5 parts)
Consider two implementations M1
(600 MHz) and M2
(500 MHz) ofan instruction set containing three classes of instructions:
Class CPI for M1 CPI for M2 CommentsF 5.0 4.0 Floating-pointI 2.0 3.8 Integer arithmeticN 2.4 2.0 Nonarithmetic
a. What are the peak performances of M1 and M2 in MIPS?b. If 50% of instructions executed are class-N, with the rest divided
equally among F and I, which machine is faster? By what factor?
Solution
a. Peak MIPS for M1 = 600 / 2.0 = 300; for M2 = 500 / 2.0 = 250
b. Average CPI for M1 = 5.0 / 4 + 2.0 / 4 + 2.4 / 2 = 2.95;
for M2 = 4.0/4 + 3.8/4 + 2.0/2 = 2.95 M1 is faster; factor 1.2
MIPS Rating Can Be Misleading
7/30/2019 f37 Book Intarch Pres Pt1
82/91
Jan. 2011 Computer Architecture, Background and Motivation Slide 82
MIPS Rating Can Be Misleading
Example 4.5
Two compilers produce machine code for a program on a machinewith two classes of instructions. Here are the number of instructions:
Class CPI Compiler 1 Compiler 2
A 1 600M 400M
B 2 400M 400M
a. What are run times of the two programs with a 1 GHz clock?
b. Which compiler produces faster code and by what factor?
c. Which compilers output runs at a higher MIPS rate?
Solution
a. Running time 1 (2) = (600M 1 + 400M 2) / 109 = 1.4 s (1.2 s)
b. Compiler 2s output runs 1.4 / 1.2 = 1.17 times as fast
c. MIPS rating 1, CPI = 1.4 (2, CPI = 1.5) = 1000 / 1.4 = 714 (667)
7/30/2019 f37 Book Intarch Pres Pt1
83/91
Jan. 2011 Computer Architecture, Background and Motivation Slide 83
4.5 Reporting Computer Performance
Table 4.4 Measured or estimated execution times for three programs.
Time onmachine X
Time onmachine Y
Speedup ofY over X
Program A 20 200 0.1
Program B 1000 100 10.0
Program C 1500 150 10.0
All 3 progs 2520 450 5.6
Analogy: If a car is driven to a city 100 km away at 100 km/hrand returns at 50 km/hr, the average speed is not (100 + 50) / 2but is obtained from the fact that it travels 200 km in 3 hours.
C i th O ll P f
7/30/2019 f37 Book Intarch Pres Pt1
84/91
Jan. 2011 Computer Architecture, Background and Motivation Slide 84
Table 4.4 Measured or estimated execution times for three programs.
Time onmachine X
Time onmachine Y
Speedup ofY over X
Program A 20 200 0.1
Program B 1000 100 10.0
Program C 1500 150 10.0
Geometric mean does not yield a measure of overall speedup,but provides an indicator that at least moves in the right direction
Comparing the Overall Performance
Speedup ofX over Y
10
0.1
0.1
Arithmetic mean
Geometric mean6.7
2.15
3.4
0.46
Effect of Instruction Mix on Performance
7/30/2019 f37 Book Intarch Pres Pt1
85/91
Jan. 2011 Computer Architecture, Background and Motivation Slide 85
Example 4.6 (1 of 3 parts)
Consider two applications DC and RS and two machines M1 and M2:
Class Data Comp. Reactor Sim. M1s CPI M2s CPIA: Ld/Str 25% 32% 4.0 3.8B: Integer 32% 17% 1.5 2.5C: Sh/Logic 16% 2% 1.2 1.2D: Float 0% 34% 6.0 2.6
E: Branch 19% 9% 2.5 2.2F: Other 8% 6% 2.0 2.3
a. Find the effective CPI for the two applications on both machines.
Solution
a. CPI of DC on M1: 0.25 4.0 + 0.32 1.5 + 0.16 1.2 + 0 6.0 +
0.19 2.5 + 0.08 2.0 = 2.31
DC on M2: 2.54 RS on M1: 3.94 RS on M2: 2.89
7/30/2019 f37 Book Intarch Pres Pt1
86/91
Jan. 2011 Computer Architecture, Background and Motivation Slide 86
4.6 The Quest for Higher Performance
State of available computing power ca. the early 2000s:
Gigaflops on the desktop
Teraflops in the supercomputer center
Petaflops on the drawing board
Note on terminology (see Table 3.1)Prefixes for large units:
Kilo = 103, Mega = 106, Giga = 109, Tera = 1012, Peta = 1015
For memory:
K = 210
= 1024, M = 220
, G = 230
, T = 240
, P = 250
Prefixes for small units:micro = 106, nano = 109, pico = 1012, femto = 1015
Performance Trends and Obsolescence
7/30/2019 f37 Book Intarch Pres Pt1
87/91
Jan. 2011 Computer Architecture, Background and Motivation Slide 87
Figure 3.10 Trends in processorperformance and DRAM memorychip capacity (Moores law).
Performance Trends and Obsolescence
1Mb
19901980 2000 2010kIPS
MIPS
GIPS
TIPS
Processorperformance
Calendar year
8028668000
80386
80486
68040
Pentium
Pentium II
R10000
1.6 / yr
10 / 5 yrs2 / 18 mos
64Mb
4Mb
64kb
256kb
256Mb
1Gb
16Mb
4 / 3 yrs
Processor
Memory
kb
Mb
Gb
Tb
Mem
orychipcapacity
Can I call you back? Wejust bought a new computerand were trying to set it upbefore its obsolete.
Super- PFLOPS Massively parallelprocessors
7/30/2019 f37 Book Intarch Pres Pt1
88/91
Jan. 2011 Computer Architecture, Background and Motivation Slide 88
Figure 4.7 Exponential growth of supercomputer performance.
computers
19901980 2000 2010
Superc
omputerperforma
nce
Calendar year
CrayX-MP
Y-MP
CM-2
MFLOPS
GFLOPS
TFLOPS
Vectorsupercomputers
CM-5
CM-5
$240M MPPs
$30M MPPs
processors
The Most Powerful Computers
7/30/2019 f37 Book Intarch Pres Pt1
89/91
Jan. 2011 Computer Architecture, Background and Motivation Slide 89
Figure 4.8 Milestones in the DOEs Accelerated Strategic ComputingInitiative (ASCI) program with extrapolation up to the PFLOPS level.
p
20001995 2005 2010
Perform
ance(TFLOPS)
Calendar year
ASCI Red
ASCI Blue
ASCI White
1+ TFLOPS, 0.5 TB
3+ TFLOPS, 1.5 TB
10+ TFLOPS, 5 TB
30+ TFLOPS, 10 TB
100+ TFLOPS, 20 TB
1
10
100
1000 Plan Develop Use
ASCI
ASCI Purple
ASCI Q
Performance is Important, But It Isnt Everything
7/30/2019 f37 Book Intarch Pres Pt1
90/91
Jan. 2011 Computer Architecture, Background and Motivation Slide 90
Figure 25.1Trend incomputationalperformanceper watt of
power usedin general-purposeprocessorsand DSPs.
p y g
19901980 2000 2010kIPS
MIPS
GIPS
TIPS
Performance
Calendar year
Absolute
processor
performance
GP processor
performanceper Watt
DSP performance
per Watt
Roadmap for the Rest of the Book
7/30/2019 f37 Book Intarch Pres Pt1
91/91
Ch. 5-8: A simple ISA,variations in ISA
Ch. 9-12: ALU design
Ch. 13-14: Data pathand control unit design
Ch. 15-16: Pipelining
and its limits
Ch. 17-20: Memory(main, mass, cache, virtual)
Ch. 21-24: I/O, buses,
interrupts, interfacing
Fasten your seatbeltsas we begin our ride!
Ch. 25-28: Vector andparallel processing
Top Related