Lecture 5: Instruction Set Architecture Computer Engineering 585 Fall 2001.
-
Upload
alan-holmes -
Category
Documents
-
view
218 -
download
0
Transcript of Lecture 5: Instruction Set Architecture Computer Engineering 585 Fall 2001.
Lecture 5: Instruction Set Architecture
Computer Engineering 585Fall 2001
Summary, #1• Designing to Last through Trends
Capacity Speed
Logic 2x in 3 years 2x in 3/2 years
DRAM 4x in 3 years 2x in 10 years
Disk 4x in 3 years 2x in 10 years
• 6yrs to graduate => 16X CPU speed, DRAM/Disk size
• Time to run the task– Execution time, response time, latency
• Tasks per day, hour, week, sec, ns, …– Throughput, bandwidth
• “X is n times faster than Y” means ExTime(Y) Performance(X)
--------- = --------------
ExTime(X) Performance(Y)
Summary, #2 Amdahl’s Law:
CPI Law:
Execution time is the REAL measure of computer performance!
Good products created when have: Good benchmarks, good ways to summarize
performance Die Cost goes roughly with die area4
Can PC industry support engineering/research investment?
Speedupoverall =ExTimeold
ExTimenew
=
1
(1 - Fractionenhanced) + Fractionenhanced
Speedupenhanced
CPU time = Seconds = Instructions x Cycles x Seconds
Program Program Instruction Cycle
CPU time = Seconds = Instructions x Cycles x Seconds
Program Program Instruction Cycle
Computer Architecture Is …the attributes of a [computing] system as
seen by the programmer, i.e., the conceptual structure and functional behavior, as distinct from the organization of the data flows and controls the logic design, and the physical implementation.
Amdahl, Blaaw, and Brooks, 1964
SOFTWARESOFTWARE
Computer Architecture’s Changing Definition
1950s to 1960s: Computer Architecture Course: Computer Arithmetic
1970s to mid 1980s: Computer Architecture Course: Instruction Set Design, especially ISA appropriate for compilers
1990s-2000s: Computer Architecture Course:Design of CPU, memory system, I/O system, Multiprocessors
Instruction Set Architecture (ISA)
instruction set
software
hardware
Interface DesignA good interface:
• Lasts through many implementations (portability, compatibility)
• Is used in many different ways (generality)
• Provides convenient functionality to higher levels
• Permits an efficient implementation at lower levels
Interfaceimp 1
imp 2
imp 3
use
use
use
time
Evolution of Instruction Sets Single Accumulator (EDSAC 1950)
Accumulator + Index Registers(Manchester Mark I, IBM 700 series 1953)
Separation of Programming Model from Implementation
High-level Language Based Concept of a Family(B5000 1963) (IBM 360 1964)
General Purpose Register Machines
Complex Instruction Sets Load/Store Architecture
RISC
(Vax, Intel 432 1977-80) (CDC 6600, Cray 1 1963-76)
(Mips,Sparc,HP-PA,IBM RS6000, . . .1987)
A "Typical" RISC
32-bit fixed format instruction (3 formats) 32 32-bit GPR (R0 contains zero, DP take
pair) 3-address, reg-reg arithmetic instruction Single address mode for load/store:
base + displacement no indirection
Simple branch conditions Delayed branch
see: SPARC, MIPS, HP PA-Risc, DEC Alpha, IBM PowerPC, CDC 6600, CDC 7600, Cray-1, Cray-2, Cray-3
Evolution of Instruction Sets Major advances in computer architecture are
typically associated with landmark instruction set designs Ex: Stack vs GPR (System 360)
Design decisions must take into account: technology machine organization programming languages compiler technology operating systems
And they in turn influence these
Example: MIPS
Op
31 26 01516202125
Rs1 Rd immediate
Op
31 26 025
Op
31 26 01516202125
Rs1 Rs2
target
Rd Opx
Register-Register
561011
Register-Immediate
Op
31 26 01516202125
Rs1 Rs2/Opx immediate
Branch
Jump / Call
Architecture, Implementation Architecture deals with functions
provided to the programmer: addressing, addition, interrupt, and I/O
Implementation deals with method used to achieve this function, such as a parallel datapath and a microprogrammed control
Realization is means used to materialize this method: electrical, magnetic or mechanical devices; power and packaging.
Clock Architecture
1 23
12
4567
8910
11
Architecture
Variant Realizations
Architecture: Two arms – small one for hour, longer one for minutes, may be alarm.
Realization: Shape of clock arms and dial, numbers. Mechanical or digital mechanism. Energy source a wound spring or a battery.
Instruction Set Design: (1) Ease of Use
consistency: with a partial knowledge of the system, one can predict the remainder. e.g. including square-root as an instruction should almost fully define everything else. FP op halve was added to IBM 360 as an afterthought and lacked post-normalization.
orthogonality: Two independent concerns should be handled as such. e.g. clock architecture -- (1) luminous dial (2) alarm.
IBM 650, low order addr bits determine amount of shift. Yet, if address exceeds address space, a violation occurs.
transparency: an architectural function is transparent if its implementation does not produce any architecturally visible side-effects. e.g. pipelining should not affect the compiler-visible machine.
generality: Designer should not limit a function by his/her own notions about its use. Intel 8080 has a restart op intended to restart after an interrupt. Its larger use is a return from a subroutine, since it was designed in all its generality.
open-endedness: provision for future expansion.
completeness: all functions of a given class are provided. special case: symmetry: inverse is also provided.
Instruction Set Design: (2) Program size: memory size; CPU-MM
bandwidth; frequently used (written-down) instructions should be short.
(3) Execution speed: time required to execute an instruction Can they be pipelined? Are they uniform in
execution length? Control and cache are often in the critical path of a
processor design. Uniform length requirements at loggerheads with
(2) above. (4) Complexity of control unit: Some
instructions should not even be in the instruction set. (RISC)
Instruction Set Classification
internal CPU operand storage mechanism: registers, stack, accumulator
# explicit operands / instruction: 0, 1, 2, 3
presumed operand locations: memory, stack
Operations type and size of operands
Instruction: Opcode ---- Operands: ADD R1, 20
Instruction Formats
#Ops Instruction
Semantics Machine
4 NI op A B C C = A op B IBM 650 µ-code
3 op A B C C=A op B RISC, Cray
2 op A B A=A op B IBM370, VAX
1 op A Acc = Acc op A
PDP8, M6809
0 op X=X op Y stack machines transputer, B5500
Stack/Reg/Acc Architectures
Stack Accumulator
Reg-Mem Reg-Reg
PUSH A LOAD A LOAD R1, A LOAD R1, A
PUSH B ADD B ADD R1, B LOAD R2, B
ADD STORE C STORE C, R1
ADD R3, R1, R2
POP C STORE R3, C
C = A+B
Stack: short inst, post-fix model of expression evaluation; sequential operand access --- hard for
compilers, Implementation issues --- how deep,
exception handling e.g. when empty? Accumulator: short inst and relatively
small machine state, (easier context-switch); high memory traffic.
Reg-Reg: Easiest for compiler optimization -- most general model.
long instructions and large state.
Endian-ness of Memory AddressingCohen's article: On Holy Wars and a Plea for Peace,IEEE Computer, Oct 81.
CPUwords, pages
Memorybits, bytes
What order are they composed in order to form the nextobject in the hierarchy?
LSB (less-significant unit) travels first little endians (Lilliputians)MSB (more-significant unit) travels first big endians (Blefuscians)
Endian-ness Big-endian: IBM 360, MIPS, Motorola
680xx, SPARC, DLX Little-endian: DEC VAX,Compaq/HP
Alpha, Intel 80x86 Selectable: PowerPC, MIPS: mode bit: 0-
Big, 1-LittleA Content
s
4 0x10
5 0x20
6 0x30
7 0x40
Word at Addr 4: 0X10203040 (Big) 0x40302010 (Little)
Memory addressing contd: data alignment
Most machines are byte addressable.
Object Aligned at byte addr Misaligned at byte addr
Byte 0,1,2,3,4,5,6,7 Never
Half word 0,2,4,6 1,3,5,7
Word 0,4 1,2,3,5,6,7
Double word 0 1,2,3,4,5,6,7
1 B
Decoder
1KX1B memory
.. …1K
1K to 132 to 1
decoder 32
32X32B memory
32 B
32 to 1 multiplexor
5 MSBAddr bits
5 LSBAddr bits
10 addr. bits
Physical Rationale for Alignment
Costs of misalignment
0 1 2 3
a2=1
4 5 6 7
Memory Multiplexor
3 addr bits: a3, a2, a1
a3=0 a3=1
a2=0