1 204521 Digital System Architecture Instruction Set Architecture, the DLX and the 80x86 FALL 2000...
-
Upload
piers-townsend -
Category
Documents
-
view
217 -
download
0
Transcript of 1 204521 Digital System Architecture Instruction Set Architecture, the DLX and the 80x86 FALL 2000...
1204521 Digital System Architecture
Instruction Set Architecture, the DLX and the 80x86
FALL 2000
Pradondet Nilagupta
(orginal note from Dr. Robert F. Hodson)
(based on notes by Randy Katz)
2204521 Digital System Architecture
Review from last time Design Space of ISA
Five Primary Dimensions• Number of explicit operands ( 0, 1, 2, 3 )
• Operand Storage Where besides memory?
• Effective Address How is memory location specified?
• Type & Size of Operands byte, int, float, vector, . . .
How is it specified?
• Operations add, sub, mul, . . .
How is it specifed?
Other Aspects• Successor How is it specified?
• Conditions How are they determined?
• Encodings Fixed or variable? Wide?
• Parallelism
3204521 Digital System Architecture
ISA MetricsAesthetics:
• Orthogonality– No special registers, few special cases, all operand modes a
vailable with any data type or instruction type
• Completeness– Support for a wide range of operations and target application
s
• Regularity– No overloading for the meanings of instruction fields
• Streamlined– Resource needs easily determined
Ease of compilation (programming?)
Ease of implementation
Scalability
4204521 Digital System Architecture
Basic ISA ClassesAccumulator:
1 address add A acc ฌacc + mem[A]
1+x address addx A acc ฌacc + mem[A + x]
Stack:
0 address add tos ฌtos + next
General Purpose Register:
2 address add A B EA(A) ฌEA(A) + EA(B)
3 address add A B C EA(A) ฌEA(B) + EA(C)
Load/Store:
3 address add Ra Rb Rc Ra ฌRb + Rc
load Ra Rb Ra ฌmem[Rb]
store Ra Rb mem[Rb] ฌRa
5204521 Digital System Architecture
Stack Machines
• Instruction set: +, -, *, /, . . .
push A, pop A
• Example: a*b - (a+c*b)push a
push b
*
push a
push c
push b
*
+
-
A BA
A*B
-
+
aa b
*
b
*
c
A*BA*B
A*B
AAC
A*BA A*B
A C B B*C
6204521 Digital System Architecture
The Case Against Stacks
• Performance is derived from the existence of several fast registers, not from the way they are organized
• Data does not always “surface” when needed– Constants, repeated operands, common subexpressions
so TOP and Swap instructions are required
• Code density is about equal to that of GPR instruction sets
– Registers have short addresses
– Keep things in registers and reuse them
• Slightly simpler to write a poor compiler, but not an optimizing compiler
7204521 Digital System Architecture
Variable format, 2 and 3 address instruction
• 32-bit word size, 16 GPR (four reserved)
• Rich set of addressing modes (apply to any operand)
• Rich set of operations
– bit field, stack, call, case, loop, string, system
• Rich set of data types (B, W, L, Q, O, F, D, G, H)
• Condition codes
VAX-11
OpCode A/M A/M A/M
Byte 0 1 n m
8204521 Digital System Architecture
Kinds of Addressing Modes
• Register direct Ri
• Immediate (literal) v
• Direct (absolute) M[v]
• Register indirect M[Ri]
• Base+Displacement M[Ri + v]
• Base+Index M[Ri + Rj]
• Scaled Index M[Ri + Rj*d + v]
• Autoincrement M[Ri++]
• Autodecrement M[Ri - -]
• Memory Indirect (deferred) M[ M[Ri] ]
• [Indirection Chains] Ri Rj v
memory
reg. file
9204521 Digital System Architecture
A "Typical" RISC
• 32-bit fixed format instruction (3 formats)
• 32 32-bit GPR (R0 contains zero, Double Precision takes a register pair)
• 3-address, reg-reg arithmetic instruction
• Single address mode for load/store: base + displacement
– no indirection
• Simple branch conditions
• Delayed branch
see: SPARC, MIPS, MC88100, AMD2900, i960, i860 PARisc, DEC Alpha, Clipper, CDC 6600, CDC 7600, Cray-1, Cray-2, Cray-3
10204521 Digital System Architecture
Example: MIPS
Op
31 26 01516202125
Rs1 Rd immediate
Op
31 26 025
Op
31 26 01516202125
Rs1 Rs2
target
Rd Opx
Register-Register
561011
Register-Immediate
Op
31 26 01516202125
Rs1 Rs2/Opx immediate
Branch
Jump / Call
11204521 Digital System Architecture
Example: DLX
Op Rs1 Rd immediate
Op
Op Rs1 Rs2
Offset Added to PC
Rd
R-Type
I-Type
J-Type
Function
Rd <-- Rs1 Function Rs2
Load, Stores, Conditional Branched
Jump, Jump&Link,RTE
115556
6 5 5 16
6 26
12204521 Digital System Architecture
DLX Architecture• Introduced by Hennessey and Patterson in 19
90.
• DLX illustrates a typical RISC architecture very similar to the MIPS architecture.
– 32-bit byte addresses (algined)
– 32-bit fixed length instructions
– 3 instruction formats
– Load/store architecture
– Simple branch conditions (no condition codes).
• DLX registers– 32 32-bit GPRs (R0 = 0)
– 32 32-bit (or 16 64-bit) FPRs
– Special purpose registers (e.g., FP Status and PC)
13204521 Digital System Architecture
DLX Instruction SetAppendix C.3
• Data transfer– Load/store word
– Load/store halfword or byte (singed/unsigned loads)
– Load/store floating point single/double
– Register moves (many varieties)
• Arithmetic and Logic– Add/subtract (signed or unsigned, reg. or imm.)
– Multiply/divide (signed or unsigned, operands in FP reg.)
– And, or, xor (reg. or imm.)
– Load high word (loads upper half of a reg. with imm.)
– Shifts (LL, RL, RA) (reg. or imm.)
– Set conditionals (LT, GT, LE, GE, EQ, NE) (reg. or imm.)
14204521 Digital System Architecture
DLX Instruction Set
• Control – Conditional branch on register (compare with zero)
– Conditional on FP status bit (bit true or false)
– Jump, jump register (26 bit imm. or reg.)
– Jump and link, jump and link register (26 bit imm. or reg.)
– Trap, return from exception (trap to and return from O.S.)
• Floating Point– Add, subtract, multiply, divide (single or double)
– FP converts (convert between single, double, and integer)
– FP compares (single or double, sets bit in FP status)
15204521 Digital System Architecture
Examples of DLX Instructions
• Data Transfer– LW R1, 30(R2) Regs[R1] <= Mem[30 + Regs[R2]]
– SD 40(R3), F0 Mem[40 + Regs[3]] <= Regs[F0]
Mem[41 + Regs[3]] <= Regs[F1]
– How would you perform a register move? a no-op?
• Arithmetic and Logic– LHI R1, #42 Regs[R1] <= 42##0
– SLT R1, R2, R3 if (Regs[R2] < Regs[R3]) Regs[1] <= 1
else Regs[1] <= 0
- How would you load a 32 bit immediate into a register?
16
16204521 Digital System Architecture
Examples of DLX Instructions
• Control– JALR R2
Regs[31] <= PC+4, PC <= Regs[R2]
– JR R3 PC <= Regs[R3]
– How would you implement a subroutine call and return?
• Floating Point– MULF F1, F2, F3 Regs[F1] <= Regs[F2] + Regs[F3]
– LTD F1, R2 If (Regs[R1] < Regs[R2]) then set
a bit in the FP status.
– Why don’t they have LTD be a 3 operand instruction, compares 2 floating point registers the third to zero or one?
– What would be difficult about adding a floating-point multiply and add instruction to DLX?
17204521 Digital System Architecture
DLX Instruction Formats
Op
31 26 01516202125
rs1 rd immediate
Op
31 26 025
Op
31 26 01516202125
rs1 rs2
offset added to PC
rd
Register-Register (R-type)
561011
Register-Immediate (I-type)
Jump / Call (J-type)
func
(ALU imm. operations, loads and stores, conditional branch, jump (and link)
(jump, jump and link, trap and return from exception)
(ALI reg. operations, read/write special registers and moves)
18204521 Digital System Architecture
DLX Addressing Modes
• Displacement– Register Deferred if Displacement is 0
– PC Relative if Jump or Branch
– Absolute if R0 is the base (R0 is always 0)
• Immediate– Constants contained with the instruction
• Register Direct– For R-Type Instructions
• Addressing Mode Encoded in the Opcode– LW (Displacement), ADD (Register), ANDI (Immediate)
19204521 Digital System Architecture
DLX Data Types
• Signed/Unsigned Integer– Byte, HalfWord, Word, DoubleWord
• Floating Point– Single & Double Precision
– IEEE Standard 754 (0.f X 2 )E
20204521 Digital System Architecture
DLX Load/Stores
• Loads– Word, Byte, Unsigned Byte, Halfword, Float, Double
• Stores– Word, Byte, Double, Halfword, Float
• Examples– LW R1, 30(R2) R1 <-- MEM[30+R2]
– LB R1,40(R3) R1 <-- MEM[40+R3]0 ## MEM[40+R3]
– SW 500(R4), R3 MEM[500+R4] <-- R3
24
21204521 Digital System Architecture
DLX Arithmetic/Logical
• Add/Subtract– immediate,unsigned,immediate/unsigned
• Multiply/Divide– signed, unsigned
• Logical– And, Or, Xor
• Shift– left/right, logial/arithmetic
22204521 Digital System Architecture
Additional ALU Functions
• Set Condition Code Instructions– SLT, SGT, SLE, SGE, SEQ, SNE (Signed Test)
– SGT R1, R2, R3 R1 <-- R2 >R3
• DLX does limits the number of instructionsthat set condition codes
– simplifies compiler instruction scheduling
– pipelining must insure a transfer instruction access to a previous instructions condition codes
– No PSW
23204521 Digital System Architecture
DLX Control
• Branch– EQ/NEQ to Zero, FP comparison Bit T/F
• Jumps– Offset, Register, Jump&Link
• Traps– Transfer to OS at vectored address
• RFE– Return from exception
24204521 Digital System Architecture
DLX Floating Point
• Add/Subtract/Multiply/Divide– Single/Double
• Convert– F2D, F2I, D2F, D2I, I2F, I2D
• Compares– Single/Double, LT, GT, LE, GE, EQ, NE
25204521 Digital System Architecture
DLX Instruction Usage
Instr. gcc espresso spice nasa7int load/store 43% 29% 23% 1%int. arith 26% 30% 33% 22%control 17% 13% 11% 4%logical 10% 23% 5% 1%fp load/store 0% 0% 8% 33%fp arith 0% 0% 5% 39%misc 4% 0% 5% 0%
26204521 Digital System Architecture
DLX Summary
• Simple load/store architecture– Only accesses memory on loads/stores– All other operations use registers and immediate
• Designed for pipeline efficiency– Fixed length instruction encoding– Simple instructions
• Easy to compile to– Simple, frequently used instructions– Orthogonal instruction set– Few addressing modes
• Reduces execution time by– reducing CPI– reducing clock rate
27204521 Digital System Architecture
History of the Intel 80x86
• 1971: Intel invents microprocessor - 4004
• 1975: 8080 introduced– 8-bit microprocessor
– Accumulator machine
• 1978: 8086 introduced– 16 bit microprocessor
– Accumulator plus dedicated registers
• 1980: IBM selects 8088 as basis for IBM PC– 8088 is 8-bit external bus version of 8086
• 1980: 8087 floating point coprocessor – adds 60 floating point instructions
– 80 bit floating point registers
– uses hybrid stack/register scheme
28204521 Digital System Architecture
History of the Intel 80x86
• 1982: 80286 introduced– 24-bit address
– memory mapping & protection
• 1985: 80386 introduced– 32-bit address
– 32-bit GP registers
• 1989: 80486 introduced
• 1992: Pentium introduced
• 1995: Pentium Pro introduced
• 1996: Pentium with MMX extensions– 57 new instructions
– Primarily for multimedia applications
• 1997: Pentium II (Pentium Pro with MMX)
29204521 Digital System Architecture
Intel 80x86 Integer Registers
30204521 Digital System Architecture
Intel 80x86 Floating Point Registers
• Operations on the top of stack and one register within the stack
31204521 Digital System Architecture
Usage of Intel 80x86 Floating Point Registers
NASA7 SpiceStack (2nd operand ST(1)) 0.3% 2.0%
Register (2nd operand ST(i), i>1) 23.3% 8.3%
Memory 76.3% 89.7%
Above are dynamic instruction percentages (i.e., based on counts of executed instructions)
Stack unused by Solaris compilers for fastest execution
32204521 Digital System Architecture
80x86 Addressing/Protection
1 MB
16 MB
4 GB
33204521 Digital System Architecture
• 8086 in black; 80386 extensions in color
80x86 Instruction Format
(Base reg + 2Scale x Index reg)
34204521 Digital System Architecture
80x86 Instructions
• Data movement (move, push, pop)
• Arithmetic and logic (logic ops, tests CCs, shifts, integer and decimal arithmetic)
• Control flow (branches, jumps, calls, returns)
• String instructions (move and compare)
• FP data movement (load, load const., store)
• Arithmetic instructions (add, subtract, multiply, divide, square root, absolute value)
• Comparisons (can send result to ALU)
• Transcendental functions (sin, cos, log, etc.)
35204521 Digital System Architecture
80x86 Instruction Encoding: Mod, Reg, R/M Field
r w=0 w=1 r/m mod=0 mod=1 mod=2 mod=3
16b 32b 16b 32b 16b 32b 16b 32b
0 ALAX EAX 0 addr=BX+SI =EAX same same same same same
1 CLCX ECX 1 addr=BX+DI =ECX addr addr addr addr as
2 DLDX EDX 2 addr=BP+SI =EDX mod=0 mod=0 mod=0 mod=0 reg
3 BLBX EBX 3 addr=BP+SI =EBX +d8 +d8 +d16 +d32 field
4 AH SP ESP 4 addr=SI =(sib) SI+d8 (sib)+d8 SI+d8 (sib)+d32 “
5 CH BP EBP 5 addr=DI =d32 DI+d8 EBP+d8 DI+d16 EBP+d32 “
6 DH SI ESI 6 addr=d16 =ESI BP+d8 ESI+d8 BP+d16 ESI+d32 “
7 BH DI EDI 7 addr=BX =EDI BX+d8 EDI+d8 BX+d16 EDI+d32 “
First address specifier: Reg=3 bits, R/M=3 bits, Mod=2 bits
w from opcode
r/m field depends on mod and machine mode
36204521 Digital System Architecture
80x86 Instruction EncodingSc/Index/Base field
Index Base
0 EAX EAX
1 ECX ECX
2 EDX EDX
3 EBX EBX
4 no index ESP
5 EBP if mod=0, d32if modฐ0, EBP
6 ESI ESI
7 EDI EDI
Base + Scaled Index ModeUsed when: mod = 0,1,2 in 32-bit mode AND r/m = 4!
2-bit Scale Field3-bit Index Field3-bit Base Field
37204521 Digital System Architecture
80x86 Addressing Mode Usage for 32-bit Mode
Addressing ModeGcc Espr.NASA7Spice Avg.
Register indirect10% 10% 6% 2% 7%
Base + 8-bit disp46% 43% 32% 4% 31%
Base + 32-bit disp 2% 0% 24% 10% 9%
Indexed 1% 0% 1% 0% 1%
Based indexed + 8b disp 0% 0% 4% 0% 1%
Based indexed + 32b disp 0% 0% 0% 0% 0%
Base + Scaled Indexed 12% 31% 9% 0% 13%
Base + Scaled Index + 8b disp 2% 1% 2% 0% 1%
Base + Scaled Index + 32b disp 6% 2% 2% 33% 11%
32-bit Direct 19% 12% 20% 51% 26%
38204521 Digital System Architecture
80x86 Length DistributionL
en
gth
in
by
tes
% instructions at each length
0% 10% 20% 30%
1
2
3
4
5
6
7
8
9
10
11
24%
23%
21%
3%
12%
13%
3%
0%
0%
1%
19%
17%
16%
1%
15%
27%
4%
0%
0%
1%
24%
24%
27%
4%
13%
6%
2%
0%
0%
0%
25%
24%
29%
3%
12%
4%
2%
0%
0%
0%
Espresso
Gcc
Spice
NASA7
39204521 Digital System Architecture
Instruction Counts: 80x86 vs. DLXSPEC pgm x86 DLX
DLX86
gcc 3,771,327,742 3,892,063,460 1.03
espresso 2,216,423,413 2,801,294,286 1.26
spice 15,257,026,309 16,965,928,788 1.11
nasa7 15,603,040,963 6,118,740,321 0.39
• DLX tends to perform more instructions for integer programs, while the 80x86 performs more instructions for floating point programs
• 80x86 performs many more data transfers– Two to four times more for floating point programs
– About 1.25 times more for integer programs
40204521 Digital System Architecture
Intel Compiler vs. Compilers YOU Can Buy
• 66 MHz Pentium Comparison SpecInt92 SpecFP92
Intel Internal Optimizing Compiler 64.6 59.7
Best 486 Compiler (June 1993) 57.6 39.9
Typical 486 Compiler in 1990, 41.0 32.5 when Intel started project
• Integer Intel 1.1X faster, FP 1.5X faster
• 486 Comparison SpecInt92 SpecFP92
Intel Internal Optimizing Compiler 35.5 17.5
Best 486 Compiler (June 1993) 32.2 16.0
Typical 486 Compiler in 1990, 23.0 12.8 when Intel started project
• Integer: Intel 1.1X faster, FP 1.1X faster
41204521 Digital System Architecture
Intel Summary
• Archeology: history of instruction design in a single product
– Address size: 16 bit vs. 32-bit
– Protection: Segmentation vs. paged
– Temp. storage: accumulator vs. stack vs. registers
• “Golden Handcuffs” of binary compatability affect design 20 years later, as Moore predicted
• Not too difficult to make faster, as Intel has shown
• HP/Intel announcement of common future instruction set by 2000 means end of 80x86???
• “Beauty is in the eye of the beholder”– At 50M/year sold, it is a beautiful business