Intel’s IA-32 Architecture - Walla Walla Universitycurt.nelson/cptr280/lecture/intel... ·...

8
1 Cptr280, Autumn 2017 Intel’s IA-32 Architecture Cptr280 Dr Curtis Nelson History of the Intel 80x86 1971 - Intel invents the microprocessor, the 4004 1975 - 8080 introduced 8-bit microprocessor 1978 - 8086 introduced 16 bit microprocessor 1980 - IBM selects 8088 as basis for IBM PC 8088 is 8-bit external bus version of 8086 1980 - 8087 floating point coprocessor Adds 60 floating point instructions 80-bit floating point registers Uses hybrid stack/register scheme

Transcript of Intel’s IA-32 Architecture - Walla Walla Universitycurt.nelson/cptr280/lecture/intel... ·...

1Cptr280, Autumn 2017

Intel’s IA-32 Architecture

Cptr280

Dr Curtis Nelson

History of the Intel 80x86

• 1971 - Intel invents the microprocessor, the 4004• 1975 - 8080 introduced

– 8-bit microprocessor• 1978 - 8086 introduced

– 16 bit microprocessor• 1980 - IBM selects 8088 as basis for IBM PC

– 8088 is 8-bit external bus version of 8086• 1980 - 8087 floating point coprocessor

– Adds 60 floating point instructions– 80-bit floating point registers – Uses hybrid stack/register scheme

2Cptr280, Autumn 2017

• 1982 - 80286 introduced– 24-bit address space– Memory mapping and protection

• 1985 - 80386 introduced– 32-bit address space– 32-bit General Purpose registers– New addressing modes

• 1989 - 80486 introduced• 1992 - Pentium introduced• 1995 - Pentium Pro introduced• 1996 - Pentium with MMX (multimedia) extensions

– 57 new instructions– Primarily for multimedia applications

• 1997 - Pentium II (Pentium Pro with MMX)

History of the Intel 80x86

• 1999 - Pentium III Introduced• Supports Intel’s Internet Streaming SIMD technology

– Additional multimedia instructions– Four 32-bit floating point operations in parallel– Useful in speech recognition, video encoding/decoding

• 2000 - Itanium introduced– Release of IA-64 (RISC-like) architecture– Explicitly Parallel Instruction Computing (EPIC)– 128-bit bundle with three instructions– 128 general purpose registers and 128 floating point registers– Done by a partnership between HP and Intel– Able to run both UNIX and Microsoft windows

• Intel’s architecture was due to the desire for backward compatibility– Highly irregular architecture– Over 50 million sold per year

History of the Intel 80x86

3Cptr280, Autumn 2017

• 2003 - AMD extends the architecture• Increases address space to 64 bits• Widens all registers to 64 bits

• 2004 - Intel capitulates• Embraces AMD64 (calls it EM64T)• Adds more multi-media extensions

• Conclusion (Hennessy and Patterson):• Intel’s processor development history illustrates the impact of the “golden

handcuffs” of compatibility• Adding new features as someone might add clothing to a packed bag• An architecture that is difficult to explain and impossible to love”

History of the Intel 80x86

IA-32 Overview

• Complexity:– Instructions from 1 to 17 bytes long– One operand must act as both a source and destination– One operand can come from memory– Complex addressing modes

• Saving grace:– The most frequently used instructions are not too difficult to build– Compilers avoid the portions of the architecture that are slow

4Cptr280, Autumn 2017

IA-32 Registers and Data Addressing

• Registers in the 32-bit subset that originated with 80386

GPR 0

GPR 1

GPR 2

GPR 3

GPR 4

GPR 5

GPR 6

GPR 7

Code segment pointer

Stack segment pointer (top of stack)

Data segment pointer 0

Data segment pointer 1

Data segment pointer 2

Data segment pointer 3

Instruction pointer (PC)

Condition codes

Use031

Name

EAX

ECX

EDX

EBX

ESP

EBP

ESI

EDI

CS

SS

DS

ES

FS

GS

EIP

EFLAGS

IA-32 Instruction Formats

• Typical formats: (notice the different lengths)

a. JE EIP + displacement

b. CALL

c. MOV EBX, [EDI + 45]

d. PUSH ESI

e. ADD EAX, #6765

f. TEST EDX, #42

ImmediatePostbyteTEST

ADD

PUSH

MOV

CALL

JE

w

w ImmediateReg

Reg

wd Displacementr/mPostbyte

Offset

DisplacementCondi-tion

4 4 8

8 32

6 81 1 8

5 3

4 323 1

7 321 8

5Cptr280, Autumn 2017

X86 Operand Types

• x86 instructions typically have two operands, where one operand is both a source and a destination operand

• Possible combinations includeSource/destination type Second source type

Register RegisterRegister ImmediateRegister MemoryMemory RegisterMemory Immediate

• No memory-memory or immediate-immediate• Immediates can be 8, 16, or 32 bits long

80x86 Instructions

• Data movement (move, push, pop)• Arithmetic and logic (logic ops, tests CCs, shifts,

integer and decimal arithmetic)• Control flow (branches, jumps, calls, returns)• String instructions (move and compare)• FP data movement (load, load constant, store)• Arithmetic instructions (add, subtract, multiply,

divide, square root, absolute value) • Comparisons (can send result to ALU)• Transcendental functions (sin, cos, log, etc.)

6Cptr280, Autumn 2017

Top 10 80x86 Instructions

Rank instruction Integer Average Percent total executed1 load 22%2 conditional branch 20%3 compare 16%4 store 12%5 add 8%6 and 6%7 sub 5%8 move register-register 4%9 call 1%10 return 1%

Total 96%

Addressing Modes

• The x86 offers several different addressing modes for accessing memory

Register indirect Address in register (mem[R1])

Base with displacement (8, 16, or 32-bit displacement)

Base plus scaled index(8, 16, or 32-bit displacement)

Address in base register plusdisplacement (mem[R1+100])

Address isBase + 2scale x Index

scale = 0, 1, 2 or 3

Base plus scaled index with displacement(8, 16, or 32-bit displacement)

Address isBase + 2scale x Index + disp.

scale = 0, 1, 2 or 3

7Cptr280, Autumn 2017

80x86 Instruction Format

• Instructions vary from 1 to 17 bytes in length

80x86 Length Distribution

Leng

th in

byt

es

% instructions at each length

0% 10% 20% 30%

1

2

3

4

5

6

7

8

9

10

11

24%

23%

21%

3%

12%

13%

3%

0%

0%

1%

19%

17%

16%

1%

15%

27%

4%

0%

0%

1%

24%

24%

27%

4%

13%

6%

2%

0%

0%

0%

25%

24%

29%

3%

12%

4%

2%

0%

0%

0%

Espresso

Gcc

Spice

NASA7

8Cptr280, Autumn 2017

Pentium Pro vs. MIPS R10000

Benchmark Pro MIPS MIPS÷ProSPECint95 8.7 8.9 1.02SPECfp95 6.0 17.2 2.87

• The Pentium Pro and MIPS R1000 have comparable performance on integer computations

• The MIPS R10000 has much better performance than the Pentium Pro for floating point computations

• Instruction complexity is only one variable– Lower instruction count vs. higher CPI vs. lower clock rate

• Design principles– Simplicity favors regularity– Smaller is faster– Good design demands compromise– Make the common case fast

Summary