Computer architecture Lecture 4: Processor instruction list Piotr Bilski.

46
Computer architecture Lecture 4: Processor instruction list Piotr Bilski

Transcript of Computer architecture Lecture 4: Processor instruction list Piotr Bilski.

Computer architecture

Lecture 4: Processor instruction list

Piotr Bilski

Execution of program

• Processor executes machine instructions (after understanding them - decoding)

• Programmer creates a program in the symbolic low or high level language

• During compilation symbolic language is translated into the machine language instructions

Elements of the machine instructions

• Operation code• Argument references (operation input data)• Result reference (if needed)• Reference to the next instruction

0 3 4 15

Operation code Argument references

Arguments and results are stored in:

• Memory (main, cache, virtual)

• Processor registers (accumulator, general purpose registers)

• Input/output devices (hard drive, printer)

Instructions types

• Data processing (logical and arithmetic operations)

• Data storage (instructions related to the memory access)

• Data transmission (input/output operations)

• Control (result testing, non-sequential code execution – jumps, branches)

Relation between the symbolic and machine instructions

x = x + c;

LOAD 1001

ADD 1002

STORE 1001

1001

1002

x

cALU

Number of the addresses in the instruction

Instruction Action

SUB Y,A,B YA-B

MPY T,D,E TD*E

ADD T,T,C TT+C

DIV Y,Y,T YY/T

3 addresses

Instruction Action

MOVE Y,A YA

SUB Y,B YY-B

MOVE T,D TD

MPY T,E TT*E

ADD T,C TT+C

DIV Y,T YY/T

2 addressesInstruction Action

LOAD D ACD

MPY E ACAC*E

ADD C ACAC+C

DIV Y ACAC/Y

1 addressY=(A-B)/(C+D*E)

Number of the addresses in the instruction (cont.)

• Three addresses:ADD a,b,c

• Two addresses: MOVE a,b ADD a,c

• One address: LOAD b ADD c STOR a

a = b + c

Instruction list design problems

• How many (and which) operations for processor to execute?

• What data types (arguments, results)?

• What instruction format (length, addresses’ number)?

• How many (and which) registers?

• Which addressing modes?

Operands

• Addresses (unsigned integers)

• Numbers (numerical data) – fixed and floating point precision, decimal

• Characters (ASCII / IRA, EBCDIC codes etc.)

• Logical data (single bits)

Computer as the data storage

• Writing multiple-byte data in memory can be little endian, big endian, and bi-endian

• The difference between the models of the data storage is in the sequence of the bytes stored in memory, for example hexadecimal number 76859432 can be written in two ways:

263

264

265

266

263

264

265

266

76

85

94

32

32

94

85

76

Big endian

Little endian

Little and big endian

Big endian• Easy to sort character

sequences (strings)• Allows printing ASCII

characters withot any conversions

• Integers and characters are in the same order

• Used in: Sun SPARC, RISC processors, Motorola 680x0

Little endian• Easy to convert longer

number to the shorter one• Arithmetic operations are

easier to execute• Used in: Intel 80x86,

Pentium, Alpha

Bi-endian• Understands both

standards• Used in: PowerPC

Examples of little and big endian in the file types

Big endian:• Adobe Photoshop• IMG (GEM Raster) • JPEG • MacPaint • SGI (Silicon

Graphics)• Sun Raster

Little endian:• BMP (Windows,

OS/2 Bitmaps) • GIF • PCX (PC

Paintbrush) • TGA (Targa) • Microsoft RTF

(Rich Text Format)

Bi-endian:• Microsoft

RIFF (.WAV & .AVI)

• TIFF • XWD (X

Window Dump)

Pentium data types

• Data are organized in the multiplicity of the byte (byte – B, word – 2 B, double word – 4 B etc.)

• Formats are compliant with IEEE 754 norm• No need to store data under the evenly alligned

addresses• Unsigned integers (8, 16, 32, 64 bits) -

addresses• Signed integers (8,16, 32, 64 bits), two’s

complement representation• Floating point numbers (single, double, and

extended double precision)

Pentium data types (cont.)

• Generic (any content 16,32 or 64 bits long)

• Unpacked decimal number binary representation (one digit in a byte)

• Packed decimal number binary representation (two digits in a byte)

• Pointer (32-bit address)

• Bit field

• Byte chain

PowerPC data types

• Data 8, 16, 32, 64 bits long

• Data address alignment to the even byte is not required (though sometimes used)

• PowerPC is bi-endian type

• Stored: usigned and signed numbers (byte (8b), half-word (16b), word (32b), double word (64b)), floating point numbers (IEEE 754), byte chain (up to 128 B)

Operation classification

• Data transfer ( STORE, LOAD, SET PUSH, POP)• Arithmetic (ADD, SUB, NEG, INC, MULT)• Logical (AND, OR, NOT, TEST, SHIFT, ROTATE)• Control passing (JUMP, HALT, EXEC)• Input/output (READ, WRITE)• Conversion (TRANS, CONV)

Data transfer

• Aim: to move data from one location to another• Requires: determining memory location (virtual

address?), checking for cache memory, producing instruction of read/write operation

• Exemplary instructions: LOAD, STORE (in short, long, half-word versions etc.)

Logical operations

• Operands are treated as the bit chain• The most popular operations: AND, OR, XOR,

NOT• Bit chains treated as masks:

A1 = 10100101

AND

A2 = 11110000

10100000

A1 = 10100101

XOR

A2 = 11111111

01011010

Logical operations (cont.)• Logical shifting

• Arithmetic shifting

0

0

Changing execution order

• Related to the instructions’ execution order

• Contain jumps, calling procedures and execution of one operation in a loop

• Control passing can be conditional or unconditional

Conditional branches

• Multiple-bit code contains storing results of the operations being a condition to the jump execution, for example determined by the sign of the result, overflow and zeroing the result

• The second method is the jump condition embedded in the jump instruction

• Jump can be used in both directions

Branch example351

352

353 SUB X, Y

354 BRZ 373

........

372 BR 353

373

........

395 Rest of the code

396

BRZ – make a jump, if the result is zero

BR – make a jump unconditionally

Conditional code of the SUB operation determines jump in BRZ operation

Procedures

• They are isolated modules in the source code

• Their usage allows to increase flexibility of the code

• Require two instructions: call and return

• The same procedure can be called many times from different locations

• Procedures can be nested

Procedure and return location

• Procedure can be called from multiple locations in the program

• Nesting of calls is possible

• Calling the procedure requires storing the return address:– In the register– At the beginning of the called procedure– On the stack (the best option, allows the

operation of the nested (recurrent) procedures)

Procedure call

Stack

• It is an isolated memory space to store data, organized as the LIFO structure

• In many processors there is the register working as the stack pointer (for example, Motorola 68000)

• Main stack operations: PUSH, POP

Example of the stack implementation

Stack pointer

End of stack

F

T

PUSH

F

POP

F

Working with stack

• Operation a+b-(c/d)• Operation in the reverse polish notation: ab+cd/-

a

b

a+b a+b

c

d

a+b

c/d

a+b-c/d

Stack frame

• Set of the procedure parameters including return address

• Allows to call the nested procedures storing input and output parameters on the stack

Stack frame illustration

x2

x1

Return point

Previous frame pointer

y2

y1

Previous frame pointer

Return point

x2

x1

Previous frame pointer

Return point

Stack cont.

SP

FP

Procedure AProcedure A calls B

FP

SP

Stack frame in Pentium processor• Used by the ENTER, CALL commands• ENTER command supports compilers in the

nested procedures implementation• LEAVE command restores previous stack status• Frame pointer is stored in the EBP registry,

stack pointer in ESP registry• Example of the CALL execution:

PUSH EBP

MOV EBP, ESP

SUB ESP, space_in_memory

MMX instructions

• Introduced in 1996 r. to the Pentium processors• In the first version they were 57 SIMD

instructions• Used to execute operations on the integer

numbers• Purpose – multimedia applications (computer

games, graphics and sound processing)• MMX uses four new data types: packed byte,

packed word, packed double word, packed quadruple word

MMX instructions examples

• Arithmetic: PADD, PMUL, PMADD• Logical: PAND, PNDN, POR, PXOR• Comparison: PCMPEQ, PCMPGT• Conversion: PUNPCKH, PUNPCKL

• All instructions have suffixes determining, which type of data is used in the operation: B, W, D, Q

Additional MMX registers

• Eight 64-bit registers from MM0 to MM7• Due to the backward compatibility, the MMX registers

are accessible by the older software as the floating point registers

63 56 7 0

eight byte Seventh byte First byte

Fourth word

.....

Exemplary MMX operation

MMX arithmetics

• Saturation instead of the overflow

1111 0000 0000 0000

+0011 0000 0000 0000

10010 0000 0000 0000 overflow

1111 0000 0000 0000

+0011 0000 0000 0000

10010 0000 0000 0000

1111 1111 1111 1111 saturation

Why should we use MMX?

* - compared to the C code using traditional architecture

Operation Acceleration*

Echo effect 5,9

Matrix transposition 2

Arithmetic and logical operations on vectors

6

Fractals drawing (2D) 1,5

Billinear texture mapping (3D)

7

Median filter 3,8

Haar transform 2x2 2,2

Calculating L1 norm 3,3

3D transformation 3,1

SSE instructions

• Introduced in 1999 (Pentium 3)

• New 70 instructions for the floating point operations

• Additional 8 128-bit registers, addressed directly: XMM0 – XMM7 (plus control register MXCSR).

• Every register stores 4 32-bit floating point numbers

SSE (cont.)• New data type: 4-element vector of

floating point single precision numbers• Operations can be packed (PS – for all

elements of the vector), or scalar (SS – inly on the first elements)

• Example:

xmm0 = [X1 X2 X3 X4] xmm1 = [Y1 Y2 Y3 Y4]

ADDPS(xmm0,xmm1) =

[X1+Y1 X2+Y2 X3+Y3 X4+Y4]

3DNow! Instructions• Introduced in 1997 r. by the AMD

corporation• Provide set of 21 new instructions for the

floating point number calculations of the SIMD type

• Used in the multimedia applications (high resolution graphics, computer games, CAD/CAM)

• Extensions exist: Enchanced 3DNow!, 3DNow Professional

SSE2 instructions

• Introduced in 2001 (Intel Pentium IV, Athlon 64, Sempron 754, Transmeta Efficeon)

• Set of the additional 144 instructions, supported by 16 128-bit registers (XMM0 – XMM15)

• Performed operations on 64-bit floating point (coprocessors x87 work with 80-bit numbers) and integer 128-bit numbers

Next Sets of Instructions

• SSE3 (Prescott New Instructions) – 13 new instructions, including the complex numbers arithmetics (since 2004, Pentium IV Prescott, Athlon 64 E)

• SSSE3 (Supplemental Streaming SIMD Extension 3) – 16 new instructions operating on integers (since 2005 Xeon, Intel Core 2, AMD Phenom)

• SSE4 – 54 new instructions in two groups (47 and 7), including integer number instructions modifying EFLAGS register (new!), implemented in Intel Core 2, Celeron Conroe, Penryn

Next Sets of Instructions (c.d.)

• SSE5 – planned to be implemented by AMD in 2009. Finally replaced by three groups: XOP, FMA4, CVT16 (AVX compatible). Implemented in Buldozzer procesors in 2011. Instructions have even 4 arguments! Competitor to Intel’s SSE4

• AVX (Advanced Vector Extensions) – implemented by Intel in 2011: 16 new 256-bit registers (YMM0-YMM15) + 19 instructions working exclusively on these registers

Assembler

• Low level programming language

• Uses both instructions and symbolic pointers to data

• Every processor has its own assembler

Example of the assembly program

101 0010 0010 0000 0001

102 0001 0010 0000 0010

103 0001 0010 0000 0011

104 0011 0010 0000 0100

201 0000 0000 0000 0010

202 0000 0000 0000 0011

203 0000 0000 0000 0100

204 0000 0000 0000 0000

101 LDA 201

102 ADD 202

103 ADD 203

104 STA 204

201 DAT 2

202 DAT 3

203 DAT 4

204 DAT 0

FORMUL LDA I

ADD J

ADD K

STA L

I DATA 2

J DATA 3

K DATA 4

L DATA 0

MACHINE LANGUAGE SYMBOLIC ASSEMBLER

PROGRAM

L = I + J + K