CS/EE 5810 CS/EE 6810 F00: 1 Multimedia. CS/EE 5810 CS/EE 6810 F00: 2 New Architecture Direction...

11
CS/EE 5810 CS/EE 6810 F00: 1 Multimedia

Transcript of CS/EE 5810 CS/EE 6810 F00: 1 Multimedia. CS/EE 5810 CS/EE 6810 F00: 2 New Architecture Direction...

Page 1: CS/EE 5810 CS/EE 6810 F00: 1 Multimedia. CS/EE 5810 CS/EE 6810 F00: 2 New Architecture Direction “… media processing will become the dominant force in.

CS/EE 5810CS/EE 6810

F00: 1

Multimedia

Page 2: CS/EE 5810 CS/EE 6810 F00: 1 Multimedia. CS/EE 5810 CS/EE 6810 F00: 2 New Architecture Direction “… media processing will become the dominant force in.

CS/EE 5810CS/EE 6810

F00: 2

New Architecture Direction• “… media processing will become the dominant force in

computer architecture and microprocessor design”

• “… new media-rich applications … involve significant real-time processing of continuous media streams and make heavy use of vectors of packed 8-, 16-, and 32-bit integer and f.p.”

– “How Multimedia Workloads will Change Processor Design,” Diefendorff & Dubey, IEEE Computer (9/97)

• Needs includes high memory bandwidth, high network bandwidth, continuous media data types, real-time response, fine-grain parallelism

• Also significant focus on system bus performance

– Common bridge to the memory system and I/O

– Critical performance component for SMP server platforms

Page 3: CS/EE 5810 CS/EE 6810 F00: 1 Multimedia. CS/EE 5810 CS/EE 6810 F00: 2 New Architecture Direction “… media processing will become the dominant force in.

CS/EE 5810CS/EE 6810

F00: 3

Multimedia Workloads

• Multimedia

– Video conferencing

– Video authoring

– Animation

– Games

• Algorithms

– Image compression (jpeg)

– Video Compression (mpeg)

– 3-D graphics

– encryption

Page 4: CS/EE 5810 CS/EE 6810 F00: 1 Multimedia. CS/EE 5810 CS/EE 6810 F00: 2 New Architecture Direction “… media processing will become the dominant force in.

CS/EE 5810CS/EE 6810

F00: 4

Multimedia Characteristics• Real-time response

– Video, audio• Continuous media data types

– 8-16 bits sufficient for many applications• Data parallelism

– E.g. share same operation to whole image– Vector or SIMD work well here

• Coarse-grained parallelism– E.g. video encoding/decoding, audio encoding/decoding

• Small loops– Most time spent in kernal– Amenable to hand-optimization

• High memory bandwidth– Video, 3d graphics– Caches not large enough

Page 5: CS/EE 5810 CS/EE 6810 F00: 1 Multimedia. CS/EE 5810 CS/EE 6810 F00: 2 New Architecture Direction “… media processing will become the dominant force in.

CS/EE 5810CS/EE 6810

F00: 5

Multimedia ISA Extensions

• HP PA-RISC

– MAX-2

• SUN SPARC

– VIS

• Intel x86

– MMX

• MIPS

– MDMX

• PowerPC

– Altivec

Page 6: CS/EE 5810 CS/EE 6810 F00: 1 Multimedia. CS/EE 5810 CS/EE 6810 F00: 2 New Architecture Direction “… media processing will become the dominant force in.

CS/EE 5810CS/EE 6810

F00: 6

MMX

• “MMX Technology Extension to the Intel Architecture” Alex Peleg and Uri Weiser, IEEE Micro, August 1996

• Goals

– Improve performance of multimedia applications» Graphics, MPEG video

» Image processing, speech recognition

– Remain completely compatible with Intel x86 ISA

– Minimize cost

• Approach

– Use packed data types

– Exploit SIMD parallelism

– Make use of existing wide data paths

Page 7: CS/EE 5810 CS/EE 6810 F00: 1 Multimedia. CS/EE 5810 CS/EE 6810 F00: 2 New Architecture Direction “… media processing will become the dominant force in.

CS/EE 5810CS/EE 6810

F00: 7

Data Types and Operands

• Three fixed-point integer types packed into 64 bit quad word

– Packed Byte: 8 8-bit bytes

– Packed Word: 4 16-bit words

– Packed Doubleword: 2 32-bit words

• User-controlled fixed point

• Eight 64-bit GP registers (mm0-mm7)

• MMX shares FPU

– Can’t do FP an MMX at the same time

• Random Access

– Learned lesson from FP unit design.

Page 8: CS/EE 5810 CS/EE 6810 F00: 1 Multimedia. CS/EE 5810 CS/EE 6810 F00: 2 New Architecture Direction “… media processing will become the dominant force in.

CS/EE 5810CS/EE 6810

F00: 8

MMX Operations

• 57 MMX instructions work on all data types

• Support for saturation arithmetic

– Simplifies handling of underflow and overflow

– Matches physical behavior

• Packed operations

– Addition/subtraction, multiplication, compares, shifts

• Conversion operations

– Pack/unpack

• Performance improvement

– Fewer loads and stores

– Fewer arithmetic operations, but more conversion

Page 9: CS/EE 5810 CS/EE 6810 F00: 1 Multimedia. CS/EE 5810 CS/EE 6810 F00: 2 New Architecture Direction “… media processing will become the dominant force in.

CS/EE 5810CS/EE 6810

F00: 9

MMX Operations

A3 A2 A1 A0

B3 B2 B1 B0 X X X X

A3 X B3 A2 X B2 A1 X B1 A0 X B0

A3XB3 + A2XB2 A3XB3 + A2XB2

Packed multiply-addTo doubleword

51 3 5 23

73 2 5 6 > > > >

00…0 11…1 00…0 11…1

Packed compareGreater-than word

Page 10: CS/EE 5810 CS/EE 6810 F00: 1 Multimedia. CS/EE 5810 CS/EE 6810 F00: 2 New Architecture Direction “… media processing will become the dominant force in.

CS/EE 5810CS/EE 6810

F00: 10

Using MMX

• Assembly language coding

• Use of libraries

– E.g. IDCT, DCT, matrix multiply…

• Use of C macros (“intrinsics”)

– Generate optimized assembly code

– Performs register allocation and instruction scheduling» MMX64 t0, t1;

t0 = padd(t0, t1);

– Requires intimate knowledge of MMX

• Could a compiler generate MMX code?

Page 11: CS/EE 5810 CS/EE 6810 F00: 1 Multimedia. CS/EE 5810 CS/EE 6810 F00: 2 New Architecture Direction “… media processing will become the dominant force in.

CS/EE 5810CS/EE 6810

F00: 11

Chroma Keying

• Weatherman example» For (I = 0; I < imagesize; I++)

new_image = (x[I] == blue) ? Y[I] : X[I];

– Movq mm3, mem1 ; load 8 pixels from weathermanmovq mm4, mem2 ; load 8 pixels from mapPcmpeq mm1, mm3 ; generate select maskpand mm4, mm1 ; AND map with maskpandn mm1, mm3 ; AND weatherman with inverse maskpor mm4, mm1 ; OR masked images together