Client and Server processors

28
EE 382 Processor Design Winter 98/99 Michael Flynn 1 Client and Server processors • Client incorporates – Multi Media (sound and video) – Imaging (3D) – Security and network accessibility – wireless communications • Server incorporates – High speed processing – Management of large memory and file store complexes

description

Client and Server processors. Client incorporates Multi Media (sound and video) Imaging (3D) Security and network accessibility wireless communications Server incorporates High speed processing Management of large memory and file store complexes. Client processors. - PowerPoint PPT Presentation

Transcript of Client and Server processors

Page 1: Client and Server processors

EE 382 Processor Design Winter 98/99 Michael Flynn 1

Client and Server processors

• Client incorporates– Multi Media (sound and video)– Imaging (3D)– Security and network accessibility– wireless communications

• Server incorporates– High speed processing– Management of large memory and file store

complexes

Page 2: Client and Server processors

EE 382 Processor Design Winter 98/99 Michael Flynn 2

Client processors

• Modern processors are being enhanced to support multimedia, security, etc.

• Most of the recent interesting processor developments have been in client processors– largest market, not dominated by clock speed,

and more amenable to low power implement.– “system on a die”… includes dsp arithmetic

and as much structured memory as possible.

Page 3: Client and Server processors

EE 382 Processor Design Winter 98/99 Michael Flynn 3

Multi Media

• Includes video, audio, 3 D graphic imaging, as well as subsidiary functions such as music (composition and rendering), voice recognition, handwriting rec., animation

• Closely coupled to the display / presentation technology (raster line or pixel density, audio speaker fidelity / range)

Page 4: Client and Server processors

EE 382 Processor Design Winter 98/99 Michael Flynn 4

Still Images/ Video/ Audio

The problem is compression and meeting real time constraints– a B/W still image, 512x512 pixels, represents

about 1/4 MB (8b/pixel); color (3B/pixel) almost 1MB; use 1 MB as a typical image

– video requires 30 frames/sec; 30MB/sec; 1 hour is 108GB

– voice requires 44k samples/sec; 3B/samples/sec 2 or more channels; about 1/4 MB/sec.

Page 5: Client and Server processors

EE 382 Processor Design Winter 98/99 Michael Flynn 5

Still Images

• Lossless vs. Lossy compression – a simple bounded Huffman code gives 3:1

lossless compression

• JPEG is standard– offers (say) 25:1 lossy compression– tradeoffs: image quality file size

computational complexity

Page 6: Client and Server processors

EE 382 Processor Design Winter 98/99 Michael Flynn 6

JPEG

Image is partitioned into 8x8 pixel blocks– transform into frequency domain by DCT (the

high freq components are at the high index values of the resultant 8x8 matrix and often = 0.

– Quantize (the lossy step) map values to few numbers

– Zig zag access, to access low freq components (non 0) values first.

– Huffman (run length) encode values

Page 7: Client and Server processors

EE 382 Processor Design Winter 98/99 Michael Flynn 7

Discrete cosine transform (DCT)

Map X (spatial domain) to Y (freq. domain)– more compact representation, use 8x8 pixel blk– y[u,v] =(4C(u) C(v)/n2) SjSkx(u,v) cos(2j+1) up/2n cos(2k+1) vp/2n

– C(w) = 1for w=1,2… or C(w) = 1/sqrt2 for w=0

– better than discrete Fourier transform, but needs more computation

Page 8: Client and Server processors

EE 382 Processor Design Winter 98/99 Michael Flynn 8

DCT basis functions

Page 9: Client and Server processors

EE 382 Processor Design Winter 98/99 Michael Flynn 9

Block diagram of JPEG encoder

Page 10: Client and Server processors

EE 382 Processor Design Winter 98/99 Michael Flynn 10

Video

Popular standards:– H263 (video conferencing)– MPEG 1 (VHS quality)– MPEG 2 (Broadcast quality)– MPEG 4 (uses VOPs to achieve high quality

with good compression)…. More complex, an emerging standard

Page 11: Client and Server processors

EE 382 Processor Design Winter 98/99 Michael Flynn 11

Typical compression

• Image size, quality and delay are factors

• Lossless 3:1

• JPEG 25:1

• MPEG1 100:1 uses 352x288 CIF; 1-2 sec

• H 263 maybe 300:1; QCIF 176x144; 1/4sec

• MPEG2 4xCIF uses lower Q; longer delay

Page 12: Client and Server processors

EE 382 Processor Design Winter 98/99 Michael Flynn 12

MPEG frames

Three types of frames:– I intra-picture, like lossy JPEG– P predicted picture, motion prediction based on

earlier I; motion vector plus error terms, as error terms are small quantizing gives good compression

– B bidirectional pictures, motion prediction based on past and future I or P

– result is GOP, e.g. IPBBPBBPBBPBBPBBI

Page 13: Client and Server processors

EE 382 Processor Design Winter 98/99 Michael Flynn 13

I frames

• In MPEG typically use 1 I per 15 frames

• In H263 maybe 1 I per 300 frames

• I frames take (maybe) 4x bits to represent than a P or B frame.

Page 14: Client and Server processors

EE 382 Processor Design Winter 98/99 Michael Flynn 14

MPEG block diagram

Page 15: Client and Server processors

EE 382 Processor Design Winter 98/99 Michael Flynn 15

P frames

• Motion prediction is computationally intensive; based on macro blocks 2x2 blocks

• 16x16 of luminance, 1 8x8 Cr, 1 8x8 Cb, color is interleaved (called 4:2:0)

Page 16: Client and Server processors

EE 382 Processor Design Winter 98/99 Michael Flynn 16

Motion estimation process

Page 17: Client and Server processors

EE 382 Processor Design Winter 98/99 Michael Flynn 17

Forward motion compensation

Page 18: Client and Server processors

EE 382 Processor Design Winter 98/99 Michael Flynn 18

Motion estimation

• Computation intensive

• Compute SAD for all neighboring macro block combinations (index by 1 pixel).

• S [xi,jyi,k] across all macro blocks

• Find location that minimizes SAD

Page 19: Client and Server processors

EE 382 Processor Design Winter 98/99 Michael Flynn 19

Bidirectional motion compensation

Page 20: Client and Server processors

EE 382 Processor Design Winter 98/99 Michael Flynn 20

Block diagram of MPEG encoder

Page 21: Client and Server processors

EE 382 Processor Design Winter 98/99 Michael Flynn 21

Instructions /pixel

• JPEG about 320 to compress;280 to decode

• MPEG1 about 1100 to compress; about 80 to decode.

• Note problem in motion estimation; need 352 x 288 x 1100 x 30 instr /sec = 3.3 GIPS for MPEG1 to compress.

• MPEG2 uses bigger frames; better motion estimation and color …maybe…20 GIPS

Page 22: Client and Server processors

EE 382 Processor Design Winter 98/99 Michael Flynn 22

Video memory

• Even if we have enough arithmetic BW, memory (cache) access is a problem. A single CIF frame has 200 400 kB and won’t fit into a L2 caches less than (say) 1 or 2 MB. Worse is the behavior of the L1 D cache. There are NO hits after a line is used.

• Solution: prefetch and stride prediction caches at L1.

Page 23: Client and Server processors

EE 382 Processor Design Winter 98/99 Michael Flynn 23

Audio

Frequency range 20-20k Hz @ 2x samplingSample rates

• 8k telephone• 22k personal computers• 32k digital audio and TV• 44k CDs• 48k HDTV, DAT

Page 24: Client and Server processors

EE 382 Processor Design Winter 98/99 Michael Flynn 24

Audio

• Dynamic range: 0 to 120db about 20 bits of exponent

• Phasing: 2 or more channels to locate source

• Clipping: ear tolerates about 200ms delay, after 300ms becomes annoying.

• Bit rates: 44k x 20 x 2 = 1.7 Mbps or (PCs) 22k x 16 = 352 kbps

Page 25: Client and Server processors

EE 382 Processor Design Winter 98/99 Michael Flynn 25

Audio

• Can do better by compression; use ADPCM and send the difference between adjacent pulses… G722 standard 16k with ADPCM to fit into 64kbps.

• G728 uses linear predictive coder achieves 16kbps. Models voice as a linear filter; matches sample with codebook, send index into receivers codebook

Page 26: Client and Server processors

EE 382 Processor Design Winter 98/99 Michael Flynn 26

MPEG audio

• Compresses digital audio signals (PCM)

• Uses 32 sub band filters (512 taps), samples shifted 32 at a time. Computation is s(i) = Snx(t- n) Hi(n) over n = 512 per sample Hi(n) is the impulse response for the ith filter. Thus we have 512 multiply-accumulates per sample. About 22Mops/sec

Page 27: Client and Server processors

EE 382 Processor Design Winter 98/99 Michael Flynn 27

MPEG Audio

• Sample rates 32, 44, 48 kHz

• Mono, stereo or joint stereo

• Bit rates 64kbps to 128kbps, several layers and coder complexity to get better bit rates and/or better quality.

• Computationally requires probably 5 - 100 million multiply-adds per second (16b).

Page 28: Client and Server processors

EE 382 Processor Design Winter 98/99 Michael Flynn 28

MPEG audio encoder