Penn ESE534 Spring2012 -- DeHon 1 ESE534: Computer Organization Day 21: April 4, 2012 Lossless Data...

40
Penn ESE534 Spring2012 -- DeHon 1 ESE534: Computer Organization Day 21: April 4, 2012 Lossless Data Compression

description

Lossless vs. Lossy Lossless – can reconstruct source perfectly (bit- identical) Examples –Huffman –Run Length Coding –Lempel-Ziv –Unix compress/gzip Lossy – capture important elements of original, but maybe not all bits Examples –MP3 –JPEG –MPEG Penn ESE534 Spring DeHon 3 uncompress(compress(x))=x

Transcript of Penn ESE534 Spring2012 -- DeHon 1 ESE534: Computer Organization Day 21: April 4, 2012 Lossless Data...

Page 1: Penn ESE534 Spring2012 -- DeHon 1 ESE534: Computer Organization Day 21: April 4, 2012 Lossless Data Compression.

Penn ESE534 Spring2012 -- DeHon1

ESE534:Computer Organization

Day 21: April 4, 2012Lossless Data Compression

Page 2: Penn ESE534 Spring2012 -- DeHon 1 ESE534: Computer Organization Day 21: April 4, 2012 Lossless Data Compression.

Today

• Basic Idea• Example• Systolic• Tree• Reclaiming space in tree• CAMs

Penn ESE534 Spring2012 -- DeHon2

Page 3: Penn ESE534 Spring2012 -- DeHon 1 ESE534: Computer Organization Day 21: April 4, 2012 Lossless Data Compression.

Lossless vs. Lossy

• Lossless – can reconstruct source perfectly (bit-identical)

• Examples– Huffman– Run Length Coding– Lempel-Ziv– Unix compress/gzip

• Lossy – capture important elements of original, but maybe not all bits

• Examples– MP3– JPEG– MPEG

Penn ESE534 Spring2012 -- DeHon3

uncompress(compress(x))=x

Page 4: Penn ESE534 Spring2012 -- DeHon 1 ESE534: Computer Organization Day 21: April 4, 2012 Lossless Data Compression.

Dictionary Idea

• Send id for long string rather than all the characters

Penn ESE534 Spring2012 -- DeHon4

Page 5: Penn ESE534 Spring2012 -- DeHon 1 ESE534: Computer Organization Day 21: April 4, 2012 Lossless Data Compression.

Dictionary Example• “the instruction controls

the behavior of the ALU, data memory, and interconnect on each cycle.”

• Characters?– Bits at 8b/character

• Encoding with dictionary? Bits?

Code Word000 ALU

001 memory

010 interconnect

011 instruction

100 data

101 cycle

110 control

111 the

Penn ESE534 Spring2012 -- DeHon5

Page 6: Penn ESE534 Spring2012 -- DeHon 1 ESE534: Computer Organization Day 21: April 4, 2012 Lossless Data Compression.

Dictionary Usability

• When can we do this?

• What might prevent us from pulling this trick?

Penn ESE534 Spring2012 -- DeHon6

Page 7: Penn ESE534 Spring2012 -- DeHon 1 ESE534: Computer Organization Day 21: April 4, 2012 Lossless Data Compression.

Big Idea

• Use data already sent as the dictionary– Don’t need to pre-arrange dictionary– Adapt to common phrases/idioms in a

particular document

Penn ESE534 Spring2012 -- DeHon7

Page 8: Penn ESE534 Spring2012 -- DeHon 1 ESE534: Computer Organization Day 21: April 4, 2012 Lossless Data Compression.

Example

• First line of Dr. Suess’s Green Eggs and Ham– I AM SAM SAM I AM

• Recurring substrings?

Penn ESE534 Spring2012 -- DeHon8

Page 9: Penn ESE534 Spring2012 -- DeHon 1 ESE534: Computer Organization Day 21: April 4, 2012 Lossless Data Compression.

Example

• An encoding:• I AM S<2,3> <5,4><0,4>• Decode.• Characters in original?

– Bits based on 8b characters?

Penn ESE534 Spring2012 -- DeHon9

Page 10: Penn ESE534 Spring2012 -- DeHon 1 ESE534: Computer Organization Day 21: April 4, 2012 Lossless Data Compression.

Example

• An encoding:• I AM S<2,3> <5,4><0,4>• Encode:

– Add 1 bit to identify character vs <x,y>• 9b characters

– <x,y>: 1b says this + 4b for x, 4b for y• Also 9b

• How many bits?

Penn ESE534 Spring2012 -- DeHon10

Page 11: Penn ESE534 Spring2012 -- DeHon 1 ESE534: Computer Organization Day 21: April 4, 2012 Lossless Data Compression.

Technical Issue

• How many bits assign to x and y?• Issues?

• What if the document is huge?– What problems might that pose?

Penn ESE534 Spring2012 -- DeHon11

Page 12: Penn ESE534 Spring2012 -- DeHon 1 ESE534: Computer Organization Day 21: April 4, 2012 Lossless Data Compression.

Windows

• Pragmatic solution– Only keep the last D characters– D is window size– Need log2(D) bits to specify a position– Parameterize encoder based on D– Typically larger D Greater compression

Penn ESE534 Spring2012 -- DeHon12

Page 13: Penn ESE534 Spring2012 -- DeHon 1 ESE534: Computer Organization Day 21: April 4, 2012 Lossless Data Compression.

Encoding

• Greedy simplification– Encode by successively selecting the

longest match between the head of the remaining string to send and the current window

Penn ESE534 Spring2012 -- DeHon13

Page 14: Penn ESE534 Spring2012 -- DeHon 1 ESE534: Computer Organization Day 21: April 4, 2012 Lossless Data Compression.

Algorithm Concept

• While data to send– Find largest match in window– If length=1

• Send character – Else

• Send <x,y> = <match-pos,length>– Shift data encoded into window

Penn ESE534 Spring2012 -- DeHon14

Page 15: Penn ESE534 Spring2012 -- DeHon 1 ESE534: Computer Organization Day 21: April 4, 2012 Lossless Data Compression.

Run Algorithm

• Use D=8• I AM SAM SAM I AM

• How many bits?

Penn ESE534 Spring2012 -- DeHon15

Page 16: Penn ESE534 Spring2012 -- DeHon 1 ESE534: Computer Organization Day 21: April 4, 2012 Lossless Data Compression.

What’s challenging to implement?

• While data to send– Find largest match in window– If length=1

• Send character – Else

• Send <x,y> = <match-pos,length>– Shift data encoded into window

Penn ESE534 Spring2012 -- DeHon16

Page 17: Penn ESE534 Spring2012 -- DeHon 1 ESE534: Computer Organization Day 21: April 4, 2012 Lossless Data Compression.

Systolic AlgorithmGive character in window a PE• Broadcast characters to PE• While (some PE has match-out=true)

match-out=new-search*match OR cont-search*match*match-in

Len=len+1Broadcast next character

• Send <pos-last-match-out,len>• Shift last set of characters into window

Penn ESE534 Spring2012 -- DeHon17

Page 18: Penn ESE534 Spring2012 -- DeHon 1 ESE534: Computer Organization Day 21: April 4, 2012 Lossless Data Compression.

Systolic Hardware

• While (some PE has match-out=true)match-out=new-search*match

OR cont-search*match*match-inLen=len+1Broadcast next character

Penn ESE534 Spring2012 -- DeHon18

Page 19: Penn ESE534 Spring2012 -- DeHon 1 ESE534: Computer Organization Day 21: April 4, 2012 Lossless Data Compression.

Simulate Systolic

• Each student is a PE in the window– Identify left and right neighbors– Raise right hand for match-out– Note left neighbor’s hand at end of

previous cycle to know match-in• I AM SAM SAM I AM

Penn ESE534 Spring2012 -- DeHon19

Page 20: Penn ESE534 Spring2012 -- DeHon 1 ESE534: Computer Organization Day 21: April 4, 2012 Lossless Data Compression.

Contemplate Solution

• How complicated is each PE?• How fast PE?• How fast does encoding operate?• How much area do we need?• How much energy?

Penn ESE534 Spring2012 -- DeHon20

Page 21: Penn ESE534 Spring2012 -- DeHon 1 ESE534: Computer Organization Day 21: April 4, 2012 Lossless Data Compression.

Contemplate Solution

• What’s inefficient or unsatisfying about this solution?

Penn ESE534 Spring2012 -- DeHon21

Page 22: Penn ESE534 Spring2012 -- DeHon 1 ESE534: Computer Organization Day 21: April 4, 2012 Lossless Data Compression.

Tree Based

Penn ESE534 Spring2012 -- DeHon22

Page 23: Penn ESE534 Spring2012 -- DeHon 1 ESE534: Computer Organization Day 21: April 4, 2012 Lossless Data Compression.

Idea

• Avoid need to track multiple substrings• Compress storageBY• Storing common prefixes together in a

tree

Penn ESE534 Spring2012 -- DeHon23

Page 24: Penn ESE534 Spring2012 -- DeHon 1 ESE534: Computer Organization Day 21: April 4, 2012 Lossless Data Compression.

Tree Example

• THEN AND THERE, THEY STOOD…

Penn ESE534 Spring2012 -- DeHon24

T

H

E

N

R Y

E

Page 25: Penn ESE534 Spring2012 -- DeHon 1 ESE534: Computer Organization Day 21: April 4, 2012 Lossless Data Compression.

Idea

• Avoid need to track multiple substrings• Compress storageBY• Storing common prefixes together in a

tree

Penn ESE534 Spring2012 -- DeHon25

Page 26: Penn ESE534 Spring2012 -- DeHon 1 ESE534: Computer Organization Day 21: April 4, 2012 Lossless Data Compression.

Tree Algorithm

Root for each character• Follow tree according to input until no

more match• Send <name of last tree node> • Extend tree with new character• Start over with this character

Penn ESE534 Spring2012 -- DeHon26

Page 27: Penn ESE534 Spring2012 -- DeHon 1 ESE534: Computer Organization Day 21: April 4, 2012 Lossless Data Compression.

Run Algorithm

• I AM SAM SAM I AM

Penn ESE534 Spring2012 -- DeHon27

Page 28: Penn ESE534 Spring2012 -- DeHon 1 ESE534: Computer Organization Day 21: April 4, 2012 Lossless Data Compression.

Encoding

• Encoding bits assuming D=512– So, 9b to encode tree node

Penn ESE534 Spring2012 -- DeHon28

Page 29: Penn ESE534 Spring2012 -- DeHon 1 ESE534: Computer Organization Day 21: April 4, 2012 Lossless Data Compression.

Finite Window

• How can we maintain a finite window in this case?

Penn ESE534 Spring2012 -- DeHon29

Page 30: Penn ESE534 Spring2012 -- DeHon 1 ESE534: Computer Organization Day 21: April 4, 2012 Lossless Data Compression.

Finite Window• Clear and start over• LRU on tree nodes• Maintain two areas

– Encode from one (perhaps both)– Add to new– When new fills,

• New->old, clear old

• Pick old leaf node to replace

Penn ESE534 Spring2012 -- DeHon30

Page 31: Penn ESE534 Spring2012 -- DeHon 1 ESE534: Computer Organization Day 21: April 4, 2012 Lossless Data Compression.

Complexity

• How much work per character to encode?

Penn ESE534 Spring2012 -- DeHon31

Page 32: Penn ESE534 Spring2012 -- DeHon 1 ESE534: Computer Organization Day 21: April 4, 2012 Lossless Data Compression.

Tree Node Representation

Penn ESE534 Spring2012 -- DeHon32

T

H

E

N

R Y

E

Page 33: Penn ESE534 Spring2012 -- DeHon 1 ESE534: Computer Organization Day 21: April 4, 2012 Lossless Data Compression.

Tree Node Representation• Encode in memory

Penn ESE534 Spring2012 -- DeHon33

T

H

E

N

R Y

E

Page 34: Penn ESE534 Spring2012 -- DeHon 1 ESE534: Computer Organization Day 21: April 4, 2012 Lossless Data Compression.

Content Addressable Memory

• What’s a CAM?

Penn ESE534 Spring2012 -- DeHon34

Page 35: Penn ESE534 Spring2012 -- DeHon 1 ESE534: Computer Organization Day 21: April 4, 2012 Lossless Data Compression.

Penn ESE534 Spring2012 -- DeHon35

PLA

Page 36: Penn ESE534 Spring2012 -- DeHon 1 ESE534: Computer Organization Day 21: April 4, 2012 Lossless Data Compression.

CAM

• Memory with Programmable Addresses– Capacity < 2(matchbits)

• PLA with both planes writeable

Penn ESE534 Spring2012 -- DeHon36

Page 37: Penn ESE534 Spring2012 -- DeHon 1 ESE534: Computer Organization Day 21: April 4, 2012 Lossless Data Compression.

Penn ESE534 Spring2012 -- DeHon37

PLA and Memory

Page 38: Penn ESE534 Spring2012 -- DeHon 1 ESE534: Computer Organization Day 21: April 4, 2012 Lossless Data Compression.

Contemplate

• What value do Bunton and Borriello get from using a CAM?

Penn ESE534 Spring2012 -- DeHon38

Page 39: Penn ESE534 Spring2012 -- DeHon 1 ESE534: Computer Organization Day 21: April 4, 2012 Lossless Data Compression.

Admin

• Reading for Monday on Web• FM1 for Monday

– Implement tree version on processor and estimate energy

Penn ESE534 Spring2012 -- DeHon39

Page 40: Penn ESE534 Spring2012 -- DeHon 1 ESE534: Computer Organization Day 21: April 4, 2012 Lossless Data Compression.

Penn ESE534 Spring2012 -- DeHon40

Big Ideas[MSB Ideas]

• Can often compress data without loss of information

• Exploit structure in data to encode• Build dictionary based on data already sent• Code repeating substrings compactly in

terms of data already seen in recent past