File Compression Techniques Alex Robertson. Outline History Lossless vs Lossy Basics Huffman Coding...

20
File Compression Techniques Alex Robertson

Transcript of File Compression Techniques Alex Robertson. Outline History Lossless vs Lossy Basics Huffman Coding...

Page 1: File Compression Techniques Alex Robertson. Outline History Lossless vs Lossy Basics Huffman Coding Getting Advanced Lossy Explained Limitations Future.

File Compression TechniquesAlex Robertson

Page 2: File Compression Techniques Alex Robertson. Outline History Lossless vs Lossy Basics Huffman Coding Getting Advanced Lossy Explained Limitations Future.

Outline

History Lossless vs Lossy Basics Huffman Coding Getting Advanced Lossy Explained Limitations Future

Page 3: File Compression Techniques Alex Robertson. Outline History Lossless vs Lossy Basics Huffman Coding Getting Advanced Lossy Explained Limitations Future.

History, where this all started The Problem! 1940s Shannon-Fano coding

Properties Different codes have different numbers of bits. Codes for symbols with low probabilities have more

bits, and codes for symbols with high probabilities have fewer bits.

Though the codes are of different bit lengths, they can be uniquely decoded.

Page 4: File Compression Techniques Alex Robertson. Outline History Lossless vs Lossy Basics Huffman Coding Getting Advanced Lossy Explained Limitations Future.

Lossless vs Lossy

Lossless DEFLATE Data, every little detail is important

Lossy JPEG MP3 Data can be lost and unnoticed

Page 5: File Compression Techniques Alex Robertson. Outline History Lossless vs Lossy Basics Huffman Coding Getting Advanced Lossy Explained Limitations Future.

Understanding the Basics Properties

Different codes have different numbers of bits. Codes for symbols with low probabilities have more

bits, and codes for symbols with high probabilities have fewer bits.

Though the codes are of different bit lengths, they can be uniquely decoded.

Encode “SATA”

S = 10 A = 0 T = 11

Page 6: File Compression Techniques Alex Robertson. Outline History Lossless vs Lossy Basics Huffman Coding Getting Advanced Lossy Explained Limitations Future.

Prefix Rule S = 01 A = 0 T = 00

SATA SAAAA STT

010000

No code can be the prefix of another code.

If 0 is a code,0* can’t be a code.

Page 7: File Compression Techniques Alex Robertson. Outline History Lossless vs Lossy Basics Huffman Coding Getting Advanced Lossy Explained Limitations Future.

Make a Tree

Create a Tree

A = 010B = 11C = 00D = 10R = 011

Page 8: File Compression Techniques Alex Robertson. Outline History Lossless vs Lossy Basics Huffman Coding Getting Advanced Lossy Explained Limitations Future.

Decode

01011011010000101001011011010

A = 010B = 11C = 00D = 10R = 011

Violates the property:

Codes for symbols with low probabilities have more bits, and codes for symbols with high probabilities have fewer bits.

Page 9: File Compression Techniques Alex Robertson. Outline History Lossless vs Lossy Basics Huffman Coding Getting Advanced Lossy Explained Limitations Future.

Huffman Coding

Create a Tree Encode “ABRACADABRA”

Determine Frequencies1. The two least frequent “nodes” are located. 2. A parent node is created from the two above nodes

and it is given a weight equal to the sum of the two contain node frequencies.

3. One of the child nodes is given the 0 bit and the other the 1 bit

4. Repeat the above steps until only one node is left.

Page 10: File Compression Techniques Alex Robertson. Outline History Lossless vs Lossy Basics Huffman Coding Getting Advanced Lossy Explained Limitations Future.

Does it work?

Re-encode

01011011010000101001011011010 29 bits

Page 11: File Compression Techniques Alex Robertson. Outline History Lossless vs Lossy Basics Huffman Coding Getting Advanced Lossy Explained Limitations Future.

It Works!

01011001110011110101100= 23 bits

ABRACADABRA= 11 character * 7 bits each= 77 bits

but…

Page 12: File Compression Techniques Alex Robertson. Outline History Lossless vs Lossy Basics Huffman Coding Getting Advanced Lossy Explained Limitations Future.

It Works… With Issues.

Header includes the probability table

Not the best in certain cases

Example.‘A’ 100 times

Huffman only reduces this to 100 bits(minus the header)

Page 13: File Compression Techniques Alex Robertson. Outline History Lossless vs Lossy Basics Huffman Coding Getting Advanced Lossy Explained Limitations Future.

Moving Forward

Arithmetic Method Not Specific Code Continuously changing single

floating-point output number

Example

Page 14: File Compression Techniques Alex Robertson. Outline History Lossless vs Lossy Basics Huffman Coding Getting Advanced Lossy Explained Limitations Future.

“BILL GATES”Character Probability Range

SPACE 1/10 0.0 >= r > 0.1

A 1/10 0.1 >= r > 0.2

B 1/10 0.2 >= r > 0.3

E 1/10 0.3 >= r > 0.4

G 1/10 0.4 >= r > 0.5

I 1/10 0.5 >= r > 0.6

L 2/10 0.6 >= r > 0.8

S 1/10 0.8 >= r > 0.9

T 1/10 0.9 >= r > 1.0

Page 15: File Compression Techniques Alex Robertson. Outline History Lossless vs Lossy Basics Huffman Coding Getting Advanced Lossy Explained Limitations Future.

Dictionary Based

Implemented in the late 70s Uses previously seen words as a

dictionary.

the quick brown fox jumped over the lazy dog

I bought a Mississippi Banana in Mississippi.

Page 16: File Compression Techniques Alex Robertson. Outline History Lossless vs Lossy Basics Huffman Coding Getting Advanced Lossy Explained Limitations Future.

Lossy Compression

Lossy Formula Lossless Formula

My Sound!

Page 17: File Compression Techniques Alex Robertson. Outline History Lossless vs Lossy Basics Huffman Coding Getting Advanced Lossy Explained Limitations Future.

Mathematical Limitations

Claude E. Shannon

http://www.data-compression.com/theory.html

Page 18: File Compression Techniques Alex Robertson. Outline History Lossless vs Lossy Basics Huffman Coding Getting Advanced Lossy Explained Limitations Future.

Example

DEFLATE http://en.wikipedia.org/wiki/DEFLAT

E

Page 19: File Compression Techniques Alex Robertson. Outline History Lossless vs Lossy Basics Huffman Coding Getting Advanced Lossy Explained Limitations Future.

Future

Hardware is getting better Theories are the same

Page 20: File Compression Techniques Alex Robertson. Outline History Lossless vs Lossy Basics Huffman Coding Getting Advanced Lossy Explained Limitations Future.

Thanks You

Questions