Data Compression Advanced Data Structures and...

24
Data Compression Associate Professor Dr. Raed Ibraheem Hamed University of Human Development, College of Science and Technology Computer Science Department 2015 – 2016 Advanced Data Structures and Algorithms DEPARTMENT OF COMPUTER SCIENCE- ADSA - UHD 1

Transcript of Data Compression Advanced Data Structures and...

Page 1: Data Compression Advanced Data Structures and Algorithmsraedh.weebly.com/uploads/1/3/4/6/13465115/lectures_of_adsa__15.pdf2015 –2016 Advanced Data Structures and Algorithms DEPARTMENT

Data Compression

Associate Professor Dr. Raed Ibraheem Hamed

University of Human Development, College of Science and

Technology Computer Science Department

2015 – 2016

Advanced Data Structures and Algorithms

DEPARTMENT OF COMPUTER SCIENCE- ADSA - UHD 1

Page 2: Data Compression Advanced Data Structures and Algorithmsraedh.weebly.com/uploads/1/3/4/6/13465115/lectures_of_adsa__15.pdf2015 –2016 Advanced Data Structures and Algorithms DEPARTMENT

2

Introduction

What is Compression? Compression is the process of

encoding data more efficiently to achieve a reduction in file

size

Advantages of Compression

1) When compressed, the quantity of bits used to store the information

is reduced.

2) Files that are smaller in size will result in shorter transmission times

when they are transferred on the Internet.

3) Compressed files also take up less storage space.

4) File compression can zip up several small files into a single file for

more convenient email transmission.

Page 3: Data Compression Advanced Data Structures and Algorithmsraedh.weebly.com/uploads/1/3/4/6/13465115/lectures_of_adsa__15.pdf2015 –2016 Advanced Data Structures and Algorithms DEPARTMENT

Realize the need for data compression.

Differentiate between lossless and lossy compression.

Understand three lossless compression encoding

techniques: run-length, Huffman, and Lempel Ziv.

After reading this topic, the reader should

be able to:

OBJECTIVES

Understand two lossy compression methods: JPEG and

MPEG.

DEPARTMENT OF COMPUTER SCIENCE- ADSA - UHD 3

Page 4: Data Compression Advanced Data Structures and Algorithmsraedh.weebly.com/uploads/1/3/4/6/13465115/lectures_of_adsa__15.pdf2015 –2016 Advanced Data Structures and Algorithms DEPARTMENT

©Brooks/Cole, 2003

Figure 15-1

Data compression methods

Data compression means sending or storing a smaller number of bits.

DEPARTMENT OF COMPUTER SCIENCE- ADSA - UHD 4

Page 5: Data Compression Advanced Data Structures and Algorithmsraedh.weebly.com/uploads/1/3/4/6/13465115/lectures_of_adsa__15.pdf2015 –2016 Advanced Data Structures and Algorithms DEPARTMENT

DEPARTMENT OF COMPUTER SCIENCE- ADSA - UHD 5

Lossless Compression

Methods

Page 6: Data Compression Advanced Data Structures and Algorithmsraedh.weebly.com/uploads/1/3/4/6/13465115/lectures_of_adsa__15.pdf2015 –2016 Advanced Data Structures and Algorithms DEPARTMENT

©Brooks/Cole, 2003

Lossless compression

In lossless data compression, the integrity of the data ispreserved.

The original data and the data after compression anddecompression are exactly the same because thecompression and decompression algorithms are exactly theinverse of each other.

Example:◦ Run-length encoding

◦ Huffman encoding

◦ Lempel Ziv (L Z) encoding (dictionary-based encoding)

DEPARTMENT OF COMPUTER SCIENCE- ADSA - UHD 6

Page 7: Data Compression Advanced Data Structures and Algorithmsraedh.weebly.com/uploads/1/3/4/6/13465115/lectures_of_adsa__15.pdf2015 –2016 Advanced Data Structures and Algorithms DEPARTMENT

©Brooks/Cole, 2003

Run-length encoding

• It does not need knowledge of the frequency of occurrence ofsymbols and can be very efficient if data are represented as 0s and1s.

• For example:

DEPARTMENT OF COMPUTER SCIENCE- ADSA - UHD 7DEPARTMENT OF COMPUTER SCIENCE- ADSA - UHD 6

Page 8: Data Compression Advanced Data Structures and Algorithmsraedh.weebly.com/uploads/1/3/4/6/13465115/lectures_of_adsa__15.pdf2015 –2016 Advanced Data Structures and Algorithms DEPARTMENT

©Brooks/Cole, 2003

Run-length encoding for two symbols

We can encode one symbol which is more frequent than the other.

This example only encode 0’s between 1’s.

There is no 0 between 1’s

DEPARTMENT OF COMPUTER SCIENCE- ADSA - UHD 8DEPARTMENT OF COMPUTER SCIENCE- ADSA - UHD 7

14

120

4

Page 9: Data Compression Advanced Data Structures and Algorithmsraedh.weebly.com/uploads/1/3/4/6/13465115/lectures_of_adsa__15.pdf2015 –2016 Advanced Data Structures and Algorithms DEPARTMENT

©Brooks/Cole, 2003DEPARTMENT OF COMPUTER SCIENCE- ADSA - UHD 9

Code the run length of 0’s using k bits. Transmit the code.

Do not transmit runs of 1’s.

Two consecutive 1’s are implicitly separately by a zero-length

run of zero.

Example: suppose we use k = 4 bits to encode the run length

(maximum run length of 15) for following bit patterns.

Binary Run-length encoding

DEPARTMENT OF COMPUTER SCIENCE- ADSA - UHD 9DEPARTMENT OF COMPUTER SCIENCE- ADSA - UHD 7

Page 10: Data Compression Advanced Data Structures and Algorithmsraedh.weebly.com/uploads/1/3/4/6/13465115/lectures_of_adsa__15.pdf2015 –2016 Advanced Data Structures and Algorithms DEPARTMENT

©Brooks/Cole, 2003DEPARTMENT OF COMPUTER SCIENCE- ADSA - UHD 10DEPARTMENT OF COMPUTER SCIENCE- ADSA - UHD 10

DEPARTMENT OF COMPUTER SCIENCE- ADSA - UHD 8

Data files frequently contain the same character repeated many

times in a row. For example, text files use multiple spaces to

separate sentences, indent paragraphs, format tables & charts, etc.

Example: run-length encoding for a data sequence having

frequent runs of zeros

Note: many single zeros in the data can make the encoded file larger than the original.

Page 11: Data Compression Advanced Data Structures and Algorithmsraedh.weebly.com/uploads/1/3/4/6/13465115/lectures_of_adsa__15.pdf2015 –2016 Advanced Data Structures and Algorithms DEPARTMENT

©Brooks/Cole, 2003DEPARTMENT OF COMPUTER SCIENCE- ADSA - UHD 11

The following algorithm generates Huffman code:

Find (or assume) the probability of each values occurrence.

Initialization: Put all nodes in an list, keep it sorted at all times (e.g., ABCDE).

Take the two symbols with the lowest probability, and placethem as leaves on a binary tree.

Form a new row in the table replacing the these two symbolswith a new symbol. This new symbol forms a branch node inthe tree. Draw it in the tree with branches to its leaf(component) symbols

Assign the new symbol a probability equal to the sum of thecomponent symbol’s probability.

Huffman coding

DEPARTMENT OF COMPUTER SCIENCE- ADSA - UHD 11DEPARTMENT OF COMPUTER SCIENCE- ADSA - UHD 11DEPARTMENT OF COMPUTER SCIENCE- ADSA - UHD 11

Page 12: Data Compression Advanced Data Structures and Algorithmsraedh.weebly.com/uploads/1/3/4/6/13465115/lectures_of_adsa__15.pdf2015 –2016 Advanced Data Structures and Algorithms DEPARTMENT

©Brooks/Cole, 2003DEPARTMENT OF COMPUTER SCIENCE- ADSA - UHD 12

Repeat the above until there is only one symbol left. This is theroot of the tree.

Nominally assign 1’s to the right hand branches and 0’s to theleft hand branches at each node.

Read the code for each symbol from the root of the tree.

Huffman coding

DEPARTMENT OF COMPUTER SCIENCE- ADSA - UHD 12DEPARTMENT OF COMPUTER SCIENCE- ADSA - UHD 12DEPARTMENT OF COMPUTER SCIENCE- ADSA - UHD 12DEPARTMENT OF COMPUTER SCIENCE- ADSA - UHD 11

David Huffman

Page 13: Data Compression Advanced Data Structures and Algorithmsraedh.weebly.com/uploads/1/3/4/6/13465115/lectures_of_adsa__15.pdf2015 –2016 Advanced Data Structures and Algorithms DEPARTMENT

©Brooks/Cole, 2003

Table 15.1 Frequency of characters

Character A B C D E------------------------------------------------------Frequency 17 12 12 27 32

Huffman coding

o In Huffman coding, you assign shorter codes to symbols that occurmore frequently and longer codes to those that occur lessfrequently.

o The process of building the tree begins by counting theoccurrences of each symbol in the text to be encoded.

o For example:

DEPARTMENT OF COMPUTER SCIENCE- ADSA - UHD 13DEPARTMENT OF COMPUTER SCIENCE- ADSA - UHD 13DEPARTMENT OF COMPUTER SCIENCE- ADSA - UHD 9

Page 14: Data Compression Advanced Data Structures and Algorithmsraedh.weebly.com/uploads/1/3/4/6/13465115/lectures_of_adsa__15.pdf2015 –2016 Advanced Data Structures and Algorithms DEPARTMENT

Figure 15-4

Huffman coding

DEPARTMENT OF COMPUTER SCIENCE- ADSA - UHD 14

Page 15: Data Compression Advanced Data Structures and Algorithmsraedh.weebly.com/uploads/1/3/4/6/13465115/lectures_of_adsa__15.pdf2015 –2016 Advanced Data Structures and Algorithms DEPARTMENT

Figure 15-5

Final tree and code

DEPARTMENT OF COMPUTER SCIENCE- ADSA - UHD 15

Page 16: Data Compression Advanced Data Structures and Algorithmsraedh.weebly.com/uploads/1/3/4/6/13465115/lectures_of_adsa__15.pdf2015 –2016 Advanced Data Structures and Algorithms DEPARTMENT

Figure 15-6

Huffman encoding

DEPARTMENT OF COMPUTER SCIENCE- ADSA - UHD 16

Page 17: Data Compression Advanced Data Structures and Algorithmsraedh.weebly.com/uploads/1/3/4/6/13465115/lectures_of_adsa__15.pdf2015 –2016 Advanced Data Structures and Algorithms DEPARTMENT

Figure 15-7

Huffman decoding

DEPARTMENT OF COMPUTER SCIENCE- ADSA - UHD 17

Page 18: Data Compression Advanced Data Structures and Algorithmsraedh.weebly.com/uploads/1/3/4/6/13465115/lectures_of_adsa__15.pdf2015 –2016 Advanced Data Structures and Algorithms DEPARTMENT

©Brooks/Cole, 2003

Huffman coding

The beauty of Huffman coding is that no code in theprefix of another code.

There is no ambiguity in encoding.

The receiver can decode the received data withoutambiguity.

Huffman code is called instantaneous (immediate)code because the decoder can unambiguously decodethe bits instantaneously with the minimum number ofbits.

DEPARTMENT OF COMPUTER SCIENCE- ADSA - UHD 18DEPARTMENT OF COMPUTER SCIENCE- ADSA - UHD 18DEPARTMENT OF COMPUTER SCIENCE- ADSA - UHD 18DEPARTMENT OF COMPUTER SCIENCE- ADSA - UHD 15

Page 19: Data Compression Advanced Data Structures and Algorithmsraedh.weebly.com/uploads/1/3/4/6/13465115/lectures_of_adsa__15.pdf2015 –2016 Advanced Data Structures and Algorithms DEPARTMENT

©Brooks/Cole, 2003

Lempel Ziv encoding

LZ encoding is an example of a category ofalgorithms called dictionary-based encoding.

The idea is to create a dictionary (table) of stringsused during the communication session.

The compression algorithm extracts the smallestsubstring that cannot be found in the dictionaryfrom the remaining non-compressed string.

DEPARTMENT OF COMPUTER SCIENCE- ADSA - UHD 19DEPARTMENT OF COMPUTER SCIENCE- ADSA - UHD 19DEPARTMENT OF COMPUTER SCIENCE- ADSA - UHD 19DEPARTMENT OF COMPUTER SCIENCE- ADSA - UHD 16

Abraham Lempel Jacob Ziv

Page 20: Data Compression Advanced Data Structures and Algorithmsraedh.weebly.com/uploads/1/3/4/6/13465115/lectures_of_adsa__15.pdf2015 –2016 Advanced Data Structures and Algorithms DEPARTMENT

Figure 15-8:Part I

Example of Lempel Ziv encoding

DEPARTMENT OF COMPUTER SCIENCE- ADSA - UHD 20

Page 21: Data Compression Advanced Data Structures and Algorithmsraedh.weebly.com/uploads/1/3/4/6/13465115/lectures_of_adsa__15.pdf2015 –2016 Advanced Data Structures and Algorithms DEPARTMENT

Figure 15-8:Part 2

Example of Lempel Ziv encoding

DEPARTMENT OF COMPUTER SCIENCE- ADSA - UHD 21

Page 22: Data Compression Advanced Data Structures and Algorithmsraedh.weebly.com/uploads/1/3/4/6/13465115/lectures_of_adsa__15.pdf2015 –2016 Advanced Data Structures and Algorithms DEPARTMENT

Figure 15-9: Part I

Example of Lempel Ziv decoding

DEPARTMENT OF COMPUTER SCIENCE- ADSA - UHD 22

Page 23: Data Compression Advanced Data Structures and Algorithmsraedh.weebly.com/uploads/1/3/4/6/13465115/lectures_of_adsa__15.pdf2015 –2016 Advanced Data Structures and Algorithms DEPARTMENT

Figure 15-9: Part II

Example of Lempel Ziv decoding

DEPARTMENT OF COMPUTER SCIENCE- ADSA - UHD 23

Page 24: Data Compression Advanced Data Structures and Algorithmsraedh.weebly.com/uploads/1/3/4/6/13465115/lectures_of_adsa__15.pdf2015 –2016 Advanced Data Structures and Algorithms DEPARTMENT

©Brooks/Cole, 2003DEPARTMENT OF COMPUTER SCIENCE- ADSA - UHD 24DEPARTMENT OF COMPUTER SCIENCE- ADSA - UHD 24DEPARTMENT OF COMPUTER SCIENCE- ADSA - UHD 24DEPARTMENT OF COMPUTER SCIENCE- ADSA - UHD 24DEPARTMENT OF COMPUTER SCIENCE- ADSA - UHD 24

DEPARTMENT OF COMPUTER SCIENCE- ADSA - UHD 22