Greedy Algorithms (Huffman Coding) Slide 1. Huffman Coding A technique to compress data effectively...

Greedy Algorithms(Huffman Coding)

Huffman Coding

• A technique to compress data effectively – Usually between 20%-90% compression

• Lossless compression– No information is lost– When decompress, you get the original file

Original file

Compressed file

Huffman coding

Huffman Coding: Applications

• Saving space– Store compressed files instead of original files

• Transmitting files or data– Send compressed data to save transmission time and power

• Encryption and decryption– Cannot read the compressed file without knowing the “key”

Original file

Compressed file

Huffman coding

Main Idea: Frequency-Based Encoding

• Assume in this file only 6 characters appear

– E, A, C, T, K, N

• The frequencies are: Character Frequency

E 10,000

A 4,000

N 100• Option I (No Compression)

– Each character = 1 Byte (8 bits)– Total file size = 14,700 * 8 = 117,600 bits

• Option 2 (Fixed size compression)– We have 6 characters, so we need

3 bits to encode them– Total file size = 14,700 * 3 = 44,100 bits

Character Fixed Encoding

Main Idea: Frequency-Based Encoding(Cont’d)

• Assume in this file only 6 characters appear

– E, A, C, T, K, N

• The frequencies are: Character Frequency

E 10,000

A 4,000

N 100• Option 3 (Huffman compression)– Variable-length compression– Assign shorter codes to more frequent characters and

longer codes to less frequent characters

– Total file size:

HuffmanEncoding

T 1110

K 11110

N 11111

(10,000 x 1) + (4,000 x 2) + (300 x 3) + (200 x 4) + (100 x 5) + (100 x 5) = 20,700 bits

Huffman Coding

• A variable-length coding for characters– More frequent characters shorter codes

– Less frequent characters longer codes

• It is not like ASCII coding where all characters have the same coding length (8 bits)

• Two main questions – How to assign codes (Encoding process)?

– How to decode (from the compressed file, generate the original file) (Decoding process)?

Decoding for fixed-length codes is much easier

Character Fixed-length Encoding

010001100110111000

010 001 100 110 111 000

Divide into 3’s

C A T K N E

Decode

Decoding for variable-length codes is not that easy…

Character Variable-length Encoding

… …

000001

It means what???

EEEC EAC AEC

Huffman encoding guarantees to avoid this uncertainty …Always have a single decoding

Huffman Algorithm

• Step 1: Get Frequencies– Scan the file to be compressed and count the occurrence of

each character

– Sort the characters based on their frequency

• Step 2: Build Tree & Assign Codes– Build a Huffman-code tree (binary tree)

– Traverse the tree to assign codes

• Step 3: Encode (Compress)– Scan the file again and replace each character by its code

• Step 4: Decode (Decompress)

– Huffman tree is the key to decompress the file

Step 1: Get Frequencies

Eerie eyes seen near lake.Input File:

Step 2: Build Huffman Tree & Assign Codes

• It is a binary tree in which each character is a leaf node– Initially each node is a separate root

• At each step– Select two roots with smallest frequency and connect them to

a new parent (Break ties arbitrary) [The greedy choice]

– The parent will get the sum of frequencies of the two child nodes

• Repeat until you have one root

Example

Each char. has a leaf node with its frequency

Find the smallest two frequencies…Replace them with their parent

Now we have a single root…This is the Huffman Tree

Lets Analyze Huffman Tree

• All characters are at the leaf nodes

• The number at the root = # of characters in the file

• High-frequency chars (E.g., “e”) are near the root

• Low-frequency chars are far from the root

Lets Assign Codes

• Traverse the tree

– Any left edge add label 0

– As right edge add label 1

• The code for each character is its root-to-leaf label sequence

Lets Assign Codes

Coding Table

Huffman Algorithm

each character

– Huffman tree is the key to decompess the file

Generate the encoded file

Step 3: Encode (Compress) The File

Eerie eyes seen near lake.

Input File:

Coding Table

000010 1100000110 ….

Notice that no code is prefix to any other code Ensures the decoding will be unique (Unlike Slide 8)

Step 4: Decode (Decompress)

• Must have the encoded file + the coding tree

• Scan the encoded file– For each 0 move left in the tree– For each 1 move right– Until reach a leaf node Emit that character and go back to the root

000010 1100000110 ….

Eerie …

Generate the original file

Huffman Algorithm

each character

– Huffman tree is the key to decompess the file

Greedy Algorithms (Huffman Coding) Slide 1. Huffman Coding A technique to compress data effectively...

Documents

Transcript of Greedy Algorithms (Huffman Coding) Slide 1. Huffman Coding A technique to compress data effectively...

03: Huffman Coding - vernier.frederic.free.frvernier.frederic.free.fr/Teaching/InfoTermS/InfoNumerique/Vassil... · 03: Huffman Coding CSCI 6990 Data Compression Vassil Roussev 15

Lecture 12: Compression & coding › ~ffh8x › d › soi19S › Lecture12.pdf · Huffman coding A Huffman code is a prefix code with a binary tree structure. It is generated on-the-fly

Image Compression Techniques -A revie Renu.pdf · 2019-02-28 · compression: D Area Coding 1. Run length encoding 2. Huffman encoding 3. LZW coding 4. Area coding sequences, 2 Lossy

IMAGE COMPRESSION USING DEEP AUTOENCODER - · PDF fileIMAGE COMPRESSION USING DEEP AUTOENCODER A PROJECT REPORT ... 2.1.1 Image compression using Huffman coding ... 2.1.3 Representing

Data Compression Engine Enhancement Using Huffman Coding Algorithm2

Huffman Coding - uoanbar.edu.iq

Huffman Coding

Data Compression (Huffman)

Data compression huffman coding algoritham

Image Compression (in addition to notes) - BGUiip/slides/compression.pdf · Image Compression (in addition to notes) ... transform image coding (DFT, DCT, ... • Huffman coding for

Lecture 17: Huffman Coding - Department of Computer …€¦ · · 2003-04-09Lecture 17: Huffman Coding CLRS- 16.3 Outline of this Lecture Codes and Compression. Huffman coding.

FAX Image Compression - Haifacs.haifa.ac.il/~nimrod/Compression/Image/I1fax2009.pdf · MODIFIED HUFFMAN METHOD (MHM)– Unidimensional coding method based on the coding of the lenght

Huffman Coding Technique

UNIT II TEXT COMPRESSION. a. Outline Compression techniques Run length coding Huffman coding Adaptive Huffman Coding Arithmetic coding Shannon-Fano coding.

IMAGE COMPRESSION - EVTEKusers.evtek.fi/~erkkir/ImageTechnology2013/1 ImagetechPDF/7... · Lossless image compression • Image can be decompressed back to original ... Huffman coding

Compression & Huffman Codes

Compression and Huffman Coding - University of Calgarypages.cpsc.ucalgary.ca/~marina/319/CPSC319_Compression.pdfHuffman Coding Non-determinism of the algorithm Implementations: Singly-linked

Implementation of Lossless Huffman Coding: Image ... · Huffman coding is an entropy encoding algorithm used for lossless data compression. It provides the least amount of information

ARITHMETIC CODING FOR DATA COIUPRESSION - … · The state of the art in data compression is arithmetic coding, not better- known Huffman method. ... for Huffman coding. For example,

th SEMESTER (CBCS) Analog and Digital Communication …vtu.ac.in/pdf/cbcs/5sem/medel678syll.pdf · Image Compression: Fundamentals ... Basic compression methods – Huffman coding,