Download - Huffman Coding Technique

Transcript
  • 8/10/2019 Huffman Coding Technique

    1/13

  • 8/10/2019 Huffman Coding Technique

    2/13

    Prefix-Code ( Prefix-free codes)Prefix codes are the codes in which a bit string representing someparticular symbol is never a prefix of the bit string representingany other symbol. Every message encoded by a prefix code isuniquely decipherable, since no code word is a prefix of any othercode.For example: C = { a = 1, b = 110, c = 10, d = 111}Than bad is encoded into 1101111.But the code 1101111 can be deciphered as bad and it can alsobe deciphered as acda or acad.

    But if the code is taken as C = { a = 0, b = 110, c = 10, d = 111}Than bad is encoded into 1100111. And the deciphered data is:

    1100111= 110 0111=b a dwhich is unique.

    Thus, prefix codes are more optimum for decoding of themessage, in order for a code to be useful.

    Construction of Huffman codes is based on observationsregarding optimum prefix code: 1. In an optimum code, symbols with higher probability shouldhave shorter codewords as compared to the symbols with lowerprobability.2. In an optimum code, the two symbols that occur with leastfrequency will have the same length (otherwise, the truncation ofthe longer codeword to the same length still produce a decodablecode).

    Algorithm of Huffman Coding:

    1. Take the characters and their frequencies, and sort this list byincreasing frequency2. All the characters are vertices of the tree.

  • 8/10/2019 Huffman Coding Technique

    3/13

  • 8/10/2019 Huffman Coding Technique

    4/13

    b=0a=11c=101d=1000e=1001

    Flow chart

    Example:

  • 8/10/2019 Huffman Coding Technique

    5/13

    Formalized description

    Input . Alphabet , which is the symbol alphabet of size

    .

    Set , which is the set of the (positive) symbolweights (usually proportional to probabilities), i.e.

    .

    Output .

    Code , which is the set of (binary)codewords, where is the codeword for .

    Goal .

    To find:

    Weighted path length L(C) = .

    Average code length

    Entropy H(A)= Where wi is the probability or weight of the symbol

    Examples:

  • 8/10/2019 Huffman Coding Technique

    6/13

    Efficiency of Huffman CodesRedundancy(r) the difference between the entropy and theaverage length of a code.

    In this example, if we use fixed-length codes than we have tospend three bits per sample, which gives code redundancy of3-2.122=0.878bps.

  • 8/10/2019 Huffman Coding Technique

    7/13

    For Huffman code, the redundancy is zero when the probabilitiesare negative powers of two.

    Minimum Variance Huffman Codes

    When two or more than t wo symbols in a Huffman tree have thesame probability, different merge orders produce differentHuffman codes.

    Two code trees with same symbol probabilities:

    Encoding of data: Given the characters and their frequencies, perform thealgorithm and generate a code. Write the characters using thecode

  • 8/10/2019 Huffman Coding Technique

    8/13

    Decoding of data: Given the Huffman tree, figure out what each character is(possible because of prefix property)

    ENCODING:

    Algorithm:1. Find the grey-level probabilities for the image by finding thehistogram.2. Order the input probabilities (histogram magnitudes) fromsmallest to largest.3. Combine the smallest two by addition.

    4. GOTO step 2, until only two probabilities are left5. By working backward along the tree, generate code byalternating assignment of 0 and 1.

    Coding Procedures for an N-symbol source:1. Source reduction

    a. List all probabilities in a descending orderb. Merge the two symbols with smallest probabilities into

    a new compound symbolc. Repeat the above two steps for N-2 steps

    2. Codeword assignmenta. Start from the smallest source and work back to the

    original sourceb. Each merging point corresponds to a node in binary

    codeword tree

    Example:Consider an image with 3 bits/pixel, giving 8 possible gray levels.The image is of 10 rows by 10 columns.Step 1: Find the histogram for the image and Convert intoprobabilities by normalizing to the total number of pixels,Gray level 1 has 20 pixelsGray level 2 has 40 pixels

  • 8/10/2019 Huffman Coding Technique

    9/13

    Gray level 3 has 20 pixelsGray level 4 has 10 pixelsGray level 5 has 10 pixels

    Step 2: The probabilities are ordered.a 5 = 0.1

    a 4 = 0.1

    a 3 = 0.2

    a 1 = 0.2

    a 2 = 0.4

    Step 3: Combine the smallest two by addition.

    Step 4: Repeat steps 2 and 3, where reorder (if necessary) andadd the two smallest probabilities. Reorder and add until only 2values remain.

    Step 5: Actual code assignment is made. Start on the right-handside of the tree and assign 0 s & 1 s .

  • 8/10/2019 Huffman Coding Technique

    10/13

    Gray level represented by 1 bit, a 2, is the most likely to occur(40% of the time) & thus has least information in the informationtheoretic sense.

    DECODING:

    The process of decompression is simply a matter of translatingthe stream of prefix codes to individual byte values, usually bytraversing the Huffman tree node by node as each bit is read fromthe input stream (reaching a leaf node terminates the search forthat particular byte value). For this however, the Huffman treemust be somehow reconstructed or otherwise, the information toreconstruct the tree must be sent a priori.

  • 8/10/2019 Huffman Coding Technique

    11/13

    MATLAB CODE:

    The output screen shot:

  • 8/10/2019 Huffman Coding Technique

    12/13

    Advantages: Algorithm is easy to implement. Produce a lossless compression of images Reduce size of data by 20%-90% in general. Huffman codes can expresses the most common source

    symbols using shorter strings of bits than are used for lesscommon source symbols.

    The running time of Huffman's method is fairly efficient. Huffman code is the most efficient compression method: no

    other mapping of individual source symbols to unique stringsof bits will produce a smaller average output size when theactual symbol frequencies agree with those used to createthe code.

    It is generally beneficial to minimize the variance ofcodeword length.

    Disadvantages:

  • 8/10/2019 Huffman Coding Technique

    13/13

    Although Huffman's original algorithm is optimal for asymbol-by-symbol coding (i.e. a stream of unrelatedsymbols) with a known input probability distribution, it is notoptimal when the symbol-by-symbol restriction is dropped, orwhen the probability mass functions are unknown, notidentically distributed, or not independent (e.g., "cat" is morecommon than "cta"). Other methods such as arithmeticcoding and LZW coding often have better compressioncapability.

    If no characters occur more frequently than others, then noadvantage over fixed length code, such as ASCII.

    Applications

    Huffman coding is a technique used to compress files fortransmission

    Uses statistical coding

    more frequently used symbols have shorter code words

    Works well for text and fax transmissions

    An application that uses several data structures

    Both the .mp3 and .jpg file formats use Huffman coding atone stage of the compression.

    Alternative method that achieves higher compression but isslower is patented by IBM, making Huffman Codesattractive.