4.HUFF DOC

download 4.HUFF DOC

of 23

Transcript of 4.HUFF DOC

  • 8/4/2019 4.HUFF DOC

    1/23

    Huffman coding & Decoding

    SUBMITTED BY

    Ankit saxena

    Ashutosh

    Akshay

    Jayakrishna

    UNDER THE ESTEEMED GUIDANCE OF

    Mrs. MITA PODDAR

    Cranes Software International Ltd.

    (Cranes Varsity - Training Division)

    # 5, Service Road, Domlur Layout,

    Airport Road, Bangalore-560071.

    Ph: 25352636/37, www.cranessoftware.com

    http://www.cranessoftware/http://www.cranessoftware/http://www.cranessoftware/
  • 8/4/2019 4.HUFF DOC

    2/23

    HUFFMAN CODING AND DECODING

    CERTIFICATE

    This is to certify that Project on Huffman Coding & Decoding using C

    Language has been completed successfully by the following students as per

    the RTOS course requirement:

    Ankit saxena

    Ashutosh

    Akshay

    Jayakrishna

    UNDER THE ESTEEMED GUIDANCE OF

    Mrs. MITA PODDAR

    Cranes Software International Ltd. 2

  • 8/4/2019 4.HUFF DOC

    3/23

    HUFFMAN CODING AND DECODING

    ACKNOWLEDGEMENT

    First of all we would like to thank the ALMIGHTY for giving his blessings to complete

    the project successfully.

    We would like to thank CRANESVARSITY for providing an opportunity to

    successfully carry out a project in C module.

    Our profound thanks and deep sense of gratitude to our respected guide Mrs. MITA

    PODDAR valuable as well as timely guidance and for being a constant source of inspiration

    throughout the course of project.

    We also thank the staff member ofCRANES VARSITY for their valuable cooperation

    during the project work.

    Last but not least, a word of thanks to our beloved parents for their love, blessings and

    encouragement.

    Cranes Software International Ltd. 3

  • 8/4/2019 4.HUFF DOC

    4/23

    HUFFMAN CODING AND DECODING

    INDEX

    1. Introduction 5

    2. History 5

    3. Huffman properties 5

    4. Huffman code discussion 6

    5. Implementation issues 8

    6. Portability issues 11

    7. Algorithm 12

    8. Programming code 13

    9. Advantages and disadvantages16

    10.Applications 20

    11.Conclusion 21

    12.Reference 22

    Cranes Software International Ltd. 4

  • 8/4/2019 4.HUFF DOC

    5/23

    HUFFMAN CODING AND DECODING

    1. INTRODUCTION:

    Huffman coding [24] is based on the frequency of occurrence of a dataitem (pixel in images). The principle is to use a lower number of bits toencode the data that occurs more frequently.

    Codes are stored in a Code Book which may be constructed for eachimage or a set of images. In all cases the code book plus encoded data mustbe transmitted to enable decoding.

    Huffman coding is an entropy encodingalgorithm used fordata compression that finds

    the optimal system of encoding strings based on the relative frequency of each character. It wasdeveloped by David A. Huffman as a Ph.D. student at MIT in 1952, and published inA Method

    for the Construction of Minimum-Redundancy Codes.

    Huffman coding uses a specific method for choosing the representations for each symbol,

    resulting in a prefix-free code (that is, no bit string of any symbol is a prefix of the bit string ofany other symbol) that expresses the most common characters in the shortest way possible. It has

    been proven that Huffman coding is the most effective compression method of this type: no other

    mapping of source symbols to strings of bits will produce a smaller output when the actualsymbol frequencies agree with those used to create the code. (Huffman coding is such a

    widespread method for creating prefix-free codes that the term "Huffman code" is widely used as

    a synonym for "prefix-free code" even when such a code was not produced by Huffman'salgorithm.)

    For a set of symbols with a uniform probability distribution and a number of members

    which is apower of two, Huffman coding is equivalent to simple binary block encoding.

    2. HISTROY:

    In 1951, David Huffman and his MIT information theory classmates were given the

    choice of a term paper or a final exam. The professor, Robert M. Fano, assigned a term paper on

    the problem of finding the most efficient binary code. Huffman, unable to prove any codes were

    Cranes Software International Ltd. 5

    http://www.wordiq.com/definition/Entropy_encodinghttp://www.wordiq.com/definition/Algorithmhttp://www.wordiq.com/definition/Data_compressionhttp://www.wordiq.com/definition/David_A._Huffmanhttp://www.wordiq.com/definition/Doctor_of_Philosophyhttp://www.wordiq.com/definition/Doctor_of_Philosophyhttp://www.wordiq.com/definition/Massachusetts_Institute_of_Technologyhttp://www.wordiq.com/definition/1952http://www.wordiq.com/definition/Prefix-free_codehttp://www.wordiq.com/definition/Power_of_twohttp://www.wordiq.com/definition/Robert_M._Fanohttp://www.wordiq.com/definition/Entropy_encodinghttp://www.wordiq.com/definition/Algorithmhttp://www.wordiq.com/definition/Data_compressionhttp://www.wordiq.com/definition/David_A._Huffmanhttp://www.wordiq.com/definition/Doctor_of_Philosophyhttp://www.wordiq.com/definition/Massachusetts_Institute_of_Technologyhttp://www.wordiq.com/definition/1952http://www.wordiq.com/definition/Prefix-free_codehttp://www.wordiq.com/definition/Power_of_twohttp://www.wordiq.com/definition/Robert_M._Fano
  • 8/4/2019 4.HUFF DOC

    6/23

    HUFFMAN CODING AND DECODING

    the most efficient, was about to give up and start studying for the final when he hit upon the idea

    of using a frequency-sorted binary tree, and quickly proved this method the most efficient.

    In doing so, the student outdid his professor, who had worked with information theory inventor

    Claude Shannon to develop a similar code. Huffman avoided the major flaw of Shannon-Fano

    coding by building the tree from the bottom up instead of from the top down.

    3. HUFFMAN PROPERTIES:

    The frequencies used can be generic ones for the application domain that are based on

    average experience, or they can be the actual frequencies found in the text being compressed.(This variation requires that a frequency table or other hint as to the encoding must be stored

    with the compressed text; implementations employ various tricks to store these tables

    efficiently.)

    Huffman coding is optimal when the probability of each input symbol is a negative power

    of two. Prefix-free codes tend to have slight inefficiency on small alphabets, where probabilitiesoften fall between these optimal points. Expanding the alphabet size by coalescing multiple

    symbols into "words" before Huffman coding can help a bit. The worst case for Huffman coding

    can happen when the probability of a symbol exceeds 2 -1 making the upper limit of inefficiencyunbounded. To prevent this, run-length encoding can be used to preprocess the symbols.

    Extreme cases of Huffman codes are connected with Fibonacci and Lucas numbers and

    Wythoff array.

    Arithmetic coding produces slight gains over Huffman coding, but in practice these gainshave not been large enough to offset arithmetic coding's higher computational complexity and

    patent royalties (As of November 2001, IBM owns patents on the core concepts of arithmeticcoding in several jurisdictions.)

    4. HUFFMAN CODE DISCUSSION:

    This is one of those pages documenting an effort that never seems to end. I thought it

    would end, but I keep coming up with things to try. This effort grew from a little curiosity. One

    day, my copy of"Numerical Recipes In C" fell open to the section on Huffman Coding. Thealgorithm looked fairly simple, but the source code that followed looked pretty complicated and

    relied on the vector library used throughout the book.

    The complexity of the source in the book caused me to search the web for clearer source.Unfortunately, all I found was source further obfuscated by either C++ or Java languagestructures. Instead of searching any further, I decided to write my own implementation using

    what I hope is easy to follow ANSI C.

    I thought that I could put everything to rest after implementing the basic Huffman

    algorithm. I thought wrong. Mark Nelson ofData Compression. info had mentioned that thereare canonical Huffman codes which require less information to be stored in encoded files so that

    Cranes Software International Ltd. 6

    http://www.wordiq.com/definition/Information_theoryhttp://www.wordiq.com/definition/Claude_Shannonhttp://www.wordiq.com/definition/Shannon-Fano_codinghttp://www.wordiq.com/definition/Shannon-Fano_codinghttp://www.wordiq.com/definition/Run-length_encodinghttp://www.wordiq.com/definition/Fibonacci_numberhttp://mathworld.wolfram.com/WythoffArray.htmlhttp://www.wordiq.com/definition/Arithmetic_codinghttp://www.wordiq.com/definition/Patenthttp://www.wordiq.com/definition/Royaltyhttp://www.wordiq.com/definition/As_of_2001http://www.wordiq.com/definition/IBMhttp://www.nr.com/http://www.datacompression.info/http://www.wordiq.com/definition/Information_theoryhttp://www.wordiq.com/definition/Claude_Shannonhttp://www.wordiq.com/definition/Shannon-Fano_codinghttp://www.wordiq.com/definition/Shannon-Fano_codinghttp://www.wordiq.com/definition/Run-length_encodinghttp://www.wordiq.com/definition/Fibonacci_numberhttp://mathworld.wolfram.com/WythoffArray.htmlhttp://www.wordiq.com/definition/Arithmetic_codinghttp://www.wordiq.com/definition/Patenthttp://www.wordiq.com/definition/Royaltyhttp://www.wordiq.com/definition/As_of_2001http://www.wordiq.com/definition/IBMhttp://www.nr.com/http://www.datacompression.info/
  • 8/4/2019 4.HUFF DOC

    7/23

    HUFFMAN CODING AND DECODING

    they may be decoded later. Now I have an easy to follow (I hope) ANSI C implementation of

    encoding and decoding using canonical Huffman codes.

    As time passes, I've been tempted to make other enhancements to my implementation,and I've created different versions of code. Depending on what you're looking for, one version

    might suit you better than another.

    The rest of this page discusses the results of my effort.

    Algorithm Overview

    Huffman coding is a statistical technique which attempts to reduce the amount of bits

    required to represent a string of symbols. The algorithm accomplishes its goals by allowingsymbols to vary in length. Shorter codes are assigned to the most frequently used symbols, andlonger codes to the symbols which appear less frequently in the string (that's where the tatistical

    part comes in). Arithmetic codingis another statistical coding technique.

    Building a Huffman Tree

    The Huffman code for an alphabet (set of symbols) may be generated by constructing a

    binary tree with nodes containing the symbols to be encoded and their probabilities ofoccurrence. The tree may be constructed as follows:

    Step 1. Create a parentless node for each symbol. Each node should include the symbol and itsprobability.

    Step 2. Select the two parentless nodes with the lowest probabilities.

    Step 3. Create a new node which is the parent of the two lowest probability nodes.

    Step 4. Assign the new node a probability equal to the sum of its children's probabilities.

    Step 5. Repeat from Step 2 until there is only one parentless node left.

    The code for each symbol may be obtained by tracing a path to the symbol from the root of the

    tree. A 1 is assigned for a branch in one direction and a 0 is assigned for a branch in the otherdirection. For example a symbol which is reached by branching right twice, then left once may

    be represented by the pattern '110'. The figure below depicts codes for nodes of a sample tree.

    */ \

    (0) (1)/ \

    (10)(11)/ \

    (110)(111)

    Cranes Software International Ltd. 7

    http://michael.dipperstein.com/arithmetic/arithmetic.htmlhttp://michael.dipperstein.com/arithmetic/arithmetic.htmlhttp://michael.dipperstein.com/arithmetic/arithmetic.html
  • 8/4/2019 4.HUFF DOC

    8/23

    HUFFMAN CODING AND DECODING

    Once a Huffman tree is built, Canonical Huffman codes, which require less information to

    rebuild, may be generated by the following steps:

    Step 1. Remember the lengths of the codes resulting from a Huffman tree generated perabove.

    Step 2. Sort the symbols to be encoded by the lengths of their codes (use symbol value to break

    ties).Step 3. Initialize the current code to all zeros and assign code values to symbols from longest to

    shortest code as follows:

    A. If the current code length is greater than the length of the code for the current symbol,

    right shift off the extra bits.B. Assign the code to the current symbol.

    C. Increment the code value.

    D. Get the symbol with the next longest code.E. Repeat from A until all symbols are assigned codes.

    Encoding Data

    Once a Huffman code has been generated, data may be encoded simply by replacing each

    symbol with it's code.

    Decoding Data

    If you know the Huffman code for some encoded data, decoding may be accomplished by

    reading the encoded data one bit at a time. Once the bits read match a code for symbol, write out

    the symbol and start collecting bits again. See Decoding Encode Files for details.

    5. IMPLEMENTATION ISSUES

    What is a Symbol?

    One of the first questions that needs to be resolved before you start is "What is asymbol?". For my implementation a symbol is any 8-bit combination as well as an End Of File

    (EOF) marker. This means that there are 257 possible symbols in any code.

    Handling End-of-File (EOF)

    The EOF is of particular importance, because it is likely that an encoded file will not

    have a number of bits that is a integral multiple of 8. Most file systems require that files be storedin bytes, so it's likely that encoded files will have spare bits. If you don't know where the EOF is,the spare bits may be decoded as an extra symbol.

    At the time I sat down to implement Huffman's algorithm, there were two ways that I

    could think of for handling the EOF. It could either be encoded as a symbol, or ignored. Later I

    learned about the "bidirective" method for handling the EOF. For information on the"bidirective" method refers to SCOTT's "one to one" compression discussion.

    Cranes Software International Ltd. 8

    http://michael.dipperstein.com/huffman/#treegenhttp://michael.dipperstein.com/huffman/#treegenhttp://michael.dipperstein.com/huffman/#treegenhttp://michael.dipperstein.com/huffman/#decodehttp://bijective.dogma.net/http://michael.dipperstein.com/huffman/#treegenhttp://michael.dipperstein.com/huffman/#decodehttp://bijective.dogma.net/
  • 8/4/2019 4.HUFF DOC

    9/23

    HUFFMAN CODING AND DECODING

    Ignoring the EOF requires that a count of the number of symbols encoded be maintained

    so that decoding can stop after all real symbols have been decoded and any spare bits can be

    ignored.

    Encoding the EOF has the advantage of not requiring a count of the number of symbols

    encoded in a file. When I originally started out I thought that a 257th

    symbol would allow for thepossibility of a 17 bit code. And I didn't want to have to deal with 17 bit values in C. As it turns

    out a 257th symbol will create the possibility of a 256 bit code and I ended up writing a librarythat could handle 256 bit codes anyway.

    Consequently, I have two different implementations, a 0.1 version that contains a count

    of the number of symbols to be decoded, and a versions 0.2 and later that encode the EOF.

    Code Generation

    The source code that I have provided generates a unique Huffman tree based on the

    number of occurrences of symbols within the file to be encoded. The result is a Huffman codethat yields an optimal compression ratio for the file to be encoded. The algorithm to generate a

    Huffman tree and the extra steps required to build a canonical Huffman code are outlined above.

    Using character counts to generate a tree means that a character may not occur moreoften than it can be counted. The counters used in my implementation are of the type unsigned

    int, therefore a character may not occur more than UINT_MAX times. My implementation checks

    for this and issues an error. If larger counts are required the program is easily modifiable.

    Code Length

    In general, a Huffman code for an N symbol alphabet, may yield symbols with a

    maximum code length of N - 1. Following the rules outlined above, it can be shown that if at

    every step that combines the two parentless nodes with the lowest probability, where only one ofthe combined nodes already has children, an N symbol alphabet (for even N) will have two N - 1

    length codes.

    Example Given a 6 symbol alphabet with the following symbol probabilities: A = 1,B = 2,C = 4,

    D = 8, E = 16, F = 32

    Step 1. Combine A and B into AB with a probability of 3.Step 2. Combine AB and C into ABC with a probability of 7.

    Step 3. Combine ABC and D into ABCD with a probability of 15.

    Step 4. Combine ABCD and E into ABCDE with a probability of 31.

    Step 5. Combine ABCDE and F into ABCDEF with a probability of 63.

    The Following tree results:

    Cranes Software International Ltd. 9

    http://michael.dipperstein.com/huffman/#treegenhttp://michael.dipperstein.com/huffman/#treegenhttp://michael.dipperstein.com/huffman/#treegenhttp://michael.dipperstein.com/huffman/#treegenhttp://michael.dipperstein.com/huffman/#treegenhttp://michael.dipperstein.com/huffman/#treegen
  • 8/4/2019 4.HUFF DOC

    10/23

    HUFFMAN CODING AND DECODING

    ABCDEF/ \

    (0)F ABCDE/ \

    (10)E ABCD/ \

    (110)D ABC/ \

    (1110)C AB/ \

    (11110)A B(11111)

    In order to handle a 256 character alphabet, which may require code lengths of up to 255

    bits, I created a libraries that performs standard bit operations on arrays unsigned characters.Versions prior to 0.3 use a library designed specifically for 256 bit arrays. Later versions use a

    library designed for arbitrary length arrays. Though I haven't used my libraries outside of this

    application, they are written in the same portable ANSI C as the rest of my Huffman code

    library.

    Writing Encoded Files

    I chose to write my encoded files in two parts. The first part contains information used toreconstruct the Huffman code (a header) and the second part contains the encoded data.

    Header

    In order to decode files, the decoding algorithm must know what code was used to

    encode the data. Being unable to come up with a clean way to store the tree itself, I chose to store

    information about the encoded symbols.

    To reconstruct a traditional Huffman code, I chose to store a list of all the symbols and

    their counts. By using the symbol counts and the same tree generation algorithm that the

    encoding algorithm use, a tree that matching the encoding tree may be constructed.

    To save some space, I only stored the non-zero symbol counts, and the end of count datais indicated by an entry for a character zero with a count of zero. The EOF count is not stored in

    my implementations that encode the EOF, both the encoder and decoder assume that there is

    only one EOF.

    Canonical Huffman codes usually take less information to reconstruct than traditionalHuffman codes. To reconstruct a canonical Huffman code, you only need to know the length of

    the code for each symbol and the rules used to generate the code. The header generated by my

    canonical Huffman algorithm consists of the code length for each symbol. If the EOF is notencoded the total number of encoded symbols is also included in the header.

    Encoded Data

    Cranes Software International Ltd. 10

    http://michael.dipperstein.com/huffman/#treegenhttp://michael.dipperstein.com/huffman/#treegenhttp://michael.dipperstein.com/huffman/#canonicalgenhttp://michael.dipperstein.com/huffman/#treegenhttp://michael.dipperstein.com/huffman/#canonicalgen
  • 8/4/2019 4.HUFF DOC

    11/23

    HUFFMAN CODING AND DECODING

    The encoding of the original data immediately follows the header. One natural by-

    product of canonical Huffman code is a table containing symbols and their codes. This table

    allows for fast lookup of codes. If symbol codes are stored in tree form, the tree must be searchedfor each symbol to be encoded. Instead of searching the leaves of the Huffman tree each time a

    symbol is to be encoded, my traditional Huffman implementation builds a table of codes for each

    symbol. The table is built by performing a depth first traversal of the Huffman tree and storingthe codes for the leaves as they are reached.

    With a table of codes, writing encoded data is simple. Read a symbol to be encoded, and

    write the code for that symbol. Since symbols may not be integral bytes in length, care needs to

    be taken when writing each symbol. Bits need to be aggregated into bytes. In my 0.1 version ofcode, all the aggregation is done in-line. My versions 0.2 and later use a library to handle writing

    any number of bits to a file.

    Decoding Encode Files

    Like encoding a file, decoding a file is a two step process. First the header data is read in,and the Huffman code for each symbol is reconstructed. Then the encoded data is read anddecoded.

    I have read that the fastest method for decoding symbols is to read the encoded file one

    bit at time and traverse the Huffman tree until a leaf containing a symbol is reached. However I

    have also read that it is faster to store the codes for each symbol in an array sorted by code lengthand search for a match every time a bit is read in. I have yet to see a proof for either side.

    I do know that the tree method is faster for the worst case encoding where all symbols are

    8 bits long. In this case the 8-bit code will lead to a symbol 8 levels down the tree, but a binary

    search on 256 symbols is O(log2(256)) or an average of 16 steps.

    Since conventional Huffman encoding naturally leads to the construction of a tree fordecoding, I chose the tree method here. The encoded file is read one bit at a time, and the tree is

    traversed according to each of the bits. When a bit causes a leaf of the tree to be reached, the

    symbol contained in that leaf is written to the decoded file, and traversal starts again from theroot of the tree.

    Canonical Huffman encoding naturally leads to the construction of an array of symbols

    sorted by the size of their code. Consequently, I chose the array method for decoding files

    encoded with a canonical Huffman code. The encoded file is read one bit at time, with each bit

    accumulating in a string of undecoded bits. Then all the codes of a length matching the stringlength are compared to the string. If a match is found, the string is decoded as the matching

    symbol and the bit string is cleared. The process repeats itself until all symbols have beendecoded.

    6. PORTABILITY ISSUES:

    Cranes Software International Ltd. 11

  • 8/4/2019 4.HUFF DOC

    12/23

    HUFFMAN CODING AND DECODING

    All the source code that I have provided is written in strict ANSI-C. I would expect it to

    build correctly on any machine with an ANSI-C compiler. I have tested the code compiled with

    GCC on Linux and mingw on Windows 98 and 2000.

    The code makes no assumptions about the size of types or byte order (endian), so it

    should be able to run on all platforms. However type size and byte order issues will prevent filesthat are encoded on one platform from being decoded on another platform. The code also

    assumes that an array ofunsigned char will be allocated in a contiguous block of memory.

    7. ALGORITHM

    Step1. Start.

    Step2. Enter the message.

    Step3. Calculate the occurrence of each character.

    Step4. Create a linked list and each node having probability of a character and character

    Step5. Sorting the linked list based on probability.

    Step6. Deleting the least two probabilities i.e. First node & Second node.

    Step7. Adding the two probabilities and concatenation of two strings

    Step8. Make a binary tree root as adding the probability concatenation of string.

    Step9. Left side node is first deleting node &Right side node is second deleting node

    Step10.Reapeat the step 6 until the linked list is not equal to null.

    Step11. Passing the one element one message the element is in left side and then print 1 else

    print 0 same procedure up to the leaf node.

    Step12. Same procedure for all the each element the given message.

    Step13. The Decoding procedure is if bit is 1 then go to the left side or bit is 0 go to the right

    side until the getting the leaf node print the string in the leaf node.

    Step14. Again go to the step 13 up to the bit stream ended.

    Step15. Stop

    Cranes Software International Ltd. 12

    http://michael.dipperstein.com/free.html#gcchttp://michael.dipperstein.com/free.html#gcc
  • 8/4/2019 4.HUFF DOC

    13/23

    HUFFMAN CODING AND DECODING

    8. PROGRAMMING CODE:

    // Header _File //

    /***************huff_header.h************************/

    /************************************************************************Project Title :HUFFMAN CODING AND DECODING

    PROJECT GUIDE :MITA PODDAR

    PROJECT MEMBERS :K.Anil kumar,M.Praveen ,K.V.Rajesh kumar.************************************************************************/

    #include

    #include#include

    struct list

    {

    char ch[100];float f;

    struct list *link;

    };struct t_list

    {

    char a[100];float info;

    struct t_list *left,*right;

    };

    typedef struct t_list tree;typedef struct list node;

    void display();void insert(char *,float);void insert_tree(char*,float);

    void sort_list();

    void traves_tree(char[]);void send_data(char[]);

    node* delete();

    void tree_list();

    Cranes Software International Ltd. 13

  • 8/4/2019 4.HUFF DOC

    14/23

    HUFFMAN CODING AND DECODING

    void print_order(tree *);

    void decode(char*);

    extern node *first;extern tree *root;

    extern tree *tt[40];

    //LIBRARY FILE

    /*************************huff_lib.c*******************************/

    #include"huff_header.h"

    node *first=NULL;

    tree *root=NULL;tree *tt[40]={NULL};

    int z=0;

    char data[1000];void insert(char *c,float f)

    {

    node *newpt;

    newpt=(node *)malloc(sizeof(node));strcpy(newpt->ch,c);

    newpt->f=f;

    newpt->link=NULL;if(first==NULL)

    first=newpt;

    else{

    newpt->link=first;

    first=newpt;

    }

    }void display()

    {node *temp=first;

    while(temp!=NULL)

    {printf("\t%s\t%.4f\n",temp->ch,temp->f);

    temp=temp->link;

    Cranes Software International Ltd. 14

  • 8/4/2019 4.HUFF DOC

    15/23

    HUFFMAN CODING AND DECODING

    }

    }

    void sort_list()

    { node *temp,*temp1,swap;

    for(temp=first;temp->link!=NULL;temp=temp->link)

    {for(temp1=temp->link;temp1!=NULL;temp1=temp1->link)

    {

    if(temp1->ff)

    {strcpy(swap.ch,temp->ch);

    swap.f=temp->f;

    strcpy(temp->ch,temp1->ch);

    temp->f=temp1->f;strcpy(temp1->ch,swap.ch);

    temp1->f=swap.f;}

    }

    }

    }

    node* delete()

    {node *temp=first;

    first=temp->link;

    return temp;}

    void tree_list()

    {node *n1,*n2;

    int i,m,z=0;

    float k;

    char str[20];tree *temp,*temp2,*le,*ri;

    while(first!=NULL)

    {n1=delete();

    n2=delete();

    if(n2==NULL)break;

    k=n1->f+n2->f;

    strcpy(str,n1->ch);

    strcat(str,n2->ch);

    Cranes Software International Ltd. 15

  • 8/4/2019 4.HUFF DOC

    16/23

    HUFFMAN CODING AND DECODING

    temp=(tree*)malloc(sizeof(tree));

    strcpy(temp->a,str);

    temp->info=k;temp->left=NULL;

    temp->right=NULL;

    temp2=temp;root=temp;

    for(i=0;ia,n1->ch)==0)

    {

    temp->left=tt[i];

    }else if(strcmp(tt[i]->a,n2->ch)==0)

    {

    temp->right=tt[i];

    }}

    if(temp->left==NULL)

    {

    le=(tree*)malloc(sizeof(tree));

    strcpy(le->a,n1->ch);le->info=n1->f;

    le->left=NULL;

    le->right=NULL;temp->left=le;

    }

    if(temp->right==NULL){

    ri=(tree*)malloc(sizeof(tree));

    strcpy(ri->a,n2->ch);ri->info=n2->f;

    ri->left=NULL;

    ri->right=NULL;

    temp->right=ri;}

    if(first!=NULL)

    {

    insert(str,k);sort_list();

    }

    tt[z++]=temp2;

    printf("\n\n\n");

    Cranes Software International Ltd. 16

  • 8/4/2019 4.HUFF DOC

    17/23

    HUFFMAN CODING AND DECODING

    display();

    }

    printf("\n\n\n");print_order(root);

    printf("\n\n");

    }void print_order(tree *cur_node)

    {

    if(cur_node!=NULL){

    printf("\t%s\t\t%.4f\n",cur_node->a,cur_node->info);

    print_order(cur_node->left);

    print_order(cur_node->right);}

    }

    void traves_tree(char c[2])

    { tree *temp=root,*t2;

    while(strcmp(temp->a,c)!=0){

    if(strstr(temp->left->a,c))

    {

    printf("1");temp=temp->left;

    }

    else if(strstr(temp->right->a,c)){

    printf("0");

    temp=temp->right;}

    }}

    void send_data(char c[2])

    {

    tree *temp=root,*t2;while(strcmp(temp->a,c)!=0)

    {

    if(strstr(temp->left->a,c)){

    data[z++]='1';

    temp=temp->left;}

    else if(strstr(temp->right->a,c))

    {

    data[z++]='0';

    Cranes Software International Ltd. 17

  • 8/4/2019 4.HUFF DOC

    18/23

    HUFFMAN CODING AND DECODING

    temp=temp->right;

    }

    }

    }

    void decode(char *c)

    {int i=0;

    tree *temp=root;

    for(i=0;ileft;

    else

    temp=temp->right;if(temp->left==NULL&&temp->right==NULL)

    {printf("%s",temp->a);

    temp=root;

    }

    }

    }

    // APPLICATION_FILE/***********************huff_app.c**************************************/

    #include"huff_header.h"

    extern char data[1000];

    int main(){

    char ch[100],b[100][2];

    int i,n,f,j,l,k,m=-1;float z;

    printf("\t enter the string for encoding \n");

    //scanf("%s",ch);gets(ch);

    n=strlen(ch);

    for(i=0;i

  • 8/4/2019 4.HUFF DOC

    19/23

    HUFFMAN CODING AND DECODING

    k=0;

    for(j=0;j

  • 8/4/2019 4.HUFF DOC

    20/23

    HUFFMAN CODING AND DECODING

    printf("\n\n");

    return 0;

    }

    Cranes Software International Ltd. 20

  • 8/4/2019 4.HUFF DOC

    21/23

    HUFFMAN CODING AND DECODING

    9. ADVANTAGES

    1. Algorithm is easy to implement

    2. Produce a lossless compression of images

    DISADVANTAGES

    1. Relatively slow.

    2. Depends upon statistical model of data.

    3. Decoding is difficult due to different code lengths.

    4. Overhead due to Huffman tree.

    10. APPLICATIONS:

    1. Arithmetic coding can be viewed as a generalization of Huffmancoding; indeed, in practice arithmetic coding is often preceded byHuffman coding, as it is easier to find an arithmetic code for a binaryinput than for a no binary input.

    2. Huffman coding is in wide use because of its simplicity, high speed andlack of encumbrance by patents.

    3. Huffman coding today is often used as a "back-end" to some othercompression method.

    4. DEFLATE (PKZIP's algorithm) and multimedia codes such as JPEG andMP3 have a front-end model and quantization followed by Huffmancoding.

    Cranes Software International Ltd. 21

  • 8/4/2019 4.HUFF DOC

    22/23

    HUFFMAN CODING AND DECODING

    11. CONCLUSION:

    We have studied various techniques for compression and compare

    them on the basis of their use in different applications and their advantages anddisadvantages. I have concluded that arithmetic coding is very efficient for more frequently

    occurring sequences of pixels with fewerbits and reduces the file size dramatically.

    RLE is simple to implement and fast o execute. LZW algorithm is better to use for TIFF,GIF and Textual Files. It is easy to implement, fast and lossless algorithm whereas Huffman

    algorithm is used in JPEG compression. It produces optimal and compact code but relatively

    slow. Huffman algorithm is based on statistical model which adds to overhead.

    The above discussed algorithms use lossless compression technique. JPEG technique

    which is used mostly for image compression is a lossy compression technique. JPEG 2000isadvancement in JPEG standard which uses wavelets.

    Cranes Software International Ltd. 22

  • 8/4/2019 4.HUFF DOC

    23/23

    HUFFMAN CODING AND DECODING

    12. REFERENCES:

    A copy of the section from "Numerical Recipes In C"which started this whole effort may befound at http://lib-www.lanl.gov/numerical/bookcpdf/c20-4.pdf.

    A copy of one David Huffman's original publications about his algorithm may be found at

    http://compression.graphicon.ru/download/articles/huff/huffman_1952_minimum-redundancy-

    codes.pdf.

    A discussion on Huffman codes including canonical Huffman codes may be found athttp://www.compressconsult.com/huffman/.

    Further discussion of Huffman Coding with links to other documentation and libraries may be

    found at http://datacompression.info/Huffman.shtml.

    http://www.nr.com/http://lib-www.lanl.gov/numerical/bookcpdf/c20-4.pdfhttp://compression.graphicon.ru/download/articles/huff/huffman_1952_minimum-redundancy-codes.pdfhttp://compression.graphicon.ru/download/articles/huff/huffman_1952_minimum-redundancy-codes.pdfhttp://compression.graphicon.ru/download/articles/huff/huffman_1952_minimum-redundancy-codes.pdfhttp://www.compressconsult.com/huffman/http://datacompression.info/Huffman.shtmlhttp://www.nr.com/http://lib-www.lanl.gov/numerical/bookcpdf/c20-4.pdfhttp://compression.graphicon.ru/download/articles/huff/huffman_1952_minimum-redundancy-codes.pdfhttp://compression.graphicon.ru/download/articles/huff/huffman_1952_minimum-redundancy-codes.pdfhttp://www.compressconsult.com/huffman/http://datacompression.info/Huffman.shtml