DATA COMPRESSION AND HUFFMAN ALGORITHM.ppt

download DATA COMPRESSION AND HUFFMAN ALGORITHM.ppt

of 18

Transcript of DATA COMPRESSION AND HUFFMAN ALGORITHM.ppt

  • 7/27/2019 DATA COMPRESSION AND HUFFMAN ALGORITHM.ppt

    1/18

    DATA COMPRESSION AND HUFFMAN ALGORITHM

    Technical Seminar Paper Submitted by

    Presented by

    Vineet Agarwala

    NATIONAL INSTITUTE OF SCIENCE & TECHNOLOGY

    IT200118155

    Technical Seminar Under the guidance of

    Anisur Rahman

  • 7/27/2019 DATA COMPRESSION AND HUFFMAN ALGORITHM.ppt

    2/18

    DATA COMPRESSION

    Virtually all forms of data - text, numerical, image, video containredundant elements

    Data can be compressed by eliminating the redundant elements.

    A code is substituted for the eliminated redundant element, wherethe code is shorter than eliminated element.

    When compressed data is retrieved from storage or received over

    a communications link, it is expanded back to its original form,based on the code.

    Compression is used:

    to save storage space

    to reduce communications transmission requirements

    The art or science of compactly representing informationDigital realm: using lesser number of bits to represent information

    Data + Compression = informationredundancy

  • 7/27/2019 DATA COMPRESSION AND HUFFMAN ALGORITHM.ppt

    3/18

    REDUNDANCYMost types of computer files are fairly redundant -- they have the same

    information listed over and over again. File-compression programs

    simply get rid of the redundancy

    Ask not what your country can do for you -- ask what

    you can do for your country.

    Ignoring the difference between capital and lower-case

    letters, roughly half of the phrase is redundant. Nine words -- ask, not, what, your, country, can, do, for, you -- give us

    almost everything we need for the entire quote

  • 7/27/2019 DATA COMPRESSION AND HUFFMAN ALGORITHM.ppt

    4/18

    Compression Techniques

    LosslessData can be completely recovered after decompression

    Recovered data is identical to original

    Exploits redundancy in data

    LossyData cannot be completely recovered after

    decompression

    Some information is lost for ever

    Gives more compression than losslessDiscards insignificant data components

  • 7/27/2019 DATA COMPRESSION AND HUFFMAN ALGORITHM.ppt

    5/18

    IMAGE COMPRESSION Image compression can be lossy or lossless

    Methods for lossless image compression are:

    Run-length encoding

    Entropy coding

    Adaptive dictionary algorithms such as LZW

    Methods for lossy compression are:Reducing the color space to the most common colors in the image.

    The selected colors are specified in the color palette in the header of

    the compressed image. Each pixel just references the index of a color

    in the color palette. This method can be combined with dithering to

    blur the color borders.

    Transform coding. This is the most commonly used method. AFourier-related transform such as DCT or the wavelet transform are

    applied, followed by quantization and entropy coding.

    Fractal compression.

  • 7/27/2019 DATA COMPRESSION AND HUFFMAN ALGORITHM.ppt

    6/18

    JPEG (TRANSFORM COMPRESSION)

    JPEG is named after its origin, the Joint Photographers ExpertsGroup

    This involves reducing the number of bits per sample or entirely

    discard some of the samples

  • 7/27/2019 DATA COMPRESSION AND HUFFMAN ALGORITHM.ppt

    7/18

    MULTIMEDIA COMPRESSION

    Multimedia compression is a general term referring to thecompression of any type of multimedia, most notably

    graphics, audio, and video

    MPEG (Moving Pictures Experts Group ) The future of this

    technology is to encode the compression anduncompression algorithms directly into integrated circuits.

    The approach used by MPEG can be divided into two types

    of compression: within-the-frame and between-frame

  • 7/27/2019 DATA COMPRESSION AND HUFFMAN ALGORITHM.ppt

    8/18

    DATA COMPRESSION ALGORITHMS

    LOSSY COMPRESSION

    Run Length Encoding

    Huffman Coding

    Delta

    LZW

    LOSS LESS

    COMPRESSION

    CS & Q

    JPEG

    MPEG

  • 7/27/2019 DATA COMPRESSION AND HUFFMAN ALGORITHM.ppt

    9/18

    RUN-LENGTH ENCODING

    Example of run-length encoding. Each run of zeros is

    replaced by two characters in the compressed file: a zero

    to indicate that compression is occurring, followed by thenumber of zeros in the run.

    Data files frequently contain the same characterrepeated many times in a row.

  • 7/27/2019 DATA COMPRESSION AND HUFFMAN ALGORITHM.ppt

    10/18

    HUFFMAN ENCODING

    This method is named after D.A. Huffman, whodeveloped the procedure in the 1950s.

    More than 96% of this file consists of only 31

    characters out of 127

  • 7/27/2019 DATA COMPRESSION AND HUFFMAN ALGORITHM.ppt

    11/18

    HUFFMAN ENCODING EXAMPLE

    Character frequenciesA: 20% (.20)

    B: 9% (.09)

    C: 15%D: 11%

    E: 40%

    F: 5%

    C

    .15

    A

    .20

    D

    .15

    F.05

    BF

    .14

    B

    .09

    01

    E

    .4

  • 7/27/2019 DATA COMPRESSION AND HUFFMAN ALGORITHM.ppt

    12/18

    HUFFMAN ENCODING EXAMPLE (CONDT.)

    Codes

    A: 010

    B: 0000

    C: 011D: 001

    E: 1

    F: 0001

    ABCDEF

    1.0

    E

    .4

    C

    .15

    A

    .20

    D

    .15

    F

    .05

    BF

    .14

    AC

    .35

    BFD

    .25

    ABCDF

    .6

    B

    .09

    0

    0

    0

    0

    0

    1

    1

    11

    1

  • 7/27/2019 DATA COMPRESSION AND HUFFMAN ALGORITHM.ppt

    13/18

    Run Length Encoding

    CTAAAAAGGGTCGTTTTTTGCCCGGGGGCCTCCCCCCC

    CTAAAAAGGGTCGTTTTTTGCCCGGGGGCCTCCCCCCC

    CTAAAAAGGGTCGTTTTTTGCCCGGGGGCCTCCCCCCC

    CT5A3GTCG6TG3C5GCCT7C } Run length encoded: 21symbols

  • 7/27/2019 DATA COMPRESSION AND HUFFMAN ALGORITHM.ppt

    14/18

    Run Length Encoding (Contd.)

    WWWBWWWWWBWWWBWWWWBWWWWWBWWWBWWWWWBWWBWWWWWWBBBWWWWWWWBWBWWWWW

    WWBWWBBWWWWWBWWWWBWWWWBWWWWB

    WWWBWWWWWBWWWBWWWWB.

    3WB5WB3WB4WB.

    3151314 possible optimization, but

    #W3151314.. Optimization requires escape character

  • 7/27/2019 DATA COMPRESSION AND HUFFMAN ALGORITHM.ppt

    15/18

    Run Length Encoding (Contd.)

    Is run length encoding practical for images?

    No

    Yes

    Chances of three or more identicalconsecutive pixels are low for most real

    images.

    Especially images with large color depth.

    Some images do have lots of consecutivepixels.

    Especially images with low color depth.

    RLE is used for fax machines, and by BMP,

    TIFF and PCX files.

    http://www.cs.ucr.edu/~eamonn/new_photo.php
  • 7/27/2019 DATA COMPRESSION AND HUFFMAN ALGORITHM.ppt

    16/18

    LZW Compression

    LZW compression is named after its

    developers, A. Lempel and J. Ziv, with later

    modifications by Terry A. Welch. It is the

    foremost technique for general purpose data

    compression due to its simplicity and

    versatility

  • 7/27/2019 DATA COMPRESSION AND HUFFMAN ALGORITHM.ppt

    17/18

    LZW Compression (contd.)

    LZW compressionflowchart.

    The variable, CHAR, is

    a single byte. The

    variable, STRING, is a

    variable length

    sequence of bytes.

    Data are read from the

    input file (box 1 & 2) as

    single bytes, andwritten to the

    compressed file (box 4)

    as 12 bit codes.

  • 7/27/2019 DATA COMPRESSION AND HUFFMAN ALGORITHM.ppt

    18/18

    CONCLUSION

    Is it possible to create a data compressionalgorithm that will always compress data?

    Is there an optimal data compression algorithm?Lossless: No, compression rates depend on the data.

    Lossy: No, the quality of compression is subjective.

    Is Data Compression is really that important?