Huffman and Arithmetic Coding

Author
arberborix 
Category
Documents

view
3.630 
download
13
Embed Size (px)
description
Transcript of Huffman and Arithmetic Coding
Huffman Coding, Arithmetic Coding, and JBIG2
Illustrations
Arber Borici
2010
University of N British Columbia
Huffman Coding
Entropy encoder for lossless compression Input: Symbols and corresponding
probabilities Output: Prefixfree codes with minimum
expected lengths Prefix property: There exists no code in the
output that is a prefix of another code Optimal encoding algorithm
Huffman Coding: Algorithm
1. Create a forest of leaf nodes for each symbol
2. Take two nodes with lowest probabilities and make them siblings. The new internal node has a probability equal to the sum of the probabilities of the two child nodes.
3. The new internal node acts as any other node in the forest.
4. Repeat steps 2–3 until a tree is established.
Consider the string ARBER The probabilities of symbols A, B, E, and R
are:
The initial forest will thus comprise four nodes Now, we apply the Huffman algorithm
Huffman Coding: Example
Symbol A B E R
Frequency 1 1 1 2
Probability 20% 20% 20% 40%
Generating Huffman Codes
A 0.2 B 0.2 E 0.2 R 0.4
1 0.4
2 0.6
r
00 11
00 11
00 11
Generating Huffman Codes
A 0.2 B 0.2
E 0.2
R 0.4
1 0.4
2 0.6
r
00 11
00 11
00 11
Symbol Code
A B
E R
Generating Huffman Codes
A 0.2 B 0.2
E 0.2
R 0.4
1 0.4
2 0.6
r
00 11
00 11
00 11
Symbol Code
B
E R
A 0 0 00 0 0
Generating Huffman Codes
A 0.2 B 0.2
E 0.2
R 0.4
1 0.4
2 0.6
r
00 11
00 11
00 11
Symbol Code
B
E R
A 0 0 00 0 0
0 0 10 0 1
Generating Huffman Codes
A 0.2 B 0.2
E 0.2
R 0.4
1 0.4
2 0.6
r
00 11
00 11
00 11
Symbol Code
B
E
R
A 0 0 00 0 0
0 0 10 0 1
0 10 1
Generating Huffman Codes
A 0.2 B 0.2
E 0.2
R 0.4
1 0.4
2 0.6
r
00 11
00 11
00 11
Symbol Code
B
E
R
A 0 0 00 0 0
0 0 10 0 1
0 10 1
11
Huffman Codes: Decoding
A 0.2 B 0.2
E 0.2
R 0.4
1 0.4
2 0.6
r
00 11
00 11
00 11
0001001011
A
Huffman Codes: Decoding
A 0.2 B 0.2
E 0.2
R 0.4
1 0.4
2 0.6
r
00 11
00 11
00 11
1001011
A
R
Huffman Codes: Decoding
A 0.2 B 0.2
E 0.2
R 0.4
1 0.4
2 0.6
r
00 11
00 11
00 11
001011
A R
B
Huffman Codes: Decoding
A 0.2 B 0.2
E 0.2
R 0.4
1 0.4
2 0.6
r
00 11
00 11
00 11
011
A R B
E
Huffman Codes: Decoding
A 0.2 B 0.2
E 0.2
R 0.4
1 0.4
2 0.6
r
00 11
00 11
00 11
1
A R B E
R
Huffman Codes: Decoding
A 0.2 B 0.2
E 0.2
R 0.4
1 0.4
2 0.6
r
00 11
00 11
00 11
A R B E R
0001001011
The prefix property ensures unique decodability
Arithmetic Coding
Entropy coder for lossless compression Encodes the entire input data using a real
interval Slightly more efficient than Huffman Coding Implementation is harder: practical
implementation variations have been proposed
Arithmetic Coding: Algorithm
Create an interval for each symbol, based on cumulative probabilities.
The interval for a symbol is [low, high). Given an input string, determine the interval
of the first symbol Scale the remaining intervals:
New Low = Current Low + Sumn1(p)*(H – L)
New High = Current High + Sumn(p)*(H – L)
Consider the string ARBER The intervals of symbols A, B, E, and R are:
A: [0, 0.2); B: [0.2, 0.4); E: [0.4, 0.6); and
R: [0.6, 1);
Arithmetic Coding: Example
Symbol A B E R
Low 0 0.2 0.4 0.6
High 0.2 0.4 0.6 1
Arithmetic Coding: ExampleA R B E R
0
0.2
0.4
0.6
1
A
B
E
R
0.2
020% of (0, 0.2)
0.04
20% of (0, 0.2)0.08
20% of (0, 0.2)
0.12
40% of (0, 0.2)
0.12
0.2
20% of (0.12, 0.2)
20% of (0.12, 0.2)
20% of (0.12, 0.2)
40% of (0.12, 0.2)
0.136
0.152
0.168
Arithmetic Coding: ExampleB E R
0
0.2
0.4
0.6
1
A
B
E
R
0.2
0
0.04
0.08
0.12
0.12
0.2
0.136
0.152
0.168
0.136
0.152
20% of (0.136, 0.152)
20% of (0.136, 0.152)
20% of (0.136, 0.152)
40% of (0.136, 0.152)
0.1392
0.1424
0.1456
Arithmetic Coding: ExampleE R
0
0.2
0.4
0.6
1
A
B
E
R
0.2
0
0.04
0.08
0.12
0.12
0.2
0.136
0.152
0.168
0.136
0.152
0.1392
0.1424
0.1456
0.1424
0.1456
20% of (0.1424, 0.1456)
20% of (0.1424, 0.1456)
20% of (0.1424, 0.1456)
40% of (0.1424, 0.1456)
0.14304
0.14368
0.14432
Arithmetic Coding: ExampleR
0
0.2
0.4
0.6
1
A
B
E
R
0.2
0
0.04
0.08
0.12
0.12
0.2
0.136
0.152
0.168
0.136
0.152
0.1392
0.1424
0.1456
0.1424
0.1456
0.14304
0.14368
0.14432
0.14432
0.1456
Arithmetic Coding: Example
The final interval for the input string ARBER is [[0.144320.14432,, 0.14560.1456)).
In bits, one chooses a number in the interval and encodes the decimal part.
For the sample interval, one may choose point 0.144320.14432, which in binary is:
0 0 1 0 0 1 0 0 1 1 1 1 0 0 1 0 0 0 1 0 0 1 1 1 0 0 1 0 0 1 0 0 1 1 1 1 0 0 1 0 0 0 1 0 0 1 1 1 1 1 0 1 0 0 0 0 0 0 1 0 1 0 0 0 1 0 1 0 0 0 0 1 1 1 0 1 0 0 0 0 0 0 1 0 1 0 0 0 1 0 1 0 0 0 0 1 1 1 11 1 1 (51 bits)
Arithmetic Coding
Practical implementations involve absolute frequencies (integers), since the low and high interval values tend to become really small.
An ENDOFSTREAM flag is usually required (with a very small probability)
Decoding is straightforward: Start with the last interval and divide intervals proportionally to symbol probabilities. Proceed until and ENDOFSTREAM control sequence is reached.
JBIG2 Lossless and lossy bilevel data compression
standard Emerged from JBIG1
Joint BiLevel Image Experts Group Supports three coding modes:
Generic Halftone Text
Image is segmented into regions, which can be encoded using different methods
JBIG2: Segmentation The image on the left is segmented into a
binary image, text, and a grayscale image:
binary
grayscale
text
JBIG2: Encoding Arithmetic Coding (QM Coder) Contextbased prediction
Larger contexts than JBIG1 Progressive Compression (Display)
Predictive context uses previous information Adaptive Coder
X
A
AA
A• X = Pixel to be coded• A = Adaptive pixel (which can
be moved)
JBIG2: Halftone and Text Halftone images are coded
as multilevel images,along with patternand grid parameters
Each text symbol is
encoded in a dictionary
along with relative
coordinates:
Color Separation
Images comprising discrete colors can be considered as multilayered binary images: Each color and the image background form one
binary layer If there are N colors, where one color
represents the image background, then there will be N1 binary layers: A map with white background and four colors will
thus yield 4 binary layers
Color Separation: ExampleThe following Excel graph comprises 34 colors + the white background:
Layer 1
Layer 5
Layer 12
Comparison with JBIG2 and JPEG
Our Method: 96%
JBIG2: 94%
JPEG: 91%
Our Method: 98%
JBIG2: 97%
JPEG: 92%
Encoding Example
Codebook RCRC Uncompressible
0
The compression ratio is the size of the encoded stream over the original size:
1 – (1 + 20 + 64) / 192 = 56%
Original size:
64 * 3 = 192 bits
Definitions (cont.) Compression ratio is defined as the number of bits
after a coding scheme has been applied on the source data over the original source data size Expressed as a percentage, or usually is bits per pixel
(bpp) when source data is an image JBIG2 is the standard binary image compression
scheme Based mainly on arithmetic coding with context modeling Other methods in the literature designed for specific
classes of binary images Our objective: design a coding method
notwithstanding the nature of a binary image