Entropy 2011
-
Upload
rok-silmari -
Category
Documents
-
view
10 -
download
0
description
Transcript of Entropy 2011
-
5/20/2018 Entropy 2011
1/29
February 3, 2010 Harvard QR48 1
Coding and Entropy
-
5/20/2018 Entropy 2011
2/29
February 3, 2010 2
Squeezing out the Air
Suppose you want to ship pillows in boxes andare charged by the size of the box
Lossless data compression
Entropy = lower limit of compressibility
Harvard QR48
-
5/20/2018 Entropy 2011
3/29
February 3, 2010 3
Claude Shannon (1916-2001)A Mathematical Theory of Communication (1948)
Harvard QR48
-
5/20/2018 Entropy 2011
4/29
February 3, 2010 4
Communication over a Channel
Source Coded Bits Received Bits Decoded Message
S X Y T
Channel
symbols bits bits symbolsEncode bits before putting them in the channel
Decode bits when they come out of the channel
E.g. the transformation from Sinto Xchangesyea --> 1 nay --> 0
Changing Y into Tdoes the reverse
For now, assume no noise in the channel, i.e. X=Y
Harvard QR48
-
5/20/2018 Entropy 2011
5/29
February 3, 2010 5
Example: TelegraphySource English letters -> Morse Code
D -..
-.. D
Baltimore
Washington-..
Harvard QR48
-
5/20/2018 Entropy 2011
6/29
February 3, 2010 6
Low and High Information ContentMessages The more frequent a message is, the less information it
conveys when it occurs
Two weather forecast messages:
Bos:
LA:
In LA Sunny is a low information message and cloudy isa high information message
Harvard QR48
-
5/20/2018 Entropy 2011
7/29February 3, 2010 7
Harvard Grades
Less information in Harvard grades now than in recentpast
% A A- B+ B B- C+
2005 24 25 21 13 6 2
1995 21 23 20 14 8 3
1986 14 19 21 17 10 5
Harvard QR48
-
5/20/2018 Entropy 2011
8/29February 3, 2010 8
Fixed Length Codes (Block Codes)
Example: 4 symbols, A, B, C, D
A=00, B=01, C=10, D=11
In general, with nsymbols, codes need to be oflength lg n, rounded up
For English text, 26 letters + space = 27 symbols,length = 5 since 24< 27 < 25
(replace all punctuation marks by space)
AKA block codes
Harvard QR48
-
5/20/2018 Entropy 2011
9/29February 3, 2010 9
Modeling the Message Source
Characteristics of the stream of messagescoming from the source affect the choice ofthe coding method
We need a model for a source of Englishtext that can be described and analyzedmathematically
Source Destination
Harvard QR48
-
5/20/2018 Entropy 2011
10/29February 3, 2010 10
How can we improve on block codes?
Simple 4-symbol example: A, B, C, D
If that is all we know, need 2 bits/symbol
What if we know symbol frequencies?
Use shorter codes for more frequent symbols Morse Code does something like this
Example:
A B C D
.7 .1 .1 .1
0 100 101 110
Harvard QR48
-
5/20/2018 Entropy 2011
11/29
February 3, 2010 11
Prefix Codes
Only one way to decode left to right
A B C D.7 .1 .1 .1
0 100 101 110
Harvard QR48
-
5/20/2018 Entropy 2011
12/29
February 3, 2010 12
Minimum Average Code Length?
Average bits per symbol:
A B C D
.7 .1 .1 .1
0 100 101 110
A B C D
.7 .1 .1 .1
0 10 110 111
.71+.12+.13+.13 = 1.5
.71+.13+.13+.13 = 1.6bits/symbol (down from 2)
Harvard QR48
-
5/20/2018 Entropy 2011
13/29
February 3, 2010 13
Entropy of this code
-
5/20/2018 Entropy 2011
14/29
February 3, 2010 Harvard QR48 14
Self-Information
If a symbol Shas frequencyp, its self-
informationis H(S) = lg(1/p) = -lgp.
S A B C D
p .25 .25 .25 .25
H(S) 2 2 2 2
p .7 .1 .1 .1
H(S) .51 3.32 3.32 3.32
-
5/20/2018 Entropy 2011
15/29
February 3, 2010 Harvard QR48 15
First-Order Entropy of Source= Average Self-Information
S A B C D
p .25 .25 .25 .25
-lgp 2 2 2 2
-plgp .5 .5 .5 .5
p .7 .1 .1 .1
-lgp .51 3.32 3.32 3.32
-plgp .357 .332 .332 .332
-plgp
2
1.353
-
5/20/2018 Entropy 2011
16/29
February 3, 2010 Harvard QR48 16
Entropy, Compressibility,Redundancy
Lower entropy More redundantMorecompressibleLess information
Higher entropyLess redundant Less
compressibleMore information A source of yeas and nays takes 24 bits per
symbol but contains at most one bit per symbol ofinformation
010110010100010101000001 = yea 010011100100000110101001 = nay
-
5/20/2018 Entropy 2011
17/29
February 8, 2010 Harvard QR48 17
A B C D
.7 .1 .1 .1
0 10 110 111
Entropy andCompression
Average length for this code=.71+.12+.13+.13 = 1.5
No code taking only symbol frequencies intoaccount can be better than first-order entropy
First-order Entropy of this source =.7lg(1/.7)+.1lg(1/.1)+ .1lg(1/.1)+.1lg(1/.1) =
1.353 First-order Entropy of English is about 4
bits/character based on typical English texts
Efficiency of code = (entropy of
source)/(average code length) = 1.353/1.5 =
-
5/20/2018 Entropy 2011
18/29
February 8, 2010 Harvard QR48 18
A Simple Prefix Code:Huffman Codes
Suppose we know the symbol frequencies. We
can calculate the (first-order) entropy. Can we
design a code to match?
There is an algorithm that transforms a set of
symbol frequencies into a variable-length, prefix
code that achieves average code length
approximately equal to the entropy.
David Huffman, 1951
-
5/20/2018 Entropy 2011
19/29
February 8, 2010 Harvard QR48 19
Huffman Code Example
A.35
B.05
C.2
D.15
E.25
BD
.2
BCD
.4
AE
.6
ABCDE
1.0
-
5/20/2018 Entropy 2011
20/29
February 8, 2010 Harvard QR48 20
Huffman Code ExampleA
.35
B
.05
C
.2
D
.15
E
.25
BD
.2
BCD
.4AE
.6
ABCDE
1.0
0 1
01
0
1
01
A 00
B 100
C 11D 101
E 01
Entropy 2.12
Ave
length2.20
-
5/20/2018 Entropy 2011
21/29
February 8, 2010 Harvard QR48 21
Efficiency of Huffman Codes
Huffman codes are as efficient as possible if only
first-order information (symbol frequencies) is
taken into account.
Huffman code is always within 1 bit/symbol of the
entropy.
-
5/20/2018 Entropy 2011
22/29
February 8, 2010 Harvard QR48 22
Second-Order Entropy
Second-Order Entropy of a source iscalculated by treating digrams as singlesymbols according to their frequencies
Occurrences of q and u are notindependent so it is helpful to treat qu asone
Second-order entropy of English is about3.3 bits/character
-
5/20/2018 Entropy 2011
23/29
How English Would Look Basedon frequencies alone
0: xfoml rxkhrjffjuj zlpwcfwkcyj ffjeyvkcqsghyd
qpaamkbzaacibzlhjqd
1: ocroh hli rgwr nmielwis eu ll nbnesebya th eei
alhenhttpa oobttva
2: On ie antsoutinys are t inctore st be s deamy
achin d ilonasive tucoowe at
3: IN NO IST LAT WHEY CRATICT FROURE BIRSGROCID PONDENOME OF DEMONSTURES OF
THE REPTAGIN IS REGOACTIONA
February 8, 2010 Harvard QR48 23
-
5/20/2018 Entropy 2011
24/29
How English Would Look Basedon word frequencies
1) REPRESENTING AND SPEEDILY IS AN GOOD
APT OR COME CAN DIFFERENT NATURAL HERE
HE THE A IN CAME THE TO OF TO EXPERT GRAY
COME TO FURNISHES THE LINE MESSAGE HADBE THESE
2) THE HEAD AND IN FRONTAL ATTACK ON AN
ENGLISH WRITER THAT THE CHARACTER OFTHIS POINT IS THEREFORE ANOTHER METHOD
FOR THE LETTERS THAT THE TIME OF WHO
EVER TOLD THE PROBLEM FOR AN
UNEXPECTEDFebruary 8, 2010 Harvard QR48 24
-
5/20/2018 Entropy 2011
25/29
February 8, 2010 Harvard QR48 25
What is entropy of English?
Entropy is the limit of the information persymbol using single symbols, digrams,trigrams,
Not really calculable because English is afinite language!
Nonetheless it can be determinedexperimentally using Shannons game
Answer: a little more than 1 bit/character
-
5/20/2018 Entropy 2011
26/29
February 8, 2010 Harvard QR48 26
Shannons Remarkable 1948 paper
-
5/20/2018 Entropy 2011
27/29
February 8, 2010 Harvard QR48 27
Shannons Source CodingTheorem
No code can achieve efficiency greaterthan 1, but
For any source, there are codes withefficiency as close to 1 as desired.
The proof does not give a method to findthe best codes. It just sets a limit on how
good they can be.
-
5/20/2018 Entropy 2011
28/29
February 8, 2010 Harvard QR48 28
Huffman coding used widely
Eg JPEGs use Huffman codes to for the
pixel-to-pixel changesin color values
Colors usually change gradually so there are manysmall numbers, 0, 1, 2, in this sequence
JPEGs sometimes use a fancier
compression method called arithmetic
coding
Arithmetic coding produces 5% better
compression
-
5/20/2018 Entropy 2011
29/29
February 8 2010 Harvard QR48 29
Why dont JPEGs use arithmeticcoding?
Because it is patented by IBM
United States Patent 4,905,297
Langdon, Jr. , et al. February 27, 1990
Arithmetic coding encoder and decoder system
AbstractApparatus and method for compressing and de-compressing binary decision data by arithmetic
coding and decoding wherein the estimated probability Qe of the less probable of the two decision events,
or outcomes, adapts as decisions are successively encoded. To facilitate coding computations, an augend
value A for the current number line interval is held to approximate
What if Huffman had patented his code?