ENTROPY - luc.devroye.orgluc.devroye.org/HenriMertens-Entropy.pdf · ENTROPY THE BASICS OF...
Transcript of ENTROPY - luc.devroye.orgluc.devroye.org/HenriMertens-Entropy.pdf · ENTROPY THE BASICS OF...
![Page 1: ENTROPY - luc.devroye.orgluc.devroye.org/HenriMertens-Entropy.pdf · ENTROPY THE BASICS OF INFORMATION THEORY Shannon 's theory from 1948. Shannon's view Lower bounds for compression](https://reader030.fdocuments.in/reader030/viewer/2022040123/5e182e2e8407cc647d667dab/html5/thumbnails/1.jpg)
ENTROPY
(Notes by Glenn. Mertens)
![Page 2: ENTROPY - luc.devroye.orgluc.devroye.org/HenriMertens-Entropy.pdf · ENTROPY THE BASICS OF INFORMATION THEORY Shannon 's theory from 1948. Shannon's view Lower bounds for compression](https://reader030.fdocuments.in/reader030/viewer/2022040123/5e182e2e8407cc647d667dab/html5/thumbnails/2.jpg)
ENTROPYTHE BASICS OF INFORMATION THEORY
Shannon 's theory from 1948.
Shannon 's view
Lower bounds for compression
Entropy E = Z.fi leg,Yp
.
.
The global view
Back to prefix codes.
. hempel - Ziv compression .
![Page 3: ENTROPY - luc.devroye.orgluc.devroye.org/HenriMertens-Entropy.pdf · ENTROPY THE BASICS OF INFORMATION THEORY Shannon 's theory from 1948. Shannon's view Lower bounds for compression](https://reader030.fdocuments.in/reader030/viewer/2022040123/5e182e2e8407cc647d667dab/html5/thumbnails/3.jpg)
Shannon 's view
IFEE -oe→Bt7EF → REFER
reproofed as a ( lossless compassion)path in atria
①
Ebbntght !10
HUGE -07
.
Expected length of compressed file-
Ewa: gameD D D D probability of seeing
files filet files file 4 input file "i
".
![Page 4: ENTROPY - luc.devroye.orgluc.devroye.org/HenriMertens-Entropy.pdf · ENTROPY THE BASICS OF INFORMATION THEORY Shannon 's theory from 1948. Shannon's view Lower bounds for compression](https://reader030.fdocuments.in/reader030/viewer/2022040123/5e182e2e8407cc647d667dab/html5/thumbnails/4.jpg)
So,
the last compression method, given
the fi 's,
is the
Huffman code.
BUT . . . .The tree is too large !
The pi 's are often not known precisely .
Nevertheless,
we know a lot about the best compression method :
Shannon 's theorem.
E E ruin !.
.
Pili s E t I.
all binarytrees
E = Epi legftp.. = (binary) entropy.
G- o )
![Page 5: ENTROPY - luc.devroye.orgluc.devroye.org/HenriMertens-Entropy.pdf · ENTROPY THE BASICS OF INFORMATION THEORY Shannon 's theory from 1948. Shannon's view Lower bounds for compression](https://reader030.fdocuments.in/reader030/viewer/2022040123/5e182e2e8407cc647d667dab/html5/thumbnails/5.jpg)
KRAFT 's INEQUALITY
Let Li be the depths of the leaves in a binary tree.
Then
? # E I.
Frodo (By induction : exercise ? )°
← →" " .
i÷!÷!÷÷÷÷÷÷.
![Page 6: ENTROPY - luc.devroye.orgluc.devroye.org/HenriMertens-Entropy.pdf · ENTROPY THE BASICS OF INFORMATION THEORY Shannon 's theory from 1948. Shannon's view Lower bounds for compression](https://reader030.fdocuments.in/reader030/viewer/2022040123/5e182e2e8407cc647d667dab/html5/thumbnails/6.jpg)
PROOF Of SHANNON 'S THEOREM
(LOWER BOUND ) E
Epik . -
- Epi lggeli-ER.bg,
Ghi: )tEpiloja= Zpilofageip.
t E
> -
ZK.CZ?p.-D#tE=
-
- + E
Read:
a
'
atiYig÷÷÷→I
![Page 7: ENTROPY - luc.devroye.orgluc.devroye.org/HenriMertens-Entropy.pdf · ENTROPY THE BASICS OF INFORMATION THEORY Shannon 's theory from 1948. Shannon's view Lower bounds for compression](https://reader030.fdocuments.in/reader030/viewer/2022040123/5e182e2e8407cc647d667dab/html5/thumbnails/7.jpg)
( UPPER BOUND )
By converse of Kraft i If ?Yzli ee,
then there exists
a binary tree with leaves at these depths ( enna 're ).
so, given Pi ,At
e , = togElp?
.
Then I € -2¥ ,p..
= Ipi =L.
fo,
we can use these
Li to make a tree.
Let that tree define the code.
The
expected length is 9- called theShannon - Fano code
-2 pili = Epitopes Epi leg ftp.t Epi = Et I.
So,
there is a code with expected length ⇐ Et I.
![Page 8: ENTROPY - luc.devroye.orgluc.devroye.org/HenriMertens-Entropy.pdf · ENTROPY THE BASICS OF INFORMATION THEORY Shannon 's theory from 1948. Shannon's view Lower bounds for compression](https://reader030.fdocuments.in/reader030/viewer/2022040123/5e182e2e8407cc647d667dab/html5/thumbnails/8.jpg)
The global viewunrealistic
If the input consists of an independent symbols froman alphabet A
,with symbol probabilities pi ,
then each symbol can be coded via Hoffman ,
and we have a total expected length
← M ×③¥ntqyof one symbolLower bound : 3 M * E
(entropy of file = sum of outagesSo
. . . . of the symbols)
It help togroup
the symbols in
groups of⑧ ,and
Huffman - code eachgot . T
a small number
01 : One could use Lampert - Ziv compression Csa later )
![Page 9: ENTROPY - luc.devroye.orgluc.devroye.org/HenriMertens-Entropy.pdf · ENTROPY THE BASICS OF INFORMATION THEORY Shannon 's theory from 1948. Shannon's view Lower bounds for compression](https://reader030.fdocuments.in/reader030/viewer/2022040123/5e182e2e8407cc647d667dab/html5/thumbnails/9.jpg)
Solution I : Hoffman coding on gaps of characters
be: Group "letters in set . of s :Caleb )
,Cbca)
,.
. .
Get pi by countingoccurrences in a file
Construct the Huffman code .
Code t decode as for prefix codes
![Page 10: ENTROPY - luc.devroye.orgluc.devroye.org/HenriMertens-Entropy.pdf · ENTROPY THE BASICS OF INFORMATION THEORY Shannon 's theory from 1948. Shannon's view Lower bounds for compression](https://reader030.fdocuments.in/reader030/viewer/2022040123/5e182e2e8407cc647d667dab/html5/thumbnails/10.jpg)
RECALL
ofI
toStart at root and detect a leaf .
Decode.
Repeat .
Coin pressed 00001 I I I 01010101sequence
in -
Decoded i ta ta III ! IFTime to decode = Length of compressed sequence .
![Page 11: ENTROPY - luc.devroye.orgluc.devroye.org/HenriMertens-Entropy.pdf · ENTROPY THE BASICS OF INFORMATION THEORY Shannon 's theory from 1948. Shannon's view Lower bounds for compression](https://reader030.fdocuments.in/reader030/viewer/2022040123/5e182e2e8407cc647d667dab/html5/thumbnails/11.jpg)
compressed sequences-The decoder :
Given the compressed sequence s
- de
1£ t
Given the binary tree with the code ; it.
root is t ;its leaves contain symbols of the alphabet A
.
seat (traveling pointer in t )while Isl > o :
G a- get neat Git from s
if 8=0 then a - left CoDelse x ← right Cx ]
if a is a leaf then output hey In ]Rot
![Page 12: ENTROPY - luc.devroye.orgluc.devroye.org/HenriMertens-Entropy.pdf · ENTROPY THE BASICS OF INFORMATION THEORY Shannon 's theory from 1948. Shannon's view Lower bounds for compression](https://reader030.fdocuments.in/reader030/viewer/2022040123/5e182e2e8407cc647d667dab/html5/thumbnails/12.jpg)
LEMPEL - Ziv COMPRESSION (taupe and Ziv,
1977 ) (Solution I )
Beagles:
zip , jpeg ,
most compression methods.
Feature: Undergenerous input file assumptions ,
the expectedlength of the compressed sequence
is close to E.
Method: Parse inept in smallest pieces never seen before
INIVT pie,q#60aa ab ale ab e c 6 G Gaa 6 a⑥ aaa G a ac
PIECE # O I 2 3 4 5 6 7 8 9 10 11 12
tta ! ! !: dictate data.
take He
painfulFastsymbol of piece
to front piece
![Page 13: ENTROPY - luc.devroye.orgluc.devroye.org/HenriMertens-Entropy.pdf · ENTROPY THE BASICS OF INFORMATION THEORY Shannon 's theory from 1948. Shannon's view Lower bounds for compression](https://reader030.fdocuments.in/reader030/viewer/2022040123/5e182e2e8407cc647d667dab/html5/thumbnails/13.jpg)
THE BINARY SEQUENCE .
Fr k - th piece ,we have
lT¥a symbol from the← alphabetan integer in too . . .sk - ⇒
l needs tofalktI bit.
piece # I 2 3 4 56 7 8 9 10 in i 7 12
Tof O 1 2 233 334 4 4 4 ,- -
ii.Tx I
x needs a fixed # of bits : TegelAT.
In output ,all bits axe clearly identified .
![Page 14: ENTROPY - luc.devroye.orgluc.devroye.org/HenriMertens-Entropy.pdf · ENTROPY THE BASICS OF INFORMATION THEORY Shannon 's theory from 1948. Shannon's view Lower bounds for compression](https://reader030.fdocuments.in/reader030/viewer/2022040123/5e182e2e8407cc647d667dab/html5/thumbnails/14.jpg)
DATA STRUCTURE
FOR CODING / DECODING .
:THE DIGITAL SEARCH TREE :
"
IAI -
anytrie "
INPUT
0aaaba babe ebb Gaa Ga
⑥aaaeoaac
PIECE # O I 2 3 4 5 6 7 8 9 10 11 12
ttoa.t.at. .! dictate !a data !. Ieago
10
÷E÷¥÷÷÷÷¥:*.÷÷÷ .
![Page 15: ENTROPY - luc.devroye.orgluc.devroye.org/HenriMertens-Entropy.pdf · ENTROPY THE BASICS OF INFORMATION THEORY Shannon 's theory from 1948. Shannon's view Lower bounds for compression](https://reader030.fdocuments.in/reader030/viewer/2022040123/5e182e2e8407cc647d667dab/html5/thumbnails/15.jpg)
In theparsing phase :
start at root and descend to a leafA add a symbol Cand new leaf )
to add a piece .
Exercise: If inept is of size n
,then write the
parsing algorithm that produces
andG) the tree
G) the sequence ( Oa ) ha ) lob ) . - -
and show that it takes time On ).
![Page 16: ENTROPY - luc.devroye.orgluc.devroye.org/HenriMertens-Entropy.pdf · ENTROPY THE BASICS OF INFORMATION THEORY Shannon 's theory from 1948. Shannon's view Lower bounds for compression](https://reader030.fdocuments.in/reader030/viewer/2022040123/5e182e2e8407cc647d667dab/html5/thumbnails/16.jpg)
In the decoding phase : keep a table of pointers .
piece #
%athenEta:impatient3 o 6 withpeanutpointers
4 I G - -
-
5 4 C
060 c
7 3 6€34aato 6 a
LI 2 812 a c
Decodepiece to as ⑥a) → ( o e a) → (01 c a)
Exercise: Writean Oln ) algorithm for decoding .