Lecture 2: Basic Information Theory TSBK01 Image Coding and Data Compression Jörgen Ahlberg Div. of...
-
Upload
roderick-countiss -
Category
Documents
-
view
217 -
download
3
Transcript of Lecture 2: Basic Information Theory TSBK01 Image Coding and Data Compression Jörgen Ahlberg Div. of...
Lecture 2:Lecture 2:Basic Information TheoryBasic Information Theory
TSBK01 Image Coding and Data CompressionTSBK01 Image Coding and Data Compression
Jörgen AhlbergDiv. of Sensor Technology
Swedish Defence Research Agency (FOI)
TodayToday
1.1. What is information theory about?What is information theory about?
2.2. Stochastic (information) sources.Stochastic (information) sources.
3.3. Information and entropy.Information and entropy.
4.4. Entropy for stochastic sources.Entropy for stochastic sources.
5.5. The source coding theorem.The source coding theorem.
Part 1:Part 1: Information Theory Information Theory
Claude ShannonClaude Shannon: A Mathematical Theory of Communication
The
Bell System Technical Journal, 1948
Sometimes referred to as ”Shannon-Weaver”, sincethe standalone publication has a foreword by Weaver.
Be careful!Be careful!
Quotes about ShannonQuotes about Shannon
””What is information? Sidestepping questions about What is information? Sidestepping questions about meaning, Shannon showed that it is a measurable meaning, Shannon showed that it is a measurable commodity”.commodity”.
””Today, Shannon’s insight help shape virtually all systems Today, Shannon’s insight help shape virtually all systems that store, process, or transmit information in digital form, that store, process, or transmit information in digital form, from compact discs to computers, from facsimile machines from compact discs to computers, from facsimile machines to deep space probes”.to deep space probes”.
””Information theory has also infilitrated fields outside Information theory has also infilitrated fields outside communications, including linguistics, psychology, communications, including linguistics, psychology, economics, biology, even the arts”.economics, biology, even the arts”.
Change to an efficient representation,i.e., data compression.
SourceChannel
coder
Source
coder
Channel
Source
decoder
Sink,
receiver
Channel
decoder
Channel
Any source of informationChange to an efficient representation for,
transmission, i.e., error control coding.
Recover from channel distortion.UncompressThe channel is anything transmitting or storing information –
a radio link, a cable, a disk, a CD, a piece of paper, …
Fundamental EntitiesFundamental Entities
SourceChannel
coder
Source
coder
Channel
Source
decoder
Sink,
receiver
Channel
decoder
Channel
HH: The information content of the source.
HH
RR: Rate from the source coder.
RRCC
CC
CC: Channel capacity.
Shannon 2Shannon 2: Source coding and channel coding can be optimized independently, and binary symbols can be used as intermediate format. Assumption: Arbitrarily long delays.
Fundamental TheoremsFundamental Theorems
SourceChannel
coder
Source
coder
Channel
Source
decoder
Sink,
receiver
Channel
decoder
Channel
HH RRCC
CC
Shannon 1Shannon 1: Error-free transmission possible if R¸H and C¸R.
Source coding theorem (simplified)Channel coding theorem (simplified)
Part 2:Part 2: Stochastic sources Stochastic sources
A source outputs A source outputs symbolssymbols XX11, , XX22, ..., ...
Each symbol take its value from an Each symbol take its value from an alphabetalphabet AA = (= (aa11, , aa22, …)., …).
Model:Model: PP((XX11,…,,…,XXNN)) assumed to be known for assumed to be known for
all combinations.all combinations.
Source X1, X2, …
Example 1: A text is a sequence of symbols each taking its value from the alphabetA = (a, …, z, A, …, Z, 1, 2, …9, !, ?, …).
Example 2: A (digitized) grayscale image is a sequence of symbols each taking its value from the alphabet A = (0,1) or A = (0, …, 255).
Two Special CasesTwo Special Cases
1.1. The Memoryless SourceThe Memoryless Source Each symbol independent of the previous Each symbol independent of the previous
ones.ones. PP((XX11, , XX22, …, , …, XXnn) = ) = PP((XX11) ) ¢¢ PP((XX22) ) ¢¢ … … ¢¢ PP((XXnn))
2.2. The Markov SourceThe Markov Source Each symbol depends on the previous one.Each symbol depends on the previous one. PP((XX11, , XX22, …, , …, XXnn)) = = PP((XX11) ) ¢¢ PP((XX22||XX11) ) ¢¢ PP((XX33||XX22) ) ¢¢ … …
¢¢ PP((XXnn|X|Xnn-1-1))
The Markov SourceThe Markov Source
A symbol depends only on the previous A symbol depends only on the previous symbol, so the source can be modelled by a symbol, so the source can be modelled by a state diagram.state diagram.
a
b
c
1.00.5
0.7
0.3
0.30.2
A ternary source withalphabet A = (a, b, c).
The Markov SourceThe Markov Source
Assume we are in state Assume we are in state aa, i.e., , i.e., XXkk = = aa..
The probabilities for the next symbol are:The probabilities for the next symbol are:
a
b
c
1.00.5
0.7
0.3
0.30.2
PP((XXkk+1+1 = = a | Xa | Xkk = a = a) = 0.3) = 0.3
PP((XXkk+1+1 = = b | Xb | Xkk = a = a) = 0.7) = 0.7
PP((XXkk+1+1 = = c | Xc | Xkk = a = a) = 0) = 0
The Markov SourceThe Markov Source
So, if So, if XXkk+1+1 = = bb, we know that , we know that XXkk+2+2 will will
equal equal cc..
a
b
c
1.00.5
0.7
0.3
0.30.2
PP((XXkk+2+2 = = a | Xa | Xkk+1+1 = b = b) = 0) = 0
PP((XXkk+2+2 = = b | Xb | Xkk+1+1 = b = b) = 0) = 0
PP((XXkk+2+2 = = c | Xc | Xkk+1+1 = b = b) = 1) = 1
The Markov SourceThe Markov Source
If all the states can be reached, the If all the states can be reached, the stationary probabilitiesstationary probabilities for the states can be for the states can be calculated from the given transition calculated from the given transition probabilities.probabilities.
Markov models can be used to represent Markov models can be used to represent sources with dependencies more than one sources with dependencies more than one step back.step back.– Use a state diagram with several symbols in Use a state diagram with several symbols in
each state.each state.
Stationary probabilities? That’s theprobabilities i = P(Xk = ai) for anyk when Xk-1, Xk-2, … are not given.
Analysis and SynthesisAnalysis and Synthesis
Stochastic models can be used for Stochastic models can be used for analysinganalysing a source. a source.– Find a model that well represents the real-world Find a model that well represents the real-world
source, and then analyse the model instead of source, and then analyse the model instead of the real world.the real world.
Stochastic models can be used for Stochastic models can be used for synthesizingsynthesizing a source. a source. – Use a random number generator in each step of Use a random number generator in each step of
a Markov model to generate a sequence a Markov model to generate a sequence simulating the source.simulating the source.
Show plastic slides!
Part 3:Part 3: Information and Entropy Information and Entropy
Assume a binary memoryless source, e.g., a flip of Assume a binary memoryless source, e.g., a flip of a coin. How much information do we receive when a coin. How much information do we receive when we are told that the outcome is we are told that the outcome is headsheads??– If it’s a fair coin, i.e., If it’s a fair coin, i.e., PP((headsheads) = ) = PP ( (tailstails) = 0.5, we say ) = 0.5, we say
that the that the amount of information is amount of information is 1 bit1 bit..– If we already know that it will be (or was) heads, i.e., If we already know that it will be (or was) heads, i.e.,
PP((headsheads) = 1, the ) = 1, the amount of information is amount of information is zerozero!!– If the coin is not fair, e.g., If the coin is not fair, e.g., PP((headsheads) = 0.9, the ) = 0.9, the amount of amount of
information is information is more than zero but less than one bitmore than zero but less than one bit!!– Intuitively, the amount of information received Intuitively, the amount of information received is the is the
samesame if if PP((headsheads) = 0.9 or ) = 0.9 or PP ( (headsheads) = 0.1.) = 0.1.
Self InformationSelf Information
So, let’s look at it the way Shannon did.So, let’s look at it the way Shannon did. Assume a memoryless source withAssume a memoryless source with
– alphabet alphabet AA = (= (aa11, …, a, …, ann))
– symbol probabilities symbol probabilities ((pp11, …, p, …, pnn))..
How much information do we get when How much information do we get when finding out that the next symbol is finding out that the next symbol is aaii??
According to Shannon the According to Shannon the self informationself information of of aaii is is
Why?Why?Assume Assume two independent eventstwo independent events AA and and BB, with, withprobabilities probabilities PP((AA)) = p = pAA and and PP((BB)) = p = pBB..
For both the events to happen, the probability isFor both the events to happen, the probability isppAA ¢¢ p pBB. However, the . However, the amount of informationamount of information
should be addedshould be added, not multiplied., not multiplied.
Logarithms satisfy this!
No, we want the information to increase withdecreasing probabilities, so let’s use the negativelogarithm.
Self InformationSelf Information
Example 1:Example 1:
Example 2:Example 2:
Which logarithm?Which logarithm? Pick the one you like! If you pick the natural log, Pick the one you like! If you pick the natural log,you’ll measure in you’ll measure in natsnats, if you pick the 10-log, you’ll get , if you pick the 10-log, you’ll get HartleysHartleys,,if you pick the 2-log (like everyone else), you’ll get if you pick the 2-log (like everyone else), you’ll get bitsbits..
Self InformationSelf Information
HH((XX)) is called the first order is called the first order entropyentropy of the source. of the source.
This can be regarded as the degree of This can be regarded as the degree of uncertaintyuncertaintyabout the following symbol.about the following symbol.
On On average over all the symbolsaverage over all the symbols, we get:, we get:
EntropyEntropy
Example:Example: Binary Memoryless Source
BMS 0 1 1 0 1 0 0 0 …
1
0 0.5 1
The uncertainty (information) is greatest when
Often denotedThen
Let
Entropy: Three propertiesEntropy: Three properties
1.1. It can be shown that It can be shown that 0 0 ·· HH ·· log Nlog N..
2.2. Maximum entropyMaximum entropy ( (H = log NH = log N) is reached ) is reached when all symbols are when all symbols are equiprobableequiprobable, i.e.,, i.e.,ppii = = 11/N/N..
3.3. The difference The difference log N – Hlog N – H is called the is called the redundancyredundancy of the source. of the source.
Part 4:Part 4: Entropy for Memory Sources Entropy for Memory Sources
Assume a block of source symbols Assume a block of source symbols ((XX11, …, , …,
XXnn)) and define the and define the block entropyblock entropy::
The The entropy for a memory sourceentropy for a memory source is defined is defined as:as:
That is, the summation is done over all possible combinations of That is, the summation is done over all possible combinations of nn symbols. symbols.
That is, let the block length go towards infintity.That is, let the block length go towards infintity.Divide by Divide by nn to get the number of to get the number of bits / symbolbits / symbol..
Entropy for a Markov SourceEntropy for a Markov Source
The The entropy for a state Sentropy for a state Skk can be expressed as can be expressed as
Averaging over all states, we get theAveraging over all states, we get theentropy for the Markov sourceentropy for the Markov source as as
PPklkl is the transition probability from state is the transition probability from state kk to state to state ll..
The Run-length SourceThe Run-length Source
Certain sources generate long Certain sources generate long runsruns or or burstsbursts of of equal symbols.equal symbols.
Example:Example:
Probability for a burst of length Probability for a burst of length rr: : PP((rr)) = = ((1-1-))r-1r-1¢¢ EntropyEntropy: : HHRR = - = - r=1r=1
11 PP((rr) ) loglog PP((rr)) If the average run length is If the average run length is , then , then HHRR// = = HHMM..
A B
Part 5:Part 5: The Source Coding Theorem The Source Coding Theorem
The entropy is the smallest number of bitsThe entropy is the smallest number of bitsallowing error-free representation of the source.allowing error-free representation of the source.
Why is this? Let’s take a look on typical sequences!
Typical SequencesTypical Sequences
Assume a Assume a longlong sequence from a binary sequence from a binary memoryless source with memoryless source with PP(1) = (1) = pp..
Among n bits, there will be approximatelyAmong n bits, there will be approximatelyw = n w = n ¢¢ p p ones.ones.
Thus, there is Thus, there is MM = ( = (n n over over ww) ) such such typical typical sequencessequences!!
Only these sequences are interesting.Only these sequences are interesting. All All other sequences will appear with smaller other sequences will appear with smaller probability the larger is probability the larger is nn..
How many are the typical How many are the typical sequences?sequences?
bits/symbol
Enumeration needs log M bits, i.e,bits per symbol!
How many bits do we need?How many bits do we need?
Thus, we need Thus, we need HH((XX)) bits per symbol bits per symbolto code any typical sequence!to code any typical sequence!
The Source Coding TheoremThe Source Coding Theorem
Does tell usDoes tell us– that we can represent the output from a source that we can represent the output from a source
XX using using HH((XX)) bits/symbol. bits/symbol.– that we cannot do better.that we cannot do better.
Does not tell usDoes not tell us– how to do it.how to do it.
SummarySummary
The mathematical model of communication.The mathematical model of communication.– Source, source coder, channel coder, channel,…Source, source coder, channel coder, channel,…– Rate, entropy, channel capacity.Rate, entropy, channel capacity.
Information theoretical entitiesInformation theoretical entities– Information, self-information, uncertainty, entropy.Information, self-information, uncertainty, entropy.
SourcesSources– BMS, Markov, RLBMS, Markov, RL
The Source Coding TheoremThe Source Coding Theorem