Post on 23-Dec-2015
Recognition
stimulus input Observer(information transmission channel)
response
• Response:which category the stimulus belongs to ?
What is the “information value” of recognizing the category?
Informationarea reduced to 63/64 area reduced to 1/64
NOTHERENOTHERE
area reduced to 1/2
• The amount of information gained by receiving the signal is proportional to ratio of these two areas
Prior information(possible space of signals)
Posterior(possible space after the signal is received)
The less likely the outcome, the more information is gained!
The information in a symbol s should be inversely proportional to the probability of the symbol p.
Also
a juggling machine, rocket-powered Frisbees, motorized Pogo sticks, a device that could solve the Rubik's Cube puzzle,…..
Basics of Information TheoryClaude Elwood Shannon (1916-2001)
• Observe output message• Try to make up the input message (gain new
information)
Measuring the information
€
1) Information must be positive i(p) > 0
€
2) Information from independent events
(i.e. when probabilities multiply)
must add i(p1p2) = i(p1) + i(p2)
Multiplication turns to addition
Is always positive (since p<1)€
i(p) ≈ log(1/ p) = −log(p)
information in an event
€
Then the probability of the message
P = p1Mp1 ⋅ p2
Mp2 ⋅⋅⋅⋅⋅ pnMpn =
M
(p1p1⋅p2
p2 ⋅⋅⋅pnpn )
€
The information (logarithm of the inverse probability)
in the whole message will be
log(1/P) = −M⋅ pii=1
n
∑ ⋅ log(pi)
€
and the information per character (entropy of the alphabet S)
H(s) = 1M log(1/P) = − pi
i=1
n
∑ ⋅ log(pi)
€
If the message consists of (very many) M characters from alphabet S,
which consists of n symbols, there will be about Mp1 of the first symbol,
Mp2 of the second, e.c.t.
1 bit of information reduces the area of possible messages to half
When log2 , then entropy is in bits
€
Entropy
H(s) = − pii=1
n
∑ ⋅ log(pi)
Information gained
when deciding among N (equally likely) alternatives
Number of stimulus alternatives N
Number of Bits (log2N)
21=2 1
22 = 4 2
8 3
16 4
32 5
64 6
128 7
28 = 256 8
€
H = − pii=1
n
∑ ⋅ log2(pi)experiments with two possible outcomes with probabilities p1 and p2
total probability must be 1, so p2=1- p1
H=-p1 log2 p1 – (1– p1) log2 (1-p1)
€
limp→ 0 p log2 p = 0
i.e. H=0 for p1=0 (the second outcome certain) or p1=1 (the first outcome certain)
for p1 = 0.5, p2=0.5
H=-0.5 log2 0.5 - 0.5 log2 0.5 = log2 0.5 = 1
Entropy H (information) is maximum when the outcome is the least predictable !
€
For a given alphabet S, the entropy (i.e. the information)
will be highest, when all symbols are equally likely
HMAX = − pii=1
n
∑ ⋅ log(pi) = 1/ni=1
n
∑ log(n) = logn
1st or 2nd half ?
Equal prior probability of each category.
need 3 binary numbers (3 bits) to describe 23 = 8 categories
need more bits when dealing with symbols that are not all equally likely
5 bits
The Bar Code
bits. 58.26loglog
6 digits'' ofnumber The
22
kH
k
code). robust'' less(but efficient More
!!!!! bits 532log'
ns.combinatio 322 becan there
,meaningful is )digits'(' fingers oforder theIf
2
5
H
k
With no noise in the channel, p(xi|yi)=1 and p(xi,yj) = 0
p(x) p(y|x) p(y)
1
1
0
0
p(x1) p(y1)=p(x1)
p(x2) p(y2)=p(x2)
With noise, p(xi|yi)<1 and p(xi,yj) > 0
5/8
3/4
1/4
3/8
0.8 0.55
0.2 0.45
p(y1)=(5/8x0.8)+(1/4x0.2)=0.55p(y2)=(3/8x0.8)+(3/4x0.2)=0.45
transmitter(source) channel receiver
p(X) p(Y|X) p(Y)
noise
p(y1|x1)p(x1) p(y1)
p(x2) p(y2)
p(y2|x2)
p(y2|x1)p(y1|x2)
Two element (binary) channel
Information transfer through a communication channel
p(y1|x1)p(x1) p(y1)
p(x2) p(y2)
p(y1|x1)
p(y||x1)
p(y1|x2)
Binary Channel
N11 N12 Nstim 1
N21 N22 Nstim 2
Nres 1 Nres 2 N
stimulus 1
stimulus 2
number ofresponses
response 1 response 2 number of stimuli
total numberof stimuli(or responses)
p( xj ) = Nstim j / N
joint probability that both xj and yk happen is p( xj,y k) = Njk / N
p( xj|yk ) = Njk / Nres k p( yk ) = Nres k / N
y1 y2 yn total
x1 N11 N12 N1n Nstim 1
x2 N21 N22
xn Nn1 Nnn Nstim n
total N1row Nnrow N
calledstimulus
received response
Stimulus-Response Confusion Matrix
number of j-th stimuli Σk Njk=N stim j
number of k-th responses Σj Njk=N res k
number of called stimuli = number of responses = ΣkN res k = ΣjNstim j = N
probability of xjth symbolp( xj ) = Nstim j / N
joint probability that both xj and yk happenp( xj,y k) = Njk / N
conditional probability that xj was sent whenyk was received p( xj|yk ) = Njk / Nres k
probability of ykth symbolp( yk ) = Nres k / N
€
entropy of the input
H(X) = − p(x jj=1
n
∑ )log2 p(x j )
€
entropy of the output
H(Y ) = − p(ykk=1
n
∑ )log2 p(yk )
€
joint entropy of the input and the output
H(X,Y ) = − p(x j ,ykk=1
n
∑ )log2 p(x j,yk )
j=1
n
∑
€
maximum entropy given the input and the output
Hmax (X,Y ) = − p(x j )p(ykk=1
n
∑ )log2 p(x j)p(yk )
j=1
n
∑
This happens when the input and the output are independent (joint probabilities are given by products of the individual probabilities). There is no relation of the output to the input, i.e. no information transfer)
information transferred by the system
I (X|Y) = Hmax(X,Y)-H(X,Y)
stim 1 stim 2
resp 1 10 0 10
resp 2 0 10 10
10 10 20
run experiment 20 timesget it always RIGHT
0.5 0
0 0.5
input probabilitiesp(x1)=0.5p(x2)=0.5output probabilitiesp(y1)=0.5p(x2)=0.5
joint probabilities p(xj,yk)
€
joint entropy of the input given the output
H(X,Y ) = − p(x j ,ykj=1
n
∑ )log2 p(x j ,yk )
= 0.5 ×1+ 0 × 0 + 0.5 ×1+ 0 × 0 =1 bit
transferred information
I(X|Y)=Hmax(X,Y)-H(X,Y) =2-1=1 bit
0.25 0.25
0.25 0.25
probabilities ofindependent events
€
maximum entropy given the input and the output
Hmax (X,Y ) = − p(x j )p(ykk=1
n
∑ )log2 p(x j)p(yk )
j=1
n
∑
= 0.25 × 2 + 0.25 × 2 + 0.25 × 2 + 0.25 × 2 = 2 bits
stim 1 stim 2
resp 1 0 10 10
resp 2 10 0 10
10 10 20
run experiment 20 timesget it always WRONG
0 0.5
0.5 0
input probabilitiesp(x1)=0.5p(x2)=0.5output probabilitiesp(y1)=0.5p(x2)=0.5
joint probabilities p(xj,yk)
€
joint entropy of the input given the output
H(X,Y ) = − p(x j ,ykj=1
n
∑ )log2 p(x j ,yk )
= 0 × 0 + 0.5 ×1+ 0 × 0 + 0.5 ×1 =1 bit
transferred informationI(X;Y)=Hmax(X,Y)-H(X,Y) =2-1=1 bit
0.25 0.25
0.25 0.25
probabilities ofindependent events
€
maximum entropy given the input and the output
Hmax (X,Y ) = − p(x j )p(ykk=1
n
∑ )log2 p(x j)p(yk )
j=1
n
∑
= 0.25 × 2 + 0.25 × 2 + 0.25 × 2 + 0.25 × 2 = 2 bits
€
0.5log2 0.5
stim 1 stim 2
resp 1 5 5 10
resp 2 5 5 10
10 10 20
run experiment 20 timesget it 10 times right and 10 times wrong
0.25 0.25
02.5 0.25
input probabilitiesp(x1)=0.5p(x2)=0.5output probabilitiesp(y1)=0.5p(x2)=0.5
joint probabilities p(xj,yk)
€
joint entropy of the input given the output
H(X,Y ) = − p(x j ,ykj=1
n
∑ )log2 p(x j ,yk )
= 0.25 × 2 + 0.25 × 2 + 0.25 × 2 + 0.25 × 2 = 2 bits
transferred informationI(X;Y)=Hmax(X,Y)-H(X,Y) =2-2=0 bit
0.25 0.25
0.25 0.25
probabilities ofindependent events
€
maximum entropy given the input and the output
Hmax (X,Y ) = − p(x j )p(ykk=1
n
∑ )log2 p(x j)p(yk )
j=1
n
∑
= 0.25 × 2 + 0.25 × 2 + 0.25 × 2 + 0.25 × 2 = 2 bits
response categories number of stimuli
stimulicategories
y1 y2 y3 y4 y5
x1 20 5 0 0 0 25
x2 5 15 5 0 0 25
x3 0 6 17 2 0 25
x4 0 0 5 12 8 25
x5 0 0 0 6 19 25
number of responses
25 26 27 20 27 125
y1 y2 yn
x1 N11 N12 N1n
x2 N21 N22 N2n
xn Nn1 Nnn
Matrix of Joint Probabilities(stimulus-response matrix divided by total number of stimuli)
y1 y2 yn
x1 p(x1,y1) p(x1,y2) p(x1,yn)
x2 p(x2,y1) p(x2,y2) p(x2,yn)
xn p(xn,y1) p(xn,y2) p(xn,yn)
joint probabilitiesstimuli-responses
number of called stimuli=number of responses=N
p(xi,yj) = Nij/N
responses number of
stimuli
probability of stimulus
stimuli y1 y2 y3 y4 y5
x1 20 5 0 0 0 25 25/125= 0.2
x2 5 15 5 0 0 25 25/125=0.2
x3 0 6 17 2 0 25 25/125=0.2
x4 0 0 5 12 8 25 25/125=0.2
x5 0 0 0 6 19 25 25/125=0.2
number of responses
25 26 27 20 27 125
probability of response
25/125= 0.2
26/125=0.208
27/125=0.216
20/125=0.16
27/125=0.216
stimulus/response confusion matrix
y1 y2 y3 y4 y5
x1 20/125 =0.16
5/125=0.04
0 0 0
x2 5/125 =0.04
15/125=0.12
5/125=0.04
0 0
x3 0 6/125=0.048
17/125=0.136
2/125=0.016
0
x4 0 0 5/125=0.04
12/125=0.096
8/125=0.064
x5 0 0 0 6/125=0.048
19/125=0.152
matrix of joint probabilities p(xj,yk)
total number of stimuli (responses) N = 125joint probability p(x\xj,yk) = xiyj/N
€
joint entropy of the stimus/response system
H(X,Y ) = − p(x j ,ykk=1
n
∑j=1
n
∑ )log2 p(x j ,yk ) = 3.43 bits
when xi and yj are independent events (i.e. output does not depend on input), the joint probability would be given by a product of probabilities of these independent events P(xi,yj) = p(xi) p(yj), and the entropy of the system would be maximum Hmax (the system would be entirely useless for transmission of the information, since its output would not depend on its input)
y1 y2 y3 y4 y5
x1 20/125 =0.16
5/125=0.04
0 0 0
x2 5/125 =0.04
15/125=0.12
5/125=0.04
0 0
x3 0 6/125=0.048
17/125=0.136
2/125=0.016
0
x4 0 0 5/125=0.04
12/125=0.096
8/125=0.064
x5 0 0 0 6/125=0.048
19/125=0.152
€
joint entropy of the stimus/response system
Hmax (X,Y ) = − P(x j ,ykk=1
n
∑j=1
n
∑ )log2 P(x j ,yk ) = 4.63 bits
The information that is transmitted by the system is given by a difference between the maximum joint entropy of the matrix of independent events Hmax (X,Y) and the joint entropy of the real system (derived from the confusion matrix H(X,Y).
I(X;Y) =Hmax (X,Y) – X(X,Y) = 4.63 – 3.41 = 1.2 bits
Capacity of human channel for one-dimensional stimuli
Magic number 7±2 (between 2-3 bits)(George Miller 1956)
• Human perception seems to distinguish only among 7 (plus or minus 2) different entities along one perceptual dimension
• To recognize more items– long training (musicians)– use more than one perceptual dimension (e.g.
pitch and loudness)– chunk the items into larger chunks (phonemes to
words, words to phrases,..)
Magic number 7±2 (between 2-3 bits)(George Miller 1956)