Recognition stimulus input Observer (information transmission channel) response Response: which...

Post on 23-Dec-2015

218 views 0 download

Tags:

Transcript of Recognition stimulus input Observer (information transmission channel) response Response: which...

Recognition

stimulus input Observer(information transmission channel)

response

• Response:which category the stimulus belongs to ?

What is the “information value” of recognizing the category?

Informationarea reduced to 63/64 area reduced to 1/64

NOTHERENOTHERE

area reduced to 1/2

• The amount of information gained by receiving the signal is proportional to ratio of these two areas

Prior information(possible space of signals)

Posterior(possible space after the signal is received)

The less likely the outcome, the more information is gained!

The information in a symbol s should be inversely proportional to the probability of the symbol p.

Also

a juggling machine, rocket-powered Frisbees, motorized Pogo sticks, a device that could solve the Rubik's Cube puzzle,…..

Basics of Information TheoryClaude Elwood Shannon (1916-2001)

• Observe output message• Try to make up the input message (gain new

information)

Measuring the information

1) Information must be positive i(p) > 0

2) Information from independent events

(i.e. when probabilities multiply)

must add i(p1p2) = i(p1) + i(p2)

Multiplication turns to addition

Is always positive (since p<1)€

i(p) ≈ log(1/ p) = −log(p)

information in an event

Then the probability of the message

P = p1Mp1 ⋅ p2

Mp2 ⋅⋅⋅⋅⋅ pnMpn =

M

(p1p1⋅p2

p2 ⋅⋅⋅pnpn )

The information (logarithm of the inverse probability)

in the whole message will be

log(1/P) = −M⋅ pii=1

n

∑ ⋅ log(pi)

and the information per character (entropy of the alphabet S)

H(s) = 1M log(1/P) = − pi

i=1

n

∑ ⋅ log(pi)

If the message consists of (very many) M characters from alphabet S,

which consists of n symbols, there will be about Mp1 of the first symbol,

Mp2 of the second, e.c.t.

1 bit of information reduces the area of possible messages to half

When log2 , then entropy is in bits

Entropy

H(s) = − pii=1

n

∑ ⋅ log(pi)

Information gained

when deciding among N (equally likely) alternatives

Number of stimulus alternatives N

Number of Bits (log2N)

21=2 1

22 = 4 2

8 3

16 4

32 5

64 6

128 7

28 = 256 8

H = − pii=1

n

∑ ⋅ log2(pi)experiments with two possible outcomes with probabilities p1 and p2

total probability must be 1, so p2=1- p1

H=-p1 log2 p1 – (1– p1) log2 (1-p1)

limp→ 0 p log2 p = 0

i.e. H=0 for p1=0 (the second outcome certain) or p1=1 (the first outcome certain)

for p1 = 0.5, p2=0.5

H=-0.5 log2 0.5 - 0.5 log2 0.5 = log2 0.5 = 1

Entropy H (information) is maximum when the outcome is the least predictable !

For a given alphabet S, the entropy (i.e. the information)

will be highest, when all symbols are equally likely

HMAX = − pii=1

n

∑ ⋅ log(pi) = 1/ni=1

n

∑ log(n) = logn

1st or 2nd half ?

Equal prior probability of each category.

need 3 binary numbers (3 bits) to describe 23 = 8 categories

need more bits when dealing with symbols that are not all equally likely

5 bits

The Bar Code

bits. 58.26loglog

6 digits'' ofnumber The

22

kH

k

code). robust'' less(but efficient More

!!!!! bits 532log'

ns.combinatio 322 becan there

,meaningful is )digits'(' fingers oforder theIf

2

5

H

k

With no noise in the channel, p(xi|yi)=1 and p(xi,yj) = 0

p(x) p(y|x) p(y)

1

1

0

0

p(x1) p(y1)=p(x1)

p(x2) p(y2)=p(x2)

With noise, p(xi|yi)<1 and p(xi,yj) > 0

5/8

3/4

1/4

3/8

0.8 0.55

0.2 0.45

p(y1)=(5/8x0.8)+(1/4x0.2)=0.55p(y2)=(3/8x0.8)+(3/4x0.2)=0.45

transmitter(source) channel receiver

p(X) p(Y|X) p(Y)

noise

p(y1|x1)p(x1) p(y1)

p(x2) p(y2)

p(y2|x2)

p(y2|x1)p(y1|x2)

Two element (binary) channel

Information transfer through a communication channel

p(y1|x1)p(x1) p(y1)

p(x2) p(y2)

p(y1|x1)

p(y||x1)

p(y1|x2)

Binary Channel

N11 N12 Nstim 1

N21 N22 Nstim 2

Nres 1 Nres 2 N

stimulus 1

stimulus 2

number ofresponses

response 1 response 2 number of stimuli

total numberof stimuli(or responses)

p( xj ) = Nstim j / N

joint probability that both xj and yk happen is p( xj,y k) = Njk / N

p( xj|yk ) = Njk / Nres k p( yk ) = Nres k / N

y1 y2 yn total

x1 N11 N12 N1n Nstim 1

x2 N21 N22

xn Nn1 Nnn Nstim n

total N1row Nnrow N

calledstimulus

received response

Stimulus-Response Confusion Matrix

number of j-th stimuli Σk Njk=N stim j

number of k-th responses Σj Njk=N res k

number of called stimuli = number of responses = ΣkN res k = ΣjNstim j = N

probability of xjth symbolp( xj ) = Nstim j / N

joint probability that both xj and yk happenp( xj,y k) = Njk / N

conditional probability that xj was sent whenyk was received p( xj|yk ) = Njk / Nres k

probability of ykth symbolp( yk ) = Nres k / N

entropy of the input

H(X) = − p(x jj=1

n

∑ )log2 p(x j )

entropy of the output

H(Y ) = − p(ykk=1

n

∑ )log2 p(yk )

joint entropy of the input and the output

H(X,Y ) = − p(x j ,ykk=1

n

∑ )log2 p(x j,yk )

j=1

n

maximum entropy given the input and the output

Hmax (X,Y ) = − p(x j )p(ykk=1

n

∑ )log2 p(x j)p(yk )

j=1

n

This happens when the input and the output are independent (joint probabilities are given by products of the individual probabilities). There is no relation of the output to the input, i.e. no information transfer)

information transferred by the system

I (X|Y) = Hmax(X,Y)-H(X,Y)

stim 1 stim 2

resp 1 10 0 10

resp 2 0 10 10

10 10 20

run experiment 20 timesget it always RIGHT

0.5 0

0 0.5

input probabilitiesp(x1)=0.5p(x2)=0.5output probabilitiesp(y1)=0.5p(x2)=0.5

joint probabilities p(xj,yk)

joint entropy of the input given the output

H(X,Y ) = − p(x j ,ykj=1

n

∑ )log2 p(x j ,yk )

= 0.5 ×1+ 0 × 0 + 0.5 ×1+ 0 × 0 =1 bit

transferred information

I(X|Y)=Hmax(X,Y)-H(X,Y) =2-1=1 bit

0.25 0.25

0.25 0.25

probabilities ofindependent events

maximum entropy given the input and the output

Hmax (X,Y ) = − p(x j )p(ykk=1

n

∑ )log2 p(x j)p(yk )

j=1

n

= 0.25 × 2 + 0.25 × 2 + 0.25 × 2 + 0.25 × 2 = 2 bits

stim 1 stim 2

resp 1 0 10 10

resp 2 10 0 10

10 10 20

run experiment 20 timesget it always WRONG

0 0.5

0.5 0

input probabilitiesp(x1)=0.5p(x2)=0.5output probabilitiesp(y1)=0.5p(x2)=0.5

joint probabilities p(xj,yk)

joint entropy of the input given the output

H(X,Y ) = − p(x j ,ykj=1

n

∑ )log2 p(x j ,yk )

= 0 × 0 + 0.5 ×1+ 0 × 0 + 0.5 ×1 =1 bit

transferred informationI(X;Y)=Hmax(X,Y)-H(X,Y) =2-1=1 bit

0.25 0.25

0.25 0.25

probabilities ofindependent events

maximum entropy given the input and the output

Hmax (X,Y ) = − p(x j )p(ykk=1

n

∑ )log2 p(x j)p(yk )

j=1

n

= 0.25 × 2 + 0.25 × 2 + 0.25 × 2 + 0.25 × 2 = 2 bits

0.5log2 0.5

stim 1 stim 2

resp 1 5 5 10

resp 2 5 5 10

10 10 20

run experiment 20 timesget it 10 times right and 10 times wrong

0.25 0.25

02.5 0.25

input probabilitiesp(x1)=0.5p(x2)=0.5output probabilitiesp(y1)=0.5p(x2)=0.5

joint probabilities p(xj,yk)

joint entropy of the input given the output

H(X,Y ) = − p(x j ,ykj=1

n

∑ )log2 p(x j ,yk )

= 0.25 × 2 + 0.25 × 2 + 0.25 × 2 + 0.25 × 2 = 2 bits

transferred informationI(X;Y)=Hmax(X,Y)-H(X,Y) =2-2=0 bit

0.25 0.25

0.25 0.25

probabilities ofindependent events

maximum entropy given the input and the output

Hmax (X,Y ) = − p(x j )p(ykk=1

n

∑ )log2 p(x j)p(yk )

j=1

n

= 0.25 × 2 + 0.25 × 2 + 0.25 × 2 + 0.25 × 2 = 2 bits

response categories number of stimuli

stimulicategories

y1 y2 y3 y4 y5

x1 20 5 0 0 0 25

x2 5 15 5 0 0 25

x3 0 6 17 2 0 25

x4 0 0 5 12 8 25

x5 0 0 0 6 19 25

number of responses

25 26 27 20 27 125

y1 y2 yn

x1 N11 N12 N1n

x2 N21 N22 N2n

xn Nn1 Nnn

Matrix of Joint Probabilities(stimulus-response matrix divided by total number of stimuli)

y1 y2 yn

x1 p(x1,y1) p(x1,y2) p(x1,yn)

x2 p(x2,y1) p(x2,y2) p(x2,yn)

xn p(xn,y1) p(xn,y2) p(xn,yn)

joint probabilitiesstimuli-responses

number of called stimuli=number of responses=N

p(xi,yj) = Nij/N

responses number of

stimuli

probability of stimulus

stimuli y1 y2 y3 y4 y5

x1 20 5 0 0 0 25 25/125= 0.2

x2 5 15 5 0 0 25 25/125=0.2

x3 0 6 17 2 0 25 25/125=0.2

x4 0 0 5 12 8 25 25/125=0.2

x5 0 0 0 6 19 25 25/125=0.2

number of responses

25 26 27 20 27 125

probability of response

25/125= 0.2

26/125=0.208

27/125=0.216

20/125=0.16

27/125=0.216

stimulus/response confusion matrix

y1 y2 y3 y4 y5

x1 20/125 =0.16

5/125=0.04

0 0 0

x2 5/125 =0.04

15/125=0.12

5/125=0.04

0 0

x3 0 6/125=0.048

17/125=0.136

2/125=0.016

0

x4 0 0 5/125=0.04

12/125=0.096

8/125=0.064

x5 0 0 0 6/125=0.048

19/125=0.152

matrix of joint probabilities p(xj,yk)

total number of stimuli (responses) N = 125joint probability p(x\xj,yk) = xiyj/N

joint entropy of the stimus/response system

H(X,Y ) = − p(x j ,ykk=1

n

∑j=1

n

∑ )log2 p(x j ,yk ) = 3.43 bits

when xi and yj are independent events (i.e. output does not depend on input), the joint probability would be given by a product of probabilities of these independent events P(xi,yj) = p(xi) p(yj), and the entropy of the system would be maximum Hmax (the system would be entirely useless for transmission of the information, since its output would not depend on its input)

y1 y2 y3 y4 y5

x1 20/125 =0.16

5/125=0.04

0 0 0

x2 5/125 =0.04

15/125=0.12

5/125=0.04

0 0

x3 0 6/125=0.048

17/125=0.136

2/125=0.016

0

x4 0 0 5/125=0.04

12/125=0.096

8/125=0.064

x5 0 0 0 6/125=0.048

19/125=0.152

joint entropy of the stimus/response system

Hmax (X,Y ) = − P(x j ,ykk=1

n

∑j=1

n

∑ )log2 P(x j ,yk ) = 4.63 bits

The information that is transmitted by the system is given by a difference between the maximum joint entropy of the matrix of independent events Hmax (X,Y) and the joint entropy of the real system (derived from the confusion matrix H(X,Y).

I(X;Y) =Hmax (X,Y) – X(X,Y) = 4.63 – 3.41 = 1.2 bits

Capacity of human channel for one-dimensional stimuli

Magic number 7±2 (between 2-3 bits)(George Miller 1956)

• Human perception seems to distinguish only among 7 (plus or minus 2) different entities along one perceptual dimension

• To recognize more items– long training (musicians)– use more than one perceptual dimension (e.g.

pitch and loudness)– chunk the items into larger chunks (phonemes to

words, words to phrases,..)

Magic number 7±2 (between 2-3 bits)(George Miller 1956)