A very short introduction to information theoryA very short introduction to information theory Dr....

31
How heavy is 1 kg of information? A very short introduction to information theory Dr. Jossy Sayir, Dept of Engineering and EBI

Transcript of A very short introduction to information theoryA very short introduction to information theory Dr....

Page 1: A very short introduction to information theoryA very short introduction to information theory Dr. Jossy Sayir, Dept of Engineering and EBI About the speaker… • Affiliated lecturer

How heavy is 1 kg of information? A very short introduction to information theory

Dr. Jossy Sayir, Dept of Engineering and EBI

Page 2: A very short introduction to information theoryA very short introduction to information theory Dr. Jossy Sayir, Dept of Engineering and EBI About the speaker… • Affiliated lecturer

About the speaker…

•  Affiliated lecturer at the Department of Engineering

•  Fellow of Robinson College and Director of Studies at Newnham College

•  Currently teaching:

•  2nd year probability

•  3rd year information theory

•  4th year coding theory

•  Research in information theory, coding and communications

•  Research fellow at the European Bioinformatics Institute

Page 3: A very short introduction to information theoryA very short introduction to information theory Dr. Jossy Sayir, Dept of Engineering and EBI About the speaker… • Affiliated lecturer

About the talk…

•  Material from 3rd year course on information theory (without the maths)

•  Claude Shannon’s “Mathematical Theory of Communications” (1948)

•  Big Bang of the information age

•  Modern basis for data communication, storage, processing

•  You use information theory in your mobile phone, when you skype, when you use the internet, when you listen to music, etc.

•  Information, like weight or energy, can be measured and quantified

From the Bell Labs W

ebpage

Page 4: A very short introduction to information theoryA very short introduction to information theory Dr. Jossy Sayir, Dept of Engineering and EBI About the speaker… • Affiliated lecturer

Cambridge College 20 Questions

Page 5: A very short introduction to information theoryA very short introduction to information theory Dr. Jossy Sayir, Dept of Engineering and EBI About the speaker… • Affiliated lecturer

Cambridge College 20 Questions

•  Guess a college in as few as possible “yes/no” questions

•  31 colleges

•  How many questions?

Page 6: A very short introduction to information theoryA very short introduction to information theory Dr. Jossy Sayir, Dept of Engineering and EBI About the speaker… • Affiliated lecturer

Cambridge College 20 Questions

Page 7: A very short introduction to information theoryA very short introduction to information theory Dr. Jossy Sayir, Dept of Engineering and EBI About the speaker… • Affiliated lecturer

Guessing tree

Old? (16) (pre-1600)

New… (15) (post-1800)

By the river? (8)

Inland… (8)

North of Garret Hostel Lane? (3)

South (5)

East of Sidney / Regent St? (4)

West (4)

General entry? (7)

Restricted entry… (8)

North of Madingley Rd? (3)

South (4)

Women only? (3)

Co-ed (5)

Trinity?

St. John’s? à Magdalene

Royal Flush? (2) Kings? à Queens

Trinity Hall? à  Clare? à St. Catherine Our Saviour? (2) Jesus?

à Christ Emmanuel? à Sidney

Trumpington St.?(2) Pembroke? à Peterhouse

Corpus? à Caius

Churchill?

Girton? à Fitzwilliam

West? (2)

East? (2)

The best? (Robinson) à If not, Selwyn Downing? à Homerton Newnham?

Lucy Cavendish? à Murray Edwards

Grad? (2) Darwin? à Clare Hall

Wolfson? à Hughes H? à St. Edmunds

Page 8: A very short introduction to information theoryA very short introduction to information theory Dr. Jossy Sayir, Dept of Engineering and EBI About the speaker… • Affiliated lecturer

20 Questions - analysis

•  This tree could be improved to get to 5 questions

•  Can you think of how you could ask all question at once?

•  What college do you think I would pick?

Page 9: A very short introduction to information theoryA very short introduction to information theory Dr. Jossy Sayir, Dept of Engineering and EBI About the speaker… • Affiliated lecturer

Admissions Numbers Engineering 2014 cycle

Undergraduate Admissions Statistics

Undergraduate Admissions Statistics

 2014 Admissions cycle 2013 Admissions cycle 2012 Admissions cycle 2011 Admissions cycle

Applications, Offers & Acceptances by UCAS Apply Centre

 2014 Admissions cycle 2013 Admissions cycle 2012 Admissions cycle 2011 Admissions cycle 2010 Admissions cycle

Values

Engineering

Direct applications Open applications

Christ's College

Churchill College

Clare College

Corpus Christi College

Downing College

Emmanuel College

Fitzwilliam College

Girton College

Gonville and Caius College

Homerton College

Hughes Hall

Jesus College

King's College

Lucy Cavendish College

Magdalene College

Murray Edwards College

Newnham College

Pembroke College

Peterhouse

Queens' College

Robinson College

Selwyn College

Sidney Sussex College

St Catharine's College

St Edmund's College

St John's College

Trinity College

Trinity Hall

Wolfson College

0

50

100

150

200

Total 2,161

Page 10: A very short introduction to information theoryA very short introduction to information theory Dr. Jossy Sayir, Dept of Engineering and EBI About the speaker… • Affiliated lecturer

Information measures…

•  Hartley’s information: if a question has N possible answers, the information content in its answer is log N

•  Shannon’s information: if an answer has probability p, it’s as if it were one of Np=1/p equally likely answers and hence its information content is log(1/p) = -log p

•  Shannon’s “entropy” formula: H = -Σnpnlogb pn

•  What base is the log?

Page 11: A very short introduction to information theoryA very short introduction to information theory Dr. Jossy Sayir, Dept of Engineering and EBI About the speaker… • Affiliated lecturer

English text

•  Entropy of English is H=4.17 bits

•  Better than 5, but do we really need 4-5 yes/no questions to guess the next letter in English text?

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

a b c d e f g h i j k l m n o p q r s t u v w x y z

Wikipedia: “The frequency of letters in text has been studied for use in cryptanalysis, and frequency analysis in particular, dating back to the Iraqi mat…”

Page 12: A very short introduction to information theoryA very short introduction to information theory Dr. Jossy Sayir, Dept of Engineering and EBI About the speaker… • Affiliated lecturer

Source Coding

•  English text can be compressed well below 2 bits per letter by modern data compression algorithms

•  All sources (images, sound, video, files) are compressed before transmission (lossy or lossless)

•  Data compression removes all redundancy so that the result is perfectly unpredictable

•  Can you compress the 6 numbers between 1 and 59 resulting from a lottery draw?

•  “Compressing Sets and Multisets of Sequences” Christian Steinruecken

Page 13: A very short introduction to information theoryA very short introduction to information theory Dr. Jossy Sayir, Dept of Engineering and EBI About the speaker… • Affiliated lecturer

Interlude

Page 14: A very short introduction to information theoryA very short introduction to information theory Dr. Jossy Sayir, Dept of Engineering and EBI About the speaker… • Affiliated lecturer

Reed Solomon Coding

Page 15: A very short introduction to information theoryA very short introduction to information theory Dr. Jossy Sayir, Dept of Engineering and EBI About the speaker… • Affiliated lecturer

SUDOKU

5 3 7 6 1 9 5

9 8 6 8 6 3 4 8 3 1 7 2 6

6 2 8 4 1 9 5

8 7 9

Page 16: A very short introduction to information theoryA very short introduction to information theory Dr. Jossy Sayir, Dept of Engineering and EBI About the speaker… • Affiliated lecturer

SUDOKU

5 3 4 6 7 8 9 1 2 6 7 2 1 9 5 3 4 8 1 9 8 3 4 2 5 6 7 8 5 9 7 6 1 4 2 3 4 2 6 8 5 3 7 9 1 7 1 3 9 2 4 8 5 6 9 6 1 5 3 7 2 8 4 2 8 7 4 1 9 6 3 5 3 4 5 2 8 6 1 7 9

Page 17: A very short introduction to information theoryA very short introduction to information theory Dr. Jossy Sayir, Dept of Engineering and EBI About the speaker… • Affiliated lecturer

SUDOKU

5 3 4 5 7 8 9 1 2 6 7 2 1 9 5 3 4 8 1 9 8 3 4 1 5 6 7 9 5 9 7 6 1 4 2 3 4 2 7 8 5 3 7 9 1 7 1 3 9 2 4 8 5 6 9 6 1 5 3 8 2 8 4 2 8 7 4 1 9 6 2 5 3 4 5 2 8 6 1 7 9

Page 18: A very short introduction to information theoryA very short introduction to information theory Dr. Jossy Sayir, Dept of Engineering and EBI About the speaker… • Affiliated lecturer

1-error correction

0 1 1 0

0 1 0 1

1 0 1 1

0 1 0 1

Page 19: A very short introduction to information theoryA very short introduction to information theory Dr. Jossy Sayir, Dept of Engineering and EBI About the speaker… • Affiliated lecturer

1-error correction

0 1 1 0 0

0 1 0 1 0

1 0 1 1 1

0 1 0 1 0

1 1 0 1 1

Page 20: A very short introduction to information theoryA very short introduction to information theory Dr. Jossy Sayir, Dept of Engineering and EBI About the speaker… • Affiliated lecturer

1-error correction

0 1 0

1 1

1 0 1 1

1 0 0

1 1 0 1

Can we still correct erasures?

Page 21: A very short introduction to information theoryA very short introduction to information theory Dr. Jossy Sayir, Dept of Engineering and EBI About the speaker… • Affiliated lecturer

1-error correction

0 1 1 0 0

0 1 0 1 0

1 1 1 1 1

0 1 0 1 0

1 1 0 1 1

Page 22: A very short introduction to information theoryA very short introduction to information theory Dr. Jossy Sayir, Dept of Engineering and EBI About the speaker… • Affiliated lecturer

Dimensions and rate

•  Load a K=fxf grid of data

•  Add 2f+1 redundancy bits

•  Total length: N = f2+2f+1 = (f+1)2

•  Information rate: R = K/N = f2/(f+1)2

•  For example, for f=2 we encode K=4 information digits, add 5 redundancy digits for an information rate of 4/9 and correct 1 error

•  Can we do better?

1 1 0

0 1 1

1 0 1

Page 23: A very short introduction to information theoryA very short introduction to information theory Dr. Jossy Sayir, Dept of Engineering and EBI About the speaker… • Affiliated lecturer

(7,4) Hamming Code

0

1

0 1

Page 24: A very short introduction to information theoryA very short introduction to information theory Dr. Jossy Sayir, Dept of Engineering and EBI About the speaker… • Affiliated lecturer

(7,4) Hamming Code

0

1

0 1

1

1

0

Page 25: A very short introduction to information theoryA very short introduction to information theory Dr. Jossy Sayir, Dept of Engineering and EBI About the speaker… • Affiliated lecturer

(7,4) Hamming Code

0

1

0 0

1

1

0

Page 26: A very short introduction to information theoryA very short introduction to information theory Dr. Jossy Sayir, Dept of Engineering and EBI About the speaker… • Affiliated lecturer

(7,4) Hamming Code Dimensions

•  Decoding rule: flip the bit in the intersection of the circles that have the wrong parity

•  We encode K=4 information digits and add 3 redundancy digits to transmit N=7 digits

•  We can always correct 1 error, at an information rate 4/7 (much better than 4/9)

Page 27: A very short introduction to information theoryA very short introduction to information theory Dr. Jossy Sayir, Dept of Engineering and EBI About the speaker… • Affiliated lecturer

The best card trick ever…

Page 28: A very short introduction to information theoryA very short introduction to information theory Dr. Jossy Sayir, Dept of Engineering and EBI About the speaker… • Affiliated lecturer

Analysis

•  The “guesser” needs to guess one of 52-4=48 possible cards

•  The “guesser” needs to receive log2 48 = 5.58 bits of information

•  The “helper” has a choice among 5 cards to return to the member of the public, followed by a choice of 4x3x2 orderings of the remaining 4 cards, totaling 5! = 120 possibilities

•  The “channel” between the helper and the guesser has a capacity to transmit log2120 = 6.91 bits of information

•  There is ample capacity to comfortably transmit the information the guesser needs

•  All you need is a clever code that the “helper” and the “guesser” can work out easily in their heads

Page 29: A very short introduction to information theoryA very short introduction to information theory Dr. Jossy Sayir, Dept of Engineering and EBI About the speaker… • Affiliated lecturer

An unusual storage channel…

: Double Helix Serves Double Duty

Nick Goldman, a molecular biologist at the European Bioinformatics Institute in Hinxton, England, used a technique with error-correction software to store and retrieve data in synthetic DNA molecules.

Page 30: A very short introduction to information theoryA very short introduction to information theory Dr. Jossy Sayir, Dept of Engineering and EBI About the speaker… • Affiliated lecturer

Channel Coding

•  Every communication or storage channel has a capacity that can measured and computed

•  Clever coding can achieve any desired error probability for rates below capacity

•  Above channel capacity, there is a minimum error probability that cannot be beaten

Page 31: A very short introduction to information theoryA very short introduction to information theory Dr. Jossy Sayir, Dept of Engineering and EBI About the speaker… • Affiliated lecturer

What we’ve learned…

•  Shannon’s legacy:

From the Bell Labs W

ebpage

Source

Remove Bad

Redundancy

Add Good

Redundancy

Channel: Communications,

Storage

Recover Source

•  Information is measureable, like weight and energy

•  How much is 1 kg of information?

•  On DNA, 2 Petabyte per gram (1 Petabyte is 1000 Terabytes)

•  1kg ≈ the internet (1200 petabytes)