30-08-05Prof. Pushpak Bhattacharyya, IIT Bombay1 CS 621 Artificial Intelligence Lecture 12 –...

30-08-05 Prof. Pushpak Bhattacharyya, IIT Bombay 1

CS 621 Artificial Intelligence

Lecture 12 – 30/08/05

Prof. Pushpak Bhattacharyya

Fundamentals of Information Theory


Weather(0)

Temp(T)

Humidity(H)

Windy(W)

Decision(D)

Sunny High High F NSunny High High T NCloudy High High F YRain Med High F YRain Cold Low N YRain Cold Low T N

Cloudy Cold Low T Y


Weather(0)

Temp(T)

Humidity(H)

Windy(W)

Decision(D)

Sunny Med High F N

Sunny Cold Low F Y

Rain Med Low F Y

Sunny Med Low T Y

Cloudy Med High T Y

Cloudy High Low F Y

Rain High High T N


Outlook

RainCloudySunny

WindyYesHumidity

High Low

YesNo

T F

YesNo


Rule Base

R1: If outlook is sunny and if humidity is high then Decision is No.

R2: If outlook is sunny and if humidity is low then Decision is Yes.

R3: If outlook is cloudy then Decision is Yes.


Making Sense of Information

• Classification

• Clustering

• Giving a short and nice description


Short Description

Occam Razor principle

(Shortest/simplest description is the best for generalization)


Representation Language

• Decision tree.

• Neural network.

• Rule base.

• Boolean expression.


The example data presented in the form of rows and labels has low ordered/structured information compared to the succinct description (Decision Trees and Rule Base).

Define “information”Lack of structure in information by “Entropy”

Information & Entropy


Define Entropy of S (Labeled data)

E(S) = - ( P+ log2P+ + P- log2P- )

P+ = proportion of positively labeled data.

P- = proportion of negatively labeled data.


Example

P+ = 9/14

P1 = 5/14

E(S) = - 9/14 log2 (9/14) – 5/14 log2 (5/14)

= 0.91


Partitioning the Data

“Windy” as the attribute

Windy = [ T, F]

Windy = T : Partitioning the data


Partitioning by focusing on a particular attribute produced

“Information gain”

“Reduction in Entropy”

Partitioning the Data (Contd)


Information gain when we choose windy = [ T, F ]

Windy = T, P+ = 6 , P- = 2

Windy = F, P+ = 3 , P- = 3



Windy

T F

6, +2, -

3, +3, -



Gain(S,A) =

= E(S) -∑( |Sv| / |S| )E(Sv) v є values of A


E(S) = 0.914

E(S, Windy):E( Windy=T)

= - 6/8 log 6/8 – 2/8 log 2/8= v


E( Windy=F)

= - 3/6 log 3/6 – 3/6 log 3/6

= 1.0



Gain(S,Windy) = = 0.0914 – (8/14 * v + 9/19* 1.0)= N

Exercise: Find information gain for each attribute: outlook, Temp, Humidity and windy.



ID3 Algorithm

Calculating the gain for every attribute and choosing the one with maximum gain to finally

arrive at the decision tree is called “ID3” algorithm to build a classifier.


Origin of Information Theory

1) Shannon “The mathematical Theory of communication”, Bell systems Journal, 1948.

2) Cover and Thomas, “Elements of Information Theory”, 1991.


Motivation with the example of a horse race

8 horses - h1,h2……h8

Person P would like to bet on one of the horse. The horse have probability of winning as follows:

Example


• h1 = 1/5 h5 = 1/64

• h2 = 1/4 h6 = 1/64

• h3 = 1/8 h7 = 1/64

• h4 = 1/16 h8 = 1/64

∑hi = 1.

Example (Contd 1)


Send message specifying the horse on which to bet.

If the situation is “unbiased” i.e., all horses have equal probability of winning then we need 3 binary units.

3 = log 28

Example (Contd 2)


Compute the bias

E(s) = - ∑ Pi Log Pi

i = 1,… 8

Pi = Probability of hi winning

E(s) = - ( ½ log ½ + ¼ log ¼ + ……

…….. . + l/64 log 1/64) = 2

Example (Contd 3)


On the average we do not need more than 2 bits to communicate the desired horse.

Actual length of code ?

Example (Contd 4)

Design Of Optimal Code Is A Separate Problem.


Example 2 ( Letter Guessing Game)

P t K a i u

1/8 1/4 1/8 1/8 1/4 1/8

20 – Question game

E(s) = - ∑ Pi Log2 Pi

i = {p, t, k, a, i , u }

= 2.5


On the average we need no more than 2.5 questions.

Design a code:

P t K a i u

1/8 1/4 1/8 1/8 1/4 1/8

100 00 101 110 01 111


Q1) Is the letter t or I ?

Q2) Is it a constant ?

Expected number of questions = ∑ Pi * Ni

Where Ni = # questions for Situation i.


What has all this got to do with AI ?

Why entropy?

Why design codes?

Why communicate ?


Bridge

Multiparty participation is intelligent transformation processing.

Information gain sets up theoretical limits in communicability.


Summary• Haphazard presentation of data is not

acceptable to MIND.• Focusing attention on an attribute, automatically

leads to information gain.• Designed entropy.• Parallely designed information gain .• Related this to message communication.

30-08-05Prof. Pushpak Bhattacharyya, IIT Bombay1 CS 621 Artificial Intelligence Lecture 12 –...

Documents

Transcript of 30-08-05Prof. Pushpak Bhattacharyya, IIT Bombay1 CS 621 Artificial Intelligence Lecture 12 –...