30-08-05Prof. Pushpak Bhattacharyya, IIT Bombay1 CS 621 Artificial Intelligence Lecture 12 –...
-
Upload
lucy-atkinson -
Category
Documents
-
view
215 -
download
0
description
Transcript of 30-08-05Prof. Pushpak Bhattacharyya, IIT Bombay1 CS 621 Artificial Intelligence Lecture 12 –...
30-08-05 Prof. Pushpak Bhattacharyya, IIT Bombay 1
CS 621 Artificial Intelligence
Lecture 12 – 30/08/05
Prof. Pushpak Bhattacharyya
Fundamentals of Information Theory
30-08-05 Prof. Pushpak Bhattacharyya, IIT Bombay 2
Weather(0)
Temp(T)
Humidity(H)
Windy(W)
Decision(D)
Sunny High High F NSunny High High T NCloudy High High F YRain Med High F YRain Cold Low N YRain Cold Low T N
Cloudy Cold Low T Y
30-08-05 Prof. Pushpak Bhattacharyya, IIT Bombay 3
Weather(0)
Temp(T)
Humidity(H)
Windy(W)
Decision(D)
Sunny Med High F N
Sunny Cold Low F Y
Rain Med Low F Y
Sunny Med Low T Y
Cloudy Med High T Y
Cloudy High Low F Y
Rain High High T N
30-08-05 Prof. Pushpak Bhattacharyya, IIT Bombay 4
Outlook
RainCloudySunny
WindyYesHumidity
High Low
YesNo
T F
YesNo
30-08-05 Prof. Pushpak Bhattacharyya, IIT Bombay 5
Rule Base
R1: If outlook is sunny and if humidity is high then Decision is No.
R2: If outlook is sunny and if humidity is low then Decision is Yes.
R3: If outlook is cloudy then Decision is Yes.
30-08-05 Prof. Pushpak Bhattacharyya, IIT Bombay 6
Making Sense of Information
• Classification
• Clustering
• Giving a short and nice description
30-08-05 Prof. Pushpak Bhattacharyya, IIT Bombay 7
Short Description
Occam Razor principle
(Shortest/simplest description is the best for generalization)
30-08-05 Prof. Pushpak Bhattacharyya, IIT Bombay 8
Representation Language
• Decision tree.
• Neural network.
• Rule base.
• Boolean expression.
30-08-05 Prof. Pushpak Bhattacharyya, IIT Bombay 9
The example data presented in the form of rows and labels has low ordered/structured information compared to the succinct description (Decision Trees and Rule Base).
Define “information”Lack of structure in information by “Entropy”
Information & Entropy
30-08-05 Prof. Pushpak Bhattacharyya, IIT Bombay 10
Define Entropy of S (Labeled data)
E(S) = - ( P+ log2P+ + P- log2P- )
P+ = proportion of positively labeled data.
P- = proportion of negatively labeled data.
30-08-05 Prof. Pushpak Bhattacharyya, IIT Bombay 11
Example
P+ = 9/14
P1 = 5/14
E(S) = - 9/14 log2 (9/14) – 5/14 log2 (5/14)
= 0.91
30-08-05 Prof. Pushpak Bhattacharyya, IIT Bombay 12
Partitioning the Data
“Windy” as the attribute
Windy = [ T, F]
Windy = T : Partitioning the data
30-08-05 Prof. Pushpak Bhattacharyya, IIT Bombay 13
Partitioning by focusing on a particular attribute produced
“Information gain”
“Reduction in Entropy”
Partitioning the Data (Contd)
30-08-05 Prof. Pushpak Bhattacharyya, IIT Bombay 14
Information gain when we choose windy = [ T, F ]
Windy = T, P+ = 6 , P- = 2
Windy = F, P+ = 3 , P- = 3
Partitioning the Data (Contd)
30-08-05 Prof. Pushpak Bhattacharyya, IIT Bombay 15
Windy
T F
6, +2, -
3, +3, -
Partitioning the Data (Contd)
30-08-05 Prof. Pushpak Bhattacharyya, IIT Bombay 16
Gain(S,A) =
= E(S) -∑( |Sv| / |S| )E(Sv) v є values of A
Partitioning the Data (Contd)
E(S) = 0.914
E(S, Windy):E( Windy=T)
= - 6/8 log 6/8 – 2/8 log 2/8= v
30-08-05 Prof. Pushpak Bhattacharyya, IIT Bombay 17
E( Windy=F)
= - 3/6 log 3/6 – 3/6 log 3/6
= 1.0
Partitioning the Data (Contd)
30-08-05 Prof. Pushpak Bhattacharyya, IIT Bombay 18
Gain(S,Windy) = = 0.0914 – (8/14 * v + 9/19* 1.0)= N
Exercise: Find information gain for each attribute: outlook, Temp, Humidity and windy.
Partitioning the Data (Contd)
30-08-05 Prof. Pushpak Bhattacharyya, IIT Bombay 19
ID3 Algorithm
Calculating the gain for every attribute and choosing the one with maximum gain to finally
arrive at the decision tree is called “ID3” algorithm to build a classifier.
30-08-05 Prof. Pushpak Bhattacharyya, IIT Bombay 20
Origin of Information Theory
1) Shannon “The mathematical Theory of communication”, Bell systems Journal, 1948.
2) Cover and Thomas, “Elements of Information Theory”, 1991.
30-08-05 Prof. Pushpak Bhattacharyya, IIT Bombay 21
Motivation with the example of a horse race
8 horses - h1,h2……h8
Person P would like to bet on one of the horse. The horse have probability of winning as follows:
Example
30-08-05 Prof. Pushpak Bhattacharyya, IIT Bombay 22
• h1 = 1/5 h5 = 1/64
• h2 = 1/4 h6 = 1/64
• h3 = 1/8 h7 = 1/64
• h4 = 1/16 h8 = 1/64
∑hi = 1.
Example (Contd 1)
30-08-05 Prof. Pushpak Bhattacharyya, IIT Bombay 23
Send message specifying the horse on which to bet.
If the situation is “unbiased” i.e., all horses have equal probability of winning then we need 3 binary units.
3 = log 28
Example (Contd 2)
30-08-05 Prof. Pushpak Bhattacharyya, IIT Bombay 24
Compute the bias
E(s) = - ∑ Pi Log Pi
i = 1,… 8
Pi = Probability of hi winning
E(s) = - ( ½ log ½ + ¼ log ¼ + ……
…….. . + l/64 log 1/64) = 2
Example (Contd 3)
30-08-05 Prof. Pushpak Bhattacharyya, IIT Bombay 25
On the average we do not need more than 2 bits to communicate the desired horse.
Actual length of code ?
Example (Contd 4)
Design Of Optimal Code Is A Separate Problem.
30-08-05 Prof. Pushpak Bhattacharyya, IIT Bombay 26
Example 2 ( Letter Guessing Game)
P t K a i u
1/8 1/4 1/8 1/8 1/4 1/8
20 – Question game
E(s) = - ∑ Pi Log2 Pi
i = {p, t, k, a, i , u }
= 2.5
30-08-05 Prof. Pushpak Bhattacharyya, IIT Bombay 27
On the average we need no more than 2.5 questions.
Design a code:
P t K a i u
1/8 1/4 1/8 1/8 1/4 1/8
100 00 101 110 01 111
30-08-05 Prof. Pushpak Bhattacharyya, IIT Bombay 28
Q1) Is the letter t or I ?
Q2) Is it a constant ?
Expected number of questions = ∑ Pi * Ni
Where Ni = # questions for Situation i.
30-08-05 Prof. Pushpak Bhattacharyya, IIT Bombay 29
What has all this got to do with AI ?
Why entropy?
Why design codes?
Why communicate ?
30-08-05 Prof. Pushpak Bhattacharyya, IIT Bombay 30
Bridge
Multiparty participation is intelligent transformation processing.
Information gain sets up theoretical limits in communicability.
30-08-05 Prof. Pushpak Bhattacharyya, IIT Bombay 31
Summary• Haphazard presentation of data is not
acceptable to MIND.• Focusing attention on an attribute, automatically
leads to information gain.• Designed entropy.• Parallely designed information gain .• Related this to message communication.