Entropy-based & ChiMerge Data Discretization
description
Transcript of Entropy-based & ChiMerge Data Discretization
Entropy-based & ChiMerge Data Discretization
Feb. 12, 2008
Team #4: Seunghyun Kim Craig Dunham Suryo Muljono
Albert Lee
Entropy-based discretization
• Table 6.1 Class-labeled training tuples from the AllElectronics customer database (page 299).
RID age income Stu-dent
Credit_rating Class: buy_computer
1 Youth High No Faire No
2 Youth High No Excellent No
3 Middle_ageed
High No Faire Yes
4 Senior Medium No Faire Yes
5 Senior Low Yes Faire Yes
6 Senior Low Yes Excellent No
7 Middle_aged Low Yes Excellent Yes
8 Youth Medium No Faire No
9 Youth Low Yes Faire Yes
10 Senior Medium Yes Faire Yes
11 Youth Medium Yes Excellent Yes
12 Middle_ageed
Medium No Excellent Yes
13 Middle_ageed
High Yes Faire Yes
14 Senior Medium No Excellent No
Entropy-based (Cont’d)
• Information gain
• Info(D) = = 0.940 bits
• Infoage(D) =
= 0.649 bits
Entropy-based (Cont’d)
•Gain(A) = Info(D) – InfoA(D).
•Gain(age) = Info(D) – Infoage(D) = 0.940 – 0.694 = 0.246 bits
•Gain(income)= Info(D) – Infoincome(D) = 0.940 – 0.911 = 0.029 bits
•Gain(student)= Info(D) – Infostudent(D)= 0.940 – 0.694 = 0.152 bits
•Gain(credit) = Info(D) – Infocredit(D) = 0.940 – 0.892 = 0.04 bits
Entropy-based (Cont’d)
AllElectronics customer data-
base
Age ?
Senior Middle_age Youth
Entropy-based (Cont’d)
AllElectronics customer data-
base
Age ?
Senior Middle Youth
Stu-dent?
Credit?
StudentNon Stu-
dent
Excel-lent
Fair
yes
yes yesno no