Entropy-based & ChiMerge Data Discretization

6
Entropy-based & ChiMerge Data Discretization Feb. 12, 2008 Team #4: Seunghyun Kim Craig Dunham Suryo Muljono Albert Lee

description

Entropy-based & ChiMerge Data Discretization. Feb. 12, 2008 Team #4: Seunghyun Kim Craig Dunham Suryo Muljono Albert Lee. Entropy-based discretization. Table 6.1 Class-labeled training tuples from the AllElectronics customer database (page 299). - PowerPoint PPT Presentation

Transcript of Entropy-based & ChiMerge Data Discretization

Page 1: Entropy-based &  ChiMerge  Data  Discretization

Entropy-based & ChiMerge Data Discretization

Feb. 12, 2008

Team #4: Seunghyun Kim Craig Dunham Suryo Muljono

Albert Lee

Page 2: Entropy-based &  ChiMerge  Data  Discretization

Entropy-based discretization

• Table 6.1 Class-labeled training tuples from the AllElectronics customer database (page 299).

RID age income Stu-dent

Credit_rating Class: buy_computer

1 Youth High No Faire No

2 Youth High No Excellent No

3 Middle_ageed

High No Faire Yes

4 Senior Medium No Faire Yes

5 Senior Low Yes Faire Yes

6 Senior Low Yes Excellent No

7 Middle_aged Low Yes Excellent Yes

8 Youth Medium No Faire No

9 Youth Low Yes Faire Yes

10 Senior Medium Yes Faire Yes

11 Youth Medium Yes Excellent Yes

12 Middle_ageed

Medium No Excellent Yes

13 Middle_ageed

High Yes Faire Yes

14 Senior Medium No Excellent No

Page 3: Entropy-based &  ChiMerge  Data  Discretization

Entropy-based (Cont’d)

• Information gain

• Info(D) = = 0.940 bits

• Infoage(D) =

= 0.649 bits

Page 4: Entropy-based &  ChiMerge  Data  Discretization

Entropy-based (Cont’d)

•Gain(A) = Info(D) – InfoA(D).

•Gain(age) = Info(D) – Infoage(D) = 0.940 – 0.694 = 0.246 bits

•Gain(income)= Info(D) – Infoincome(D) = 0.940 – 0.911 = 0.029 bits

•Gain(student)= Info(D) – Infostudent(D)= 0.940 – 0.694 = 0.152 bits

•Gain(credit) = Info(D) – Infocredit(D) = 0.940 – 0.892 = 0.04 bits

Page 5: Entropy-based &  ChiMerge  Data  Discretization

Entropy-based (Cont’d)

AllElectronics customer data-

base

Age ?

Senior Middle_age Youth

Page 6: Entropy-based &  ChiMerge  Data  Discretization

Entropy-based (Cont’d)

AllElectronics customer data-

base

Age ?

Senior Middle Youth

Stu-dent?

Credit?

StudentNon Stu-

dent

Excel-lent

Fair

yes

yes yesno no