ID3 Algorithm

33
ID3 Algorithm CS 157B: Spring 2010 Meg Genoar

description

 

Transcript of ID3 Algorithm

Page 1: ID3 Algorithm

ID3 Algorithm

CS 157B: Spring 2010

Meg Genoar

Page 2: ID3 Algorithm

Iterative Dichotomiser 3

Ross Quinlan – 1987

C4.5 Precursor

Decision Tree Generation

Page 3: ID3 Algorithm

Ross Quinlan

Computer Scientist – UW 1968

Data Mining & Decision Theory

AI: Data Mining

ID3, C4.5, & C5.0

RuleQuest Research

Page 4: ID3 Algorithm

Max-Gain Split

Most Useful Attribute

Highest Information

Best Attribute

Measure of Uncertainty

Randomness

Efficient Separation of Decision Tree Elements

ID3 & Entropy

Page 5: ID3 Algorithm

Entropy

Entropy(S) = – Ppositive Log2Ppositive

– Pnegative Log2Pnegative

Ppositive: proportion of positive data

Pnegative: proportion of negative data

Page 6: ID3 Algorithm

Example…

A collection S consists of 20 data examples:

13 Yes : 7 No

Entropy(S) = – (13/20) Log2(13/20)

– (7/20) Log2(7/20)

Entropy(S) = 0.934

Page 7: ID3 Algorithm

Entropy Gain Value

Gain: Place to Split the Tree

High Gain > Low Gain

High Gain: Top of the Tree

Gain(A) = E(Current Set) - ∑ E(All Child Sets)

Page 8: ID3 Algorithm

Movie ExampleFilm

Country of Origin

Big Star Genre Success

1 United States Yes Science Fiction

True

2 United States No Comedy False

3 United States Yes Comedy True

4 Europe No Comedy True

5 Europe Yes Science Fiction

False

6 Europe Yes Romance False

7 Rest of World Yes Comedy False

8 Rest of World No Science Fiction

False

9 Europe Yes Comedy True

10 United States Yes Comedy True

Page 9: ID3 Algorithm

Entropy of Table

Is the Film a Success?

Entropy(5 Yes, 5 No) = – (5/10) Log2(5/10)

– (5/10) Log2(5/10)

Entropy(Success) = 1

Page 10: ID3 Algorithm

Split – Country of Origin

Film

Country of Origin

Big Star Genre Success

1 United States Yes Science Fiction

True

2 United States No Comedy False

3 United States Yes Comedy True

4 United States Yes Comedy TrueFilm

Country of Origin

Big Star Genre Success

1 Europe No Comedy True

2 Europe Yes Science Fiction

False

3 Europe Yes Romance False

4 Europe Yes Comedy TrueFilm

Country of Origin

Big Star Genre Success

1 Rest of World Yes Comedy False

2 Rest of World No Science Fiction

False

Page 11: ID3 Algorithm

Gain – Country of Origin

Where is the film from?

Entropy(USA) = – (3/4) Log2(3/4) – (1/4) Log2(1/4)

Entropy(USA) = 0.811

Entropy(Europe) = – (2/4) Log2(2/4) – (2/4) Log2(2/4)

Entropy(Europe) = 1

Entropy(Rest of World) = – (0/2) Log2(0/2) – (2/2) Log2(2/2)

Entropy(Rest of World) = 0

Gain(Origin) = 1 – (4/10 *0.811 + 4/10*1 + 2/10*0) = 0.276

Page 12: ID3 Algorithm

Split – Big StarFilm

Country of Origin

Big Star Genre Success

1 United States Yes Science Fiction

True

2 United States Yes Comedy True

3 Europe Yes Science Fiction

False

4 Europe Yes Romance False

5 Rest of World Yes Comedy False

6 Europe Yes Comedy True

7 United States Yes Comedy TrueFilm

Country of Origin

Big Star Genre Success

1 United States No Comedy False

2 Europe No Comedy True

3 Rest of World No Science Fiction

False

Page 13: ID3 Algorithm

Gain – Big Star

Is there a Big Star in the film?

Entropy(Yes) = – (4/7) Log2(4/7) – (3/7) Log2(3/7)

Entropy(Yes) = 0.985

Entropy(No) = – (1/3) Log2(1/3) – (2/3) Log2(2/3)

Entropy(No) = 0.918

Gain(Star) = 1 – (7/10 *0.985 + 3/10*0.918) = 0.0351

Page 14: ID3 Algorithm

Split – GenreFilm

Country of Origin

Big Star Genre Success

1 United States Yes Science Fiction

True

2 Europe Yes Science Fiction

False

3 Rest of World No Science Fiction

FalseFilm

Country of Origin

Big Star Genre Success

1 United States No Comedy False

2 United States Yes Comedy True

3 Europe No Comedy True

4 Rest of World Yes Comedy False

5 Europe Yes Comedy True

6 United States Yes Comedy TrueFilm

Country of Origin

Big Star Genre Success

1 Europe Yes Romance False

Page 15: ID3 Algorithm

Gain – Genre

What genre is the film?

Entropy(SciFi) = – (1/3) Log2(1/3) – (2/3) Log2(2/3)

Entropy(SciFi) = 0.918

Entropy(Com) = – (4/6) Log2(4/6) – (2/6) Log2(2/6)

Entropy(Com) = 0.918

Entropy(Rom) = – (0/1) Log2(0/1) – (1/1) Log2(1/1)

Entropy(Rom) = 0

Gain(Genre) = 1 – (3/10 *0.918 + 6/10*0.918+ 1/10*0) = 0.1738

Page 16: ID3 Algorithm

Compare Gains…

Gain(Origin) = 0.276

Gain(Star) = 0.0351

Gain(Genre) = 0.1738

Page 17: ID3 Algorithm

Compare Gains…

Gain(Origin) = 0.276

Gain(Star) = 0.0351

Gain(Genre) = 0.1738

First Split: Origin

Page 18: ID3 Algorithm

All Movies

United States Europe Rest of World

New Table New Table New Table

Page 19: ID3 Algorithm

All Movies

United States Europe Rest of World

New Table New Table New Table

Page 20: ID3 Algorithm

New Table – United States

Film

Country of Origin

Big Star Genre Success

1 United States Yes Science Fiction

True

2 United States No Comedy False

3 United States Yes Comedy True

4 United States Yes Comedy TrueEntropy(3 Yes, 1 No) = – (3/4) Log2(3/4) – (1/4)

Log2(1/4)

Entropy(Success) = 0.811

Page 21: ID3 Algorithm

Split – Big Star

Film

Country of Origin

Big Star Genre Success

1 United States Yes Science Fiction

True

2 United States Yes Comedy True

3 United States Yes Comedy TrueFilm

Country of Origin

Big Star Genre Success

1 United States No Comedy False

Page 22: ID3 Algorithm

Gain – Big Star

Is there a Big Star in the film?

Entropy(Yes) = – (3/3) Log2(3/3) – (0/3) Log2(0/3)

Entropy(Yes) = 0

Entropy(No) = – (0/1) Log2(0/1) – (1/1) Log2(1/1)

Entropy(No) = 0

Gain(Star) = 0.811 – (3/4 *0 + 1/4*0) = 0.811

Page 23: ID3 Algorithm

Split – Genre

Film

Country of Origin

Big Star Genre Success

1 United States Yes Science Fiction

True

Film

Country of Origin

Big Star Genre Success

1 United States No Comedy False

2 United States Yes Comedy True

3 United States Yes Comedy True

Page 24: ID3 Algorithm

Gain – Genre

What genre is the film?

Entropy(SciFi) = – (1/1) Log2(1/1) – (0/1) Log2(0/1)

Entropy(SciFi) = 0

Entropy(Com) = – (2/3) Log2(2/3) – (1/3) Log2(1/3)

Entropy(Com) = 0.918

Gain(Genre) = 0.811 – (1/4 *0 + 3/4*0.918) = 0.1225

Page 25: ID3 Algorithm

Compare Gains…

Gain(Star) = 0.811

Gain(Genre) = 0.1225

Page 26: ID3 Algorithm

Compare Gains…

Gain(Star) = 0.811

Gain(Genre) = 0.1225

Split: Star

Page 27: ID3 Algorithm

All Movies

United States Europe Rest of World

Star No Star

New Table New Table New Table

New Table New Table

Page 28: ID3 Algorithm

All Movies

United States Europe Rest of World

Star No Star

Sci-Fi Comedy

New Table New Table New Table

New Table Failure

Success Success

Page 29: ID3 Algorithm

All Movies

United States

Europe

Rest of World

Table

Star No Star

Sci-Fi

Comedy

New Failure

Success

Success

StarNo

Star

Sci-Fi

ComedyNew

Failure Success

Success

TableTable

Page 30: ID3 Algorithm

All Movies

United States

Europe

Rest of World

Table

Star No Star

Sci-Fi

Comedy

New Failure

Success

Success

StarNo

Star

Sci-Fi

ComedyNew

Failure Success

Success

TableTable

Comedy from the US, with a big star…

Page 31: ID3 Algorithm

All Movies

United States

Europe

Rest of World

Table

Star No Star

Sci-Fi

Comedy

New Failure

Success

Success

StarNo

Star

Sci-Fi

ComedyNew

Failure Success

Success

TableTable

Comedy from the US, with a big star…

Page 32: ID3 Algorithm

All Movies

United States

Europe

Rest of World

Table

Star No Star

Sci-Fi

Comedy

New Failure

Success

Success

StarNo

Star

Sci-Fi

ComedyNew

Failure Success

Success

TableTable

Comedy from the US, with a big star…

Page 33: ID3 Algorithm

All Movies

United States

Europe

Rest of World

Table

Star No Star

Sci-Fi

Comedy

New Failure

Success

Success

StarNo

Star

Sci-Fi

ComedyNew

Failure Success

Success

TableTable

Comedy from the US, with a big star…