Radosław Wesołowski Tomasz Pękalski, Michal Borkowicz, Maciej Kopaczyński 12-03-2008.

16
Radosław Wesołowski Radosław Wesołowski Tomasz Pękalski Tomasz Pękalski, Michal Borkowicz , Maciej Kopaczyński , Michal Borkowicz , Maciej Kopaczyński 12-03-2008 12-03-2008
  • date post

    20-Dec-2015
  • Category

    Documents

  • view

    218
  • download

    2

Transcript of Radosław Wesołowski Tomasz Pękalski, Michal Borkowicz, Maciej Kopaczyński 12-03-2008.

Page 1: Radosław Wesołowski Tomasz Pękalski, Michal Borkowicz, Maciej Kopaczyński 12-03-2008.

Radosław WesołowskiRadosław WesołowskiTomasz PękalskiTomasz Pękalski, Michal Borkowicz , Maciej Kopaczyński, Michal Borkowicz , Maciej Kopaczyński

12-03-200812-03-2008

Page 2: Radosław Wesołowski Tomasz Pękalski, Michal Borkowicz, Maciej Kopaczyński 12-03-2008.

What is it anyway?What is it anyway?

Decision tree Decision tree T T – a – a tree with a root (in tree with a root (in graph theory sense), graph theory sense), in which we assignin which we assign the following the following meanings to its meanings to its elements:elements:- inner nodes represent attributesinner nodes represent attributes,,- edges represent values of the edges represent values of the attributeattribute,,- leafs represent classification leafs represent classification decisionsdecisions..

Using decision tree we can Using decision tree we can visualize a program with only visualize a program with only ‘‘if-if-thenthen’’ instructions. instructions.

Page 3: Radosław Wesołowski Tomasz Pękalski, Michal Borkowicz, Maciej Kopaczyński 12-03-2008.
Page 4: Radosław Wesołowski Tomasz Pękalski, Michal Borkowicz, Maciej Kopaczyński 12-03-2008.
Page 5: Radosław Wesołowski Tomasz Pękalski, Michal Borkowicz, Maciej Kopaczyński 12-03-2008.

Testing functionsTesting functions

Let us consider an attribute A (e.g. Let us consider an attribute A (e.g. temperature). Let Vtemperature). Let VAA mean the set mean the set of all possible values of A (0K up to of all possible values of A (0K up to infinity). Let Rinfinity). Let Rtt mean the set of all mean the set of all possible test results (hot, mild, possible test results (hot, mild, cold). As a testing function we cold). As a testing function we mean a map mean a map

t: Vt: VAARRtt

We distinguish two main types of We distinguish two main types of testing functions, depending on the testing functions, depending on the set Vset VA A - discrete and - discrete and continuous.continuous.

Page 6: Radosław Wesołowski Tomasz Pękalski, Michal Borkowicz, Maciej Kopaczyński 12-03-2008.

Quality of a decision treeQuality of a decision tree (Occam's (Occam's razor)razor)::- we prefer smallwe prefer small, simple, simple trees, trees,- we want to gain maximum we want to gain maximum accuracy of classification (training accuracy of classification (training set, test set)set, test set)

For example:For example:

Q(T) = Q(T) = *size(T) + *size(T) + *accuracy(T)*accuracy(T)

Page 7: Radosław Wesołowski Tomasz Pękalski, Michal Borkowicz, Maciej Kopaczyński 12-03-2008.

Optimal tree – we are given:Optimal tree – we are given:- a training set S,a training set S,- a testing functions set TEST,a testing functions set TEST,- quality criterion Q.quality criterion Q.

Target: T optimising Q(T).Target: T optimising Q(T).

Fact: usually this is NP-hard Fact: usually this is NP-hard problem.problem.

Conclusion: we have to use Conclusion: we have to use heuristics.heuristics.

Page 8: Radosław Wesołowski Tomasz Pękalski, Michal Borkowicz, Maciej Kopaczyński 12-03-2008.

Building a decision tree:Building a decision tree:- top_down method:top_down method:

a. In the beginning the root a. In the beginning the root includes includes all training examplesall training examples

b. We divide them recursively, b. We divide them recursively, choosing one attribute at a timechoosing one attribute at a time

- bottom_up: we remove subtrees - bottom_up: we remove subtrees or edges or edges to gain precision for to gain precision for judging new judging new cases.cases.

Page 9: Radosław Wesołowski Tomasz Pękalski, Michal Borkowicz, Maciej Kopaczyński 12-03-2008.
Page 10: Radosław Wesołowski Tomasz Pękalski, Michal Borkowicz, Maciej Kopaczyński 12-03-2008.

Entropy – average bits amount to Entropy – average bits amount to represent a decision d for a represent a decision d for a randomly chosen object from a randomly chosen object from a given set S. Why? Because optimal given set S. Why? Because optimal binary representation assigns –binary representation assigns –log2(p) bits to a decision which log2(p) bits to a decision which probability is p. We have probability is p. We have formula:formula:

entropy(p1,...pn)= - p1*log2(p1) - ... - entropy(p1,...pn)= - p1*log2(p1) - ... - pn*log2(pn) pn*log2(pn)

Page 11: Radosław Wesołowski Tomasz Pękalski, Michal Borkowicz, Maciej Kopaczyński 12-03-2008.
Page 12: Radosław Wesołowski Tomasz Pękalski, Michal Borkowicz, Maciej Kopaczyński 12-03-2008.
Page 13: Radosław Wesołowski Tomasz Pękalski, Michal Borkowicz, Maciej Kopaczyński 12-03-2008.

Information gain:Information gain:gain(.) = info before dividing – info after gain(.) = info before dividing – info after

dividingdividing

Page 14: Radosław Wesołowski Tomasz Pękalski, Michal Borkowicz, Maciej Kopaczyński 12-03-2008.
Page 15: Radosław Wesołowski Tomasz Pękalski, Michal Borkowicz, Maciej Kopaczyński 12-03-2008.

Overtraining: We say that a model H Overtraining: We say that a model H overfits if there is a model H’ such overfits if there is a model H’ such that :that :- training_error(H) < training_error(H) < training_error(H’),training_error(H’),- testing_error(H) > testing_error(H) > testingtesting__error(H’).error(H’).

Avoiding overtraining: Avoiding overtraining: - adequate stop criterions,adequate stop criterions,- posprunning,posprunning,- preprunning.preprunning.

Page 16: Radosław Wesołowski Tomasz Pękalski, Michal Borkowicz, Maciej Kopaczyński 12-03-2008.

Some decision trees algorithms:Some decision trees algorithms:

- R1,R1,- ID3 (ID3 (Interactive dichotomizer version 3),),- C4.5 (C4.5 (ID3 + discretization + prunning),- CART (CART (Classification and Regression Trees),- CHAID (CHAID (CHi-squared Automatic Interaction

-Detection).).