International Graduate School of Dynamic Intelligent Systems Machine Learning RG Knowledge Based...

International GraduateSchool of DynamicIntelligent Systems

Machine Learning

RG Knowledge Based Systems

Hans Kleine Büning

19 April 2023

Hans Kleine Büning9 January 2009

2RG Knowledge Based SystemsUniversity of Paderborn


Outline

Learning by Example Motivation Decision Trees ID3 Overfitting Pruning Exercise

Reinforcement Learning Motivation Markov Decision Processes Q-Learning Exercise




Outline






Motivation

Partly inspired by human learning

Objectives: Classify entities according to some given examples Find structures in big databases Gain new knowledge from the samples

Input: Learning examples with Assigned attributes Assigned classes

Output: General Classifier for the given task




Classifying Training Examples

Training Example for EnjoySport

General Training Examples




Attributes & Classes

Attribute: Ai

Number of different values for Ai: |Ai|

Class: Ci

Number of different classes: |C|

Premises: n > 2 Consistent examples

(no two objects with the same attributes and different classes)




Possible Solutions

Decision Trees ID3 C4.5 CART

Rule Based Systems

Clustering

Neural Networks Backpropagation Neuroevolution




Decision Trees

Idea: Classify entities using if-then-rules

Example: Classifing Mushrooms Attributes: Color, Size, Points Classes: eatable, poisonous

Resulting rules: if (Colour = red)

and (Size = small) then poisonous

if (Colour = green)then eatable

…

Color Size Points Class

red

brown

brown

green

red

small

small

big

small

big

yes

no

yes

no

no

poisonous

eatable

eatable

eatable

eatable

Colour

poisonous/1

Size eatable/1 eatable/2

red green brown

eatable/1

small big




Decision Trees

There exist different decision trees for the same task.

In the mean the left tree decides earlier.

Colour

poisonous/1


red green brown

eatable/1

small big

Size

poisonous/1

Points eatable/2

small big

eatable/2

yes no




How to measure tree quality?

Number of leafs? Number of generated rules

Tree height? Maximum rule length

External path length? = Sum of the length of all paths from root to leaf Amount of memory needed for all rules

Weighted external path length Like external path length Paths are weighted by the number of objects they represent




Back to the Example

Colour

poisonous/1


red green brown

eatable/1

small big

Size

poisonous/1

Points eatable/2

small big

eatable/2

yes no

Criterion Left Tree Right Tree

number of leafs 4 5

height 2 2

external path length 6 5

weighted external path length 7 8




Weighted External Path Length

Idea from information theory: Given:

Text which should be compressed Probabilities for character occurrence

Result: Coding tree

Example: eeab p(e) = 0.5 p(a) = 0.25 p(b) = 0.25 Encoding: 110001

Build tree according to the information content.

0

e

a b

10

1




Entropy

Entropy = Measurement for mean information content

In general:

Mean number of bits to encode each element by optimal encoding.(= mean height of the theoretically optimal encoding tree)

0.2 0.4 0.6 0.8 1

0.1

0.2

0.3

0.4

0.5




Information Gain

Information gain = expected reduction of entropy due to sorting

Conditional Entropy:

Information Gain:




Use conditional entropy and information gain for selecting split attributes.

Chosen split attribute Ak:

Possible values for Ak:

xi – Number of objects with value ai for Ak

xi,j – Number of objects with value ai for Ak and class Cj

Probability that one of the objects has attribute ai

Probability that an object with attribute ai has class Cj


Entropy & Decision Trees





Decision Tree Construction

Choose split attribute Ak which gives the highest information

gain or the smallest

Example: colour

Color Size Points Class

red

brown

brown

green

red

small

small

big

small

big

yes

no

yes

no

no

poisonous

eatable

eatable

eatable

eatable




Decision Tree Construction (2)

Analogously: H(C|Acolour) = 0.4

H(C|Asize) ≈ 0.4562

H(C|Apoints) = 0.4

Choose colour or points as first split criterion

Recursively repeat this procedure

Points

Colour Size Class

red small poisonous

brown big eatable

Colour Size Class

red big eatable

brown small eatable

green small eatable

yes no




Decision Tree Construction (3)

Right side is trivial:

Left side: both attributes have the same information gain

Points

Colour Size Class

red small poisonous

brown big eatable

yes no

eatable/3

Points

yes no

eatable/3Colour

eatable/1poisonous/1

greenred




Generalisation

The classifier should also be able to handle unknown data.

Classifing model is often called hypothesis.

Testing Generality: Divide samples into

Training set Validation or test set

Learn according to training set Test generality according to validation set

Error computation: Test set X Hypothesis h error(X,h) – Function which is monotonously

increasing in the number of wrongly classified examples in X by h




Overfitting

Learnt hypothesis performs good on training set but bad on validation set

Formally:h is overfitted if there exists a hypothesis h’ with error(D,h) < error(D,h’) and error(X,h) > error(X,h’)

X validation setD training set




Avoiding Overfitting

Stopping Don‘t split further if some criteria is true Examples:

Size of node n:Don‘t split if n contains less then ¯ examples.

Purity of node n:Don‘t split of purity gain is not big enough.

Pruning Reduce decision tree after training. Examples:

Reduced Error Pruning Minimal Cost-Complexity Pruning Rule-Post Pruning




Pruning

Pruning Syntax:

If T0 was produced by (repeated) pruning on T we write

n

n n

T Tn T/Tn




Maximum Tree Creation

Before pruning we need a maximum tree Tmax

What is a maximum tree? All leaf nodes are smaller then some threshold or All leaf nodes represent only one class or All leaf nodes have only objects with the same attribute values

Tmax is then pruned starting from the leaves.




Reduced Error Pruning

1. Consider branch Tn of T

2. Replace Tn by leaf with the class that is mostly associated with

Tn

3. If error(X, h(T)) < error(X, h(T/Tn)) take back the decision

4. Back to 1. until all non-leaf nodes were considered




Exercise

Fred wants to buy a VW Beetle and classifies all offering in the classes interesting and uninteresting. Help Fred by creating a decision tree using the ID3 algorithm.

Colour Year of Construction Mileage Class

redblue

greenred

greenblue

yellow

1975198019751975 197019751970

> 200 000 km> 200 000 km< 200 000 km> 200 000 km< 200 000 km> 200 000 km< 200 000 km

interestinguninterestinginterestinginteresting

uninterestinguninterestinginteresting




Outline






Reinforcement Learning: The Idea

A way of programming agents by reward and punishment without specifying how the task is to be achieved




Learning to Balance on a Bicycle

States: Angle of handle bars Angular velocity of handle bars Angle of bicycle to vertical Angular velocity of bicycle to

vertical Acceleration of angle of bicycle

to vertical




Learning to Balance on a Bicycle

Actions: Torque to be applied to the

handle bars Displacement of the center of

mass from the bicycle’s plan (in cm)




Angle of bicycle to vertical is greater

than 12°

Reward = 0

Reward = -1

no yes




Reinforcement Learning: Applications

Board Games TD-Gammon program, based on reinforcement learning, has

become a world-class backgammon player

Control a Mobile Robot Learning to Drive a Bicycle Navigation Pole-balancing Acrobot

Robot Soccer

Learning to Control Sequential Processes Elevator Dispatching




Deterministic Markov Decision Process




Value of Policy and Agent’s Task




Nondeterministic Markov Decision Process

P = 0

.8

P = 0.1

P = 0.1




Methods

Dynamic Programming

ValueFunction

Approximation+

DynamicProgramming

ReinforcementLearning

ValuationFunction

Approximation+

ReinforcementLearning

continuousstates

discrete states discrete statescontinuous

states

Model (reward function and transitionprobabilities) is known

Model (reward function or transitionprobabilities) is unknown




Q-learning Algorithm




Example




Example: Q-table Initialization




Example: Episode 1




Example: Q-table




Example: Episode 1




Episode 1




Example: Q-table




Example: Episode 2




Example: Q-table after Convergence




Example: Value Function after Convergence




Example: Optimal Policy




Q-learning




Convergence of Q-learning




Blackjack

Standard rules of blackjack hold State space:

element[0] - current value of player's hand (4-21)

element[1] - value of dealer's face -up card (2-11)

element[2] - player does not have usable ace (0/1)

Starting states: player has any 2 cards (uniformly

distributed), dealer has any 1 card (uniformly distributed)

Actions: HIT STICK

Rewards: 1 for a loss 0 for a draw 1 for a win




Blackjack: Optimal Policy




Exercise:




Problems

Multiagent Systems Cooperative Agents Competitive Agents

Continuous Domains

Partially observable MDP (POMDP)

International Graduate School of Dynamic Intelligent Systems Machine Learning RG Knowledge Based...

Documents

Transcript of International Graduate School of Dynamic Intelligent Systems Machine Learning RG Knowledge Based...