Zhongzhi Shi, Markus Stumptner, Yalei Hao, Gerald Quirchmayr

106
22/6/28 Chap4 Inductive Learning Zhon gzhi Shi 1 Advanced Computing Seminar Data Mining and Its Industrial Applications — Chapter 4 — Inductive Learning Zhongzhi Shi, Markus Stumptner, Yalei Hao, Gerald Quirchmayr Knowledge and Software Engineering Lab Advanced Computing Research Centre School of Computer and Information Science University of South Australia )

description

Advanced Computing Seminar Data Mining and Its Industrial Applications — Chapter 4 — Inductive Learning. Zhongzhi Shi, Markus Stumptner, Yalei Hao, Gerald Quirchmayr Knowledge and Software Engineering Lab Advanced Computing Research Centre School of Computer and Information Science - PowerPoint PPT Presentation

Transcript of Zhongzhi Shi, Markus Stumptner, Yalei Hao, Gerald Quirchmayr

Page 1: Zhongzhi Shi, Markus Stumptner, Yalei Hao, Gerald Quirchmayr

23/4/22 Chap4 Inductive Learning Zhongzhi Shi 1

Advanced Computing Seminar Data Mining and Its

Industrial Applications — Chapter 4 —

Inductive LearningZhongzhi Shi, Markus Stumptner, Yalei Hao, Gerald Quirchmayr

Knowledge and Software Engineering LabAdvanced Computing Research Centre

School of Computer and Information ScienceUniversity of South Australia

)

Page 2: Zhongzhi Shi, Markus Stumptner, Yalei Hao, Gerald Quirchmayr

23/4/22 Chap4 Inductive Learning Zhongzhi Shi 2

Outline

Introduction Machine learning Version space and bias Decision tree learning Ripper algorithm Summary

Page 3: Zhongzhi Shi, Markus Stumptner, Yalei Hao, Gerald Quirchmayr

23/4/22 Chap4 Inductive Learning Zhongzhi Shi 3

Basic Concepts

Data: Store on any media with certain format

Information: Assign meaning to concrete data

knowledge: Refine from information

Page 4: Zhongzhi Shi, Markus Stumptner, Yalei Hao, Gerald Quirchmayr

23/4/22 Chap4 Inductive Learning Zhongzhi Shi 4

Why Data Mining?Why Data Mining?

Rich Data, Poor Knowledge

Data KnowledgeKnowledge Decision Decision MakingMaking

Pattern Trends Concept Relation Model Association

Rules Sequence

E-commerce Resource

distribution Trade Business

Intelligence E-Science

Finance Economic Government Post Population Life cycle

Page 5: Zhongzhi Shi, Markus Stumptner, Yalei Hao, Gerald Quirchmayr

23/4/22 Chap4 Inductive Learning Zhongzhi Shi 5

Data Mining vs Knowledge Discovery

Data mining Extraction of interesting (non-trivial,

implicit, previously unknown and potentially useful) patterns or knowledge from huge amount of data

Knowledge discovery (mining) in databases (KDD), knowledge extraction, data/pattern analysis, data archeology, data dredging, information harvesting, business intelligence, etc.

Page 6: Zhongzhi Shi, Markus Stumptner, Yalei Hao, Gerald Quirchmayr

23/4/22 Chap4 Inductive Learning Zhongzhi Shi 6

Data Mining: A KDD Process

Data mining—core of knowledge discovery process

Data Cleaning

Data Integration

Databases

Data Warehouse

Task-relevant Data

Selection

Data Mining

Pattern Evaluation

Page 7: Zhongzhi Shi, Markus Stumptner, Yalei Hao, Gerald Quirchmayr

23/4/22 Chap4 Inductive Learning Zhongzhi Shi 7

Data Warehouse ProcessOrganizationReadiness

Assessment

BusinessStrategyDefinition

DataWarehouseArchitecture

DefinitionData

WarehouseInfrastructure

Design

Design andBuild

DataExploitation

Implementation

• Meta data management• Data access• Systems Integration

Page 8: Zhongzhi Shi, Markus Stumptner, Yalei Hao, Gerald Quirchmayr

23/4/22 Chap4 Inductive Learning Zhongzhi Shi 8

Macro Picture

Data Mining Approach to Data Warehouse Design

Desired star schema

Attribute• Width• Type• NULL allowed• Name• Key

Numeric• Maximum• Minimum• Average• Standard deviation

Text fields• Number of spaces• Numerals used• Average length

Designed Star SchemaMapping Rules

Page 9: Zhongzhi Shi, Markus Stumptner, Yalei Hao, Gerald Quirchmayr

23/4/22 Chap4 Inductive Learning Zhongzhi Shi 9

Detailed picture

InfoSource

1

Extractor

Similarity Calculator

Attribute Classifier

Integrator

InfoSource

1Info

Source1

Translator

DesiredStar Schema

Designed Star SchemaMapping rules

Page 10: Zhongzhi Shi, Markus Stumptner, Yalei Hao, Gerald Quirchmayr

23/4/22 Chap4 Inductive Learning Zhongzhi Shi 10

Knowledge Representation

Production system Frame Semantic networks First order logic Ontology

Page 11: Zhongzhi Shi, Markus Stumptner, Yalei Hao, Gerald Quirchmayr

23/4/22 Chap4 Inductive Learning Zhongzhi Shi 11

Production System

Rules IF (conditions) Then (conclusions)

If ( animal has wing) and (animal can fly) Then (animal is a bird)

Page 12: Zhongzhi Shi, Markus Stumptner, Yalei Hao, Gerald Quirchmayr

23/4/22 Chap4 Inductive Learning Zhongzhi Shi 12

Production System

MYCIN

$<rule> = IF <antecedent> THEN <action> (ELSE <action>$

$<antecedent> = AND <condition>$

$<condition> = OR <condition> | <predicate> <associative-tripe>$

$<associative-tripe> = <attribute> <object> <value>$

$<action> = <consequent>) | <procedure>$

$<consequent> = <associative-triple> <certainty-factor>$

Page 13: Zhongzhi Shi, Markus Stumptner, Yalei Hao, Gerald Quirchmayr

23/4/22 Chap4 Inductive Learning Zhongzhi Shi 13

Frame Structure

FRAME FRAME-NAME

SLOT-NAME-1: ASPECT-11 ASPECT-VALUE-11

ASPECT-12 ASPECT-VALUE-12

ASPECT-1m AWPECT-VALUE-1m

...... ......

SOLT-NAME-n: ASPECT-n1 ASPECT VALUE-n1

ASPECT-n2 ASPECT-VAPECT-VALUE-n2

ASPECT-n1 ASPECT-VALUE-n1

Page 14: Zhongzhi Shi, Markus Stumptner, Yalei Hao, Gerald Quirchmayr

23/4/22 Chap4 Inductive Learning Zhongzhi Shi 14

Semantic Networks

node: objects arc: relationships

Page 15: Zhongzhi Shi, Markus Stumptner, Yalei Hao, Gerald Quirchmayr

23/4/22 Chap4 Inductive Learning Zhongzhi Shi 15

First Order Logic

Student(John) Teacher(Markus) Father(x,y) Father(y,z) Grandfather(x,z):-Father(x,y),Father(y,z) If ( animal has wing) and (animal can fly) Then (animal is a bird)

Page 16: Zhongzhi Shi, Markus Stumptner, Yalei Hao, Gerald Quirchmayr

23/4/22 Chap4 Inductive Learning Zhongzhi Shi 16

Ontology

Semantic Web:

Ontology OWL Ontology schema Description Logic

Page 17: Zhongzhi Shi, Markus Stumptner, Yalei Hao, Gerald Quirchmayr

23/4/22 Chap4 Inductive Learning Zhongzhi Shi 17

Outline

Introduction Machine learning Version space and bias Decision tree learning Ripper algorithm Summary

Page 18: Zhongzhi Shi, Markus Stumptner, Yalei Hao, Gerald Quirchmayr

23/4/22 Chap4 Inductive Learning Zhongzhi Shi 18

The Essence of Learning

Learning denotes changes in the system that are adaptive in the sense that they enable the system to do the same task or tasks drawn from the same population more efficiently and more effectively the next time. [Simon 1983]

Machine learning is the study of how to make machines acquire new knowledge, new skills, and reorganize existing knowledge.

Page 19: Zhongzhi Shi, Markus Stumptner, Yalei Hao, Gerald Quirchmayr

23/4/22 Chap4 Inductive Learning Zhongzhi Shi 19

The Essence of Learning

The environment supplies the source information to the learning system. The level and quality of the information will significantly affect the learning strategy.

Feedback

Environment

Learning Element

Knowledge Base

Performance Element

Page 20: Zhongzhi Shi, Markus Stumptner, Yalei Hao, Gerald Quirchmayr

23/4/22 Chap4 Inductive Learning Zhongzhi Shi 20

The Essence of Learning

The environment = Information source

Database Text Web pages Image Video Space data

Page 21: Zhongzhi Shi, Markus Stumptner, Yalei Hao, Gerald Quirchmayr

23/4/22 Chap4 Inductive Learning Zhongzhi Shi 21

The Essence of Learning The learning element uses this information to

make improvements in an explicit knowledge base, and the performance element uses the knowledge base to perform its task.

Inductive learning Analogical Learning Explanation Learning Genetic algorithm Neural network

Page 22: Zhongzhi Shi, Markus Stumptner, Yalei Hao, Gerald Quirchmayr

23/4/22 Chap4 Inductive Learning Zhongzhi Shi 22

Paradigms for Machine Learning

The inductive paradigm The most widely studied method for symbolic learning is one

of inducing a general concept description from a sequence of instances of the concept and known counterexamples of the concept. The task is to build a concept description from which all the previous positive instances can be rederived by universal instantiation but none of the previous negative instances can be rederived by the same process.

The analogical paradigm Analogical reasoning is a strategy of inference that allows the

transfer of knowledge from a known area into another area with similar properties.

Page 23: Zhongzhi Shi, Markus Stumptner, Yalei Hao, Gerald Quirchmayr

23/4/22 Chap4 Inductive Learning Zhongzhi Shi 23

Paradigms for Machine Learning

The analytic paradigm The methods attempt to formulate a generalization after analyzing

few instances in terms of the systems's knowledge. Mainly deductive rather than inductive mechanisms are used for such learning.

The genetic paradigm Genetic algorithms have been inspired by a direct analogy to

mutations in biological reproduction and Darwinian natural selection. In principle, genetic algorithms encode a parallel search through concept space, with each process attempting coarse-grain hill climbing.

The connectionist paradigm Connectionist learning systems, also called ``neural networks“.

Connectionist learning consists of readjusting weights in a fixed-topology network via specific learning algorithms

Page 24: Zhongzhi Shi, Markus Stumptner, Yalei Hao, Gerald Quirchmayr

23/4/22 Chap4 Inductive Learning Zhongzhi Shi 24

The Essence of Learning

The knowledge base contains predefined concepts, domain constrains heuristic rules and so on.

Knowledge representation Knowledge consistence Knowledge redundancy

Page 25: Zhongzhi Shi, Markus Stumptner, Yalei Hao, Gerald Quirchmayr

23/4/22 Chap4 Inductive Learning Zhongzhi Shi 25

The Essence of Learning

The performance element. The learning element is trying to improve the action of the performance element. The performance element applies knowledge to solve problems and evaluate the learning effects.

Page 26: Zhongzhi Shi, Markus Stumptner, Yalei Hao, Gerald Quirchmayr

23/4/22 Chap4 Inductive Learning Zhongzhi Shi 26

On Concept The term ``concept" is an universal notion which reflects a general,abstract, and essential features. For example, ``triangle", ``animal",``computer", all of them are concept. Horse, tiger, bird and so on arecalled as example of the concept ``animal".

Concept contains two meanings, extension and intension. Intension. The set of attributes which reflect the essential features

of a concept is called intension. Extension. The set of examples which satisfy the definition of a concept is called extension.

Fruit Student

Page 27: Zhongzhi Shi, Markus Stumptner, Yalei Hao, Gerald Quirchmayr

23/4/22 Chap4 Inductive Learning Zhongzhi Shi 27

Concept Description In general, a concept can be described by the concept name, and

list of the attributes and attribute-value pairs, that is,

(Concept name (Attribute 1 Value1) (Attribute2 Value2) … (Attributen Valuen)

In addition, concept description can be represented by first order logic.

Each attribute is a predicate, concept name and attribute value can be viewed as arguments. Concept description is represented by predicate calculus

Page 28: Zhongzhi Shi, Markus Stumptner, Yalei Hao, Gerald Quirchmayr

23/4/22 Chap4 Inductive Learning Zhongzhi Shi 28

Attribute Types Nominal attribute is one that

takes on a finite, unordered set of mutually exclusive values.

Linear attribute Structured attribute

Page 29: Zhongzhi Shi, Markus Stumptner, Yalei Hao, Gerald Quirchmayr

23/4/22 Chap4 Inductive Learning Zhongzhi Shi 29

Attribute Types Nominal attribute is one that

takes on a finite, unordered set of mutually exclusive values.

For examples• Color: red, green, blue• Traffic: airline, railway, ship

Page 30: Zhongzhi Shi, Markus Stumptner, Yalei Hao, Gerald Quirchmayr

23/4/22 Chap4 Inductive Learning Zhongzhi Shi 30

Attribute Types Linear attribute

For examples• Age: 1,2,…100• Temperature: 20, 21,… • Distance: 1km, 2km,…

Page 31: Zhongzhi Shi, Markus Stumptner, Yalei Hao, Gerald Quirchmayr

23/4/22 Chap4 Inductive Learning Zhongzhi Shi 31

Attribute Types Structured attribute For examples:• Tree structure •

computer

Hardware Software

CPU Memory

Computing Control

Page 32: Zhongzhi Shi, Markus Stumptner, Yalei Hao, Gerald Quirchmayr

23/4/22 Chap4 Inductive Learning Zhongzhi Shi 32

Inductive Learning From particular examples to general

conclusion, principle, rule

apple eat tomato eat banana eat … … fruit eat

Page 33: Zhongzhi Shi, Markus Stumptner, Yalei Hao, Gerald Quirchmayr

23/4/22 Chap4 Inductive Learning Zhongzhi Shi 33

Inductive Learning Given: • Premise statements. Consists of facts, specific observations, intermediate generalizations that provide information about some objects, phenomena, processes, and so on. • Tentative inductive assertion. Provides a priori hypothesis held about the objects in the premise statement. • Background knowledge. Contains general and domain-specific concepts for interpreting the premises and

inference rules relevant to the task of inference Find: Inductive assertion (hypothesis). It strongly or weakly

implies the premise statements in the context of background knowledge and satisfies the preference criterion.

Page 34: Zhongzhi Shi, Markus Stumptner, Yalei Hao, Gerald Quirchmayr

23/4/22 Chap4 Inductive Learning Zhongzhi Shi 34

Inductive Learning • Simplest form: learn a function from examples

f is the target function

An example is a pair (x, f(x))

Problem: find a hypothesis hsuch that h ≈ fgiven a training set of examples

(This is a highly simplified model of real learning:– Ignores prior knowledge– Assumes examples are given)–

Page 35: Zhongzhi Shi, Markus Stumptner, Yalei Hao, Gerald Quirchmayr

23/4/22 Chap4 Inductive Learning Zhongzhi Shi 35

Inductive Learning Method • Construct/adjust h to agree with f on training set• (h is consistent if it agrees with f on all examples)• E.g., curve fitting:••

Page 36: Zhongzhi Shi, Markus Stumptner, Yalei Hao, Gerald Quirchmayr

23/4/22 Chap4 Inductive Learning Zhongzhi Shi 36

Inductive Learning Method • Construct/adjust h to agree with f on training set• (h is consistent if it agrees with f on all examples)• E.g., curve fitting:••

Page 37: Zhongzhi Shi, Markus Stumptner, Yalei Hao, Gerald Quirchmayr

23/4/22 Chap4 Inductive Learning Zhongzhi Shi 37

Inductive Learning Method • Construct/adjust h to agree with f on training set• (h is consistent if it agrees with f on all examples)• E.g., curve fitting:••

Page 38: Zhongzhi Shi, Markus Stumptner, Yalei Hao, Gerald Quirchmayr

23/4/22 Chap4 Inductive Learning Zhongzhi Shi 38

Inductive Learning Method • Construct/adjust h to agree with f on training set• (h is consistent if it agrees with f on all examples)• E.g., curve fitting:••

Page 39: Zhongzhi Shi, Markus Stumptner, Yalei Hao, Gerald Quirchmayr

23/4/22 Chap4 Inductive Learning Zhongzhi Shi 39

Best-Hypothesis Positive example generalize Negative example specialize Drawbacks: check previous examples & backtrack

Page 40: Zhongzhi Shi, Markus Stumptner, Yalei Hao, Gerald Quirchmayr

23/4/22 Chap4 Inductive Learning Zhongzhi Shi 40

Outline

Introduction Machine learning Version space and bias Decision tree learning Ripper algorithm Summary

Page 41: Zhongzhi Shi, Markus Stumptner, Yalei Hao, Gerald Quirchmayr

23/4/22 Chap4 Inductive Learning Zhongzhi Shi 41

Hypothesis Space Concept description Extension

a certain set of examples predicted to be satisfied by the hypothesis

Bias any preference for one hypothesis over another

Page 42: Zhongzhi Shi, Markus Stumptner, Yalei Hao, Gerald Quirchmayr

23/4/22 Chap4 Inductive Learning Zhongzhi Shi 42

Training Examples for Enjoy Sport

Sky Temp Humidity Wind Water Forecast EnjoySport

Sunny Warm Normal Strong Warm Same YESSunny Warm High Strong Warm Same YESRainy Cold High Strong Warm Change NOSunny Warm High Strong Cool Change YES

What is the general concept?

Page 43: Zhongzhi Shi, Markus Stumptner, Yalei Hao, Gerald Quirchmayr

23/4/22 Chap4 Inductive Learning Zhongzhi Shi 43

is more_general_than_or_equal_to relation

Definition of more_general_than_or_equal_to relation:Let hj and hk be boolean-valued functions defined over X. Then hj is more_general_than_or_equal_to hk (hj g hk) iff

(xX) [(hk(x)=1)(hj(x)=1)]

In our case the most general hypothesis - that every day is a positive example - is represented by ?, ?, ?, ?, ?, ?,and the most specific possible hypothesis - that no day is positive example - is represented by , , , , , .

Page 44: Zhongzhi Shi, Markus Stumptner, Yalei Hao, Gerald Quirchmayr

23/4/22 Chap4 Inductive Learning Zhongzhi Shi 44

Example of the Ordering of Hypotheses

Page 45: Zhongzhi Shi, Markus Stumptner, Yalei Hao, Gerald Quirchmayr

23/4/22 Chap4 Inductive Learning Zhongzhi Shi 45

Version Space Search

Page 46: Zhongzhi Shi, Markus Stumptner, Yalei Hao, Gerald Quirchmayr

23/4/22 Chap4 Inductive Learning Zhongzhi Shi 46

Version Space Example

Page 47: Zhongzhi Shi, Markus Stumptner, Yalei Hao, Gerald Quirchmayr

23/4/22 Chap4 Inductive Learning Zhongzhi Shi 47

Representing Version Space

The General boundary, G, of version space VSH,E, is the set of its maximally general members

The Specific boundary, S, of version space VSH,E, is the set of its maximally specific members

Every member of the version space lies between these boundariesVSH,E, = {hH | (sS) (gG) (ghs)} where xy means x is more general or equal to y

Page 48: Zhongzhi Shi, Markus Stumptner, Yalei Hao, Gerald Quirchmayr

23/4/22 Chap4 Inductive Learning Zhongzhi Shi 48

Candidate-elimination algorithm

1 Initilize H to be the whole space. Thus, the G set contains only the null description, and the S set is consistent with the first observed positive training instance.

2. For each subsequent instance, i, BEGIN IF i is a positive instance, THEN BEGIN Retain in G only those generalizations which match I. Update S to generalize the elements in S as little as possible, so that they will match i.

Page 49: Zhongzhi Shi, Markus Stumptner, Yalei Hao, Gerald Quirchmayr

23/4/22 Chap4 Inductive Learning Zhongzhi Shi 49

Candidate-elimination algorithm

ELSE IF i is a negative instance, THEN BEGIN Retain in S only those generalizations which do not match I. Update G to specialize the elements in G as little as possible, so that they will not match i.3 Repeat step 2 until G = S and this is a singleton set. When this occurs, H has

collapsed to include only a single concept.4 Output H.

Page 50: Zhongzhi Shi, Markus Stumptner, Yalei Hao, Gerald Quirchmayr

23/4/22 Chap4 Inductive Learning Zhongzhi Shi 50

Converging Boundaries of the G and S sets

Page 51: Zhongzhi Shi, Markus Stumptner, Yalei Hao, Gerald Quirchmayr

23/4/22 Chap4 Inductive Learning Zhongzhi Shi 51

Example Trace (1)

Page 52: Zhongzhi Shi, Markus Stumptner, Yalei Hao, Gerald Quirchmayr

23/4/22 Chap4 Inductive Learning Zhongzhi Shi 52

Example Trace (2)

Page 53: Zhongzhi Shi, Markus Stumptner, Yalei Hao, Gerald Quirchmayr

23/4/22 Chap4 Inductive Learning Zhongzhi Shi 53

Example Trace (3)

Page 54: Zhongzhi Shi, Markus Stumptner, Yalei Hao, Gerald Quirchmayr

23/4/22 Chap4 Inductive Learning Zhongzhi Shi 54

Example Trace (4)

Page 55: Zhongzhi Shi, Markus Stumptner, Yalei Hao, Gerald Quirchmayr

23/4/22 Chap4 Inductive Learning Zhongzhi Shi 55

How to Classify new Instances? New instance i is classified as a positive

instance if every hypothesis in the current version space classifies it as positive.

Efficient test - iff the instance satisfies every member of S

New instance i is classified as a negative instance if every hypothesis in the current version space classifies it as negative.

Efficient test - iff the instance satisfies none of the members of G

Page 56: Zhongzhi Shi, Markus Stumptner, Yalei Hao, Gerald Quirchmayr

23/4/22 Chap4 Inductive Learning Zhongzhi Shi 56

New Instances to be Classified

A Sunny, Warm, Normal, Strong, Cool, Change (YES)B Rainy, Cold, Normal, Light, Warm, Same (NO)C Sunny, Warm, Normal, Light, Warm, Same (Ppos(C)=3/6)

D Sunny, Cold, Normal, Strong, Warm, Same (Ppos(C)=2/6)

Page 57: Zhongzhi Shi, Markus Stumptner, Yalei Hao, Gerald Quirchmayr

23/4/22 Chap4 Inductive Learning Zhongzhi Shi 57

Remarks on Version Space and Candidate-Elimination

The algorithm outputs a set of all hypotheses consistent with the training examples iff there are no errors in the training data iff there is some hypothesis in H that

correctly describes the target concept The target concept is exactly learned when the

S and G boundary sets converge to a single identical hypothesis.

Applications learning regularities in chemical mass

spectroscopy learning control rules for heuristic search

Page 58: Zhongzhi Shi, Markus Stumptner, Yalei Hao, Gerald Quirchmayr

23/4/22 Chap4 Inductive Learning Zhongzhi Shi 58

Drawbacks of Version Space

Assume consistent training data Noise-sensitive Comments on version space

though not practical in most real-world learning problems, they provide a good deal of insight into the logical structure of hypothesis space

Page 59: Zhongzhi Shi, Markus Stumptner, Yalei Hao, Gerald Quirchmayr

23/4/22 Chap4 Inductive Learning Zhongzhi Shi 59

Version-Space Merging

VS1 VS2

S1 S2

G1 G2

G1 2

S1 2

VS1 2

Page 60: Zhongzhi Shi, Markus Stumptner, Yalei Hao, Gerald Quirchmayr

23/4/22 Chap4 Inductive Learning Zhongzhi Shi 60

Version-Space Merging

Conceptional each new piece of information new version space

Practical parallel ambiguous, inconsistent data, background domain theories

VSMVSI

VSn

Page 61: Zhongzhi Shi, Markus Stumptner, Yalei Hao, Gerald Quirchmayr

23/4/22 Chap4 Inductive Learning Zhongzhi Shi 61

IVSM Examples

any-shape

Polyhedron Spheroid

any-size

Large Small

Cube Pyram

id Octoploid

Page 62: Zhongzhi Shi, Markus Stumptner, Yalei Hao, Gerald Quirchmayr

23/4/22 Chap4 Inductive Learning Zhongzhi Shi 62

IVSM Examples Example Instance S Instance G Resulting S Resulting G

[S,C] [S,C] [ ? , ? ] [S,C] [ ? , ? ]X [S,Sp] [L,?] [?,Po][S,C] [?,Po]X [L,O] [S,?] [?,C][S,C] [?,C] [S,Py]

[?,Py] [?,C] [S,P] [S,P] [ ? , ? ] [S,C] [S,Po]

[S,Py]

Page 63: Zhongzhi Shi, Markus Stumptner, Yalei Hao, Gerald Quirchmayr

23/4/22 Chap4 Inductive Learning Zhongzhi Shi 63

Bias

Definition any basis for choosing one generalization over another any factor that influences the definition or selection of

inductive hypotheses Representational bias

lauguage, language implementation, primitive terms Procedural (algorithmic) bias

order of traversal of the states in the space defined by a representational bias

Page 64: Zhongzhi Shi, Markus Stumptner, Yalei Hao, Gerald Quirchmayr

23/4/22 Chap4 Inductive Learning Zhongzhi Shi 64

Bias

Program

Training set

Search Knowledge

Bias

Training Examples

Hypothesis

Page 65: Zhongzhi Shi, Markus Stumptner, Yalei Hao, Gerald Quirchmayr

23/4/22 Chap4 Inductive Learning Zhongzhi Shi 65

Bias Selection & Evaluation

Real-world domains have potentially hundreds of features and sources of data

Why is bias selection important? improve the predictive accuracy of the learner improve performance goals

Selection: static vs. dynamic Evaluation: basis for bias selection

online and empirical vs. offline and analytical

Page 66: Zhongzhi Shi, Markus Stumptner, Yalei Hao, Gerald Quirchmayr

23/4/22 Chap4 Inductive Learning Zhongzhi Shi 66

Multi-Tiered Bias System

Bias shifting bias selection occurs again after learning has begun useful when the knowledge for bias selection is not available

prior to learning, but can be gathered during learning Multi-tierd bias

make embedded biases explicit! reduce the cost of system and knowledge engineering flexible system design, conceptual simplicity

Characterize learning as search within multiple tiers!

Page 67: Zhongzhi Shi, Markus Stumptner, Yalei Hao, Gerald Quirchmayr

23/4/22 Chap4 Inductive Learning Zhongzhi Shi 67

Multi-Tiered Bias Search Space

L(H)

H

P(l(H)))

P(l(L(H))) L(L(H)) L(P(l(H))) P(l(P(l(H))))

RepresentationalBias Space

ProceduralBias Space

Hypothesis Space

Procedural Meta-Bias Spaces

Representational Meta-Bias Spaces

Page 68: Zhongzhi Shi, Markus Stumptner, Yalei Hao, Gerald Quirchmayr

23/4/22 Chap4 Inductive Learning Zhongzhi Shi 68

Outline

Introduction Machine learning Version space and bias Decision tree learning Ripper algorithm Summary

Page 69: Zhongzhi Shi, Markus Stumptner, Yalei Hao, Gerald Quirchmayr

23/4/22 Chap4 Inductive Learning Zhongzhi Shi 69

Decision Tree Learning1966 Hunt, Marin, Stone: CLS1983 Quinlan: ID31986 Schlimmer, Fisher: ID4 ,

Incremental learning1988 Utgoff: ID51993 Quinlan: C4.5, C5

Page 70: Zhongzhi Shi, Markus Stumptner, Yalei Hao, Gerald Quirchmayr

23/4/22 Chap4 Inductive Learning Zhongzhi Shi 70

Play tennis: Training examplesDay Outlook Temperature Humidity Wind Play TennisD1 Sunny Hot High Weak NoD2 Sunny Hot High Strong NoD3 Overcast Hot High Weak YesD4 Rain Mild High Weak YesD5 Rain Cool Normal Weak YesD6 Rain Cool Normal Strong NoD7 Overcast Cool Normal Strong YesD8 Sunny Mild High Weak NoD9 Sunny Cool Normal Weak YesD10 Rain Mild Normal Weak YesD11 Sunny Mild Normal Strong YesD12 Overcast Mild High Strong

YesD13 Overcast Hot NormalWeak YesD14 Rain Mild High Strong No

Page 71: Zhongzhi Shi, Markus Stumptner, Yalei Hao, Gerald Quirchmayr

23/4/22 Chap4 Inductive Learning Zhongzhi Shi 71

CLS learning algorithm

Decision tree each internal node tests an attribute each branch corresponds to attribute

value each leaf node assigns a classification

Decision trees are inherently disjunctive, since each branch leaving a decision node corresponds to a separate disjunctive case. Decision trees can be used to represent disjunctive concepts.

Page 72: Zhongzhi Shi, Markus Stumptner, Yalei Hao, Gerald Quirchmayr

23/4/22 Chap4 Inductive Learning Zhongzhi Shi 72

CLS learning algorithm

The CLS algorithm starts with an empty decision tree and gradually refines it, by adding decision nodes, until the tree correctly classifies all the training instances. The algorithm operates over a set of training instances, C, as follows:

If all instances in C are positive, then create a YES node and halt. If all instances in C are negative, create a NO node and halt. Otherwise, select (using some heuristic criterion) an attribute, A, with values v1,…,vn and create the decision tree.

Partition the training instances in C into subsets C1,…,Cn

according to the values of V. Apply the algorithm recursively to each of the sets Ci.

Page 73: Zhongzhi Shi, Markus Stumptner, Yalei Hao, Gerald Quirchmayr

23/4/22 Chap4 Inductive Learning Zhongzhi Shi 73

ID3 Approach

ID3 algorithm build decision tree based on training

objects with known class labels to classify testing objects

rank attributes with information gain measure

minimal height the least number of tests to classify an

object

Page 74: Zhongzhi Shi, Markus Stumptner, Yalei Hao, Gerald Quirchmayr

23/4/22 Chap4 Inductive Learning Zhongzhi Shi 74

Decision Tree Representation Representation:

Internal node test on some property (attribute) Branch corresponds to attribute value Leaf node assigns a classification

Decision trees represent a disjunction of conjunctions of constraints on the attribute values of instances

(Outlook = Sunny Humidity = Normal) (Outlook = Overcast)

(Outlook = Rain Wind = Weak)

Page 75: Zhongzhi Shi, Markus Stumptner, Yalei Hao, Gerald Quirchmayr

23/4/22 Chap4 Inductive Learning Zhongzhi Shi 75

Decision Tree Example

Page 76: Zhongzhi Shi, Markus Stumptner, Yalei Hao, Gerald Quirchmayr

23/4/22 Chap4 Inductive Learning Zhongzhi Shi 76

Appropriate problems for decision Trees

Instances are represented by attribute-value pairs

Target function has discrete output values Disjunctive hypothesis may be required Possibly noisy training data

data may contain errors data may contain missing attribute

values

Page 77: Zhongzhi Shi, Markus Stumptner, Yalei Hao, Gerald Quirchmayr

23/4/22 Chap4 Inductive Learning Zhongzhi Shi 77

Learning of Decision TreesTop-Down Induction of Decision Trees

Algorithm: The ID3 learning algorithm (Quinlan, 1986)If all examples from E belong to the same class Cj then label the leaf with Cj else

select the “best” decision attribute A with values v1, v2, …, vn for next node

divide the training set S into S1, …, Sn according to values v1,…,vn

recursively build subtrees T1, …, Tn for S1, …, Sn generate decision tree T

Page 78: Zhongzhi Shi, Markus Stumptner, Yalei Hao, Gerald Quirchmayr

23/4/22 Chap4 Inductive Learning Zhongzhi Shi 78

Entropy S - a sample of training examples; p+ (p-) is a proportion of positive (negative)

examples in S Entropy(S) = expected number of bits needed to

encode the classification of an arbitrary member of S

Information theory: optimal length code assigns-log2 p bits to message having probability p

Expected number of bits to encode “+” or “-” of random member of S: Entropy(S) - p- log2 p- - p+ log2 p+

Generally for c different classesEntropy(S) c- pi log2 pi

Page 79: Zhongzhi Shi, Markus Stumptner, Yalei Hao, Gerald Quirchmayr

23/4/22 Chap4 Inductive Learning Zhongzhi Shi 79

Entropy

The entropy function relative to a boolean classification, as the proportion of positive examples varies between 0 and 1

entropy as a measure of impurity in a collection of examples

Page 80: Zhongzhi Shi, Markus Stumptner, Yalei Hao, Gerald Quirchmayr

23/4/22 Chap4 Inductive Learning Zhongzhi Shi 80

Information Gain Search Heuristic

Gain(S,A) - the expected reduction in entropy caused by partitioning the examples of S according to the attribute A. a measure of the effectiveness of an attribute in

classifying the training data

Values(A) - possible values of the attribute A Sv - subset of S, for which attribute A has value v

The best attribute has maximal Gain(S,A) Aim is to minimise the number of tests needed for

class.

Gain S A Entropy SSvSv Values A

( , ) = ( ) -( )

Entropy Sv( )

Page 81: Zhongzhi Shi, Markus Stumptner, Yalei Hao, Gerald Quirchmayr

23/4/22 Chap4 Inductive Learning Zhongzhi Shi 81

Play Tennis: Information GainValues(Wind) = {Weak, Strong}

S = [9+, 5-], E(S) = 0.940 Sweak = [6+, 2-], E(Sweak) = 0.811 Sstrong = [3+, 3-], E(Sstrong) = 1.0

Gain(S,Wind) = E(S) - (8/14) E(Sweak) - (6/14) E(Sstrong) = 0.940 - (8/14) 0.811 - (6/14) 1.0 = 0.048

Gain(S,Outlook) = 0.246Gain(S,Humidity) = 0.151Gain(S,Temperature) = 0.029

Page 82: Zhongzhi Shi, Markus Stumptner, Yalei Hao, Gerald Quirchmayr

23/4/22 Chap4 Inductive Learning Zhongzhi Shi 82

Entropy and Information Gain

S contains si tuples of class Ci for i = {1, …, m}

Information measures info required to classify any arbitrary tuple

Entropy of attribute A with values {a1,a2,…,av}

Information gained by branching on attribute A

sslog

ss),...,s,ssI( i

m

i

im21 2

1

)s,...,s(Is

s...sE(A) mjj

v

j

mjj1

1

1

E(A))s,...,s,I(sGain(A) m 21

Page 83: Zhongzhi Shi, Markus Stumptner, Yalei Hao, Gerald Quirchmayr

23/4/22 Chap4 Inductive Learning Zhongzhi Shi 83

The ID3 Algorithm function ID3 (R: a set of non-categorical attributes,

C: the categorical attribute, S: a training set) returns a decision tree;

begin If S is empty, return a single node with value Failure; If S consists of records all with the same value for

the categorical attribute, return a single node with that value; If R is empty, then return a single node with as value

the most frequent of the values of the categorical attribute that are found in records of S; [note that then there

will be errors, that is, records that will be improperly classified];

Page 84: Zhongzhi Shi, Markus Stumptner, Yalei Hao, Gerald Quirchmayr

23/4/22 Chap4 Inductive Learning Zhongzhi Shi 84

The ID3 Algorithm Let D be the attribute with largest Gain(D,S)

among attributes in R; Let {dj| j=1,2, .., m} be the values of attribute D; Let {Sj| j=1,2, .., m} be the subsets of S consisting

respectively of records with value dj for attribute D; Return a tree with root labeled D and arcs

labeled d1, d2, .., dm going respectively to the trees   ID3(R-{D}, C, S1), ID3(R-{D}, C, S2), .., ID3(R-

{D}, C, Sm); end ID3;

Page 85: Zhongzhi Shi, Markus Stumptner, Yalei Hao, Gerald Quirchmayr

23/4/22 Chap4 Inductive Learning Zhongzhi Shi 85

C4.5 c4.5 is a program that creates a decision tree

based on a set of labeled input data. This decision tree can then be tested against

unseen labeled test data to quantify how well it generalizes.

The software for C4.5 can be obtained with Quinlan's book. A wide variety of training and test data is available, some provided by Quinlan.

Quinlan,J.R is working at RULEQUEST RESEARCH

company, See5/C5.0 has been designed to operate on large databases and incorporates innovations such as boosting.

Page 86: Zhongzhi Shi, Markus Stumptner, Yalei Hao, Gerald Quirchmayr

23/4/22 Chap4 Inductive Learning Zhongzhi Shi 86

C4.5 C4.5 is a software extension of the basic ID3 algorithm designed by

Quinlan to address the following issues not dealt with by ID3: Avoiding overfitting the data

Determining how deeply to grow a decision tree. Reduced error pruning. Rule post-pruning. Handling continuous attributes.

e.g., temperature Choosing an appropriate attribute selection measure. Handling training data with missing attribute values. Handling attributes with differing costs. Improving computational efficiency.

Page 87: Zhongzhi Shi, Markus Stumptner, Yalei Hao, Gerald Quirchmayr

23/4/22 Chap4 Inductive Learning Zhongzhi Shi 87

Running c4.5 On cunix.columbia.edu

~amr2104/c4.5/bin/c4.5 –u –f filestem c4.5 expects to find 3 files

filestem.names filestem.data filestem.test

Page 88: Zhongzhi Shi, Markus Stumptner, Yalei Hao, Gerald Quirchmayr

23/4/22 Chap4 Inductive Learning Zhongzhi Shi 88

File Format: .names The file begins with a comma separated list

of classes ending with a period, followed by a blank line E.g, >50K, <=50K.

The remaining lines have the following format (note the end of line period): Attribute: {ignore, discrete n, continuous,

list}.

Page 89: Zhongzhi Shi, Markus Stumptner, Yalei Hao, Gerald Quirchmayr

23/4/22 Chap4 Inductive Learning Zhongzhi Shi 89

Example: census.names>50K, <=50K.

age: continuous.workclass: Private, Self-emp-not-inc, Self-emp-inc, Federal-gov,

etc. fnlwgt: continuous.education: Bachelors, Some-college, 11th, HS-grad, Prof-school, etc. education-num: continuous.marital-status: Married-civ-spouse, Divorced, Never-married, etc.occupation: Tech-support, Craft-repair, Other-service, Sales, etc.

relationship: Wife, Own-child, Husband, Not-in-family, Unmarried.race: White, Asian-Pac-Islander, Amer-Indian-Eskimo, Other, Black.sex: Female, Male.capital-gain: continuous.capital-loss: continuous.hours-per-week: continuous.native-country: United-States, Cambodia, England, Puerto-Rico,

Canada, etc.

Page 90: Zhongzhi Shi, Markus Stumptner, Yalei Hao, Gerald Quirchmayr

23/4/22 Chap4 Inductive Learning Zhongzhi Shi 90

File Format: .data, .test Each line in these data files is a comma

separated list of attribute values ending with a class label followed by a period. The attributes must be in the same order

as described in the .names file. Unavailable values can be entered as ‘?’

When creating test sets, make sure that you remove these data points from the training data.

Page 91: Zhongzhi Shi, Markus Stumptner, Yalei Hao, Gerald Quirchmayr

23/4/22 Chap4 Inductive Learning Zhongzhi Shi 91

Example: adult.test25, Private, 226802, 11th, 7, Never-married, Machine-op-inspct, Own-child,

Black, Male, 0, 0, 40, United-States, <=50K.38, Private, 89814, HS-grad, 9, Married-civ-spouse, Farming-fishing,

Husband, White, Male, 0, 0, 50, United-States, <=50K.28, Local-gov, 336951, Assoc-acdm, 12, Married-civ-spouse, Protective-serv,

Husband, White, Male, 0, 0, 40, United-States, >50K.44, Private, 160323, Some-college, 10, Married-civ-spouse, Machine-op-

inspct, Husband, Black, Male, 7688, 0, 40, United-States, >50K.18, ?, 103497, Some-college, 10, Never-married, ?, Own-child, White, Female,

0, 0, 30, United-States, <=50K.34, Private, 198693, 10th, 6, Never-married, Other-service, Not-in-family,

White, Male, 0, 0, 30, United-States, <=50K.29, ?, 227026, HS-grad, 9, Never-married, ?, Unmarried, Black, Male, 0, 0,

40, United-States, <=50K.63, Self-emp-not-inc, 104626, Prof-school, 15, Married-civ-spouse, Prof-

specialty, Husband, White, Male, 3103, 0, 32, United-States, >50K.24, Private, 369667, Some-college, 10, Never-married, Other-service,

Unmarried, White, Female, 0, 0, 40, United-States, <=50K.55, Private, 104996, 7th-8th, 4, Married-civ-spouse, Craft-repair, Husband,

White, Male, 0, 0, 10, United-States, <=50K.65, Private, 184454, HS-grad, 9, Married-civ-spouse, Machine-op-inspct,

Husband, White, Male, 6418, 0, 40, United-States, >50K.36, Federal-gov, 212465, Bachelors, 13, Married-civ-spouse, Adm-clerical, Husband, White, Male, 0, 0, 40, United-States, <=50K.

Page 92: Zhongzhi Shi, Markus Stumptner, Yalei Hao, Gerald Quirchmayr

23/4/22 Chap4 Inductive Learning Zhongzhi Shi 92

c4.5 Output The decision tree proper.

(weighted training examples/weighted training error)

Tables of training error and testing error Confusion matrix

You’ll want to pipe the output of c4.5 to a text file for later viewing. E.g., c4.5 –u –f filestem > filestem.results

Page 93: Zhongzhi Shi, Markus Stumptner, Yalei Hao, Gerald Quirchmayr

23/4/22 Chap4 Inductive Learning Zhongzhi Shi 93

Example outputcapital-gain > 6849 : >50K (203.0/6.2)| capital-gain <= 6849 :| | capital-gain > 6514 : <=50K (7.0/1.3)| | capital-gain <= 6514 :| | | marital-status = Married-civ-spouse: >50K (18.0/1.3)| | | marital-status = Divorced: <=50K (2.0/1.0)| | | marital-status = Never-married: >50K (0.0)| | | marital-status = Separated: >50K (0.0)| | | marital-status = Widowed: >50K (0.0)| | | marital-status = Married-spouse-absent: >50K (0.0)| | | marital-status = Married-AF-spouse: >50K (0.0)

Tree saved

Page 94: Zhongzhi Shi, Markus Stumptner, Yalei Hao, Gerald Quirchmayr

23/4/22 Chap4 Inductive Learning Zhongzhi Shi 94

Example outputEvaluation on training data (4660 items):

Before Pruning After Pruning---------------- ---------------------------Size Errors Size Errors Estimate

1692 366( 7.9%) 92 659(14.1%) (16.0%) <<

Evaluation on test data (2376 items):

Before Pruning After Pruning---------------- ---------------------------Size Errors Size Errors Estimate

1692 421(17.7%) 92 354(14.9%) (16.0%) <<

(a) (b) <-classified as ---- ---- 328 251 (a): class >50K 103 1694 (b): class <=50K

Page 95: Zhongzhi Shi, Markus Stumptner, Yalei Hao, Gerald Quirchmayr

23/4/22 Chap4 Inductive Learning Zhongzhi Shi 95

k-fold Cross Validation Start with one large data set. Using a script, randomly divide this data set

into k sets. At each iteration, use k-1 sets to train the

decision tree, and the remaining set to test the model.

Repeat this k times and take the average testing error.

The avg. error describes how well the learning algorithm can be applied to the data set.

Page 96: Zhongzhi Shi, Markus Stumptner, Yalei Hao, Gerald Quirchmayr

23/4/22 Chap4 Inductive Learning Zhongzhi Shi 96

Outline Introduction Machine learning Version space and bias Decision tree learning Ripper algorithm Summary

Page 97: Zhongzhi Shi, Markus Stumptner, Yalei Hao, Gerald Quirchmayr

23/4/22 Chap4 Inductive Learning Zhongzhi Shi 97

Inductive Learning

Inductive “Learning from Examples.”

Training Examples Decision Rules

data-case 1 : decision i1

data-case 2 : decision i2

: :data-case n : decision in

InductiveLearning

Unit

pattern 1 decision j1

pattern 2 decision j2

: :pattern n decision jn

Page 98: Zhongzhi Shi, Markus Stumptner, Yalei Hao, Gerald Quirchmayr

23/4/22 Chap4 Inductive Learning Zhongzhi Shi 98

Ripper Ripper (Repeated Incremental Pruning to Producing

Error Reduction) Ripper algorithm proposed by Cohen in 1995 Ripper is consisted of two phase: the first is to

determine the initial rule set and the second is post-process rule optimization

Page 99: Zhongzhi Shi, Markus Stumptner, Yalei Hao, Gerald Quirchmayr

23/4/22 Chap4 Inductive Learning Zhongzhi Shi 99

Ripper separate-and-conquer rule learning algorithm. First the

training data are divided into a growing set and a pruning set. Then this algorithm generates a rule set in a greedy fashion, a rule at a time. While generating a rule Ripper searches the most valuable rule for the current growing set in rule space which can be defined in the form of BNF. Immediately after a rule is extracted on growing set, it is pruned on pruning set. After pruning, the corresponding examples covered by that rule in the training set (growing and pruning sets) are deleted. The remaining training data are re-partitioned after each rule is learned in order to help stabilize any problems caused by a “bad-split”. This process is repeated until the terminal conditions satisfy.

Page 100: Zhongzhi Shi, Markus Stumptner, Yalei Hao, Gerald Quirchmayr

23/4/22 Chap4 Inductive Learning Zhongzhi Shi 100

Ripper procedure Rule_Generating(Pos,Neg) begin Ruleset := {} while Pos ¹ {} do /* grow and prune a new rule */ split (Pos,Neg) into (GrowPos,GrowNeg) and (PrunePos,PruneNeg) Rule := GrowRule(GrowPos,GrowNeg) Rule := PruneRule(Rule,PrunePos,PruneNeg) if the terminal conditions satisfy then return Ruleset else add Rule to Ruleset remove examples covered by Rule from (Pos,Neg) endif endwhile return Ruleset end

Page 101: Zhongzhi Shi, Markus Stumptner, Yalei Hao, Gerald Quirchmayr

23/4/22 Chap4 Inductive Learning Zhongzhi Shi 101

Ripper After each rule is added into the rule set, the

total description length, an integer value, of the rule set is computed. The description length gives a measure of the complexity and accuracy of a rule set. The terminal conditions satisfy when there are no positive examples left or the description length of the current rule set is more than the user-specified threshold.

Page 102: Zhongzhi Shi, Markus Stumptner, Yalei Hao, Gerald Quirchmayr

23/4/22 Chap4 Inductive Learning Zhongzhi Shi 102

Ripper Post-process rule optimization Ripper uses some post-pruning techniques to

optimize the rule set. This optimization is processed on the possible remaining positive examples. Re-optimizing the resultant rule set is called RIPPER2, and the general case of re-optimizing “k” times is called RIPPERk.

Page 103: Zhongzhi Shi, Markus Stumptner, Yalei Hao, Gerald Quirchmayr

23/4/22 Chap4 Inductive Learning Zhongzhi Shi 103

Outline

Introduction Machine learning Version space and bias Decision tree learning Ripper algorithm Summary

Page 104: Zhongzhi Shi, Markus Stumptner, Yalei Hao, Gerald Quirchmayr

23/4/22 Chap4 Inductive Learning Zhongzhi Shi 104

Summary Inductive Learning is an important

approach for data mining Version space can be used to explain

generalization and specialization ID 3 and C4.5 Ripper algorithms generate efficient

rules

Page 105: Zhongzhi Shi, Markus Stumptner, Yalei Hao, Gerald Quirchmayr

23/4/22 Chap4 Inductive Learning Zhongzhi Shi 105

References Zhongzhi Shi. Principles of Machine Learning. International

Academic Publishers, 1992 Jiawei Han and Micheline Kamber. Data Mining: Concepts and

Techniques Morgsn Kaufmann Publishers, 2000 Zhongzhi Shi. Knowledge Discovery. Tsinghua University Press.

2002 H. Liu and H. Motoda. Feature Selection for Knowledge

Discovery and Data Mining. Kluwer Academic Publishers, 1998. R. S. Michalski. A theory and methodology of inductive learning.

In Michalski et al., editor, Machine Learning: An Artificial Intelligence Approach, Vol. 1, Morgan Kaufmann, 1983.

T. M. Mitchell. Version spaces: A candidate elimination approach to rule learning. IJCAI'97, Cambridge, MA.

Quinlan,J.R.: C4.5: Programs for Machine Learning Morgan Kauffman, 1993

T. M. Mitchell. Machine Learning. McGraw Hill, 1997. J. R. Quinlan. Induction of decision trees. Machine Learning,

1:81-106, 1986.