Michael M. Richter - University of Calgarypages.cpsc.ucalgary.ca/~mrichter/ML/ML...

Michael M. RichterCalgary, Fall 2010- 1 -

Machine Learning

Michael M. Richter

Introduction 2010

Email: [email protected]

Michael M. RichterCalgary, Fall 2010

GENERAL

- 2 -


The Goal of the Lecture

• Today learning is a hot topic with applications in many

disciplines

• There are many different methods to learn something

• Many different systems have been developed

• The main goal is

– To understand the major methods

– To get a feeling for which kinds of applications

which methods and systems are suitable

– To be able to apply learning and implementing

learning methods with tools.


The General Situation

• There is a situation presented in which we have

something to do and we do not know what to do.

• This is due to the fact that the situation is relatively

complex and there are no precise rules given.

• However, there may be such rules or guidelines which we

unfortunate do not know.

• What we have instead are observations.

• In these observations a lot of information can be

contained and the task of learning is to make his

information explicit.

- 4 -


Observations

• Observation produce data in various forms:

– numerical

– symbolic

– images

– texts

– etc.

• In these data some information is contained that is not

directly visible.

• The task of machine learning is to make them visible.


The Learning Situation

• The data are :

– collected

– structured

– etc.

• These data contain some hidden (implicit) information

• The purpose of learning is to make this information

explicit for further use.

• This applies to many applications

– in business companies like commerce and e-commerce

– engineering

– in medicine and biology

– and others


Learning in Different Disciplines

• Education, psychology, cognitive science,

neurophysiology

• The human as pupil

• Processing of experiences

• Higher intellectual abilities

• More than simple “memorizing”

• Constructing adequate knowledge structures

• Relation to a learning goal

• Possible later application of learned knowledge


The View of this Lecture

• Practical orientation:

– Task oriented methods

• Understand advanced methods in order to apply them

properly

• Make use of everything available

– Tools

– Knowledge representation

– Statistics

– Cognition


Machine

Learning

Pattern

Recognition/

Vision

(Multi-)Agent

Learning,

games

Robotics

Neuro

informatics

Applied

Statistics

Control

Cognitive

Science, AI

KDD

Data mining

Biological

Learning

Learning theory

(PAC, RL)

Time series

analysis

.

Statistical

Language Learn.

Bio Informatics

Applications::

Medicine., …

Positioning ML


Model for a Learning Step

Learner initially

Environment

Teacher

special Informa- tion

Learner after Learning

Learning

Feed- back

Compare

changed

Control

Correct

criteria


A First Classification

Learning

Learning

with

teacher

Learning with

delayed feedback

Unsupervised

learning


Learning with TeacherLearning with Teacher

Memorizing

• Direct insertion of knowledge

• Programming

• Construction

• Storage

Learning by Instruction

Presentationofexamples

Evaluation

in Detail At the End

Correction

Role the teacher

or the environment


Learning without Teacher

• Unsupervised learning: There is no specifc task given

that can be achieved, there is no „right“ or „wrong“

• Two Forms:

– passive observation

• no feedback from the environment

– active experiments

• Independent actions with the environment to generate examples

– From such iobesrvations one may get some interesting insights.


Weakly Supervised Learning

• There is some function that tells you how good you are

(but this function is user defined and may be misleading!)

• Progress not by a fixed plan only, but also by some

random interruption, following the evolution: The fittest

will survive.


Learning by Example

Learning by Example

Source

Learner

Teacher

Environment

Types

positive negative

Presentation

Once Incrementally


Evaluation (1)

• The learning goal is given by the user: It says what is

expected.

• This gives rise to an evaluation.

• A necessary condition is that the success can be measured.

• That means: We need an experimental setting for the

measurement, e.g. measure:

– the winning of games

– the correctness of classifications

– the exactness of predictions

– the cost reduction in economical situations

– etc.


Evaluation (2)

• Usually one distinguishes to kinds of data:

– training data: Used in the learning process

– evaluation data: Used in the evaluation process

• We distinguish two kinds of evaluations:

– Shallow evaluation: Measurements concerning the learning goals

directly, e.g. the number of games won in a tournament

– Deep evaluation: investigates the learning process in more

details. The purpose is, in case that the shallow evaluation is not

satisfactory, how to improve the learning process. E.g., which

positions in a backgammon game should be avoided, which

moves lead directly to a loss.

– Deep evaluation needs much more insight into the problem than

shallow evaluation.


HINTS FOR STARTING AN

APPLICATION

- 18 -


Practical Project

• In order to get a feeling for the intended application areas

there will be practical work in the assignment as part of

the whole lecture.

• How to do it:

– Select an application, formulate it in clear way, explain the

difficulties.

– Select a learning method and a tool, motivate the choice

– Implement learning techniques.

– Describe and/or perform an evaluation.


Getting an Application

• First you select a possible application. Preconditions:

– One has to understand the domain very well.

– There have to be data available

• Determine the characteristics of the application as:

– Static, dynamic

– Supervised, unsupervised

– Clear and identifiable goal, improvement

– Incremental, one step data presentation

– Etc.

• Examples are given in the previous section.

University of

Calgary


Goal Analysis

• A goal has to be stated:

– What the functional and non-functional parts of the goal?

– E.g. which terminology dóes the user understand?

– Visualization?

– Real time ?

• The goal has to flexible in the sense that it can be

weakened in case it turns out to be too ambitious.

• At least in principle there should be a method to evaluate

success.

• Analyze the goal:

– What are the possible risks?

– What to do in case the risks realize?

University of

Calgary


General Steps (if applicable)

• Problem analysis: Define goal(s), name possible

algorithms and tools.

• Data collection: Find out and name data sources.

• Data visualization (e.g. from WEKA): Allows better

understanding of data distribution.

• Data preprocessing:

– Data cleaning:

• Incomplete data, missing values: Identify possibilities

• Noisy data (use e.g. cluster analysis or regression for outliers).

– Data intergration (identify inconsistencies and redundancies)

– Data reduction and compression: Used to reduce the data size

but still show the same analysis.

University of

Calgary


Candidates of Techniques

• Compare the characterisitcs of the application with the

strength of the different techniques and select a list of

possible candidates.

• Analyze this further and take into account with which

techniques you are familiar; restrict the number of

candidates to at most two.

• Select one or two possible tools and continue the analysis

on this level.

• Do the tools have possibilities for data preprocessing if

needed?

University of

Calgary


Data Properties

• Are there enough data?

• How large is the data set?

• Data quality: Are they

– Clean

– Noisy

– Corrupted

– Partially missing

• What are the data sources and how are the sources

connected with the quality? Are data

– From a standard repository?

– Randomly collected?

– Self produced by experiments?

University of

Calgary


Data and Tools

• How different are the data structures of

– Given data

– Required by the tool

– Required for the user after learning

• Think about the conversions needed:

– How difficult and time consuming is it?

– Is something of the semantics (meaning) lost?

• Can the tool deal with noisy, missing etc. data?

• Are there additional tools for representing the results

needed? (e.g. visualization)

University of

Calgary


Discussion

• The application of two learning methods is often very

useful.

• The first method is often some kind of unsupervised

learning:

– One has no idea what is going on and will study the result.

• This output then is not very useful for further application,

in particular for humans.

• Therefore some symbolic learning is used in order to

obtain a symbolic description which humans can use.


Reliability of Knowledge

Extension of knowledge

Darkness indicates

reliability

Obtained by direct retrieval

Obtained by logical deduction

Obtained by approximative

reasoning

Obtained by CBR

Obtained by

learning and data

mining

This assumes

that the underlying

data and information

bases are reliable


Reliability of Knowledge (2)• This schema is only a rough and general indication.

• The success in applications depend heavily on e.g.

– correctness, amount and typicality of data

– adequate choice of the specific method and precision with

which it is applied

– number of experiments carried out

– testing of the results

• Therefore the success depends on the investigated

effort.

• There is again the utility question: Costs of obtaining

knowledge versus gain of applying knowledge


Overview over Methods

– Concept Learning

– Decision Trees (ID3 C4.5 C5)

– Inductive Logic Programming

(ILP)

– Unsupervised Learning of

Concepts

– Experience Based Methods

– Learning Informal Concepts

– Bayesian Learning

– Support Vector Machines

– Data Mining

– Evolutionary Algorithms

– Reinforcement Learning

– Organizational Learning

– Neural Nets

– Clustering

– Data Preprocessing and

Visualization

– PAC-Learning

– Tools and Evaluation

These refer to the chapters of the course:


EXAMPLES

- 30 -


A Basic Task: Classification (1)

• There is a set U of objects and a set Cl = {C1,...,Cn} of

subsets of U called classes.

• A classifier is a function F: U Cl; i.e. a function that

assigns to each object in U some class.

• Learning problem: Learn the classifier from a set of

classified examples.

University of

Calgary


Classification (2)

• Examples:

– Image based classification

– Classification of molecules

– Classification of diseases and therapies

– Classification of texts

– Classification of customers

– Classification of sales products

– Possible methods: Neural nets, clustering, or Hidden Markov

models; neural networks, decision trees, concept learning,

support vector machines.

University of

Calgary


Example: Classify Music

•Voice

•Vibration•Music

Feature

•Pitch•Chorus

•Symphony•Harmonic

•Partial

•Male•Female

•String

•Instrument

•Wind •Keyboard

•Rhythm

•Other •Percussion


Data for Classification

• One needs many classified examples.

• There is no general to find data sources because this

depends very much on the objects that have to be

classified.

University of

Calgary


Medical Diagnosis (1)

• Goal:Learning from patient data.

• Difficulties:

– Incompleteness (missing parameter values)

– Incorrectness (systematic or random noise in the data)

– Sparseness (few and/or non-representable patient records

available)

– Inexactness (inappropriate selection of parameters)

• Possible Methods:

– Neural networks (backpropagation)

– Support vector machines

– Decision trees

University of

Calgary


Medical Diagnosis (2)

• Data collection:

• Medical hospitals usually have many patient records

where you need access to.

• In addition, there are many data records of patients in the

web.

• It is recommended to have a close connection to a

hospital.

University of

Calgary


Multi-Agents

• Learn to understand other agents behavior in order to

make predictions:

– A) Competitive agents (e.g. financial business, banks, other

companies)

– B) Learn tactics of an opponent in a game

– C) Cooperating agents (e.g. teams in a distributed environment,

software devlopment, organizing logistic companies).

• Possible methods:

– Genetic algorithms

– Reinforcement learning

University of

Calgary


Cryptography

• There are different kinds of examples, e.g.

– Public key creation

– Comparing keys and synchronization

– Fighting against attacks

- 38 -


Customer Relationship Management

– Classify customers according to sales behavior• Difficulties:

– Taste is personal and often difficult to define

– Customers answer queries often not honestly

– Predictions:

• To understand when and why a company’s customers are likely to leave.

• Learn how customers will react on

– changes or special offers

– rewards

University of

Calgary


Recommender Systems (1)

• Recommender systems are designed to permit near real-

time personalization before a customer makes a

purchase.

• Problem: The wealth of and quantity of information is so

overwhelming that it poses some difficulty for customers

to potentially find an item of relevant interest to them.

• A recommendation of items to a customer is presented to

them based on similarity of their purchases compared to

their previous purchases or purchases of other

customers.

University of

Calgary



• Examples:

– TV programs

– Vacations

– Events (sports, culture)

– Music

– Luxury goods

– Recommend topics to students what curricula to choose

• Learning Goal: Learn customer preferences from recorded

examples.

• Candidates for methods:

– Clustering algorithms

– Neural nets: Adaptive Resonance Theory (ART), Kohonen nets

University of

Calgary



• Data collection:

• You need access to

– Collection of items you want to recommend (usually easy to

download from the web, like vacations, cars, PC equipment etc)

– What people may like to buy or watch: There are statistical data,

also about recommendation systems.

– Results of your recommendation: Sometimes difficult. You can

either

• Compare your recommendations with is actually bought

• Perform your own experiments

University of

Calgary


Playing Games (1)

• To play games is an example of a dynamic goal.

• This goal means to learn a behavior that enables to play

better.

• Characteristics of non-trivial games:• A vast domain of parameter values.

• One does not have a thorough insight

• There is no obvious strategy to play

• One does not exactly know the effect of each parameter value when

playing

University of

Calgary


Playing Games (2)

• The goal is usually clear: You want to win as often as

possible. You can play against other players but also

– Against yourself

– Against ealier versions of your system. This is useful to show

progress.

• Data and Examples: Games and their outcomes

– Recorded examples from data collections

– Self produced by systematic experiments

University of

Calgary


Training and Test Data

• One can use previous moves in games and scores as

training data.

• Previous games can can also be used as a test data for

the program.

• Problem: How representative are the training data?

• Is there experience available?

• Evaluation is mostly easy: Count wins/losses

University of

Calgary


Web Search (1)

• Current search engines “keyword search” paradigm is not

sufficient enough to capture some of the fine grain

information that humans can understand.

• A common way to interface web application with

heterogeneous data sources on the web is through a

wrapper.

• Due to the virtually infinite number of data schema on the

web, custom fitting a wrapper for each data source on the

web is impractical.

University of

Calgary


Web Search (2)

• The aim is to be able to extract data from documents.

• One task is to analyze an HTML document by assigning

match score to each node and identify the repeating

pattern by clustering the match score using k-means

clustering algorithm.

• Learning goal: Cluster html documents by using any kind

of repeating data structures (e.g. including product

listings, readers’ comments, sports scoreboards and

forums).

• Possible methods and tools:

– Clustering, fuzzy clustering (WEKA)

– Kohonen nets

University of

Calgary


Web Search (3)

• Data collection is often not difficult because you can look

at data collection in the web.

• An example would be: Choose a (restricted) topic like

– Air travel

– Information about a period in history, etc.

– Find wiki pages that are useful for this purpose. Then you can

determine the progress and success yourself-

University of

Calgary


Software Engineering (1)

• TOPICS:

• Prediction and Estimation

• Property and Model Discovery

• Transformation

• Generation and Synthesis

• Reuse

• Requirement Acquisition

• Management of Development Knowledge

University of

Calgary


Software Engineering (2)

• Release Planning:

• This an early step in the development of software projects.

• In this step the different features of the project are arranged

in groups called releases. The features in the first release

are done first and so on.

• There are constraints to be observed, mainly:

– Technical contraints, e.g. certain features have to be done together;

resource capacities are restricted.

– Customer have preferences.

• Details can be provided; this is an important research in

Calgary.

University of

Calgary


Predictions

• Predictions have two aspects

– 1) Proper predicting the outcome of an event, decision or result

– 2) Making use of the prediction.

• For 1) most of the methods mentioned for classification

can be used.

• For 2) the problem is that the prediction may not be in a

form that can be used directly.

• Example:

• Suppose the result is a clustering. Then one does not

know for any new object to which cluster it belongs.

• It requires a second learning process to find out the

corresponding rules.

University of

Calgary


Controlling Dynamic Behavior

• If you are working with systems that perform something

like

– A mechanical or electronic system

– A network

– An operating system, etc.

then you want a certain perfortmance what determines

the goal.

• The data are recorded collections of previous

performances.

University of

Calgary


How to Read these Notes

• In general, there is no specific ordering for the sections.

• In some sections we refer to other sections, so one has to

make a certain detour.

• A general distinction between the sections is as follows:

– Basic learning sections: Here some general concepts, algorithms

and methods are presented, like concept learning, decision trees,

Pac-learning etc. Of course, these methods make only sense if

they are instantiated properly in applications.

– “High level” sections: Here one wants to apply the basic methods.

– Finally, we have auxiliary sections for practical purposes like

visualization, preprocessing or tools and evaluations.


References

A general reference is:

Tom Mitchell (1997). Machine Learning. McGraw-Hill.

Additional references will be given at the end of each chapter.


Acknowledgements

• Some of these notes were developed in collaboration

Ralph Bergmann and several slides have been taken

over from his notes, see http://www.bergmann.uni-trier.de

• Several slides have also been taken over from the course

of Sandra Zilles, see http://www.Sandra.Zilles.dfki.de

Michael M. Richter - University of Calgarypages.cpsc.ucalgary.ca/~mrichter/ML/ML...

Documents

Transcript of Michael M. Richter - University of Calgarypages.cpsc.ucalgary.ca/~mrichter/ML/ML...