Michael M. Richter - University of Calgarypages.cpsc.ucalgary.ca/~mrichter/ML/ML...
Transcript of Michael M. Richter - University of Calgarypages.cpsc.ucalgary.ca/~mrichter/ML/ML...
Michael M. RichterCalgary, Fall 2010- 1 -
Machine Learning
Michael M. Richter
Introduction 2010
Email: [email protected]
Michael M. RichterCalgary, Fall 2010
GENERAL
- 2 -
Michael M. RichterCalgary, Fall 2010- 3 -
The Goal of the Lecture
• Today learning is a hot topic with applications in many
disciplines
• There are many different methods to learn something
• Many different systems have been developed
• The main goal is
– To understand the major methods
– To get a feeling for which kinds of applications
which methods and systems are suitable
– To be able to apply learning and implementing
learning methods with tools.
Michael M. RichterCalgary, Fall 2010
The General Situation
• There is a situation presented in which we have
something to do and we do not know what to do.
• This is due to the fact that the situation is relatively
complex and there are no precise rules given.
• However, there may be such rules or guidelines which we
unfortunate do not know.
• What we have instead are observations.
• In these observations a lot of information can be
contained and the task of learning is to make his
information explicit.
- 4 -
Michael M. RichterCalgary, Fall 2010- 5 -
Observations
• Observation produce data in various forms:
– numerical
– symbolic
– images
– texts
– etc.
• In these data some information is contained that is not
directly visible.
• The task of machine learning is to make them visible.
Michael M. RichterCalgary, Fall 2010- 6 -
The Learning Situation
• The data are :
– collected
– structured
– etc.
• These data contain some hidden (implicit) information
• The purpose of learning is to make this information
explicit for further use.
• This applies to many applications
– in business companies like commerce and e-commerce
– engineering
– in medicine and biology
– and others
Michael M. RichterCalgary, Fall 2010- 7 -
Learning in Different Disciplines
• Education, psychology, cognitive science,
neurophysiology
• The human as pupil
• Processing of experiences
• Higher intellectual abilities
• More than simple “memorizing”
• Constructing adequate knowledge structures
• Relation to a learning goal
• Possible later application of learned knowledge
Michael M. RichterCalgary, Fall 2010- 8 -
The View of this Lecture
• Practical orientation:
– Task oriented methods
• Understand advanced methods in order to apply them
properly
• Make use of everything available
– Tools
– Knowledge representation
– Statistics
– Cognition
Michael M. RichterCalgary, Fall 2010- 9 -
Machine
Learning
Pattern
Recognition/
Vision
(Multi-)Agent
Learning,
games
Robotics
Neuro
informatics
Applied
Statistics
Control
Cognitive
Science, AI
KDD
Data mining
Biological
Learning
Learning theory
(PAC, RL)
Time series
analysis
.
Statistical
Language Learn.
Bio Informatics
Applications::
Medicine., …
Positioning ML
Michael M. RichterCalgary, Fall 2010- 10 -
Model for a Learning Step
Learner initially
Environment
Teacher
special Informa- tion
Learner after Learning
Learning
Feed- back
Compare
changed
Control
Correct
criteria
Michael M. RichterCalgary, Fall 2010- 11 -
A First Classification
Learning
Learning
with
teacher
Learning with
delayed feedback
Unsupervised
learning
Michael M. RichterCalgary, Fall 2010- 12 -
Learning with TeacherLearning with Teacher
Memorizing
• Direct insertion of knowledge
• Programming
• Construction
• Storage
Learning by Instruction
Presentationofexamples
Evaluation
in Detail At the End
Correction
Role the teacher
or the environment
Michael M. RichterCalgary, Fall 2010- 13 -
Learning without Teacher
• Unsupervised learning: There is no specifc task given
that can be achieved, there is no „right“ or „wrong“
• Two Forms:
– passive observation
• no feedback from the environment
– active experiments
• Independent actions with the environment to generate examples
– From such iobesrvations one may get some interesting insights.
Michael M. RichterCalgary, Fall 2010- 14 -
Weakly Supervised Learning
• There is some function that tells you how good you are
(but this function is user defined and may be misleading!)
• Progress not by a fixed plan only, but also by some
random interruption, following the evolution: The fittest
will survive.
Michael M. RichterCalgary, Fall 2010- 15 -
Learning by Example
Learning by Example
Source
Learner
Teacher
Environment
Types
positive negative
Presentation
Once Incrementally
Michael M. RichterCalgary, Fall 2010- 16 -
Evaluation (1)
• The learning goal is given by the user: It says what is
expected.
• This gives rise to an evaluation.
• A necessary condition is that the success can be measured.
• That means: We need an experimental setting for the
measurement, e.g. measure:
– the winning of games
– the correctness of classifications
– the exactness of predictions
– the cost reduction in economical situations
– etc.
Michael M. RichterCalgary, Fall 2010- 17 -
Evaluation (2)
• Usually one distinguishes to kinds of data:
– training data: Used in the learning process
– evaluation data: Used in the evaluation process
• We distinguish two kinds of evaluations:
– Shallow evaluation: Measurements concerning the learning goals
directly, e.g. the number of games won in a tournament
– Deep evaluation: investigates the learning process in more
details. The purpose is, in case that the shallow evaluation is not
satisfactory, how to improve the learning process. E.g., which
positions in a backgammon game should be avoided, which
moves lead directly to a loss.
– Deep evaluation needs much more insight into the problem than
shallow evaluation.
Michael M. RichterCalgary, Fall 2010
HINTS FOR STARTING AN
APPLICATION
- 18 -
Michael M. RichterCalgary, Fall 2010- 19 -
Practical Project
• In order to get a feeling for the intended application areas
there will be practical work in the assignment as part of
the whole lecture.
• How to do it:
– Select an application, formulate it in clear way, explain the
difficulties.
– Select a learning method and a tool, motivate the choice
– Implement learning techniques.
– Describe and/or perform an evaluation.
Michael M. RichterCalgary, Fall 2010
Getting an Application
• First you select a possible application. Preconditions:
– One has to understand the domain very well.
– There have to be data available
• Determine the characteristics of the application as:
– Static, dynamic
– Supervised, unsupervised
– Clear and identifiable goal, improvement
– Incremental, one step data presentation
– Etc.
• Examples are given in the previous section.
University of
Calgary
Michael M. RichterCalgary, Fall 2010
Goal Analysis
• A goal has to be stated:
– What the functional and non-functional parts of the goal?
– E.g. which terminology dóes the user understand?
– Visualization?
– Real time ?
• The goal has to flexible in the sense that it can be
weakened in case it turns out to be too ambitious.
• At least in principle there should be a method to evaluate
success.
• Analyze the goal:
– What are the possible risks?
– What to do in case the risks realize?
University of
Calgary
Michael M. RichterCalgary, Fall 2010
General Steps (if applicable)
• Problem analysis: Define goal(s), name possible
algorithms and tools.
• Data collection: Find out and name data sources.
• Data visualization (e.g. from WEKA): Allows better
understanding of data distribution.
• Data preprocessing:
– Data cleaning:
• Incomplete data, missing values: Identify possibilities
• Noisy data (use e.g. cluster analysis or regression for outliers).
– Data intergration (identify inconsistencies and redundancies)
– Data reduction and compression: Used to reduce the data size
but still show the same analysis.
University of
Calgary
Michael M. RichterCalgary, Fall 2010
Candidates of Techniques
• Compare the characterisitcs of the application with the
strength of the different techniques and select a list of
possible candidates.
• Analyze this further and take into account with which
techniques you are familiar; restrict the number of
candidates to at most two.
• Select one or two possible tools and continue the analysis
on this level.
• Do the tools have possibilities for data preprocessing if
needed?
University of
Calgary
Michael M. RichterCalgary, Fall 2010
Data Properties
• Are there enough data?
• How large is the data set?
• Data quality: Are they
– Clean
– Noisy
– Corrupted
– Partially missing
• What are the data sources and how are the sources
connected with the quality? Are data
– From a standard repository?
– Randomly collected?
– Self produced by experiments?
University of
Calgary
Michael M. RichterCalgary, Fall 2010
Data and Tools
• How different are the data structures of
– Given data
– Required by the tool
– Required for the user after learning
• Think about the conversions needed:
– How difficult and time consuming is it?
– Is something of the semantics (meaning) lost?
• Can the tool deal with noisy, missing etc. data?
• Are there additional tools for representing the results
needed? (e.g. visualization)
University of
Calgary
Michael M. RichterCalgary, Fall 2010- 26 -
Discussion
• The application of two learning methods is often very
useful.
• The first method is often some kind of unsupervised
learning:
– One has no idea what is going on and will study the result.
• This output then is not very useful for further application,
in particular for humans.
• Therefore some symbolic learning is used in order to
obtain a symbolic description which humans can use.
Michael M. RichterCalgary, Fall 2010- 27 -
Reliability of Knowledge
Extension of knowledge
Darkness indicates
reliability
Obtained by direct retrieval
Obtained by logical deduction
Obtained by approximative
reasoning
Obtained by CBR
Obtained by
learning and data
mining
This assumes
that the underlying
data and information
bases are reliable
Michael M. RichterCalgary, Fall 2010- 28 -
Reliability of Knowledge (2)• This schema is only a rough and general indication.
• The success in applications depend heavily on e.g.
– correctness, amount and typicality of data
– adequate choice of the specific method and precision with
which it is applied
– number of experiments carried out
– testing of the results
• Therefore the success depends on the investigated
effort.
• There is again the utility question: Costs of obtaining
knowledge versus gain of applying knowledge
Michael M. RichterCalgary, Fall 2010- 29 -
Overview over Methods
– Concept Learning
– Decision Trees (ID3 C4.5 C5)
– Inductive Logic Programming
(ILP)
– Unsupervised Learning of
Concepts
– Experience Based Methods
– Learning Informal Concepts
– Bayesian Learning
– Support Vector Machines
– Data Mining
– Evolutionary Algorithms
– Reinforcement Learning
– Organizational Learning
– Neural Nets
– Clustering
– Data Preprocessing and
Visualization
– PAC-Learning
– Tools and Evaluation
These refer to the chapters of the course:
Michael M. RichterCalgary, Fall 2010
EXAMPLES
- 30 -
Michael M. RichterCalgary, Fall 2010
A Basic Task: Classification (1)
• There is a set U of objects and a set Cl = {C1,...,Cn} of
subsets of U called classes.
• A classifier is a function F: U Cl; i.e. a function that
assigns to each object in U some class.
• Learning problem: Learn the classifier from a set of
classified examples.
University of
Calgary
Michael M. RichterCalgary, Fall 2010
Classification (2)
• Examples:
– Image based classification
– Classification of molecules
– Classification of diseases and therapies
– Classification of texts
– Classification of customers
– Classification of sales products
– Possible methods: Neural nets, clustering, or Hidden Markov
models; neural networks, decision trees, concept learning,
support vector machines.
University of
Calgary
Michael M. RichterCalgary, Fall 2010
Example: Classify Music
•Voice
•Vibration•Music
Feature
•Pitch•Chorus
•Symphony•Harmonic
•Partial
•Male•Female
•String
•Instrument
•Wind •Keyboard
•Rhythm
•Other •Percussion
Michael M. RichterCalgary, Fall 2010
Data for Classification
• One needs many classified examples.
• There is no general to find data sources because this
depends very much on the objects that have to be
classified.
University of
Calgary
Michael M. RichterCalgary, Fall 2010
Medical Diagnosis (1)
• Goal:Learning from patient data.
• Difficulties:
– Incompleteness (missing parameter values)
– Incorrectness (systematic or random noise in the data)
– Sparseness (few and/or non-representable patient records
available)
– Inexactness (inappropriate selection of parameters)
• Possible Methods:
– Neural networks (backpropagation)
– Support vector machines
– Decision trees
University of
Calgary
Michael M. RichterCalgary, Fall 2010
Medical Diagnosis (2)
• Data collection:
• Medical hospitals usually have many patient records
where you need access to.
• In addition, there are many data records of patients in the
web.
• It is recommended to have a close connection to a
hospital.
University of
Calgary
Michael M. RichterCalgary, Fall 2010
Multi-Agents
• Learn to understand other agents behavior in order to
make predictions:
– A) Competitive agents (e.g. financial business, banks, other
companies)
– B) Learn tactics of an opponent in a game
– C) Cooperating agents (e.g. teams in a distributed environment,
software devlopment, organizing logistic companies).
• Possible methods:
– Genetic algorithms
– Reinforcement learning
University of
Calgary
Michael M. RichterCalgary, Fall 2010
Cryptography
• There are different kinds of examples, e.g.
– Public key creation
– Comparing keys and synchronization
– Fighting against attacks
- 38 -
Michael M. RichterCalgary, Fall 2010
Customer Relationship Management
– Classify customers according to sales behavior• Difficulties:
– Taste is personal and often difficult to define
– Customers answer queries often not honestly
– Predictions:
• To understand when and why a company’s customers are likely to leave.
• Learn how customers will react on
– changes or special offers
– rewards
University of
Calgary
Michael M. RichterCalgary, Fall 2010
Recommender Systems (1)
• Recommender systems are designed to permit near real-
time personalization before a customer makes a
purchase.
• Problem: The wealth of and quantity of information is so
overwhelming that it poses some difficulty for customers
to potentially find an item of relevant interest to them.
• A recommendation of items to a customer is presented to
them based on similarity of their purchases compared to
their previous purchases or purchases of other
customers.
University of
Calgary
Michael M. RichterCalgary, Fall 2010
Recommender Systems (2)
• Examples:
– TV programs
– Vacations
– Events (sports, culture)
– Music
– Luxury goods
– Recommend topics to students what curricula to choose
• Learning Goal: Learn customer preferences from recorded
examples.
• Candidates for methods:
– Clustering algorithms
– Neural nets: Adaptive Resonance Theory (ART), Kohonen nets
University of
Calgary
Michael M. RichterCalgary, Fall 2010
Recommender Systems (3)
• Data collection:
• You need access to
– Collection of items you want to recommend (usually easy to
download from the web, like vacations, cars, PC equipment etc)
– What people may like to buy or watch: There are statistical data,
also about recommendation systems.
– Results of your recommendation: Sometimes difficult. You can
either
• Compare your recommendations with is actually bought
• Perform your own experiments
University of
Calgary
Michael M. RichterCalgary, Fall 2010
Playing Games (1)
• To play games is an example of a dynamic goal.
• This goal means to learn a behavior that enables to play
better.
• Characteristics of non-trivial games:• A vast domain of parameter values.
• One does not have a thorough insight
• There is no obvious strategy to play
• One does not exactly know the effect of each parameter value when
playing
University of
Calgary
Michael M. RichterCalgary, Fall 2010
Playing Games (2)
• The goal is usually clear: You want to win as often as
possible. You can play against other players but also
– Against yourself
– Against ealier versions of your system. This is useful to show
progress.
• Data and Examples: Games and their outcomes
– Recorded examples from data collections
– Self produced by systematic experiments
University of
Calgary
Michael M. RichterCalgary, Fall 2010
Training and Test Data
• One can use previous moves in games and scores as
training data.
• Previous games can can also be used as a test data for
the program.
• Problem: How representative are the training data?
• Is there experience available?
• Evaluation is mostly easy: Count wins/losses
University of
Calgary
Michael M. RichterCalgary, Fall 2010
Web Search (1)
• Current search engines “keyword search” paradigm is not
sufficient enough to capture some of the fine grain
information that humans can understand.
• A common way to interface web application with
heterogeneous data sources on the web is through a
wrapper.
• Due to the virtually infinite number of data schema on the
web, custom fitting a wrapper for each data source on the
web is impractical.
University of
Calgary
Michael M. RichterCalgary, Fall 2010
Web Search (2)
• The aim is to be able to extract data from documents.
• One task is to analyze an HTML document by assigning
match score to each node and identify the repeating
pattern by clustering the match score using k-means
clustering algorithm.
• Learning goal: Cluster html documents by using any kind
of repeating data structures (e.g. including product
listings, readers’ comments, sports scoreboards and
forums).
• Possible methods and tools:
– Clustering, fuzzy clustering (WEKA)
– Kohonen nets
University of
Calgary
Michael M. RichterCalgary, Fall 2010
Web Search (3)
• Data collection is often not difficult because you can look
at data collection in the web.
• An example would be: Choose a (restricted) topic like
– Air travel
– Information about a period in history, etc.
– Find wiki pages that are useful for this purpose. Then you can
determine the progress and success yourself-
University of
Calgary
Michael M. RichterCalgary, Fall 2010
Software Engineering (1)
• TOPICS:
• Prediction and Estimation
• Property and Model Discovery
• Transformation
• Generation and Synthesis
• Reuse
• Requirement Acquisition
• Management of Development Knowledge
University of
Calgary
Michael M. RichterCalgary, Fall 2010
Software Engineering (2)
• Release Planning:
• This an early step in the development of software projects.
• In this step the different features of the project are arranged
in groups called releases. The features in the first release
are done first and so on.
• There are constraints to be observed, mainly:
– Technical contraints, e.g. certain features have to be done together;
resource capacities are restricted.
– Customer have preferences.
• Details can be provided; this is an important research in
Calgary.
University of
Calgary
Michael M. RichterCalgary, Fall 2010
Predictions
• Predictions have two aspects
– 1) Proper predicting the outcome of an event, decision or result
– 2) Making use of the prediction.
• For 1) most of the methods mentioned for classification
can be used.
• For 2) the problem is that the prediction may not be in a
form that can be used directly.
• Example:
• Suppose the result is a clustering. Then one does not
know for any new object to which cluster it belongs.
• It requires a second learning process to find out the
corresponding rules.
University of
Calgary
Michael M. RichterCalgary, Fall 2010
Controlling Dynamic Behavior
• If you are working with systems that perform something
like
– A mechanical or electronic system
– A network
– An operating system, etc.
then you want a certain perfortmance what determines
the goal.
• The data are recorded collections of previous
performances.
University of
Calgary
Michael M. RichterCalgary, Fall 2010- 53 -
How to Read these Notes
• In general, there is no specific ordering for the sections.
• In some sections we refer to other sections, so one has to
make a certain detour.
• A general distinction between the sections is as follows:
– Basic learning sections: Here some general concepts, algorithms
and methods are presented, like concept learning, decision trees,
Pac-learning etc. Of course, these methods make only sense if
they are instantiated properly in applications.
– “High level” sections: Here one wants to apply the basic methods.
– Finally, we have auxiliary sections for practical purposes like
visualization, preprocessing or tools and evaluations.
Michael M. RichterCalgary, Fall 2010- 54 -
References
A general reference is:
Tom Mitchell (1997). Machine Learning. McGraw-Hill.
Additional references will be given at the end of each chapter.
Michael M. RichterCalgary, Fall 2010- 55 -
Acknowledgements
• Some of these notes were developed in collaboration
Ralph Bergmann and several slides have been taken
over from his notes, see http://www.bergmann.uni-trier.de
• Several slides have also been taken over from the course
of Sandra Zilles, see http://www.Sandra.Zilles.dfki.de