Introduction to machine learning

52
Introduction to Machine Learning

description

 

Transcript of Introduction to machine learning

Page 1: Introduction to machine learning

Introduction to Machine Learning

Page 2: Introduction to machine learning

Reminders

• HW6 posted

• All HW solutions posted

Page 3: Introduction to machine learning

A Review

• You will be asked to learn the parameters of a probabilistic model of C given A,B, using some data

• The parameters model entries of a conditional probability distribution as coin flips

A,B A,B A,B A,B

P(C|AB) 1 2 3 4

Page 4: Introduction to machine learning

A Review

• Example data

• ML estimates

A,B A,B A,B A,B

P(C|AB) 1=3/8 2=1/11 3=4/11 4=6/11

A B C # obs

1 1 1 3

1 1 0 5

1 0 1 1

1 0 0 10

0 1 1 4

0 1 0 7

0 0 1 6

0 0 0 5

Page 5: Introduction to machine learning

Maximum Likelihood for BN

• For any BN, the ML parameters of any CPT can be derived by the fraction of observed values in the data

Alarm

Earthquake Burglar

E 500 B: 200

N=1000

P(E) = 0.5 P(B) = 0.2

A|E,B: 19/20A|B: 188/200A|E: 170/500A| : 1/380

E B P(A|E,B)

T T 0.95

F T 0.95

T F 0.34

F F 0.003

Page 6: Introduction to machine learning

Maximum A Posteriori

• Example data

• MAP Estimatesassuming Betaprior with a=b=3

• virtual counts 2H,2T

A,B A,B A,B A,B

P(C|AB) 1=5/14 2=3/15 3=6/15 4=8/15

A B C # obs

1 1 1 3

1 1 0 5

1 0 1 1

1 0 0 10

0 1 1 4

0 1 0 7

0 0 1 6

0 0 0 5

Page 7: Introduction to machine learning

Discussion

• Where are we now?

• How does it relate to the future of AI?

• How will you be tested on it?

Page 8: Introduction to machine learning

Motivation

• Agent has made observations (data)

• Now must make sense of it (hypotheses)– Hypotheses alone may be important (e.g., in

basic science)– For inference (e.g., forecasting)– To take sensible actions (decision making)

• A basic component of economics, social and hard sciences, engineering, …

Page 9: Introduction to machine learning

Topics in Machine Learning

ApplicationsDocument retrievalDocument classificationData miningComputer visionScientific discoveryRobotics…

Tasks & settingsClassificationRankingClusteringRegressionDecision-making

SupervisedUnsupervisedSemi-supervisedActiveReinforcement learning

TechniquesBayesian learningDecision treesNeural networksSupport vector machinesBoostingCase-based reasoning Dimensionality reduction…

Page 10: Introduction to machine learning

What is Learning?

Mostly generalization from experience:“Our experience of the world is specific, yet we are able to formulate general theories that account for the past and predict the future”M.R. Genesereth and N.J. Nilsson, in Logical Foundations of AI, 1987

Concepts, heuristics, policies Supervised vs. un-supervised learning

Page 11: Introduction to machine learning

Inductive Learning

• Basic form: learn a function from examples• f is the unknown target function• An example is a pair (x, f(x))• Problem: find a hypothesis h

– such that h ≈ f– given a training set of examples D

• Instance of supervised learning– Classification task: f {0,1,…,C}– Regression task: f reals

Page 12: Introduction to machine learning

Inductive learning method

• Construct/adjust h to agree with f on training set• (h is consistent if it agrees with f on all examples)• E.g., curve fitting:

••

Page 13: Introduction to machine learning

Inductive learning method

• Construct/adjust h to agree with f on training set• (h is consistent if it agrees with f on all examples)• E.g., curve fitting:

••

Page 14: Introduction to machine learning

Inductive learning method

• Construct/adjust h to agree with f on training set• (h is consistent if it agrees with f on all examples)• E.g., curve fitting:

••

Page 15: Introduction to machine learning

Inductive learning method

• Construct/adjust h to agree with f on training set• (h is consistent if it agrees with f on all examples)• E.g., curve fitting:

••

Page 16: Introduction to machine learning

Inductive learning method

• Construct/adjust h to agree with f on training set• (h is consistent if it agrees with f on all examples)• E.g., curve fitting:

Page 17: Introduction to machine learning

Inductive learning method

• Construct/adjust h to agree with f on training set• (h is consistent if it agrees with f on all examples)• E.g., curve fitting:

••

h=D is a trivial, but perhaps uninteresting solution (caching)

Page 18: Introduction to machine learning

Classification Task

The concept f(x) takes on values True and False A example is positive if f is True, else it is

negative The set X of all examples is the example set The training set is a subset of X

a small one!

Page 19: Introduction to machine learning

Logic-Based Inductive Learning

• Here, examples (x, f(x)) take on discrete values

Page 20: Introduction to machine learning

Logic-Based Inductive Learning

• Here, examples (x,f(x)) take discrete values

Concept

Note that the training set does not say whether an observable predicate is pertinent or not

Page 21: Introduction to machine learning

Rewarded Card Example

Deck of cards, with each card designated by [r,s], its rank and suit, and some cards “rewarded”

Background knowledge KB: ((r=1) v … v (r=10)) NUM(r)((r=J) v (r=Q) v (r=K)) FACE(r)((s=S) v (s=C)) BLACK(s)((s=D) v (s=H)) RED(s)

Training set D:REWARD([4,C]) REWARD([7,C]) REWARD([2,S]) REWARD([5,H]) REWARD([J,S])

Page 22: Introduction to machine learning

Rewarded Card Example

Deck of cards, with each card designated by [r,s], its rank and suit, and some cards “rewarded”

Background knowledge KB: ((r=1) v … v (r=10)) NUM(r)((r=J) v (r=Q) v (r=K)) FACE(r)((s=S) v (s=C)) BLACK(s)((s=D) v (s=H)) RED(s)

Training set D:REWARD([4,C]) REWARD([7,C]) REWARD([2,S]) REWARD([5,H]) REWARD([J,S])

Possible inductive hypothesis:h (NUM(r) BLACK(s) REWARD([r,s]))

There are several possible inductive hypotheses

Page 23: Introduction to machine learning

Learning a Logical Predicate(Concept Classifier)

Set E of objects (e.g., cards) Goal predicate CONCEPT(x), where x is an object in E,

that takes the value True or False (e.g., REWARD)

Example: CONCEPT describes the precondition of an action, e.g.,

Unstack(C,A) • E is the set of states• CONCEPT(x)

HANDEMPTYx, BLOCK(C) x, BLOCK(A) x,

CLEAR(C) x, ON(C,A) xLearning CONCEPT is a step toward learning an action description

Page 24: Introduction to machine learning

Learning a Logical Predicate(Concept Classifier)

Set E of objects (e.g., cards) Goal predicate CONCEPT(x), where x is an object in E, that takes the value True or False (e.g.,

REWARD) Observable predicates A(x), B(X), … (e.g., NUM, RED) Training set: values of CONCEPT for some combinations of values of the observable predicates

Page 25: Introduction to machine learning

Learning a Logical Predicate(Concept Classifier)

Set E of objects (e.g., cards) Goal predicate CONCEPT(x), where x is an object in E, that takes the value True or False (e.g., REWARD) Observable predicates A(x), B(X), … (e.g., NUM, RED) Training set: values of CONCEPT for some combinations of values of the observable predicates

Find a representation of CONCEPT in the form: CONCEPT(x) S(A,B, …)where S(A,B,…) is a sentence built with the observable predicates, e.g.: CONCEPT(x) A(x) (B(x) v C(x))

Page 26: Introduction to machine learning

An hypothesis is any sentence of the form: CONCEPT(x) S(A,B, …)where S(A,B,…) is a sentence built using the observable predicates

The set of all hypotheses is called the hypothesis space H

An hypothesis h agrees with an example if it gives the correct value of CONCEPT

Hypothesis Space

Page 27: Introduction to machine learning

+

++

+

+

+

+

++

+

+

+ -

-

-

-

-

-

-

- --

-

-

Example set X{[A, B, …, CONCEPT]}

Inductive Learning Scheme

Hypothesis space H{[CONCEPT(x) S(A,B, …)]}

Training set DInductive

hypothesis h

Page 28: Introduction to machine learning

Size of Hypothesis Space

n observable predicates 2n entries in truth table defining

CONCEPT and each entry can be filled with True or False

In the absence of any restriction (bias), there are hypotheses to choose from

n = 6 2x1019 hypotheses! 22n

Page 29: Introduction to machine learning

h1 NUM(r) BLACK(s) REWARD([r,s])

h2 BLACK(s) (r=J) REWARD([r,s])

h3 ([r,s]=[4,C]) ([r,s]=[7,C]) [r,s]=[2,S]) REWARD([r,s])

h4 ([r,s]=[5,H]) ([r,s]=[J,S]) REWARD([r,s])

agree with all the examples in the training set

Multiple Inductive Hypotheses Deck of cards, with each card designated by [r,s], its

rank and suit, and some cards “rewarded” Background knowledge KB:

((r=1) v …v (r=10)) NUM(r)((r=J ) v (r=Q) v (r=K)) FACE(r)((s=S) v (s=C)) BLACK(s)((s=D) v (s=H)) RED(s)

Training set D:REWARD([4,C]) REWARD([7,C]) REWARD([2,S])

REWARD([5,H]) REWARD([J ,S])

Page 30: Introduction to machine learning

h1 NUM(r) BLACK(s) REWARD([r,s])

h2 BLACK(s) (r=J) REWARD([r,s])

h3 ([r,s]=[4,C]) ([r,s]=[7,C]) [r,s]=[2,S]) REWARD([r,s])

h4 ([r,s]=[5,H]) ([r,s]=[J,S]) REWARD([r,s])

agree with all the examples in the training set

Multiple Inductive Hypotheses Deck of cards, with each card designated by [r,s], its

rank and suit, and some cards “rewarded” Background knowledge KB:

((r=1) v …v (r=10)) NUM(r)((r=J ) v (r=Q) v (r=K)) FACE(r)((s=S) v (s=C)) BLACK(s)((s=D) v (s=H)) RED(s)

Training set D:REWARD([4,C]) REWARD([7,C]) REWARD([2,S])

REWARD([5,H]) REWARD([J ,S])

Need for a system of preferences – called an inductive bias – to compare possible hypotheses

Page 31: Introduction to machine learning

Notion of Capacity It refers to the ability of a machine to learn any training

set without error A machine with too much capacity is like a botanist with

photographic memory who, when presented with a new tree, concludes that it is not a tree because it has a different number of leaves from anything he has seen before

A machine with too little capacity is like the botanist’s lazy brother, who declares that if it’s green, it’s a tree

Good generalization can only be achieved when the right balance is struck between the accuracy attained on the training set and the capacity of the machine

Page 32: Introduction to machine learning

Keep-It-Simple (KIS) Bias

Examples• Use much fewer observable predicates than the training

set• Constrain the learnt predicate, e.g., to use only “high-

level” observable predicates such as NUM, FACE, BLACK, and RED and/or to have simple syntax

Motivation• If an hypothesis is too complex it is not worth learning it

(data caching does the job as well)• There are much fewer simple hypotheses than complex

ones, hence the hypothesis space is smaller

Page 33: Introduction to machine learning

Keep-It-Simple (KIS) Bias

Examples• Use much fewer observable predicates than the training

set• Constrain the learnt predicate, e.g., to use only “high-

level” observable predicates such as NUM, FACE, BLACK, and RED and/or to have simple syntax

Motivation• If an hypothesis is too complex it is not worth learning it

(data caching does the job as well)• There are much fewer simple hypotheses than complex

ones, hence the hypothesis space is smaller

Einstein: “A theory must be as simple as possible, but not simpler than this”

Page 34: Introduction to machine learning

Keep-It-Simple (KIS) Bias

Examples• Use much fewer observable predicates than the training

set• Constrain the learnt predicate, e.g., to use only “high-

level” observable predicates such as NUM, FACE, BLACK, and RED and/or to have simple syntax

Motivation• If an hypothesis is too complex it is not worth learning it

(data caching does the job as well)• There are much fewer simple hypotheses than complex

ones, hence the hypothesis space is smaller

If the bias allows only sentences S that areconjunctions of k << n predicates picked fromthe n observable predicates, then the size of H is O(nk)

Page 35: Introduction to machine learning

Relation to Bayesian Learning

• Recall MAP learningMAP = argmax P(D|) P()

– P(D|) is the likelihood ~ accuracy on training set

– P() is the prior ~ inductive bias

MAP = argmin -log P(D|) - log P()

Measures accuracy on training set

Minimum length encoding of

MAP learning is a form of KIS bias

Page 36: Introduction to machine learning

Supervised Learning Flow Chart

Training set

Targetconcept

Datapoints

InductiveHypothesis

Prediction

Learner

Hypothesisspace

Choice of learning algorithm

Unknown concept we want to approximate

Observations we have seen

Test set

Observations we will see in the future

Better quantities to assess performance

Page 37: Introduction to machine learning

Capacity is Not the Only Criterion

• Accuracy on training set isn’t the best measure of performance

+

++

+

+

+

+

++

+

+

+ -

-

-

-

-

-

-

- --

-

-Learn

Test

Example set X Hypothesis space H

Training set D

Page 38: Introduction to machine learning

Generalization Error

• A hypothesis h is said to generalize well if it achieves low error on all examples in X

+

++

+

+

+

+

++

+

+

+ -

-

-

-

-

-

-

- --

-

-

Learn

Test

Example set X Hypothesis space H

Page 39: Introduction to machine learning

Assessing Performance of a Learning Algorithm

• Samples from X are typically unavailable

• Take out some of the training set– Train on the remaining training set– Test on the excluded instances– Cross-validation

Page 40: Introduction to machine learning

Cross-Validation

• Split original set of examples, train

+

+

+

+

++

+

-

-

-

--

-

+

+

+

+

+

-

-

-

-

-

-Hypothesis space H

Train

Examples D

Page 41: Introduction to machine learning

Cross-Validation

• Evaluate hypothesis on testing set

+

+

+

+

++

+

-

-

-

--

-

Hypothesis space H

Testing set

Page 42: Introduction to machine learning

Cross-Validation

• Evaluate hypothesis on testing set

Hypothesis space H

Testing set

++

++

+

--

-

-

-

-

++

Test

Page 43: Introduction to machine learning

Cross-Validation

• Compare true concept against prediction

+

+

+

+

++

+

-

-

-

--

-

Hypothesis space H

Testing set

++

++

+

--

-

-

-

-

++

9/13 correct

Page 44: Introduction to machine learning

Common Splitting Strategies

• k-fold cross-validation

• Leave-one-out

Train Test

Dataset

Page 45: Introduction to machine learning

Cardinal sins of machine learning

• Thou shalt not:– Train on examples in the testing set– Form assumptions by “peeking” at the testing

set, then formulating inductive bias

Page 46: Introduction to machine learning

Tennis Example

• Evaluate learning algorithmPlayTennis = S(Temperature,Wind)

Page 47: Introduction to machine learning

Tennis Example

• Evaluate learning algorithmPlayTennis = S(Temperature,Wind)

Trained hypothesis

PlayTennis =(T=Mild or Cool) (W=Weak)

Training errors = 3/10

Testing errors = 4/4

Page 48: Introduction to machine learning

Tennis Example

• Evaluate learning algorithmPlayTennis = S(Temperature,Wind)

Trained hypothesis

PlayTennis = (T=Mild or Cool)

Training errors = 3/10

Testing errors = 1/4

Page 49: Introduction to machine learning

Tennis Example

• Evaluate learning algorithmPlayTennis = S(Temperature,Wind)

Trained hypothesis

PlayTennis = (T=Mild or Cool)

Training errors = 3/10

Testing errors = 2/4

Page 50: Introduction to machine learning

Tennis Example

• Evaluate learning algorithmPlayTennis = S(Temperature,Wind)

Sum of all testing errors: 7/12

Page 51: Introduction to machine learning

How to construct a better learner?

• Ideas?

Page 52: Introduction to machine learning

Next Time

• Decision tree learning