Machine Learning Part I: Classification and Bayesian Learning Ref: E. Alpaydin, Intro to Machine...

21
Machine Learning Part I: Classification and Bayesian Learning Ref: E. Alpaydin, Intro to Machine Learning, MIT
  • date post

    22-Dec-2015
  • Category

    Documents

  • view

    229
  • download

    2

Transcript of Machine Learning Part I: Classification and Bayesian Learning Ref: E. Alpaydin, Intro to Machine...

Page 1: Machine Learning Part I: Classification and Bayesian Learning Ref: E. Alpaydin, Intro to Machine Learning, MIT 2004.

Machine Learning

Part I: Classification and Bayesian Learning

Ref: E. Alpaydin, Intro to Machine Learning, MIT 2004

Page 2: Machine Learning Part I: Classification and Bayesian Learning Ref: E. Alpaydin, Intro to Machine Learning, MIT 2004.

Machine Learning• Machine Leaning is programming computers to optimize a

perf criteria using example data or past experience– Inference from samples

• There is a process that explains the data we observe. But we don’t know the details about how the data are generated.– Internet requests, failure events, etc

• It’s hard to identify (model) the process completely, we could construct a good and useful approximation that detect certain patterns. Such patterns would help us to understand the process and make predictions about the future.

Page 3: Machine Learning Part I: Classification and Bayesian Learning Ref: E. Alpaydin, Intro to Machine Learning, MIT 2004.

Types of Machine Learning• Supervised learning is to create a function from training data. The

training data consist of pairs of input objects (typically vectors), and desired outputs. – Classification: Given an input, the output is Boolean (yes/no) to predict

a class label of the input object; – Regression: If the label is a numerical value, learn the function f(x)

that best explain the input instance;• Unsupervised learning: manual labels of inputs are not used.

– Clustering: partition a data set into subsets (clusters), so that the data in each subset share some common trait

• Semi-supervised learning: make use of both labeled and unlabeled data for training

• Reinforcement Learning– Learning a policy: A sequence of outputs; No supervised output but

delayed reward– Examples: game playing, robot navigation

Page 4: Machine Learning Part I: Classification and Bayesian Learning Ref: E. Alpaydin, Intro to Machine Learning, MIT 2004.

Supervised Learning

• Use of Supervised Learning• Classification• Regression• Evaluation Methodology• Bayesian Learning for Classification

Page 5: Machine Learning Part I: Classification and Bayesian Learning Ref: E. Alpaydin, Intro to Machine Learning, MIT 2004.

Why Supervised Learning?

• Prediction of future cases: Use the rule to predict the output for future inputs

• Knowledge extraction: The rule is easy to understand

• Compression: The rule is simpler than the data it explains

• Outlier detection: Exceptions that are not covered by the rule, e.g., fraud

•5

Page 6: Machine Learning Part I: Classification and Bayesian Learning Ref: E. Alpaydin, Intro to Machine Learning, MIT 2004.

Classification

•6

• E.g: Credit scoring• Differentiating

between low-risk and high-risk customers from their income and savings

• Rule-based prediction

Discriminant: IF income > θ1 AND savings > θ2 THEN low-risk ELSE high-risk

Page 7: Machine Learning Part I: Classification and Bayesian Learning Ref: E. Alpaydin, Intro to Machine Learning, MIT 2004.

Learning a Class from Examples• Given a set of examples of

cars, with a label of “family car” or not according to a survey, class learning is to find a description that is shared by all positive examples.

• Use of the class info– Prediction: Is car x a family car?

– Knowledge extraction: What do people expect from a family car?

Page 8: Machine Learning Part I: Classification and Bayesian Learning Ref: E. Alpaydin, Intro to Machine Learning, MIT 2004.

Training set X

Nt

tt,r 1}{ xX

negative is if 0

positive is if 1

x

xr

2

1

x

xx

Input representation

Attributes: price & engine power

Label of each instance

Page 9: Machine Learning Part I: Classification and Bayesian Learning Ref: E. Alpaydin, Intro to Machine Learning, MIT 2004.

Hypothesis Class: C

•9

Most specific hypothesis, S

Most general hypothesis, G

2121 power engine AND price eepp

Learning is to find a particular

hypothesis h to approximate C

Page 10: Machine Learning Part I: Classification and Bayesian Learning Ref: E. Alpaydin, Intro to Machine Learning, MIT 2004.

Hypothesis h and Empirical Error

•10

negative as classifies if 0

positive as classifies if 1)(

x

xx

h

hh

N

t

tt rhhE1

1)|( xX

Error of h:

Page 11: Machine Learning Part I: Classification and Bayesian Learning Ref: E. Alpaydin, Intro to Machine Learning, MIT 2004.

Model Selection & Generalization• Learning is an ill-posed problem: data is not sufficient to

find a unique solution– Limited number of sample data– Some data might be noise due to imprecision in recording,

labeling, or hidden (latent, unobservable) attributes that affect the label of instances

• The need for inductive bias: assumptions about class structureH – Why rectangle, not circle or irregular shape?

– What’s degree of tightness of fitting?

• Generalization: How well a model performs on new data

Page 12: Machine Learning Part I: Classification and Bayesian Learning Ref: E. Alpaydin, Intro to Machine Learning, MIT 2004.

Noise and Model ComplexitySimple model is preferred• Easy to use (check)

(lower time complexity)• Easy to train (lower

space complexity)• Easyto explain

(more interpretable)• Easy to generalize (lower

variance )

•12

Noise: any anomaly in the data

which leads it infeasible to reach

a zero-error classification

with a simple hypothesis class

Page 13: Machine Learning Part I: Classification and Bayesian Learning Ref: E. Alpaydin, Intro to Machine Learning, MIT 2004.

Probably Approximately Correct (PAC) Learning

• How many training examples N should we have, such that with probability at least 1 ‒ δ, h has error at most ε ?

• Each strip is at most ε/4• Pr that we miss a strip 1‒ ε/4• Pr that N instances miss a strip (1 ‒ ε/4)N

• Pr that N instances miss 4 strips 4(1 ‒ ε/4)N

• 4(1 ‒ ε/4)N ≤ δ and (1 ‒ x)≤exp( ‒ x)• 4exp(‒ εN/4) ≤ δ and N ≥ (4/ε)log(4/δ)

•13

Page 14: Machine Learning Part I: Classification and Bayesian Learning Ref: E. Alpaydin, Intro to Machine Learning, MIT 2004.

2-Class vs K-ClassNt

tt,r 1}{ xX

, if 0

if 1

ijr

jt

it

ti C

C

x

x

, if 0

if 1

ijh

jt

it

ti C

C

x

xx

•14

K-class problem be viewed as K 2-class problem:Train hypotheses hi(x), i =1,...,K:

Page 15: Machine Learning Part I: Classification and Bayesian Learning Ref: E. Alpaydin, Intro to Machine Learning, MIT 2004.

Regression

• Examples– Price of a used car– Speed of Top500

• x : car attributesy : price

y = g (x | θ)g ( ) model,

θ parameters

•15

•y = wx+w0

Linear regression

Page 16: Machine Learning Part I: Classification and Bayesian Learning Ref: E. Alpaydin, Intro to Machine Learning, MIT 2004.

Basic Concepts• Interpolation

– Find a function that best fits a training set with no presence of noise

– r = f(x)

• Extrapolation– Predict the output for any x, if x is NOT in the training set

• Regression– Noise factor must be considered– r = f(x) + OR there’re hidden variables we couldn’t

observe: r = f(x, z)

Page 17: Machine Learning Part I: Classification and Bayesian Learning Ref: E. Alpaydin, Intro to Machine Learning, MIT 2004.

Regression

01 wxwxg

012

2 wxwxwxg

N

t

tt xgrN

gE1

21|X

•17

N

t

tt wxwrN

w,wE1

2

0101

1|X

tt

t

N

ttt

xfr

r

rx 1,X

For a given test set, find g() that minimizes the empirical error

Page 18: Machine Learning Part I: Classification and Bayesian Learning Ref: E. Alpaydin, Intro to Machine Learning, MIT 2004.

Underfitting vs Overfitting

• Underfitting: Hypothesis (H) less complex than actual model (C)– Using a line to fit data sampled from a 3rd order

polynomial– Accuracy increases with more sample data; may

not enough if the hypothesis is too complex

• Overfitting: H more complex than C– Having more training data helps but only up to a

certain point

Page 19: Machine Learning Part I: Classification and Bayesian Learning Ref: E. Alpaydin, Intro to Machine Learning, MIT 2004.

Triple Trade-Off Trade-off between three factors :

1. Complexity of the hypothesisH, c (H): capacity of the hypothesis class

2. Training set size, N, 3. Generalization error, E, on new examples

• As NE• As c (H)first Eand then EThe error of an

over-complex hypothesis can be kept in check by increasing the amount of training data, but only up to a point)

Page 20: Machine Learning Part I: Classification and Bayesian Learning Ref: E. Alpaydin, Intro to Machine Learning, MIT 2004.

Cross-Validation

• To estimate generalization error, we need data unseen during training.

• Three types of data in cross-validation:– Training set (50%)– Validation set (25%)– Test (publication) set (25%)

• Resampling when there is few data

Page 21: Machine Learning Part I: Classification and Bayesian Learning Ref: E. Alpaydin, Intro to Machine Learning, MIT 2004.

Dimensions of a Supervised Learner: Summary

1. Model g() and parameter

2. Loss function L(): diff between desired output and approximation

3. Optimization procedure:

|xg

t

tt g,rLE || xX

•21

X|min arg

E*

return the argument that minimizes