of 71

liang-zeng-han
• Category

Documents

• view

227

1

description

semi slide for Christopher M. Bishop 'Pattern Recognition and Machine Learning' chapter.1

Transcript of PRML Chap.1

• .......

PRML Chapter 1 : Introduction

Liang Zeng Han

2013/3/19(Tue)

2013/3/19(Tue)

• Introduction

Pattern Recognition and Machine Learning, Christopher. M. Bishop,2006, Springer-Verlag New York

Support Page :http://research.microsoft.com/en-us/um/people/cmbishop/PRML/

Wikihttp://ibisforest.org/index.php?PRML

: https://github.com/herumi/prml

2013/3/19(Tue)

• Introduction

What is expected to Machine Learning

(ex. face detection) (ex. collaborative ltering)

2013/3/19(Tue)

• Example

Handwritten Digit Recognition

• Introduction

Preprocessing

(feature extraction) : (,PCA) :

2013/3/19(Tue)

• Introduction

Classication of Learning

...1 Supervised learning

classication : regression : ()

...2 Unsupervised learning

clustering density estimation visualiation

...3 Reinforcement learning

2013/3/19(Tue)

• Polynomial Curve Fitting

Polynomial Curve Fitting

2013/3/19(Tue)

• Polynomial Curve Fitting

• Sum-of-Squares Error Function

• 0th Order Polynomial

• 1st Order Polynomial

• 3rd Order Polynomial

• 9th Order Polynomial

• Over-fitting

Root-Mean-Square (RMS) Error:

• Polynomial Curve Fitting

sin(2x) Taylor :

sin(2x) =1Xn=1

(1)n+1 (2)2n1

(2n 1)!x2n1

= 2x (2)3

3!x3 +

(2)5

5!x5 (2)

7

7!x7

' 6:28x 41:34x3 + 81:61x5 76:71x7

M !1

w2n1 ! (1)n+1 (2)2n1

(2n 1)! ; w2n ! 0

?

2013/3/19(Tue)

• Polynomial Coefficients

• Data Set Size:

9th Order Polynomial

• Data Set Size:

9th Order Polynomial

• Regularization

Penalize large coefficient values

• Regularization:

• Regularization:

• Regularization: vs.

• Polynomial Coefficients

• Probability Theory

Probability Theory

2013/3/19(Tue)

• Probability Theory

Apples and Oranges

• Probability Theory

Marginal Probability

Conditional ProbabilityJoint Probability

• Probability Theory

Sum Rule

Product Rule

• The Rules of Probability

Sum Rule

Product Rule

Sum Rule

Product Rule

• Bayes Theorem

posterior v likelihood prior

• Probability Densities

• Transformed Densities

• Expectations

Conditional Expectation(discrete)

Approximate Expectation(discrete and continuous)

• Variances and Covariances

• Probability Theory

Rethink : Bayes Theorem

prior probability : p(w)D likelihood function : p(Djw)w D posterior probability : p(wj D)D

p(wj D) = p(Djw)p(w)p(D)

p(D) =Z

p(Djw)p(w)dw

2013/3/19(Tue)

• The Gaussian Distribution

• Gaussian Mean and Variance

• The Multivariate Gaussian

• Gaussian Parameter Estimation

Likelihood function

• Maximum (Log) Likelihood

• Properties of and

• Curve Fitting Re-visited

• Maximum Likelihood

Determine by minimizing sum-of-squares error, .

• Predictive Distribution

• MAP: A Step towards Bayes

Determine by minimizing regularized sum-of-squares error, .

• Bayesian Curve Fitting

• Bayesian Predictive Distribution

• Model Selection

Model Selection

2013/3/19(Tue)

• Model Selection

Cross-Validation

• Curse of Dimensionality

Curse of Dimensionality

2013/3/19(Tue)

• Curse of Dimensionality

• Curse of Dimensionality

Polynomial curve fitting, M = 3

Gaussian Densities in higher dimensions

• Decision Theory

Decision Theory

2013/3/19(Tue)

• Decision Theory

Inference step

Determine either or .

Decision stepFor given x, determine optimal t.

• Minimum Misclassification Rate

• Minimum Expected Loss

Example: classify medical images as cancer or normal

DecisionT

r

u

t

h

• Minimum Expected Loss

Regions are chosen to minimize

• Reject Option

• Why Separate Inference and Decision?

Minimizing risk (loss matrix may change over time) Reject option Unbalanced class priors Combining models

• Decision Theory for Regression

Inference step

Determine .

Decision stepFor given x, make optimal prediction, y(x), for t.

Loss function:

• The Squared Loss Function

• Generative vs Discriminative

Generative approach:

Model

Use Bayes theorem

Discriminative approach:

Model directly

• Information Theory

Information Theory

2013/3/19(Tue)

• Entropy

Important quantity in coding theory statistical physicsmachine learning

• Entropy

Coding theory: x discrete with 8 possible states; how many bits to transmit the state of x?

All states equally likely

• Entropy

• Entropy

In how many ways can N identical objects be allocated Mbins?

Entropy maximized when

• Entropy

• Differential Entropy

Put bins of width along the real line

Differential entropy maximized (for fixed ) when

in which case

• Conditional Entropy

• The Kullback-Leibler Divergence

• Mutual Information

prml_01IntroductionPolynomial Curve FittingProbability TheoryModel SelectionCurse of DimensionalityDecision TheoryInformation Theory

prml-slides-1_2prml_01IntroductionPolynomial Curve FittingProbability TheoryModel SelectionCurse of DimensionalityDecision TheoryInformation Theory

prml-slides-1_3-9prml_01IntroductionPolynomial Curve FittingProbability TheoryModel SelectionCurse of DimensionalityDecision TheoryInformation Theory

prml-slides-1_10-17prml_01IntroductionPolynomial Curve FittingProbability TheoryModel SelectionCurse of DimensionalityDecision TheoryInformation Theory

prml-slides-1_18-26prml_01IntroductionPolynomial Curve FittingProbability TheoryModel SelectionCurse of DimensionalityDecision TheoryInformation Theory

prml-slides-1_27-38prml_01IntroductionPolynomial Curve FittingProbability TheoryModel SelectionCurse of DimensionalityDecision TheoryInformation Theory

prml-slides-1_39prml_01IntroductionPolynomial Curve FittingProbability TheoryModel SelectionCurse of DimensionalityDecision TheoryInformation Theory

prml-slides-1_40-41prml_01IntroductionPolynomial Curve FittingProbability TheoryModel SelectionCurse of DimensionalityDecision TheoryInformation Theory

prml-slides-1_42-50prml_01IntroductionPolynomial Curve FittingProbability TheoryModel SelectionCurse of DimensionalityDecision TheoryInformation Theory

prml-slides-1_51-59