Pattern Recognition and Machine Learning:...
-
Upload
truongminh -
Category
Documents
-
view
219 -
download
6
Transcript of Pattern Recognition and Machine Learning:...
![Page 1: Pattern Recognition and Machine Learning: Introductionlibao.in/files/Talk_Slides/PRML_Introduction_11.17.pdf · Example: HandwrittenDigitRecognition Training Set: x, to tune the parameters](https://reader033.fdocuments.in/reader033/viewer/2022051723/5abe6ee77f8b9aa15e8cd4d4/html5/thumbnails/1.jpg)
Pattern Recognition and Machine Learning:Introduction
Libao Jin
November 17, 2016
![Page 2: Pattern Recognition and Machine Learning: Introductionlibao.in/files/Talk_Slides/PRML_Introduction_11.17.pdf · Example: HandwrittenDigitRecognition Training Set: x, to tune the parameters](https://reader033.fdocuments.in/reader033/viewer/2022051723/5abe6ee77f8b9aa15e8cd4d4/html5/thumbnails/2.jpg)
Example: Handwritten Digit Recognition
Training Set: x, to tune the parameters of an adaptive model
Target Vector: t, to express the category of a digitNote that there is one such target vector t for each digit imagex
![Page 3: Pattern Recognition and Machine Learning: Introductionlibao.in/files/Talk_Slides/PRML_Introduction_11.17.pdf · Example: HandwrittenDigitRecognition Training Set: x, to tune the parameters](https://reader033.fdocuments.in/reader033/viewer/2022051723/5abe6ee77f8b9aa15e8cd4d4/html5/thumbnails/3.jpg)
Example: Handwritten Digit Recognition
Training Set: x, to tune the parameters of an adaptive modelTarget Vector: t, to express the category of a digit
Note that there is one such target vector t for each digit imagex
![Page 4: Pattern Recognition and Machine Learning: Introductionlibao.in/files/Talk_Slides/PRML_Introduction_11.17.pdf · Example: HandwrittenDigitRecognition Training Set: x, to tune the parameters](https://reader033.fdocuments.in/reader033/viewer/2022051723/5abe6ee77f8b9aa15e8cd4d4/html5/thumbnails/4.jpg)
Example: Handwritten Digit Recognition
Training Set: x, to tune the parameters of an adaptive modelTarget Vector: t, to express the category of a digitNote that there is one such target vector t for each digit imagex
![Page 5: Pattern Recognition and Machine Learning: Introductionlibao.in/files/Talk_Slides/PRML_Introduction_11.17.pdf · Example: HandwrittenDigitRecognition Training Set: x, to tune the parameters](https://reader033.fdocuments.in/reader033/viewer/2022051723/5abe6ee77f8b9aa15e8cd4d4/html5/thumbnails/5.jpg)
The Result of Running the Machine Learning Algorithm
y = y(x), which encoded in the same way as the target vectors
Once the model is trained it can then determine the identity ofnew digit images, which are said to comprise a test setIn practical applications, training data can comprise only a tinyfraction of all possible input vectors, and so generalization is acentral goal in pattern recognition
![Page 6: Pattern Recognition and Machine Learning: Introductionlibao.in/files/Talk_Slides/PRML_Introduction_11.17.pdf · Example: HandwrittenDigitRecognition Training Set: x, to tune the parameters](https://reader033.fdocuments.in/reader033/viewer/2022051723/5abe6ee77f8b9aa15e8cd4d4/html5/thumbnails/6.jpg)
The Result of Running the Machine Learning Algorithm
y = y(x), which encoded in the same way as the target vectorsOnce the model is trained it can then determine the identity ofnew digit images, which are said to comprise a test set
In practical applications, training data can comprise only a tinyfraction of all possible input vectors, and so generalization is acentral goal in pattern recognition
![Page 7: Pattern Recognition and Machine Learning: Introductionlibao.in/files/Talk_Slides/PRML_Introduction_11.17.pdf · Example: HandwrittenDigitRecognition Training Set: x, to tune the parameters](https://reader033.fdocuments.in/reader033/viewer/2022051723/5abe6ee77f8b9aa15e8cd4d4/html5/thumbnails/7.jpg)
The Result of Running the Machine Learning Algorithm
y = y(x), which encoded in the same way as the target vectorsOnce the model is trained it can then determine the identity ofnew digit images, which are said to comprise a test setIn practical applications, training data can comprise only a tinyfraction of all possible input vectors, and so generalization is acentral goal in pattern recognition
![Page 8: Pattern Recognition and Machine Learning: Introductionlibao.in/files/Talk_Slides/PRML_Introduction_11.17.pdf · Example: HandwrittenDigitRecognition Training Set: x, to tune the parameters](https://reader033.fdocuments.in/reader033/viewer/2022051723/5abe6ee77f8b9aa15e8cd4d4/html5/thumbnails/8.jpg)
Polynomial Curve Fitting
Training Set (blue circles): x ≡ (x1, . . . , xN )T
Target Vector (green line): t ≡ (t1, . . . , tN )T
y(x, w) = w0 + w1x + w2x2 + . . . + wM xM =M∑
j=0wjxj
![Page 9: Pattern Recognition and Machine Learning: Introductionlibao.in/files/Talk_Slides/PRML_Introduction_11.17.pdf · Example: HandwrittenDigitRecognition Training Set: x, to tune the parameters](https://reader033.fdocuments.in/reader033/viewer/2022051723/5abe6ee77f8b9aa15e8cd4d4/html5/thumbnails/9.jpg)
Polynomial Curve Fitting
Training Set (blue circles): x ≡ (x1, . . . , xN )T
Target Vector (green line): t ≡ (t1, . . . , tN )T
y(x, w) = w0 + w1x + w2x2 + . . . + wM xM =M∑
j=0wjxj
![Page 10: Pattern Recognition and Machine Learning: Introductionlibao.in/files/Talk_Slides/PRML_Introduction_11.17.pdf · Example: HandwrittenDigitRecognition Training Set: x, to tune the parameters](https://reader033.fdocuments.in/reader033/viewer/2022051723/5abe6ee77f8b9aa15e8cd4d4/html5/thumbnails/10.jpg)
Polynomial Curve Fitting
Training Set (blue circles): x ≡ (x1, . . . , xN )T
Target Vector (green line): t ≡ (t1, . . . , tN )T
y(x, w) = w0 + w1x + w2x2 + . . . + wM xM =M∑
j=0wjxj
![Page 11: Pattern Recognition and Machine Learning: Introductionlibao.in/files/Talk_Slides/PRML_Introduction_11.17.pdf · Example: HandwrittenDigitRecognition Training Set: x, to tune the parameters](https://reader033.fdocuments.in/reader033/viewer/2022051723/5abe6ee77f8b9aa15e8cd4d4/html5/thumbnails/11.jpg)
Sum-of-Squares Error Function
E(w) = 12
N∑n=1{y(xn, w)− tn}2
![Page 12: Pattern Recognition and Machine Learning: Introductionlibao.in/files/Talk_Slides/PRML_Introduction_11.17.pdf · Example: HandwrittenDigitRecognition Training Set: x, to tune the parameters](https://reader033.fdocuments.in/reader033/viewer/2022051723/5abe6ee77f8b9aa15e8cd4d4/html5/thumbnails/12.jpg)
Minimize Sum-of-Squares Error Function
E(w) = 12
N∑n=1{y(xn, w)− tn}2 = 1
2
N∑n=1
M∑j=0
wjxjn − tn
2
∂E(w)∂wj
=N∑
n=1
M∑j=0
wjxjn − tn
xjn
=[xj
1 · · · xjN
]
x01 x1 · · · xM
1x0
2 x2 · · · xM2
...... . . . ...
x0N xN · · · xM
N
w0w1...
wM
−
t1t2...
tN
=[xj
1 · · · xjN
](Xw− t) = 0⇒ w = (XT X)−1XT t
![Page 13: Pattern Recognition and Machine Learning: Introductionlibao.in/files/Talk_Slides/PRML_Introduction_11.17.pdf · Example: HandwrittenDigitRecognition Training Set: x, to tune the parameters](https://reader033.fdocuments.in/reader033/viewer/2022051723/5abe6ee77f8b9aa15e8cd4d4/html5/thumbnails/13.jpg)
0th Order Polynomial
![Page 14: Pattern Recognition and Machine Learning: Introductionlibao.in/files/Talk_Slides/PRML_Introduction_11.17.pdf · Example: HandwrittenDigitRecognition Training Set: x, to tune the parameters](https://reader033.fdocuments.in/reader033/viewer/2022051723/5abe6ee77f8b9aa15e8cd4d4/html5/thumbnails/14.jpg)
1st Order Polynomial
![Page 15: Pattern Recognition and Machine Learning: Introductionlibao.in/files/Talk_Slides/PRML_Introduction_11.17.pdf · Example: HandwrittenDigitRecognition Training Set: x, to tune the parameters](https://reader033.fdocuments.in/reader033/viewer/2022051723/5abe6ee77f8b9aa15e8cd4d4/html5/thumbnails/15.jpg)
3rd Order Polynomial
![Page 16: Pattern Recognition and Machine Learning: Introductionlibao.in/files/Talk_Slides/PRML_Introduction_11.17.pdf · Example: HandwrittenDigitRecognition Training Set: x, to tune the parameters](https://reader033.fdocuments.in/reader033/viewer/2022051723/5abe6ee77f8b9aa15e8cd4d4/html5/thumbnails/16.jpg)
9th Order Polynomial
![Page 17: Pattern Recognition and Machine Learning: Introductionlibao.in/files/Talk_Slides/PRML_Introduction_11.17.pdf · Example: HandwrittenDigitRecognition Training Set: x, to tune the parameters](https://reader033.fdocuments.in/reader033/viewer/2022051723/5abe6ee77f8b9aa15e8cd4d4/html5/thumbnails/17.jpg)
Over-fitting
Root-Mean-Square (RMS) Error: ERMS =√
2E(w∗)/N
![Page 18: Pattern Recognition and Machine Learning: Introductionlibao.in/files/Talk_Slides/PRML_Introduction_11.17.pdf · Example: HandwrittenDigitRecognition Training Set: x, to tune the parameters](https://reader033.fdocuments.in/reader033/viewer/2022051723/5abe6ee77f8b9aa15e8cd4d4/html5/thumbnails/18.jpg)
Polynomial Coefficients
![Page 19: Pattern Recognition and Machine Learning: Introductionlibao.in/files/Talk_Slides/PRML_Introduction_11.17.pdf · Example: HandwrittenDigitRecognition Training Set: x, to tune the parameters](https://reader033.fdocuments.in/reader033/viewer/2022051723/5abe6ee77f8b9aa15e8cd4d4/html5/thumbnails/19.jpg)
Data Set Size: N = 15
9th Order Polynomial
![Page 20: Pattern Recognition and Machine Learning: Introductionlibao.in/files/Talk_Slides/PRML_Introduction_11.17.pdf · Example: HandwrittenDigitRecognition Training Set: x, to tune the parameters](https://reader033.fdocuments.in/reader033/viewer/2022051723/5abe6ee77f8b9aa15e8cd4d4/html5/thumbnails/20.jpg)
Data Set Size: N = 100
9th Order Polynomial
![Page 21: Pattern Recognition and Machine Learning: Introductionlibao.in/files/Talk_Slides/PRML_Introduction_11.17.pdf · Example: HandwrittenDigitRecognition Training Set: x, to tune the parameters](https://reader033.fdocuments.in/reader033/viewer/2022051723/5abe6ee77f8b9aa15e8cd4d4/html5/thumbnails/21.jpg)
Probability Theory
Marginal Probability: p(X = xi) = ciN .
Joint Probability: p(X = xi, Y = yj) = nij
N .
Conditional Probability: p(Y = yj |X = xi) = nij
ci.
![Page 22: Pattern Recognition and Machine Learning: Introductionlibao.in/files/Talk_Slides/PRML_Introduction_11.17.pdf · Example: HandwrittenDigitRecognition Training Set: x, to tune the parameters](https://reader033.fdocuments.in/reader033/viewer/2022051723/5abe6ee77f8b9aa15e8cd4d4/html5/thumbnails/22.jpg)
Probability Theory
Sum Rule:p(X = xi) = ci
N = 1N
∑Lj=1 nij =
∑Lj=1 p(X = xi, Y = yj).
Product Rule:p(X = xi, Y = yj) = nij
N = nij
ci· ci
N = p(Y = yj |X = xi)p(X = xi).
![Page 23: Pattern Recognition and Machine Learning: Introductionlibao.in/files/Talk_Slides/PRML_Introduction_11.17.pdf · Example: HandwrittenDigitRecognition Training Set: x, to tune the parameters](https://reader033.fdocuments.in/reader033/viewer/2022051723/5abe6ee77f8b9aa15e8cd4d4/html5/thumbnails/23.jpg)
The Rules of Probability
Sum Rule p(X) =∑Y
p(X, Y )
Product Rule p(X, Y ) = p(Y |X)p(X)
![Page 24: Pattern Recognition and Machine Learning: Introductionlibao.in/files/Talk_Slides/PRML_Introduction_11.17.pdf · Example: HandwrittenDigitRecognition Training Set: x, to tune the parameters](https://reader033.fdocuments.in/reader033/viewer/2022051723/5abe6ee77f8b9aa15e8cd4d4/html5/thumbnails/24.jpg)
Bayes’ Theorem
By Product Rule, we have
p(X, Y ) = p(Y, X)⇒ p(Y |X)p(X) = p(X|Y )p(Y )
p(Y |X) = p(X|Y )p(Y )p(X)
p(X) =∑Y
P (X|Y )p(Y )
posterior ∝ likelihood × prior
![Page 25: Pattern Recognition and Machine Learning: Introductionlibao.in/files/Talk_Slides/PRML_Introduction_11.17.pdf · Example: HandwrittenDigitRecognition Training Set: x, to tune the parameters](https://reader033.fdocuments.in/reader033/viewer/2022051723/5abe6ee77f8b9aa15e8cd4d4/html5/thumbnails/25.jpg)
Probability Density
P (z) =∫ z
−∞p(x)dx
p(x) ≥ 0∫ ∞−∞
p(x)dx = 1
![Page 26: Pattern Recognition and Machine Learning: Introductionlibao.in/files/Talk_Slides/PRML_Introduction_11.17.pdf · Example: HandwrittenDigitRecognition Training Set: x, to tune the parameters](https://reader033.fdocuments.in/reader033/viewer/2022051723/5abe6ee77f8b9aa15e8cd4d4/html5/thumbnails/26.jpg)
Expectations
E[f ] =∑
x p(x)f(x) E[f ] =∫
p(x)f(x)dx
E[f |y] =∑
x p(x|y)f(x) Conditional ExpectationE[f ] ≈ 1
N
∑Nn=1 f(xn) Approximate Expectation
![Page 27: Pattern Recognition and Machine Learning: Introductionlibao.in/files/Talk_Slides/PRML_Introduction_11.17.pdf · Example: HandwrittenDigitRecognition Training Set: x, to tune the parameters](https://reader033.fdocuments.in/reader033/viewer/2022051723/5abe6ee77f8b9aa15e8cd4d4/html5/thumbnails/27.jpg)
Variances and Covariances
var[f ] = E[(f(x)− E[f(x)])2] = E[f(x)2]− E[f(x)]2.
cov[x, y] = Ex,y[{x− E[x]}{y − E[y]}] = Ex,y[xy]− E[x]E[y].
cov[x, y] = Ex,y[{x−E[x]}{yT−E[yT ]}] = Ex,y[xyT ]−E[x]E[yT ].