Machine vision

108
1 Pattern Recognition Dr P. N. Suganthan Room No.: S2-B2a-21 Tel : 67905404 E-mail: [email protected]

description

Lecture slide for machine vision

Transcript of Machine vision

Page 1: Machine vision

1

Pattern Recognition

Dr P. N. SuganthanRoom No.: S2-B2a-21

Tel : 67905404E-mail: [email protected]

Page 2: Machine vision

2

Outline• Pattern Recognition (weeks 4 - 7)- Model based approach: template matching (covered in week #3)- Statistical approaches

IntroductionSupervised

Minimum distance, minimum risk, knn, Fisher’s discriminantUnsupervised

Clustering algorithmsApplications- Biometrics

Page 3: Machine vision

3

Recommended Reference Books

1. R.C. Gonzalez & R. E. Woods, “Digital image processing”, Addison-Wesley, 2010.

2. RO Duda, E Hart and DG Stork, “Pattern Classification 2nd ed.” John Wiley & Sons, Inc 2001. [Duda01]

3. Bow ST, Pattern recognition and image preprocessing, M. Dekker, TK7882.P3B784P 2002.

Page 4: Machine vision

4

Feature Extraction

Representation& description

Knowledgebase

Image processing(e.g. segmentation) Classification,

Recognition &Interpretation

Image acquisition

High-level processingLow-level processing

Intermediate-level processing

Overview of image analysis

Page 5: Machine vision

5

Design Cycle

[Duda01]

Iterative processNo universal approach

Training – use data to determine optimum Classifier providing optimal decision surface

Evaluation of error for improvement

Page 6: Machine vision

6

What is a feature ? A feature is a characteristic primitive or attribute of image

patterns.(e.g. pixel value of an image point or a distribution of pixel values within an

image region, shape characteristics of patterns, etc.)

A pattern may be characterized by one or more features.

Examples of features used to characterize regions/objectsFeatures used to represent image regions (objects) are :

area (A): counting the no. of pixels within a region. This gives a measure of the size of the region.

Perimeter (P): counting the no. of pixels on the boundary of an image region.

roundness (R): a measure of circularity (R=1 for a circle):

A lot more on features in lecture set 1.

R PA

2

4

Page 7: Machine vision

7

Statistical features:

• Statistical measures commonly used to represent features and the variations of feature values.

• Two basic types of statistical measures are:First order statistics - e.g. mean, variance & standard

deviation of feature distribution.(assumed normal distribution).

Second order statistics - features obtained from co-occurrence matrix.

• By analyzing feature values, shapes/patterns can be classified and recognized.

• Statistical features can be any extractable measurements.

Page 8: Machine vision

8

Page 9: Machine vision

9

Page 10: Machine vision

10

Page 11: Machine vision

11

Page 12: Machine vision

12

Page 13: Machine vision

13

Page 14: Machine vision

14

Page 15: Machine vision

15

Page 16: Machine vision

16

Page 17: Machine vision

17

Page 18: Machine vision

18

Page 19: Machine vision

19

2

Page 20: Machine vision

20

Page 21: Machine vision

21

Minimum Distance Classifier (MDC)

Example (Bean – red/grn; big/small):

1 : m1= (x1(1), x2

(1))T

2 : m2= (x1(2), x2

(2))T

3 : m3= (x1(3), x2

(3))T

4 : m4= (x1(4), x2

(4))T

||x- m3|| < ||x- m1||||x- m3|| < ||x- m2||||x- m3|| < ||x- m4||

x is closest to m3Therefore, x belongs to 3.

m3

m1

m2

x

x1

x2

Types of decision function:

m4

Page 22: Machine vision

22

• Each pattern class is represented by a prototype :

where mj is a mean vector for class j,Nj is the no. of pattern vectors from class j.

• Given an unknown pattern vector x, find the closest prototype by finding the minimum Euclidean Distance:

where = Euclidean norm of vector a

x i if

m xx

jjN

j Mj

1 12

, ,...,

MjD jj ,...,2,1 )( mxx

= 1/2 aaa T

D D j M j ii j( ) ( ) , ,..., ;x x 12

m4m3

m1 m2

Page 23: Machine vision

23

• Recognition is achieved by finding

• Decision function for MDC can be defined as

• Recognition is then achieved by finding

• By expanding Dj(x) (in prev. slide) and removing the common terms, dj(x) is simplified to

;,...,2,1 )}({min)( MjDd jji xx

)()( xx jj Dd

d d j Mij

j( ) max{ ( )} , ,..., ;x x 1 2

d jT

j jT

j( )x x m m m 12

Page 24: Machine vision

24

How to simplify the decision function of MDC ?

)()()( mxmxx TjD

d j Mj jT

j( ) ( ) ( ) ,...,x x m x m 1 ( )( )x m x mT

jT

j

( )x x m x x m m mTjT T

j jT

j

d jT T

j jT

j( ) ( )x x x x m m m2

m x x mjT T

j

x x

x x

T

jT

j

d

d

is independent of j, i.e. a common term.

When (x)} is sought, is can be dropped.

Therefore, dividing by 2, we have

max{

d jT

j jT

j( )x x m m m12

Page 25: Machine vision

25

versicolor 1 : m1=(4.3, 1.3)T

setosa 2 : m2=(1.5, 0.3)T

Rewrite in matrix form

d x xT T1 1 1 1 1 2

12

4 3 13 101( ) . . .x x m m m

d x xT T2 2 2 2 1 2

12

15 03 117( ) . . .x x m m m

4 3 13 10115 03 117

1

1

21

2

. . .

. . .

xx

dd

Page 26: Machine vision

26

If a petal has length 2 cm and width 0.5cm, i.e. x =(2, 0.5)T

Therefore, it is Iris Setosa.

If a petal has length 5 cm and width 1.5cm, i.e. x =(5, 1.5)T

Therefore, it is Iris Versicolor.

4 3 13 10115 03 117

2051

085198

. . .

. . ..

..

4 3 13 10115 03 117

5151

1335678

. . .

. . ..

..

Page 27: Machine vision

27

Decision boundary

- it is the line that separates two classes.

Property of the decision boundary in MDC

the decision boundary is a perpendicular bisector of the line joining two class prototypes.

At the decision boundary (surface), d1(x) = d2(x)

d12(x) = d1(x) - d2(x) = 2.8x1 + 1.0x2 - 8.9 = 0

for Iris classification,

x if d12(x) > 0x if d12(x) < 0

Page 28: Machine vision

28

Example:

x =(2, 0.5)T; d12((2, 0.5)T) = -2.8 2 Iris setosax =(5, 1.5)T; d12((5, 1.5)T) = 6.6 1 Iris versicolor

Decision boundary can be used for recognition

Definition of decision function:

Let the decision function be a line or surface in the pattern space separating the 2-classes.

Recognition is achieved by testing the sign of the decision function when presented with a pattern vector.

Page 29: Machine vision

29

Example:

Classify Iris versicolor and Iris setosa by petal length, x1, and petal width, x2.

Page 30: Machine vision

30

Use of Probability Theory in Recognition

• Probability theory is widely used in statistical pattern recognition.

• In particular, Bayes probability theory is used in formulating the optimal classifier (recognizer).

• Feature distributions are used to form a priori probability distributions and they are represented as

• The above is also known as the likelihood function of class j object (prior probability).

p j Mj( | ) , ,...., .x for 12

Page 31: Machine vision

31

Optimal Statistical Classifier - The Bayes Classifier

Recognition problem: given x, find i.

Let the probability of deciding x belonging to i be p(i|x) . (Given x, the probability that it belongs to i)

If x should belong to i but is wrongly classified to j, the conditional average risk or conditional average loss is defined as

where Lkj is the loss incurred due to misclassification.

M

kkkjj pLr

1=)|( =)( xx

rj (x) = conditional average risk

Page 32: Machine vision

32

• p(i|x) is a posterior probability and is generally unknown.

From Bayes’ rule,

p(x)-1 is independent of j and is a common term,

rj can be reduced to

)()()|()|(

xx

ppxpp kk

k

)()|( )(

1=)(1=

kM

kkkjj ppL

pr x

xx

)()|( =)(1=

kM

kkkjj ppLr xx

p(x|ωk): probability density function of the patterns in each class;p(ωk): probability of occurrence of each class.

Page 33: Machine vision

33

• p(x | k) and p(k) are both a priori probabilities that can be estimated from training set.

• Recognition is achieved by

x i if for j = 1,2, ... M.

Decision function for Bayes classifier is

x i if for j = 1,2, ... M.

)()|( -=)(1=

kM

kkkjj ppLd xx

}{max jj

i dd

}{min jj

i rr

Page 34: Machine vision

34

Example illustrating the concept of Bayes classifier

For a 2-class problem, show how a feature x coming from class 2 can be correctly classified using the Bayes classifier. Assume that patterns from both classes are equally likely to be seen. For correct classification, there should be no penalty. For incorrect classification, the penalty is Lij=1, i j.

From above, we know : M = 2, p(1) = p(2), and x 2Also, L11 = L22 = 0, L12=L21=1.

r1 = L11 p(x | 1) p(1) + L21 p(x | 2) p(2) r2 = L12 p(x | 1) p(1) + L22 p(x | 2) p(2)

Page 35: Machine vision

35

Since p(1) = p(2), and L11 = L22 = 0, L12=L21=1.

r1 = p(x | 2) r2 = p(x | 1)d1 = -p(x | 2) d2 = -p(x | 1)

If x 2 p(x | 2) > p(x | 1) r1 > r2

If x 2 - p(x | 2) < -p(x | 1) d1 < d2

p(x | 2)p(x | 1)

Page 36: Machine vision

36

Bayes classifier with 0-1 loss function.

• In the penalty approach,

Lii = 0 for correct decision, i.e. i = j.Lij = 1 for incorrect decision, i.e. i j.

• In the reward approach,

ii= 1 for correct decision, i.e. i = j .ij= 0 for incorrect decision. i.e. i j.

Let Lij = 1- ij and ij= 1 when i = j, ij= 0 when i j.

)()|( )1(-=)(1=

kM

kkkjj ppd xx Penalty approach.

From 3 slides back

Page 37: Machine vision

37

The summation can be separated into 2 parts:

But,

Since -p(x) is constant and ij= 0 except when k = j,

x i ifMjppd jjj ,...,2,1 , )()|(=)( xx

Mjdd jj

i ,...,2,1 ,)}({max x

)()|( +)()|( -=)(1=1=

kM

kkkjk

M

kkj ppppd xxx

)()|( =)(1=

kM

kk ppp xx

)()|( +)(-=)(1=

kM

kkkjj pppd xxx

Page 38: Machine vision

38

A Bayes Classifier with 0-1 loss

x Evaluationof

p(x|j)j=1,2,…,M

DecisionMaximum

Selector

p(x|1)

p(x|2)

p(x|M)

p(1)

p(2)

p(M)

Page 39: Machine vision

39

Bayes classifier with 0-1 loss for Gaussian patterns

A pattern class with a Gaussian distribution of its feature values can be represented by

where m = E{x}

2 = E{(x-m)2}

E{.} denotes the expected values.

p x x m( ) exp

12

12

2

Gaussian probability density function

Page 40: Machine vision

40

A 2-class example of Gaussian patterns

Suppose only one feature is used to classify two patterns. Let x be the feature, the Bayes decision function is

2,1 , )()|(=)( jpxpxd jjj

2,1 , )(21exp

21)(

2

jp

mxxd j

j

j

jj

Therefore,

Page 41: Machine vision

41

At the decision boundary , d1(x0)=d2(x0)

If p(2) = p(1), p(x0 | 1) = p(x0 | 2).

If p(1) > p(2), x0 will be shifted to the left.

If p(1) < p(2), x0 will be shifted to the right.

Page 42: Machine vision

42

For n dimensional feature vectors, Gaussian distribution is represented by

)()(

21exp

||)2(1)|( 1

2/12/ jjT

jj

njp mxCmxC

x

where mj is the mean vector of class j,Cj is the covariance matrix of class j,|Cj| is the determinant of Cj,n is the dimension of feature vector.

= Ej { x }= Ej {(x – mj)(x – mj)T}

Approximating the Expected value, E{•}, with its average value:

jj

j N x

xm 1

j

Tjj

T

jj N x

mmxxC 1

Page 43: Machine vision

43

The Bayes decision function for Gaussian patterns is:

)()()(21exp

||)2(1)( 1

2/12/ jjjT

jj

nj pd

mxCmx

Cx

)(ln=)(' xx jj dd

Remove the constant terms, dj(x) can be redefined as

jjjjT

jj pd CmxCmxx ln21)(ln+)()(

21)( 1'

This is the general form of Bayes classifier for Gaussian pattern classes with 0-1 loss function

)(ln+)()(21ln

21)2(ln

2)( 1'

jjjT

jjj pnd mxCmxCx

Page 44: Machine vision

44

Cj is a symmetric square matrix of n by n.

The Covariance matrix Cj

}))({( Tjjjj E mxmxC

nn

nn

jj mxmxmx

mx

mxmx

E ...

.

.

.2211

22

11

C

Page 45: Machine vision

45

E x m x m E x m x m E x m x mE x m x m E x m x m

E x m x m E x m x m

n n

n n n n n n

{( )( )} {( )( )} ... {( )( )}{( )( )} {( )( )} .

. . .

. .

. . .{( )( )} ... ... {( )( )}

1 1 1 1 1 1 2 2 1 1

2 2 1 1 2 2 2 2

1 1

Cj=

j

C j

n

n nn j

c c cc c

c c

11 12 1

21 22

1

....

. . .

. .

. . .... ...

•cii is the variance

•cij is the covariance; i j.

Page 46: Machine vision

46

• Covariance matrix can be estimated by expanding the original definition and simplifying by using averages to:

Tjj

T

j

Tjj

Tjj

jN

E mmxxmmxxCx

1][

• For features that are statistically independent (uncorrelated), Cj is a diagonal matrix. (cij = 0)

If all covariance matrices are equal and are identity matrices and P(j) = 1/M for all j = 1,2, …., M, then:

d jT

j jT

j( )x x m m m12

Thus, MDC is optimal in the Bayes sense if:1. Pattern classes are Gaussian2. All covariances are equal to identity matrix3. All classes are equally likely to occur

= MDC (Minimum Distance Classifier)

Page 47: Machine vision

47

Special cases of the covariance matrix (2D examples):

(a) assume m = 0, C = I

(b) assume m = [1,0]T, c22 > c11

(0,0)

x1

x2

(1,0)

x1

x2

22

110

0c

cC

Page 48: Machine vision

48

(c) assume m = [0,1]T, C

c cc c11 12

21 22

(0,1)

x1

x2

Page 49: Machine vision

Random Vector Functional Link Network (RVFL)

EE6222

Page 50: Machine vision

Outline

IntroductionSLFN: Single Layer Feedforward NetworkFrom Error back-propagation (back-prop) to randomizedlearningNotations of RVFLLearning process in RVFLTheoretical part of RVFL

Classification ability of RVFLproperty of RVFL variants

Page 51: Machine vision

Outline

IntroductionSLFN: Single Layer Feedforward NetworkFrom Error back-propagation (back-prop) to randomizedlearningNotations of RVFLLearning process in RVFLTheoretical part of RVFL

Classification ability of RVFLproperty of RVFL variants

Page 52: Machine vision

Structure of an SLFN (Single Layer Feedforward Network)

Structure

Details

I Bias :x × β + b = [x , 1]× β. i.e, extending the input features by addingadditional feature which equals to 1.

I For C classes, we can have C 0-1 output neurons. When we present aninput observation belonging to c th class, we expect the c th neuron to givean output near one while all others to give near zero. Circles in thehidden and output layers have non-linear activation functions. Inputs toactivation functions are weighted sums.

Page 53: Machine vision

Learning with neural network

learning the mapping ti = f (xi ), where i is the index for the data. The pipelinecan be formalized as follows:

I Input: Input consists of a set of N data [xi , yi ], where xi is the featuresand yi is the actual target given with the dataset. We refer to this dataas the training set used to learn network parameters, etc.

I Learning: Use the training set to learn network structure and parameters.We refer to this step as training a classifier.

I Classification: In the end, we evaluate the quality of the classifier byasking it to predict target t for an input data that it has never seenbefore. We will then compare the true target y of the data to the tpredicted by the model. Prediction t is selected by a max operator amongthe C outputs.

Page 54: Machine vision

Approximation ability of neural network

Theoretical proof about the approximation ability of standard multilayerfeedforward network can be found in1.With the following conditions:

I Arbitrary squashing functions.

I Sufficiently many hidden units are available.

Standard multilayer feedforward network can approximate virtually anyfunction to any desired degree of accuracy.

1Kurt Hornik, Maxwell Stinchcombe, and Halbert White. “Multilayerfeedforward networks are universal approximators”. In: Neural Networks 2.5(1989), pp. 359–366.

Page 55: Machine vision

Why randomized neural network

Some weaknesses of back-prop

I Error surface can have multiple local minimum.

I Slow convergence due to iteratively adjusting the weights onevery link. These weights are first randomly generated.

I Sensitivity to learning rate setting.

Alternative for back-prop

I In2, the authors show the parameters in hidden layers can berandomly and properly generated without learning.

I In3, the parameters in the enhancement nodes can berandomly generalized with proper range.

2Wouter F Schmidt, Martin Kraaijveld, Robert PW Duin, et al.“Feedforward neural networks with random weights”. In: 1992 11th IAPRInternational Conference on Pattern Recognition. IEEE. 1992, pp. 1–4.

3Yoh-Han Pao, Gwang-Hoon Park, and Dejan J Sobajic. “Learning andgeneralization characteristics of the random vector functional-link net”. In:Neurocomputing 6.2 (1994), pp. 163–180.

Page 56: Machine vision

Outline

IntroductionSLFN: Single Layer Feedforward NetworkFrom Error back-propagation (back-prop) to randomizedlearningNotations of RVFLLearning process in RVFLTheoretical part of RVFL

Classification ability of RVFLproperty of RVFL variants

Page 57: Machine vision

Structure of RVFL4

Illustrations

Details

I Parameters β (indicated in blue)in enhancement nodes arerandomly generated in properrange and kept fixed.

I Original features (from the inputlayer) are concatenated withenhanced feature (from hiddenlayer) and referred to as d .

I Learning aims at solvingti = di ∗W , i = 1, 2, ...,N, whereN is the number of trainingsample, W (indicated in red andgray) are the weights in theoutput layer, and ti , di representthe output values and thecombined features, respectively.

4Yoh-Han Pao, Gwang-Hoon Park, and Dejan J Sobajic. “Learning andgeneralization characteristics of the random vector functional-link net”. In:Neurocomputing 6.2 (1994), pp. 163–180.

Page 58: Machine vision

Outline

IntroductionSLFN: Single Layer Feedforward NetworkFrom Error back-propagation (back-prop) to randomizedlearningNotations of RVFLLearning process in RVFLTheoretical part of RVFL

Classification ability of RVFLproperty of RVFL variants

Page 59: Machine vision

Details of RVFLNotations

I X = [x1, x2, ...xN ]′: input data (N samples and m features).

I β = [β1, β2, ...βm]′: the weights for the enhancement nodes (m × k, k isthe number of enhancement nodes).

I b = [b1, b2, ...bk ] the bias for the enhancement node.

I H = h(X ∗ β + ones(n, 1) ∗ b): feature matrix after the enhancementnodes and h is the activation function. No activation in output nodes.

Commonly used activation functions in hidden layer

activation function formulation: s is the input, y is the output

sigmoid y = 11+e−s

sine y = sine(s)

hardlim y = (sign(s) + 1)/2)

tribas y = max(1− |s|, 0)

radbas y = exp(−s2)

sign y = sign(s)

relu y = max(0, x)

Page 60: Machine vision

Details of RVFL

Notations

I H =

h(x1)h(x2)

...h(xN)

=

h1(x1) · · · hk(x1)h1(x2) · · · hk(x2)

.... . .

...h1(xN) · · · hk(xN)

I D = [H,X ] =

d(x1)d(x2)

...d(xN)

h1(x1) · · · hk(x1) x11 · · · x1mh1(x2) · · · hk(x2) x21 · · · x2m

.... . .

......

. . ....

h1(xN) · · · hk(xN) xN1 · · · xNm

I Target y , can be class labels for C class classification problem (1, 2, ...,C).

I For classification, y is usually transformed into 0-1 coding of the class

label. For example, if we have 3 classes y =

123

=

1 0 00 1 00 0 1

, i.e. we

have 3 neurons, 1st neuron goes to 1 for 1st classes while others at zero,and so on.

Page 61: Machine vision

Outline

IntroductionSLFN: Single Layer Feedforward NetworkFrom Error back-propagation (back-prop) to randomizedlearningNotations of RVFLLearning process in RVFLTheoretical part of RVFL

Classification ability of RVFLproperty of RVFL variants

Page 62: Machine vision

Learning

Define the problemOnce the random parameters β and b are generated in proper range, thelearning of RVFL aims at solving the following problem:ti = di ∗W , i = 1, 2, ...,N,where N is the number of training sample, W are the weights in the outputlayer, and ti , di represent the ouput and the combined features.

Solutions

I closed form:

I In prime space:W = (λI + DTD)−1DTY ; (1)

I In dual space:W = DT (λI + DDT )−1Y (2)

where λ is the regularization parameter, when λ→ 0, the methodsconverges to the Moore-Penrose pseudoinverse solution.Y = [y1, y2, ...yN ]′;

I Iterative methods: tune the weight based on the gradient of the errorfunction in each step. In assignment, you’ll use above Eqns.

Page 63: Machine vision

Outline

IntroductionSLFN: Single Layer Feedforward NetworkFrom Error back-propagation (back-prop) to randomizedlearningNotations of RVFLLearning process in RVFLTheoretical part of RVFL

Classification ability of RVFLproperty of RVFL variants

Page 64: Machine vision

Approximation ability of RVFLTheoretical justification for RVFL is presented in5.

Method

I It formulates a limit-integral representation of the function tobe approximated.

I Evaluates the integral with the Monte-Carlo Method.

Conclusion

I RVFL is a universal approximator for continuous function onbounded finite dimensional sets.

I RVFL is an efficient universal approximator with the rate ofapproximation error converge to zero of order O(C/

√n),

where n is the number of basis functions and C isindependent of n.

5Bons Igelnik and Yoh-Han Pao. “Stochastic choice of basis functions inadaptive function approximation and the functional-link net”. In: IEEETransactions on Neural Networks 6.6 (1995), pp. 1320–1329.

Page 65: Machine vision

Outline

IntroductionSLFN: Single Layer Feedforward NetworkFrom Error back-propagation (back-prop) to randomizedlearningNotations of RVFLLearning process in RVFLTheoretical part of RVFL

Classification ability of RVFLproperty of RVFL variants

Page 66: Machine vision

Classification ability of RVFL

EvaluationThe following issues can be investigated by using 121 UCI datasets as done by6

I Effect of direct links from the input layer to the output layer.

I Effect of the bias in the output neuron.

I Effect of scaling the random features before feeding them into theactivation function.

I Performance of 6 commonly used activation functions: sigmoid , radbas,sine, sign, hardlim, tribas .

I Performance of Moore-Penrose pseudoinverse and ridge regression (orregularized least square solutions) for the computation of the outputweights.

6Manuel Fernandez-Delgado et al. “Do we need hundreds of classifiers tosolve real world classification problems?” In: The Journal of Machine LearningResearch 15.1 (2014), pp. 3133–3181.

Page 67: Machine vision

67

k-Nearest Neighbour• We are concerned with the following problem: we wish to

label some observed pattern x with some class category . • Two possible situations with respect to x and may occur:

– We may have complete statistical knowledge of the distribution of observation x and category . In this case, a standard Bayes analysis yields an optimal decision procedure.

– We may have no knowledge of the distribution of observation x and category aside from that provided by pre-classified samples. In this case, a decision to classify x into category will depend only on a collection of correctly classified samples.

• The nearest neighbor rule is concerned with the latter case. Such problems are classified in the domain of non-parametric statistics. No optimal classification procedure exists with respect to all underlying statistics under such conditions.

Page 68: Machine vision

68

The NN Rule• Recall that the only knowledge we have is a set of i points

x correctly classified into categories : (xi,i). By intuition, it is reasonable to assume that observations that are close together -- for some appropriate metric -- will have the same classification.

• Thus, when classifying an unknown sample x, it seems appropriate to weight the evidence of the nearby xi's heavily.

• One simple non-parametric decision procedure of this form is the nearest neighbor rule or NN-rule. This rule classifies x in the category of its nearest neighbor. More precisely, we call

xm (x1, x2, x3, …, xn)xm is a nearest neighbor to x if min d(xi, x) = d(xm, x) where i = 1, 2, . . , n .

Page 69: Machine vision

69

Page 70: Machine vision

70

Page 71: Machine vision

71

Page 72: Machine vision

72

Page 73: Machine vision

73

Fisher’s Linear Discriminant Analysis

Page 74: Machine vision

74

5001000

Page 75: Machine vision

75

Page 76: Machine vision

76

ii Hx

ii

i xn

m 1

Page 77: Machine vision

77

Page 78: Machine vision

78

Page 79: Machine vision

79

Page 80: Machine vision

80

Page 81: Machine vision

81

( +

Page 82: Machine vision

82

Page 83: Machine vision

83

Page 84: Machine vision

84

Page 85: Machine vision

85

Page 86: Machine vision

86

Page 87: Machine vision

87

Also refer to the book by Robert Schalkoff, “Pattern Recognition …”, on page 89- for more details on LDA Q327.S297

Page 88: Machine vision

88

Supervised Training Vs Unsupervised Training

Supervised training• A supervisor teaches the system how to classify a

known set of patterns with class labels before using it.

• need a priori class label information

Unsupervised training • does not depend on a priori class label information• used when a priori class label information is not

available or proper training sets are not available.

Page 89: Machine vision

89

Unsupervised training by clusteringTo seek regions such that every pattern falls into one of these regions and no pattern falls in two or more regions, i.e.

* Algorithms classify objects into clusters bynatural association according to some similaritymeasures. e.g. Euclidean distance* Clustering algorithms are based either on heuristics or optimization principles.

;....321 SSSSS K jiSS ji

Page 90: Machine vision

90

Clustering with a known number of classesRequire knowledge about the number of classes, e.g.

K classes (or clusters).The K-means algorithm is based on the

minimization of the sum of squared distances

, j = 1,2, ... , K

where Sj(k) is the cluster domain for cluster Centre zj at the kth iteration.

)(

2

kSjj

j

Jx

zx

Page 91: Machine vision

91

K-Means Algorithm

1. Arbitrarily choose K samples as initial cluster centres. i.e. z1(k), z2(k), z3(k), .. , zK(k); iteration,k = 1.

2. At kth iterative step, distribute the pattern samples x among the K clusters according to the following rule:

if

for i = 1, 2, ..., K ; i j ;

)(kS jx )()( kk ij zxzx

Page 92: Machine vision

92

4. If zj(k+1)= zj(k) for j = 1,2, ..., K, stop.

Otherwise go to step 2.

3. Compute zj(k+1) and Jj for j = 1,2, ..., K.

)(

;1)1(kSj

jj

Nk

xxz

)(

2)1(

kSjj

j

kJx

zx

Page 93: Machine vision

93

Advantages :

• simple and efficient.

• minimum computations are required

Drawbacks :• K must be decided • Individual Clusters themselves should be tight,

and clusters should be widely separated from one another.

• Results usually depend on the order of presentation of training samples and the initial cluster centers.

Page 94: Machine vision

94

ExampleApply K-means algorithm to the followingdata for K = 2 :x1 = (0,0), x2 =(0,1), x3 =(1,0), x4=(1,1), x5 =(2,1), x6 =(1,2), x7 =(2,2), x8 =(3,2), x9 =(6,6), x10 =(7,6) x11 =(8,6), x12 =(6,7), x13 =(7,7), x14=(8,7), x15 =(9,7), x16 =(7,8), x17 =(8,8), x18 =(9,8), x19 =(8,9), x20 =(9,9)

Page 95: Machine vision

95

0123456789

10

0 2 4 6 8 10

x1

x2

Homework – make use of this dataset with class labels to try out Fisher’sDiscriminant method.

Page 96: Machine vision

96

ExampleIteration 1:

Step 1. Let K = 2, z1(1) = x1 = (0,0)T, z2(1) = x2 = (0,1)T.(These starting points are selected randomly).Step 2.Calculate distances of all patterns from z1 and z2. The patterns are assigned to S1(1) and S2(1) asS1(1) = {x1, x3} ; S2(1) = {x2, x4, x5,..., x20}

Page 97: Machine vision

97

Step 3.Update the cluster centers:

z1(2) =(0.5 0.0) T ; z2(2) =(5.67 5.33) T

Step 4.

zj(1) and zj(2) are different, go back to step 2

Iteration 2:Step 2.Calculate distances of all patterns from z1 and z2. The patterns are assigned to S1(2) and S2(2) as S1(2) = {x1, x2,..., x8} ; S2(2) = {x9, x10,..., x20}

Page 98: Machine vision

98

Step 3.Update the cluster centers: z1(3) = (1.25 1.13)T ; z2(3) = (7.67 7.33)T

Step 4.zj(3) and zj(2) are different, go to step 2

Iteration 3:Step 2.Calculate distances of all patterns from z1 and z2.No change in S1 and S2.

Page 99: Machine vision

99

Step 3.• Update the cluster centers:• z1(3) = (1.25 1.13)T ; z2(3) = (7.67 7.33)T

Step 4.• zj(3) and zj(4) are the same, stop.

• Final Centers :• z1(4) = (1.25 1.13)T ; z2(4) = (7.67 7.33)T

Page 100: Machine vision

100

Batchelor and Wilkin’s algorithm

This algorithm forms clusters without requiring the specification of the number of clusters, K. But instead, it needs to specify aparameter, namely the proportion p.

The algorithm: 1. Arbitrarily, take the first sample as first

cluster. i.e. z1= x1

2. Choose the farthest pattern sample from z1 to be z2.

Page 101: Machine vision

101

3. Compute distance from each remaining pattern sample to z1 and z2.

4. Save the minimum distance for each pair of these computations.

5. Select the maximum of these minimumdistances.

6. If the maximum distance is greater than a fraction p of d(z1, z2), assign the correspondingsample to a new cluster z3. Otherwise, stop.

Page 102: Machine vision

102

7. Compute distance from each remaining pattern sample to each of the 3 clusters and save the minimum distance of every group of 3 distances.

8. Then select the maximum of these minimum distances.

9. If the maximum distance is greater than a fraction p of the typical previous maximum distance, assign the corresponding sample to a new clusterz4. Otherwise, stop.

Page 103: Machine vision

103

10. Repeat steps 7 to 9 until no new cluster is formed.

11. Assign each sample to its nearest clusters.

*********************

Page 104: Machine vision

104

Example Apply Batchelor and Wilkin’s algorithmto find the cluster groups for the following samples :

x1 = (0,0), x2 = (3,8), x3 = (2,2), x4 = (1,1),

x5 = (5,3), x6 = (4,8), x7 = (6,3), x8 = (5,4),

x9 = (6,4), x10 = (7,5)

By using the algorithm, the result will be

S1 = {x1, x3, x4}; S2 = {x2, x6}

S3 = {x5, x7, x8, x9, x10};

Page 105: Machine vision

105

0123456789

0 2 4 6 8

x1

x2

Q : Apply Batchelor and Wilkens algorithm to the above training data. Select the proportion parameter as 0.5. Start with x1 = (0,0).

Q : What effect do you think the proportion parameter has on the final clustering ?

Page 106: Machine vision

106

The The Advantages of this heuristic algorithm are as follows :

i) It determines K,

ii) it self-organises, and

iii) it is efficient for small-to-medium numbers of classes.

Page 107: Machine vision

107

The disadvantages are :

i) the proportion p must be given,

ii) the ordering of the samples influences the clustering,

iii) the value of p influences the clustering,

iv) the separation is linear,

v) the cluster centers are not recomputed to betterrepresent the classes,

vi) there is no measure of how good the clustering is,

vii) an outlier can upset the whole process.

Page 108: Machine vision

108

In practice, it is suggested that the process be run for several values of p and the mean-squared error of all distances of the samples from their center be computed for each kth cluster (k = 1, 2, …, K) and then summed over all classes k to obtain the total mean-squared error. The value of p that provides the lowest total mean-square error is an optimal one. The final clusters should be averaged to obtain a new center and all sample vectors reassigned via minimum distance to the new centers.