Digital Image Processing Lectures 27 & 28
Transcript of Digital Image Processing Lectures 27 & 28
Segmentation & Feature Extraction Feature Selection Pattern Classification Unsupervised Cluster Discovery
Digital Image ProcessingLectures 27 & 28
M.R. Azimi, Professor
Department of Electrical and Computer EngineeringColorado State University
M.R. Azimi Digital Image Processing
Segmentation & Feature Extraction Feature Selection Pattern Classification Unsupervised Cluster Discovery
Area 5: Segmentation & Feature Extraction
Segmentation:Detect and isolate objects of interest (targets) from thebackground.Feature Extraction:Extract salient features of the detected objects for thepurpose of classification and recognition.
Figure 1: Block Diagram of a Pattern Classification System.
Segmentation can be done using one of the following classes ofapproaches:
Histogram-Based
Template Matching
Region Growing
M.R. Azimi Digital Image Processing
Segmentation & Feature Extraction Feature Selection Pattern Classification Unsupervised Cluster Discovery
Feature Extraction & SelectionThe most crucial step in any pattern classification system. Goals are:
1 Extract salient and representative set of features with highdiscriminatory ability.
2 Dimensionality reduction
3 Decorrelation, if possible.
Category of Methods:
Energy-based: KL transform, statistical-based
Contour-based: Fourier Descriptor, Hough transform
Shape-dependent: Moments invariants, Zernike moments
Texture-based: WT, Gabor filter, Gray-level Co-occurrence Matrix(GLCM), statistical-based
M.R. Azimi Digital Image Processing
Segmentation & Feature Extraction Feature Selection Pattern Classification Unsupervised Cluster Discovery
Fourier Descriptor (FD)
FD extracts contour shape-dependent features. Steps are:1 Define a closed contour with M-points (see Fig. 2).
2 Map contour points to real and imaginary parts of a complete-valued function,p(n) = xn + jyn, n ∈ [0,M − 1]. Clearly, p(n) = p(n+ lM) i.e. periodic withperiod M .
3 Take DFT of complex-valued, p(n) function to generate FD coefficients i.e.P (k) = DFT{p(n)}, k ∈ [0,M − 1]. Thus,
P (k) =
M−1∑n=0
p(n)e−j 2πnkM
Figure 2: Contour with M-points.
M.R. Azimi Digital Image Processing
Segmentation & Feature Extraction Feature Selection Pattern Classification Unsupervised Cluster Discovery
Properties:1 FD’s of regular circular contours are primarily concentrated on
low-order coefficients, whereas irregular contours have FD’s that arespan over the high-order coefficients.
2 FD is NOT good for small objects. A large number of points isnecessary in order to get good FD’s.
3 Due to the periodicity of the closed contours and FD’s the featuresare translation and rotation independent.
The next three figures show some examples of binary silhouettes and their
corresponding FD plots for three targets as well as two data sets without
and noise. As can be seen, one can discriminate different objects using
their FD coefficients based upon the Euclidean distances (see tables).
M.R. Azimi Digital Image Processing
Segmentation & Feature Extraction Feature Selection Pattern Classification Unsupervised Cluster Discovery
Figure 3: Target silhouettes and corresponding FD coefficients.
M.R. Azimi Digital Image Processing
Segmentation & Feature Extraction Feature Selection Pattern Classification Unsupervised Cluster Discovery
M.R. Azimi Digital Image Processing
Segmentation & Feature Extraction Feature Selection Pattern Classification Unsupervised Cluster Discovery
M.R. Azimi Digital Image Processing
Segmentation & Feature Extraction Feature Selection Pattern Classification Unsupervised Cluster Discovery
Moments Invariant
Moment invariants (MIs) provide a set of nonlinear features dependenton normalized 2nd and 3rd order central moments. MIs provide sevenfeatures invariant to rotation, scaling and translation ideal for patternrecognition.Let x(m,n) be the detected object, the (p+ q)th order regular andcentral moments are:
µp,q =∑m
∑n
mpnqx(m,n)
ξp,q =∑m
∑n
(m−m)p(n− n)qx(m,n)
where m∆= µ10
µ00,and n
∆= µ01
µ00The central moments of order p+ q ≤ 3
are:ξ00 = µ00 ξ11 = µ11 − nµ10
ξ10 = 0 ξ20 = µ20 −mµ10
ξ01 = 0 ξ02 = µ02 − nµ01
ξ30 = µ30 − 3mµ20 + 2µ10m2
ξ03 = µ03 − 3nµ02 + 2µ01n2
M.R. Azimi Digital Image Processing
Segmentation & Feature Extraction Feature Selection Pattern Classification Unsupervised Cluster Discovery
ξ12 = µ12 − 2nµ11 −mµ02 + 2n2µ10
ξ21 = µ21 − 2mµ11 − nµ20 + 2m2µ01
The normalized central moments are:
ηp,q =ξp,qξγ00
, γ =p+ q
2+ 1, p+ q = 2, 3, ·
Then the seven invariant features are computed using
φ1 = η20 + η02
φ2 = (η20 − η02)2 + 4η2
11
φ3 = (η30 − 3η12)2 + (3η21 − η03)
2
φ4 = (η30 + η12)2 + (η21 + η03)
2
φ5 = (η30 − 3η12)(η30 + η12)[(η30 + η12)2 − 3(η21 + η03)
2]
+(3η21 − η03)(η21 + η03)[3(η30 + η12)2 − (η21 + η03)
2]
φ6 = (η20 − η02)[(η30 + η12)2 − (η21 + η03)
2]
+4η11(η30 + η12)(η21 + η03)
M.R. Azimi Digital Image Processing
Segmentation & Feature Extraction Feature Selection Pattern Classification Unsupervised Cluster Discovery
φ7 = (3η21 − η30)(η30 + η12)[(η30 + η12)2 − 3(η21 + η03)
2]
+(3η12 − η30)(η21 + η03)[3(η30 + η12)2 − (η21 + η03)
2]
Remarks:1 The symmetric form of these features make them independent of
rotation and translation.
2 Everything about the shape of the object is represented by theseseven features.
3 Two measures known as “Spread” and “Slenderness” can be definedin terms of the 2nd order moments as SP = µ20 + µ02 = φ1 andSL =
√(µ20 − µ02)2 + 4µ2
11 =√φ2, respectively. These may be
used as features in some simple shape discrimination problems.
M.R. Azimi Digital Image Processing
Segmentation & Feature Extraction Feature Selection Pattern Classification Unsupervised Cluster Discovery
Feature Selection
Bellman’s Curse of DimensionalityClassification performance will NOT improve as more features are added.More features ⇒ more parameters to be estimated ⇒ increasedestimation error when using finite training samples. The trained classifierwill be so fine-tuned to training data that will lack generalization abilityon novel data.
Fisher Discriminant Function (DF)Goal: Extract a lower-dimensional feature subspace that are mostdiscriminatory and remove the ones that may have detrimental effects.Assume two-class problems. For every feature xi in the feature vector, x,the Fisher DF is computed over all the training samples using
Dxi(C1, C2) =|µC1
(xi)− µC2(xi)|2
σ2C1
(xi) + σ2C2
(xi)
where µCj (xi) and σ2Cj(xi) are the mean and variance of class Cj for the
ith feature xi.
M.R. Azimi Digital Image Processing
Segmentation & Feature Extraction Feature Selection Pattern Classification Unsupervised Cluster Discovery
The process is repeated for every feature and only those that have highdiscriminatory ability are selected. The next figure depicts the featurespace distribution for a 2D feature vector case.
=
2
1
xx
x
x1
x2 Feature Vector
µc1(x1) µc2(x1)
µc1(x2) µc2(x2)
σ2c2(x1)
σ2c1(x1)
σ2c2(x2)
σ2c1(x2)
Class 2
Class 1
Figure 4: Feature Selection using Fisher DF.
M.R. Azimi Digital Image Processing
Segmentation & Feature Extraction Feature Selection Pattern Classification Unsupervised Cluster Discovery
Area 6 - Pattern Classification
Goal:
Assign a pattern (picture, fingerprint, characters, speech, EKG, etc.) intoone of the M prescribed (known) classes.
A classifier maps the feature (pattern) space into classification decision(label) space i.e. performs a mapping x ∈ RN −→ io, io ∈ [1,M ] orio = f(x) where x is the N -dimensional feature vector and io = krepresents kth class. The function f(·) specifies the relation between theclassifier inputs and outputs or the “decision rule”.
Decision Regions & Surface: The decision rule divides the feature(pattern) space into M disjoint regions, Rk, k ∈ [1,M ], known as“Decision Regions” that are separated by “Decision Surfaces”, Si,j .
Discriminant Functions: Assuming that the classifier is already designed,the classification decision for an unknown pattern x is made bycomparing M scalar functions g1(x), g2(x), . . . gM (x), known as“discriminant functions” (DF’s).
M.R. Azimi Digital Image Processing
Segmentation & Feature Extraction Feature Selection Pattern Classification Unsupervised Cluster Discovery
Figure 5: Decision regions and surfaces.
A pattern x belongs to class j (Cj) iff
gj(x) > gk(x),∀k ∈ [1,M ], k 6= j ⇐⇒ x ∈ Cji.e. selecting the class with the largest DF.The decision surface, Sk,l, separating regions two contiguous decisionregions Rk, Rl satisfies
gk(x)− gl(x) = 0⇒ Sk,l
There are several types of classifiers that can be built depending on the
type of the DF’s generated.
M.R. Azimi Digital Image Processing
Segmentation & Feature Extraction Feature Selection Pattern Classification Unsupervised Cluster Discovery
Once the type of the DF is selected, the learning algorithm results in asolution for the unknown parameters of the DFs. Among typical DFs are:
Linear Classifier
Bayes Classifier
K-mean clustering
Vector Quantization
Neural Network (supervised vs. unsupervised)
1. Linear Classifier: A linear classifier (linear DF) constitutes ahyperplane in N -dimensional feature space. Thus, DF is
gi(x) = wiTx+ wi,N+1 =
N∑j=1
wi,jxj + wi,N+1, ∀i ∈ [1,M ]
Important Remark:A minimum distance classifier is a linear classifier. Assume that eachclass is represented by its “prototype” pattern (mean or centroid of eachgroup of patterns) ci, i ∈ [1,M ].
M.R. Azimi Digital Image Processing
Segmentation & Feature Extraction Feature Selection Pattern Classification Unsupervised Cluster Discovery
2. Minimum Distance Classifier: Makes its decision on pattern xbased upon the smallest Euclidean distance to a particular prototypepattern, i.e.
d(x, cj) < d(x, ck),∀k ∈ [1,M ], k 6= j ⇐⇒ x ∈ Cj
where d(x, cj) =‖ x− cj ‖2.Rewrite
‖ x− cj ‖2 = (x− cj)T (x− cj)= xTx− 2cj
Tx+ cjT cj
Since the term xTx is common to all the M expressions, it only sufficesto examine cj
Tx− 12cj
T cj . That is
x ∈ Cj ⇔ cjTx− 1
2cjT cj > ck
Tx− 1
2ckT ck,∀k ∈ [1,M ], k 6= j
This implies that gj(x) = cjTx− 1
2cjT cj .
M.R. Azimi Digital Image Processing
Segmentation & Feature Extraction Feature Selection Pattern Classification Unsupervised Cluster Discovery
Now, comparing with linear DF gj(x) = wjTx we have wj = cj and
wj,N+1 = − 12cj
T cj . Thus, a linear (minimum distance) classifier caneasily be built using the prototype patterns.Note: Linear classifier uses a deterministic DF.
3. Bayes Classification:Bayes decision is based upon minimizing the loss in making wrongdecisions. The decision rule follows
x ∈ Cj or Rj ⇔ p(Rj |x) > p(Rk|x),∀k ∈ [1,M ], k 6= j
where p(Rj |x) is the a posteriori class conditional PDF. Using Bayes rule
p(Rj |x) =p(x|Rj)P (Rj)
p(x)
Since the denominator is common to all classes, the decision rule can bemodified to
x ∈ Cj ⇔ p(x|Rj)P (Rj) > p(x|Rk)P (Rk),∀k ∈ [1,M ], k 6= j
M.R. Azimi Digital Image Processing
Segmentation & Feature Extraction Feature Selection Pattern Classification Unsupervised Cluster Discovery
Thus, in Bayes classification DF is gi(x) = p(x|Ri)P (Ri) i.e. notdeterministic. Alternatively, we can use gi(x) = ln[p(x|Ri)P (Ri)].Note that p(x|Ri) and P (Ri) can be computed from the “training data”.
Bayes Classifier for Gaussian CasesSuppose the distribution of patterns in each decision region Ri can berepresented by multi-variate Gaussian, i.e.
p(x|Ri) =1
(2π)N/2(Det(Ri))1/2e−
12 (x−µ
i)T Ri
−1(x−µi)
where µi
and Ri represent the mean vector and covariance matrix for the
ith class computed from the training data in each class. Then, assumingthat Ri = σ2
x I (i.e. features are uncorrelated) we have
gi(x) = ln[p(x|Ri)] + ln[P (Ri)]
= −N2ln(2πσ2
x)−1
2σ2x
(x− µi)T (x− µ
i) + ln[P (Ri)]
= − 1
2σ2x
(xTx− 2µiTx− µ
iTµ
i) + ln[P (Ri)] + ...
Since the term xTx is common to all the expressions, it could be ignored.
M.R. Azimi Digital Image Processing
Segmentation & Feature Extraction Feature Selection Pattern Classification Unsupervised Cluster Discovery
Remarks:1 In the Gaussian case, Bayes classifier becomes a linear one withwi = µ
i/σ2
x.2 Bayes minimizes the loss in misclassification independent of the
overlap between the distributions.
Figure 6: Bayes classification in 1-D.
M.R. Azimi Digital Image Processing
Segmentation & Feature Extraction Feature Selection Pattern Classification Unsupervised Cluster Discovery
Figure 7: Bayes classification in 2-D.
M.R. Azimi Digital Image Processing
Segmentation & Feature Extraction Feature Selection Pattern Classification Unsupervised Cluster Discovery
Cluster Discovery Networks
These systems use:
No a priori knowledge about distribution of the unlabeleddata.
Network learns the underlying distribution (statisticalproperties) of the data and forms clusters of data depending.
The number of clusters, K, must be determined based uponsome prior knowledge or expectations.
Perform some type of vector quantization (VQ).
Training and testing involves winner-take-all scheme.
Let S = x1, x2, · · · , xQ where xk is N -D, be the training set ofpatterns with unknown labels. The goal is to cluster them into Kclusters depending on their underlying distribution.
M.R. Azimi Digital Image Processing
Segmentation & Feature Extraction Feature Selection Pattern Classification Unsupervised Cluster Discovery
1. Initialization: The code-book vectors (weights) are firstinitialized using e.g., uniform distribution on unity hypersphere or”convex combination” method wi(0) =
1√N[1 1 · · · 1]t, i ∈ [1,K].
2. Winner-Take-All Learning : During the unsupervised trainingthe winner selection involves finding the kth cluster for whichk = argmini∈[1,m] ||wi − x|| or k = argmaxi∈[1,m]w
tix
The winner then updates (promoting the winner) its code-bookvector usingwk(l + 1) = wk(l) + α(l)(xl − wk(l))
While the losers don’t update their code-book vector i.e.wj(l + 1) = wj(l), ∀j ∈ [1,m], j 6= k3. Cluster Selection (testing): After training is completed (i.e.code-book vectors wi’s are established). Now, if for an unknownsample x, mth cluster is the winner i.e.ym = f(wt
mx) = maxi yi, i 6= m, then x ∈ cluster m.
M.R. Azimi Digital Image Processing
Segmentation & Feature Extraction Feature Selection Pattern Classification Unsupervised Cluster Discovery
Important Remarks:
1 At convergence, weight vectors represent centroids of clusters i.e.wi = µ
i
2 0 < α(l) < 1 is the step size in learning. For first 1000 steps, wechoose α(l) ≈ 0.99 thereafter use a monotonically decreasingfunction, e.g.,Linear: α(l) = α0(1− l−lp
lq), α0 = 0.9
lp is epoch at which decay starts and lq is time constant.
Exponential: α(l) = α0e(− l−lp
lq)
3 If the trained system is to be used for classification, clusters mustbe calibrated (required labeling after cluster formation with someknown patterns).
4 Works well for linearly separable clusters. It does not work well whencode-book vectors are overlapping (see Fig. shown). One remedy isto use large number of clusters to partition the clusters further.
M.R. Azimi Digital Image Processing
Segmentation & Feature Extraction Feature Selection Pattern Classification Unsupervised Cluster Discovery
Figure 8: Unsupervised clustering fails.
M.R. Azimi Digital Image Processing