Autoregressive and Random Field
Transcript of Autoregressive and Random Field
Wei-Ta Chu
Autoregressive and Random FieldTexture Models
1
2009/11/5
Multimedia Content Analysis, CSIE, CCU
Announcement of Homework #2–Content-Based Image Retrieval
2
Goal: develop a basic CBIR system or utilize an opensource library to build a CBIR system
Requirements (case 1): 1. Write programs that use at least one color-based feature
and one texture-based feature to automatically perform
Multimedia Content Analysis, CSIE, CCU
and one texture-based feature to automatically performCBIR
2. Write a report that describes2.1. How to run your program2.2. What kinds of features, distance metrics, and algorithms you
used or compared.2.3. Detection performance in precision and recall, or even
ROC/PR curves
Announcement of Homework #2–Content-Based Image Retrieval
3
Requirements (case 2):1. Setup a CBIR system based on an open source
library2. Write a report that describes2.1. How to setup this system, including environment setting,
Multimedia Content Analysis, CSIE, CCU
2.1. How to setup this system, including environment setting,parameter setting, and etc.
2.2. How can we write a CBIR program based on this library2.3. What kinds of features, distance metrics, and algorithms
the library used.2.4. Detection performance in precision and recall, or even
ROC/PR curves
Announcement of Homework #2–Content-Based Image Retrieval
4
Evaluation data http://www.cs.ccu.edu.tw/~wtchu/courses/2009f_MCA/assignments.html
Homework submissionPack your programs and report into one zip file, and
Multimedia Content Analysis, CSIE, CCU
Pack your programs and report into one zip file, andupload to eCourse.
Deadline: 12:00, Nov. 22, 2009
Grade will be given based on retrievalperformance and descriptions in your report.
Random Field5
Think of a textured image as a 2D array of randomnumbers. The pixel intensity at each location is arandom variable.
One can model the image as a function f(r,w), where r
Multimedia Content Analysis, CSIE, CCU
One can model the image as a function f(r,w), where ris the position vector representing the pixel location,and w is a random parameter.
Once we select a specific texture w, f(r,w) is an image. f(r,w) is called a random field
Random Field Model6
A typical random field model is characterized by aset of neighbors.
Given an array of observations of pixel-intensityvalues {y(s)}, it’s natural to expect that the pixel
Multimedia Content Analysis, CSIE, CCU
values {y(s)}, it’s natural to expect that the pixel values are locally correlated.
Markov model
Simultaneous Autoregressive Model(SAR)
7
A special case of Markov random field
Multimedia Content Analysis, CSIE, CCU
Multiresolution SAR (MRSAR)8
It’s not trivial to determine the appropriate size of the neighborhood.
The MRSAR model tries to account for the variabilityof texture primitives by defining the SAR model at
Multimedia Content Analysis, CSIE, CCU
of texture primitives by defining the SAR model atdifferent resolutions.
SAR
SAR
SAR
Original image Image pyramid
Wei-Ta Chu
Spectral Texture Features9
2009/11/5
Multimedia Content Analysis, CSIE, CCU
Introduction10
Any function that is periodically repeatscan be expressed as the sum of sinesand/or cosines of different frequencies,each multiplied by a differentcoefficient–Fourier series.
Multimedia Content Analysis, CSIE, CCU
coefficient–Fourier series. Even functions that are not periodic can be
expressed as the integral of sines and/orcosines multiplied by a weighting function.The formulation is the Fourier transform.
Definition of the Fourier Transform11
Forward Continuous-Time Fourier Transform
Inverse Continuous-Time Fourier Transform
The forward transform is an analysis integral becauseit extracts spectrum information
The inverse transform is a synthesis integral becauseit is used to create the time-domain signal from itsspectral information.
Inverse Continuous-Time Fourier Transform
Definition of the Fourier Transform12
Time domain and frequency domain
It is common to say that we take the Fouriertransform of x(t), meaning that we determinetransform of x(t), meaning that we determineso that we can use the frequency-domainrepresentation of the signal.
We often say that we take the inverse Fouriertransform to go from the frequency-domain to thetime-domain.
Example: Forward Fourier Transform13
Consider the one-sided exponential signal
Take the Fourier transform of x(t)
Time-Domain Frequency-Domain
Rectangular Pulse Signals14
Consider the rectangular pulse
The Fourier transform is The Fourier transform is
Time-Domain Frequency-Domain
Rectangular Pulse Signals15
The Fourier transform of the rectangular pulsesignal is called a sinc function.
The formal definition of a sincfunction isfunction is
Time-Domain Frequency-Domain
Discrete Fourier Transform16
One-dimensional DFT
for u= 0, 1, 2, …, M-1
for x= 0, 1, 2, …, M-1
Multimedia Content Analysis, CSIE, CCU
for x= 0, 1, 2, …, M-1
In order to compute F(u), we start by substituting u = 0 in the exponential termand then summing for all values of x. We then substitute u= 1 …Like f(x), the transform is a discrete quantity, and it has the same number ofcomponents as f(x).
Discrete Fourier Transform17
Euler’s formula:
Multimedia Content Analysis, CSIE, CCU
Each term of the Fourier transform (the value of F(u)) is composed of the sum of allvalues of the function f(x).The domain (values of u) over which the values of F(u) range is called thefrequency domain, because u determines the frequency of the components of thetransform. Each of the M terms of F(u) is called a frequency component of thetransform.
Discrete Fourier Transform18
Express F(u) in polar coordinates:
Magnitude or spectrum
Phase angle or phase spectrum
Discrete Fourier Transform19
Two-dimensional DFT
Multimedia Content Analysis, CSIE, CCU
Images in Frequency Domain20
Multimedia Content Analysis, CSIE, CCU
Gonzalez and Woods, Chapter 4 of Digital Image Processing, Prentice-Hall, 2001.
Images and Their FT21
Multimedia Content Analysis, CSIE, CCU
Frequency Domain Features22
Fourier domain energy distribution Angular features (directionality)
u
v
Multimedia Content Analysis, CSIE, CCU
Radial features (coarseness)
Uniform division may not be the best
u
v
Gabor Texture23
The Gabor representation has been shown to beoptimal in the sense of minimizing the joint two-dimensional uncertainty in space and frequency.
These filters can be considered as orientation and These filters can be considered as orientation andscale tunable edge and line (bar) detectors.
The statistics of these microfeatures in a givenregion are often used to characterize theunderlying texture information.
B.S. Manjunathand W.Y. Ma, “Texture features for browsing and retrieval of image data,” IEEE Trans. on PAMI, vol. 18, no. 8, 1996, pp. 837-842.
Gabor Texture24
Fourier coefficients depend on the entire image (Global) →we lose spatial information
Objective: local spatial frequency analysis Gabor kernels: looks like Fourier basis multiplied by a
Gaussian
Multimedia Content Analysis, CSIE, CCU
Gaussian Gabor filters come in pairs: symmetric and anti-symmetric
We need to apply a number of Gabor filters at differentscales, orientations, and spatial frequencies
Symmetric kernel
Anti-symmetric kernel
Gabor Texture25
Image I(x,y) convoluted with Gabor filters hmn (totally M x N)
Using first and 2nd moments for each scale and orientations
Multimedia Content Analysis, CSIE, CCU
Features: e.g., 4 scales, 6 orientations→ 48 dimensions
evenodd
Gabor Texture26
scale
Multimedia Content Analysis, CSIE, CCU
Arranging the mean energy in a 2D form structured: localized pattern oriented (or directional): column pattern granular: row pattern random: random pattern
orientation
Homogeneous Texture Descriptor27
Frequency plane partition is uniform along the angular direction (30º), non-uniform alongthe radial direction (on an octave scale)
B.S. Manjunathand W.Y. Ma, “Texture features for browsing and retrieval of image data,” IEEE Trans. on PAMI, vol. 18, no. 8, 1996, pp. 837-842.
Gabor Function28
On the top of the feature channel, the following 2D Gaborfunction (modulated Gaussian) is applied to each individualchannels.
Equivalent to weighting the Fourier transform coefficients of the Equivalent to weighting the Fourier transform coefficients of theimage with a Gaussian centered at the frequency channels asdefined above
Each channel filters a specific type of texture
Homogeneous Texture Descriptor29
Partition the frequency domain into 30 channels(modeled by a 2D Gabor function)
Computing the energy and energy deviation foreach channel
Multimedia Content Analysis, CSIE, CCU
each channel Computing the mean and standard deviation of
frequency coefficients HTD = {fDC, fSD, e1,e2,…,e30,d1,d2,…,d30}
fDC and fSD are the mean and standard deviation of the imageei and di are the mean energy and energy deviation of the corresponding ith channel
Distance Measure30
Resources: http://vision.ece.ucsb.edu/texture/feature.htmlOn-line demo: http://vision.ece.ucsb.edu/texture/mpeg7/index.html
B.S. Manjunathand W.Y. Ma, “Texture features for browsing and retrieval of image data,” IEEE Trans. on PAMI, vol. 18, no. 8, 1996, pp. 837-842.
Example: Browsing Satellite Images31
Find a vegetation patch that looks like this region
Multimedia Content Analysis, CSIE, CCU
B.S. Manjunathand W.Y. Ma, “Texture features for browsing and retrieval of image data,” IEEE Trans. on PAMI, vol. 18, no. 8, 1996, pp. 837-842.
Example: Browsing Satellite Images32
(b) parts of highway (c) region containing some buildings (center of the image
toward the left) (d) a number marked on the image (lower left corner)
Multimedia Content Analysis, CSIE, CCU
Wavelet Features33
Wavelet transforms refer to the decomposition of a signal witha family of basis functions with recursive filtering andsubsampling
At each level, it decomposes a 2D signal into four subbands,which are often referred to as LL, LH, HL, HH (L=low, H=high)which are often referred to as LL, LH, HL, HH (L=low, H=high)
LL2 HL2HL1
LH2 HH2
LH1 HH1
Wavelet Features34
Using the mean and standard deviation of the energydistribution in each subband at each level.
PWT (Pyramid-structured wavelet transform) Recursively decompose the LL band Results in 30-dimensional feature vector (3x3x2+2=30) Results in 30-dimensional feature vector (3x3x2+2=30)
TWT (Tree-structured wavelet transform) Some information appears in the middle frequency channels–
decomposition is not restricted to the LL band Results in 40x2 = 80 dimensional feature vector
Original image PWT TWT
T. Chang and C.C.J. Kuo, “Texture analysis and classification with tree-structure wavelet transform,” IEEE Trans. On Image Processing, vol. 2, no. 4, 1993, pp. 429-441.
Wei-Ta Chu
Edge Histogram Descriptor35
2009/11/5
Multimedia Content Analysis, CSIE, CCU
Park, et al. “Efficient use of local edge histogram descriptor,” Proc. of ACM International Workshop on Standards, Interoperability and Practices, pp. 51-54, 2000.
Introduction36
Spatial distribution of edges Edge histogram descriptor (EHD)
Dividing the image into 4x4 subimages, and generatethe edge histogram based on the edges in thethe edge histogram based on the edges in thesubimages. Edges are categorized into five types: vertical, horizontal,
45º diagonal, 135º diagonal, and nondirectional edges. A total of 5x16=80 histogram bins
Local Edge Histogram37
Global, Semi-global, and LocalHistograms
38
Global-edge histogram Accumulate five types of edge distributions for all subimages
Semiglobal-edge histogram
Multimedia Content Analysis, CSIE, CCU
Image Matching39
Combining the local, the semiglobal, and global histogramtogether.
Total of 150 bins 80 bins (local) + 5 bins (global) + 65 bins (13x5, semiglobal)
The L distance measure D(A,B) can be:
Multimedia Content Analysis, CSIE, CCU
The L1 distance measure D(A,B) can be:
This feature is one of the MPEG-7 texture descriptors.
Performance Comparison40
Retrieval performance of different texture features for the Corel photo databases.
L1 distance is used to computing the dissimilarity between images.
For the MRSAR, Mahalanobis distance is used.
MRSAR (M)#relevant images
GaborTWTPWT
MRSAR
Tamura (improved)
Coarseness histogramDirectionalityEdge histogramTamura (traditional)
#top matches considered
Manjunath and Ma, Chapter12 of Image Database:Search and Retrieval of DigitalImagery, edited by V. Castelliand L.D. Bergman, John Wiley& Sons, 2002.
Performance Comparison41
Retrieval performance of different texture featuresfor the Brodatz texture image set.
GaborPercentage ofretrieving all MRSAR (M)
Gabor
TWTPWT
MRSARTamura (improved)
Coarseness histogramDirectionalityEdge histogram
Tamura (traditional)
#top matches considered
retrieving allcorrect patterns
Wei-Ta Chu
Shape for CBIR42
2009/11/5
Multimedia Content Analysis, CSIE, CCU
Shape Features43
MPEG-7 provides contour-based shape and region-based shape tools.
region-basedsimilarity
Multimedia Content Analysis, CSIE, CCU
contour-basedsimilarity
similarity
Bober, “MPEG-7 visual shapedescriptors”, IEEE Trans. On CSVT, vol. 11, no. 6, pp. 716-719, 2001.
Region-Based Shape Descriptor44
The region-based SD expressed pixel distributionwithin a 2D object or region.
It can describe complex objects consisting ofmultiple disconnected regions.
Multimedia Content Analysis, CSIE, CCU
multiple disconnected regions. 2D Angular Radial Transformation (ART)
Gives a compact and efficient way of describingmultiple disjoint regions
Robust to segmentation noise
Angular Radical Transform (ART)45
For each image, a set of ART coefficients Fnm is extracted:
Multimedia Content Analysis, CSIE, CCU
•The MPEG-7 Visual Part of the XM 4.0, ISO/IECMPEG99/W3068, Dec. 1999.•W.-Y. Kim and Y.-S. Kim, “A New Region-BasedShape Descriptor,” ISO/IEC MPEG99/M5472, Maui, Hawaii, Dec. 1999.
Contour-Based Shape Descriptor46
The contour SD is based on theCurvature Scale-Space (CSS)representation of the contour. Distinguish between shapes that have similar
region-based shape (b)
Multimedia Content Analysis, CSIE, CCU
Support search for shapes that aresemantically similar, even significant intra-class variability (c)
Robust to significant nonrigid deformations (d) and to perspective transformation (e)
Curvature Scale-Space (CSS)47
When comparing shapes, humans tend todecompose shape contours into concave and convexsections.Features: How prominent they are, their length relative
Multimedia Content Analysis, CSIE, CCU
Features: How prominent they are, their length relativeto the contour length, and their position and order onthe contour
CSS representation decomposes the contour into convexand concave sections by determining the reflectionpoints (points at which curvature is zero)
Curvature Scale-Space (CSS)48
CSS image shows how the inflection points change whenfiltering is applied to the contour X-axis corresponds to the position on the contour (clockwise, starting
from any arbitrary point) Y-axis corresponds to the values of a shape smooth parameter (when y-
Multimedia Content Analysis, CSIE, CCU
Y-axis corresponds to the values of a shape smooth parameter (when y-values increase, amount of smoothing increases)
Any black point in the CSS image signifies that at the correspondingposition and at the corresponding scale, there is an inflection point.
Curvature Scale-Space (CSS)49
The smoothing is performed iteratively and for each level, the zero crossings of thecurvature function are computed.
The CSS image is obtained by plotting all zero-crossing points on a plane
Mokhtarian and Mackworth, “A theory of multiscale, curvature-basedshape representation for planar curves,” IEEE Trans. on PAMI, vol. 14, no. 8, pp. 789-805, 1992.
Shape Descriptor50
Based on CSS images, the descriptor consists of Eccentricity (偏移量) and circularity (環狀) values of the
original and filtered contour Number of peaks
The magnitude (height) of the largest peak
Multimedia Content Analysis, CSIE, CCU
The magnitude (height) of the largest peak The x and y positions on the remaining peaks
Chapter 15 of Introduction to MPEG-7: Multimedia ContentDescription Interface. Edited by Manjunath, et al., John Wiley & Sons,2002.
Example: The QBIC System51
Example: The QBIC System52
ColorColor histogram
TextureCoarseness, contrast, directionality
Multimedia Content Analysis, CSIE, CCU
Coarseness, contrast, directionality
ShapeArea, circularity, eccentricity, major-axis direction
Fusion of multiple types of features often givesbetter performance.
References53
Tamura, et al. "Textural feature corresponding to visualperception,"IEEE Trans. on Systems, Man, and Cybernetics, vol.SMC-8, no. 6, pp. 460-473, 1978.
Park, et al. “Efficient use of local edge histogram descriptor,” Proc. of ACM International Workshop on Standards,
Multimedia Content Analysis, CSIE, CCU
Proc. of ACM International Workshop on Standards,Interoperability and Practices, pp. 51-54, 2000.
Manjunath and Ma, Chapter 12 of Image Database: Searchand Retrieval of Digital Imagery, edited by V. Castelli and L.D.Bergman, John Wiley & Sons, 2002.
Bober, “MPEG-7 visual shape descriptors”, IEEE Trans. on CSVT, vol. 11, no. 6, pp. 716-719, 2001.
Next Week54
Multidimensional Indexing Techniques
Multimedia Content Analysis, CSIE, CCU