Structural information
-
Upload
dieter-chang -
Category
Documents
-
view
37 -
download
1
description
Transcript of Structural information
Structural information
Structural information
• Structural information deals with geometry of objects
We are able to deal with very limited amounts of structural information
How to interpret structural information? We were showingbefore that this is difficult problem
We will introduce this by SHAPE CONTEXT method
We take now a very difficult case
Handwriting is very difficult:We recognizenumbers easily even if they are very distorted.What are the algorithms achieving this?
We think that first the contour of object is detectedas illustrated below
Next we think that location of points on the contour decide about the geometry of the object
• We need thus to measure the location of EACH contour point RELATIVE to all other points. In other words we need vectors from a point to all other points.
For example for point Z we need all 6 red vectors. Having all vectors for all pointsdescribes the object but is very complicated
Z
So now we reduce the description by using APPROXIMATEpolar coordinate net. The center of the net is located at each point at we only count HOW MANY other pointsare in each area of the net.
Shape histogram
• Shape histogram of a contour point ai is denoted by Hi and it is a vector obtained from the polar net by counting the number of points in each area
Hi = {hin=(#points in bin b), 0<k<M}
For a contour with M points we obtain a list
of m histograms.
Two contours are similar if the sum of
differences between the histograms is small.
Histogram differences
Hi - Hj =
m
kji kHkH
1
)()(
These are differences for two points i, jTaking differences for all contour points will result in the difference between contours. Two contours which are ver ysimilar will have very small difference
Example: Below we can see contours with point marked examples of histograms for points
Example: Here we see handwritten numbers and histograms of contour points marked in grey levels
Here we can see contours with points and the polarnet with areas marked in different colours
What counts is the number of points in each area and this forms histogram
Other methods - examples
• There are hundreds of other methods for
object retrieval and recognition
It is impossible to lecture about all of them since they are based on different principles.
To illustrate this we can look into an example of a best method known currently. This the method of eigenfaces which uses completely different principle.
EIGENFACES – global method
1. Construction of Face Space
Suppose a face image consists of N pixels, so it can be represented by a vector of dimension N. Let be the training set of face images. The average face of these M images is given by
Then each face differs from the average face by :
EIGENFACES
Now covariance matrix of the training images can be constructed:
where The basis vectors of the face space, i.e., the eigenfaces, are then the orthogonal eigenvectors of the covariance matrix .
The number of training images is usually less than the number of pixels in an image, there will be only M-1, instead of N, meaningful eigenvectors .
Eigenvalues, eigenvectors
x is eigenvector for matrix A, is eigenvalue
B = SAS-1
If S is an nonsingular n x n matrix then matrix B has the sameeigenvalues
nxn matrix has n eigenvalues
EIGENFACES
Therefore, the eigenfaces are computed by first finding the eigenvectors, , of the M by M matrix L:
The eigenvectors, , of the matrix are then expressed by a linear combination of the difference face images, , weighted by :
In practice, a smaller set of M'(M'<M) eigenfaces is sufficient for face identification. Hence, only M' significant eigenvectors of L, corresponding to the largest M' eigenvalues, are selected for the eigenface computation
Thus further data compression can be obtained. M' is determined by a threshold, , of the ratio of the eigenvalue summation:
In the training stage, the face of each known individual, , is projected into the face space and an M'-dimensional vector, , is obtained:
where is the number of face classes
A distance threshold, , that defines the maximum allowable distance from a face class as well as from the face space, is set up by computing half the largest distance between any two face classes:
In the recognition stage, a new image, , is projected into the face space to obtain a vector, :
The distance of to each face class is defined by
For the purpose of discriminating between face images and non-face like images, the distance, , between the original image, , and its reconstructed image from the eigenface space, , is also computed:
where
These distances are compared with the threshold given in equation (8) and the input image is classified by the following rules: •IF THEN input image is not a face image; •IF AND THEN input image contains an unknown face; •IF AND THEN input image contains the face of individual .
Experimental results
The eigenface-based face recognition method was tested on the ORL face database. 150 images of 15 individuals, were selected for experiments.
In the training stage, three images of each individual were used as the training samples, forming a training set totalling 45 images
The average face of the training set
Experimental results
The first 15 eigenfaces corresponding to the 15 largest eigenvalues.
Experimental results
Recognition rate depends on training images – when single view images are used for training recognition is much worse
Recognition rate
Experimental results
Faces with calm expressions in the training stage and faces of the same individual but with various expressions in the testing stage
Training images
Test images
lower imagesare projectionsin the face space
Experimental results
CONCLUSIONS
Eigenfaces method treat images globally, no localinformation is used. Compression is done on global level. The method requires lots of computations but results are good.
Explanation of good results:
images are represented as combinations of ”simple” imagesand the system is trained on them.
• THERE ARE MANY OTHER METHODS FOR OBJECT RECOGNITION AND REPRESENTATION. THEY CAN BE CLASSIFIED AS
- STRUCTURAL DESCRIPTIONS (WE MENTIONED ALREADY CHAIN CODES)
- TRANSFORM METHODS- TRAINING/LEARNING METHODS
BUT THERE ARE ALSO METHODBASED ON CLEVER TRICKS WHICHWORK VERY WELL… NEXT
• A TRANSFORM METHOD
HERE WE TRY TO TRANSFORM THE
PICTURE (OR OBJECT INFORMATION)
TO SOME OTHER DOMAIN TO GET
INFORMATION IN MORE CONVENIENT
FORM.
• THE METHOD OF MOMENTS
MOMENTS of ORDER p,q ARE DEFINED AS
....2,1,0,
),(
qp
dxdyyxfyxm qppq
MOMENT OF ORDER 1 FOR PHYSICAL OBJECTS WILL BE CENTER OF GRAVITY,IT IS OF COURSE NOT DEPENDENT HOW THE OBJECTIS LOCATED - IT IS THUS INVARIANT FOR LOCATION
• CENTRAL MOMENTS
x y
qppq
qppq
yxfyyxx
imagesdigitalfor
m
my
m
mx
where
qp
dxdyyxfyyxx
),()()(
,
....2,1,0,
),()()(
00
01
00
10
• HIGHER ORDER CENTRAL MOMENTS
102020 mxm
010202 mym
210203030 23 xmmxm
21002111212 22 ymmxmym
20120111221 22 xmmymxm
... AND SO ON...
101111 mym
• NEXT, NORMALIZED CENTRAL MOMENTS
ARE CREATED:
00
pqpq
AND INVARIANT MOMENTS:
02201 211
202202 4)(
OTHER MOMENTS ,....,, 543 CAN BE DEFINED TOO
• THESE MOMENTS ARE INVARIANT FOR
TRANSLATION, ROTATION, AND SCALE
CHANGE
THUS WHEN MOMENTS ARE CALCULATED, THEY WILL NOT CHANGE
WHEN OBJECT ROTATES OR CHANGES
SIZE. THIS IS VERY DESIRABLE FEATURE.
HOWEVER, MOMENTS ARE SENSITIVE FOR
NOISE AND ILLUMINATION CHANGE
• EXAMPLE: ROTATED AND SCALED OBJECT
HERE MOMENTS CALCULATION IS SHOWN, PLEASE NOTED THAT FOR TRANSFORMED PICTURE THEMOMENTS ARE CONSTANT
• PRACTICAL METHODS FOR DEALING WITH VISUAL OBJECTS:
- THEY ARE BASED ON SOME TRICKS
WHICH RESULT THAT THEY WORK
VERY WELL FOR SPECIFIC PROBLEM BUT THEY ARE NOT GENERAL
WE ILLUSTRATE THIS ON EXAMPLE OF PRACTICAL FACE TRACKING SYSTEM
• WHAT IS FACE TRACKING?
THERE IS CAMERA IN FRONT OF PC
AND SOFTWARE WHICH ALLOWS TO MARK THE FACE LOCATION AND POSITION OF USER SITTING AT THE DISPLAY
HERE WE DESCRIBE A METHOD AND SYSTEM FOR
FACE TRACKING WHICH IS QUITE SIMPLE,
ROBUST AND RUNS IN REAL TIME ON PC!
THE METHOD IS BASED ON FACE COLOR
HISTOGRAM STATISTICS AND MOMENTS
HERE IS THE BLOCK
DIAGRAM OF FACE TRACKING
ALGORITHM.
FIRST THE COLOR IMAGE IS CONVERTEDTO HUE, SATURATION, INTENSITY.NEXT SKIN COLOR HISTOGRAM IS CALCULATEDFINALLY MOMENTS ARE CALCULATED AD WINDOWSIZE IS ADJUSTEDITERATIVELY
• SKIN COLOR HISTOGRAM
COLOR = HUE IN THE HSI
REPRESENTATION
PEOPLE HAVE THE SAMESKIN COLOR (HUE) ONLY SATURATION IS DIFFERENT
SATURATIONLEVELS CHANGE
HERE IS THE DISTIRBUTIONOF PLACES CORRESPONDINGTO FACE ”COLOR”
FIRST WE SELECT WINDOW OF CERTAIN SIZE.NEXT CALCULATE ZEROTH AND FIRST MOMENTS IN THIS WINDOW
x y
yxIm ),(00
x yx y
yxyImyxxIm ),(),( 0110
COLOR IS GOOD FEATURE IF WE HAVE A COLOR CAMERA.HAVING FACE COLOR DISTRIBUTION WE CAN TREATIT AS TWO-DIMENSIONAL FUNCTION I(x,y) AND CALCULATE:
NEXT NEW CENTER OF THE WINDOW
IS CALCULATED
00
01
00
10
m
my
m
mx cc
AFTER ITERATING THIS CALCULATIONTHE ALGORITHM WILL CONVERGE TOSPECIFIC POSITION
HOW THE WINDOW SIZE IS SELECTED?
IT DEPENDS ON THE SIZE OF FACE.
THUS IT IS ADJUSTED ITERATIVELY
STARTING WITH SIZE 3
WE THEN SELECT WINDOW SIZE TO BE
2m0/max pixel value
BY THIS, THE WINDOW POSITION AND SIZE
IS CONTINUOUSLY ADAPTED UNTIL
IT WILL STABILIZE
THIS CAN THUS BE USED FOR FACE
TRACKING
• THIS PROCESS IS
ILLUSTRATED HERE , START IS FROM SMALL
WINDOW SIZE, THE SIZE IS
ADJUSTED AND CENTER OF THE WINDOW IS MOVED UNTIL IT STABILIZES
HERE THE FACE HAS MOVED,IN THE NEXT PICTURE THEWINDOW WILL ALSO MOVETO NEW POSITION
• THIS ALGORITHM IS SURPRISINGLY
ROBUST
NOISE DOES NOT HARM IT
AND AS WE CAN SEEIT IS ROBUST AGAINSTDISTRACTORS:ANOTHER FACE ONTHE LEFTHAND ON THE RIGHT
• THE METHOD CAN BE ALSO USED FOR
EVALUATION OF HEAD ROLL, WIDTH
AND LENGTH
ROLL
• PARAMETERS FOR HEAD POSITION
CAN BE CALCULATED BASED ON THE
SYMMETRY OF LENGTH L AND WIDTH W
THIS SYSTEM CAN BE USED FOR FACE
TRACKING E.G. FOR INTERFACE TO
COMPUTER GAMES
ANOTHER EXAMPLE:AMBULATORY VIDEO
COMPUTER WITH CAMERA
WEARABLE BY USER
THE GOAL IS TO BUILD COMPUTER
WHICH WILL KNOW WHERE THE USER IS
The user is wearing small camera attached e.g. to
head. The camera produces circular picture
which are not very good but good enough
HOW TO RECOGNIZE WHERE THE USER IS ? (E.G. ROOM, STREET)
FIRST, SPLIT VIDEO INTO LIGHT
INTENSITY I AND CHROMINANCES IN
VERY APPROXIMATE WAY:
I=R+G+B Cr=R/I Cg=G/I
SECOND, SEGEMENT THEPICTURE INTO REGIONS,CALCULATE PARAMETERS FOR EACH,MEAN AND COVARIANCE
• FOR EACH ENVIRONMENT
THERE WILL BE DIFFERENT STATISTICAL
DISTRIBUTIONS OF SIGNALS , WE CAN USE
THEM TO FIND TO WHICH CLASS
RECORDED VIDEO BELONGS
FOR 2 HOURS OF RECORDING
RESULTS ARE VERY GOOD
Label Correlation Coeff.
Office 0.9124
Lobby 0.7914
Bedroom
0.8620
Cashier
0.8325
OVERALL CONCLUSION
• WE ARE LACKING GENERAL SOLUTION TO OBJECT REPRESENTATION AND RECOGNITION PROBLEMS WHICH WOULD BE AS EFFECTIVE THE BIOLOGICAL SYSTEMS
• THERE ARE MANY APPROACHES FOR SOLUTION, WE PRESENTED APPROACH BASED ON STATISTICS OF QUANTIZED BLOCK TRANSFORM FEATURES
• THERE ARE APPROACHES BASED ON CLEVER TRICKS WHICH WORK WELL FOR SPECIFIC PROBLEMS