Structural information

Structural information

• Structural information deals with geometry of objects

We are able to deal with very limited amounts of structural information

How to interpret structural information? We were showingbefore that this is difficult problem

We will introduce this by SHAPE CONTEXT method

We take now a very difficult case

Handwriting is very difficult:We recognizenumbers easily even if they are very distorted.What are the algorithms achieving this?

We think that first the contour of object is detectedas illustrated below

Next we think that location of points on the contour decide about the geometry of the object

• We need thus to measure the location of EACH contour point RELATIVE to all other points. In other words we need vectors from a point to all other points.

For example for point Z we need all 6 red vectors. Having all vectors for all pointsdescribes the object but is very complicated

Z

So now we reduce the description by using APPROXIMATEpolar coordinate net. The center of the net is located at each point at we only count HOW MANY other pointsare in each area of the net.

Shape histogram

• Shape histogram of a contour point ai is denoted by Hi and it is a vector obtained from the polar net by counting the number of points in each area

Hi = {hin=(#points in bin b), 0<k<M}

For a contour with M points we obtain a list

of m histograms.

Two contours are similar if the sum of

differences between the histograms is small.

Histogram differences

Hi - Hj =

m

kji kHkH

1

)()(

These are differences for two points i, jTaking differences for all contour points will result in the difference between contours. Two contours which are ver ysimilar will have very small difference

Example: Below we can see contours with point marked examples of histograms for points

Example: Here we see handwritten numbers and histograms of contour points marked in grey levels

Here we can see contours with points and the polarnet with areas marked in different colours

What counts is the number of points in each area and this forms histogram

Other methods - examples

• There are hundreds of other methods for

object retrieval and recognition

It is impossible to lecture about all of them since they are based on different principles.

To illustrate this we can look into an example of a best method known currently. This the method of eigenfaces which uses completely different principle.

EIGENFACES – global method

1. Construction of Face Space

Suppose a face image consists of N pixels, so it can be represented by a vector of dimension N. Let be the training set of face images. The average face of these M images is given by

Then each face differs from the average face by :

EIGENFACES

Now covariance matrix of the training images can be constructed:

where The basis vectors of the face space, i.e., the eigenfaces, are then the orthogonal eigenvectors of the covariance matrix .

The number of training images is usually less than the number of pixels in an image, there will be only M-1, instead of N, meaningful eigenvectors .

Eigenvalues, eigenvectors

x is eigenvector for matrix A, is eigenvalue

B = SAS-1

If S is an nonsingular n x n matrix then matrix B has the sameeigenvalues

nxn matrix has n eigenvalues

EIGENFACES

Therefore, the eigenfaces are computed by first finding the eigenvectors, , of the M by M matrix L:

The eigenvectors, , of the matrix are then expressed by a linear combination of the difference face images, , weighted by :

In practice, a smaller set of M'(M'<M) eigenfaces is sufficient for face identification. Hence, only M' significant eigenvectors of L, corresponding to the largest M' eigenvalues, are selected for the eigenface computation

Thus further data compression can be obtained. M' is determined by a threshold, , of the ratio of the eigenvalue summation:

In the training stage, the face of each known individual, , is projected into the face space and an M'-dimensional vector, , is obtained:

where is the number of face classes

A distance threshold, , that defines the maximum allowable distance from a face class as well as from the face space, is set up by computing half the largest distance between any two face classes:

In the recognition stage, a new image, , is projected into the face space to obtain a vector, :

The distance of to each face class is defined by

For the purpose of discriminating between face images and non-face like images, the distance, , between the original image, , and its reconstructed image from the eigenface space, , is also computed:

where

These distances are compared with the threshold given in equation (8) and the input image is classified by the following rules: •IF THEN input image is not a face image; •IF AND THEN input image contains an unknown face; •IF AND THEN input image contains the face of individual .

Experimental results

The eigenface-based face recognition method was tested on the ORL face database. 150 images of 15 individuals, were selected for experiments.

In the training stage, three images of each individual were used as the training samples, forming a training set totalling 45 images

The average face of the training set


The first 15 eigenfaces corresponding to the 15 largest eigenvalues.


Recognition rate depends on training images – when single view images are used for training recognition is much worse

Recognition rate


Faces with calm expressions in the training stage and faces of the same individual but with various expressions in the testing stage

Training images

Test images

lower imagesare projectionsin the face space


CONCLUSIONS

Eigenfaces method treat images globally, no localinformation is used. Compression is done on global level. The method requires lots of computations but results are good.

Explanation of good results:

images are represented as combinations of ”simple” imagesand the system is trained on them.

• THERE ARE MANY OTHER METHODS FOR OBJECT RECOGNITION AND REPRESENTATION. THEY CAN BE CLASSIFIED AS

- STRUCTURAL DESCRIPTIONS (WE MENTIONED ALREADY CHAIN CODES)

- TRANSFORM METHODS- TRAINING/LEARNING METHODS

BUT THERE ARE ALSO METHODBASED ON CLEVER TRICKS WHICHWORK VERY WELL… NEXT

• A TRANSFORM METHOD

HERE WE TRY TO TRANSFORM THE

PICTURE (OR OBJECT INFORMATION)

TO SOME OTHER DOMAIN TO GET

INFORMATION IN MORE CONVENIENT

FORM.

• THE METHOD OF MOMENTS

MOMENTS of ORDER p,q ARE DEFINED AS

....2,1,0,

),(

qp

dxdyyxfyxm qppq

MOMENT OF ORDER 1 FOR PHYSICAL OBJECTS WILL BE CENTER OF GRAVITY,IT IS OF COURSE NOT DEPENDENT HOW THE OBJECTIS LOCATED - IT IS THUS INVARIANT FOR LOCATION

• CENTRAL MOMENTS

x y

qppq

qppq

yxfyyxx

imagesdigitalfor

m

my

m

mx

where

qp

dxdyyxfyyxx

),()()(

,

....2,1,0,

),()()(

00

01

00

10

• HIGHER ORDER CENTRAL MOMENTS

102020 mxm

010202 mym

210203030 23 xmmxm

21002111212 22 ymmxmym

20120111221 22 xmmymxm

... AND SO ON...

101111 mym

• NEXT, NORMALIZED CENTRAL MOMENTS

ARE CREATED:

00

pqpq

AND INVARIANT MOMENTS:

02201 211

202202 4)(

OTHER MOMENTS ,....,, 543 CAN BE DEFINED TOO

• THESE MOMENTS ARE INVARIANT FOR

TRANSLATION, ROTATION, AND SCALE

CHANGE

THUS WHEN MOMENTS ARE CALCULATED, THEY WILL NOT CHANGE

WHEN OBJECT ROTATES OR CHANGES

SIZE. THIS IS VERY DESIRABLE FEATURE.

HOWEVER, MOMENTS ARE SENSITIVE FOR

NOISE AND ILLUMINATION CHANGE

• EXAMPLE: ROTATED AND SCALED OBJECT

HERE MOMENTS CALCULATION IS SHOWN, PLEASE NOTED THAT FOR TRANSFORMED PICTURE THEMOMENTS ARE CONSTANT

• PRACTICAL METHODS FOR DEALING WITH VISUAL OBJECTS:

- THEY ARE BASED ON SOME TRICKS

WHICH RESULT THAT THEY WORK

VERY WELL FOR SPECIFIC PROBLEM BUT THEY ARE NOT GENERAL

WE ILLUSTRATE THIS ON EXAMPLE OF PRACTICAL FACE TRACKING SYSTEM

• WHAT IS FACE TRACKING?

THERE IS CAMERA IN FRONT OF PC

AND SOFTWARE WHICH ALLOWS TO MARK THE FACE LOCATION AND POSITION OF USER SITTING AT THE DISPLAY

HERE WE DESCRIBE A METHOD AND SYSTEM FOR

FACE TRACKING WHICH IS QUITE SIMPLE,

ROBUST AND RUNS IN REAL TIME ON PC!

THE METHOD IS BASED ON FACE COLOR

HISTOGRAM STATISTICS AND MOMENTS

HERE IS THE BLOCK

DIAGRAM OF FACE TRACKING

ALGORITHM.

FIRST THE COLOR IMAGE IS CONVERTEDTO HUE, SATURATION, INTENSITY.NEXT SKIN COLOR HISTOGRAM IS CALCULATEDFINALLY MOMENTS ARE CALCULATED AD WINDOWSIZE IS ADJUSTEDITERATIVELY

• SKIN COLOR HISTOGRAM

COLOR = HUE IN THE HSI

REPRESENTATION

PEOPLE HAVE THE SAMESKIN COLOR (HUE) ONLY SATURATION IS DIFFERENT

SATURATIONLEVELS CHANGE

HERE IS THE DISTIRBUTIONOF PLACES CORRESPONDINGTO FACE ”COLOR”

FIRST WE SELECT WINDOW OF CERTAIN SIZE.NEXT CALCULATE ZEROTH AND FIRST MOMENTS IN THIS WINDOW

x y

yxIm ),(00

x yx y

yxyImyxxIm ),(),( 0110

COLOR IS GOOD FEATURE IF WE HAVE A COLOR CAMERA.HAVING FACE COLOR DISTRIBUTION WE CAN TREATIT AS TWO-DIMENSIONAL FUNCTION I(x,y) AND CALCULATE:

NEXT NEW CENTER OF THE WINDOW

IS CALCULATED

00

01

00

10

m

my

m

mx cc

AFTER ITERATING THIS CALCULATIONTHE ALGORITHM WILL CONVERGE TOSPECIFIC POSITION

HOW THE WINDOW SIZE IS SELECTED?

IT DEPENDS ON THE SIZE OF FACE.

THUS IT IS ADJUSTED ITERATIVELY

STARTING WITH SIZE 3

WE THEN SELECT WINDOW SIZE TO BE

2m0/max pixel value

BY THIS, THE WINDOW POSITION AND SIZE

IS CONTINUOUSLY ADAPTED UNTIL

IT WILL STABILIZE

THIS CAN THUS BE USED FOR FACE

TRACKING

• THIS PROCESS IS

ILLUSTRATED HERE , START IS FROM SMALL

WINDOW SIZE, THE SIZE IS

ADJUSTED AND CENTER OF THE WINDOW IS MOVED UNTIL IT STABILIZES

HERE THE FACE HAS MOVED,IN THE NEXT PICTURE THEWINDOW WILL ALSO MOVETO NEW POSITION

• THIS ALGORITHM IS SURPRISINGLY

ROBUST

NOISE DOES NOT HARM IT

AND AS WE CAN SEEIT IS ROBUST AGAINSTDISTRACTORS:ANOTHER FACE ONTHE LEFTHAND ON THE RIGHT

• THE METHOD CAN BE ALSO USED FOR

EVALUATION OF HEAD ROLL, WIDTH

AND LENGTH

ROLL

• PARAMETERS FOR HEAD POSITION

CAN BE CALCULATED BASED ON THE

SYMMETRY OF LENGTH L AND WIDTH W

THIS SYSTEM CAN BE USED FOR FACE

TRACKING E.G. FOR INTERFACE TO

COMPUTER GAMES

ANOTHER EXAMPLE:AMBULATORY VIDEO

COMPUTER WITH CAMERA

WEARABLE BY USER

THE GOAL IS TO BUILD COMPUTER

WHICH WILL KNOW WHERE THE USER IS

The user is wearing small camera attached e.g. to

head. The camera produces circular picture

which are not very good but good enough

HOW TO RECOGNIZE WHERE THE USER IS ? (E.G. ROOM, STREET)

FIRST, SPLIT VIDEO INTO LIGHT

INTENSITY I AND CHROMINANCES IN

VERY APPROXIMATE WAY:

I=R+G+B Cr=R/I Cg=G/I

SECOND, SEGEMENT THEPICTURE INTO REGIONS,CALCULATE PARAMETERS FOR EACH,MEAN AND COVARIANCE

• FOR EACH ENVIRONMENT

THERE WILL BE DIFFERENT STATISTICAL

DISTRIBUTIONS OF SIGNALS , WE CAN USE

THEM TO FIND TO WHICH CLASS

RECORDED VIDEO BELONGS

FOR 2 HOURS OF RECORDING

RESULTS ARE VERY GOOD

Label Correlation Coeff.

Office 0.9124

Lobby 0.7914

Bedroom

0.8620

Cashier

0.8325

OVERALL CONCLUSION

• WE ARE LACKING GENERAL SOLUTION TO OBJECT REPRESENTATION AND RECOGNITION PROBLEMS WHICH WOULD BE AS EFFECTIVE THE BIOLOGICAL SYSTEMS

• THERE ARE MANY APPROACHES FOR SOLUTION, WE PRESENTED APPROACH BASED ON STATISTICS OF QUANTIZED BLOCK TRANSFORM FEATURES

• THERE ARE APPROACHES BASED ON CLEVER TRICKS WHICH WORK WELL FOR SPECIFIC PROBLEMS

Structural information

Documents

Transcript of Structural information