BODYFYP(baru)

8/7/2019 BODYFYP(baru)

1/47

1

CHAPTER 1

INTRODUCTION

Pattern recognition approaches have achieved measurable success in the domain

of visual detection. Examples include face, automobile, and pedestrian detection [10],

[11], [13], [1], [9]. Each of these approaches use machine learning to construct a

detector from a large number of training examples. The detector is then scanned over

the entire input image in order to find a pattern of intensities which is consistent withthe target object. Experiments show that these systems work very well for the detection

of faces, but less well for pedestrians, perhaps because the images of pedestrians are

more varied (due to changes in body pose and clothing). Detection of pedestrians is

made even more difficult in surveillance applications, where the resolution of the

images is very low (e.g. there may only be 300-500 pixels on the target). Though

improvement of pedestrian detection using better functions of image intensity is a

valuable pursuit, I take a different approach.

This paper describes a pedestrian detection system that integrates intensity

information with motion information. The pattern of human motion is well known to be

readily distinguishable from other sorts of motion. Many recent papers have used

motion to recognize people and in some cases to detect them [8], [10], [7], [3]. These

approaches have a much different flavor from the face/pedestrian detection approaches

mentioned above. They typically try to track moving objects over many frames and then

analyze the motion to look for periodicity or other cues. Detection style algorithms are

fast, perform exhaustive search over the entire image at every scale, and are trained

using large datasets to achieve high detection rates and very low false positive rates.

1


2/47

2

In this paper we apply a detection style approach using information about

motion as wellas intensity information. The implementation described is very efficient,

detects pedestrians at very small scales (as small as 18x36 pixels), and has a very low

false positive rate. The system is trained on full human figures and does not currently

detect occluded or partial human figures.

My approach builds on the detection work of Viola and Jones [10]. Novel

contributions of this paper include the development of a representation of image motion

which is extremely efficient and implementation of a state of the art pedestrian

detection system which is operates using AdaBoost and Haar-Like Features.

1.1 OBJECTIVE

The main goal of this project is to develop a system which is capable to detect the

pedestrians. This Project contains the study and implement AdaBooste Pedestrian

detection with Haar - Like Features and analysis the performance of the AdaBoost

system based on the different size of dataset. This project include the development of a

representation of image motion which is extremely efficient and implementation of a

state of the art pedestrian detection system which is operates using AdaBoost and Haar-Like Features.

1.2 PROJECT SCOPE

In this work, pictorial structures that represent a simple model of a person are used

for this purpose, as was done in previous works [9] [10] [3]. However, rather than using

a simple rectangle segment template, constant colored rectangle, or learned appearance

models specific to individuals for part detection, I train a discriminative classifier for

each body segment to detect candidate segments in images.


3/47

3

An overview of the algorithm is shown in figure 1.2.1. A large number of training

examples, both positive and negative, are used to train binary classifiers for each body

segment using the Adaboost algorithm. After training, the classifiers are scanned over

new input images. The detections from the input image for each segment are passed to

an algorithm that determines the best configurations consisting of one of each segment.

The configuration cost is computed efficiently as a function of the segment cost from

the classifier and the kinematic cost of combining pairs of segments as specified by

some deformable model.

Figure 1.2.1: Adaboost Algorithm

1) Input: Training examples ),(ii

yx , i = 1..N with positive )1( !i

y and negative )0( !i

y

examples.

2) Initialization: weightslm

i2

1,

2

1,1 ![

with m negative and l positive examples

3) For t=1,...,T:

a) Normalize all weights

b) For each feature j train classifier jh with error ! i iijj yxh |)(|I

c) Chooset

h with lowest errort

I

d) Update weights:ie

titit

!

1

,,1 F[[ where 0!ie if ix is correctly classified and

1!ie otherwise andt

t

tI

IF

!1

4) Final strong classifier:

u

! ! !other ise

xhxh

T

t

T

t ttt

:0

5.0)(:1)( 1 1

EEwith )

1log(

t

tF

E !


4/47

4

Pictorial structure configurations are considered valid if their cost is below a

predetermined threshold. I examine the ability of an AdaBoost detector to find body

segments as well as the utility of enforcing kinematic constraints on pedestrian

detections. The following sections describe the details of each component of the

detection framework. Many of the ideas used in this work have been presented

previously. I cite the original authors of each algorithm, but reproduce many equations

and algorithms for completeness.

1.3 APPLICATION OF MOTION DETECTION AND TRACKING SYSTEM

Detecting and tracking a moving object in a dynamic video sequence has been a

vital aspect of motion analysis. This detecting and tracking system has becomeincreasingly important due to its application in various areas, including communication

(video conferencing), transportation (traffic monitoring and autonomous driving

vehicle), security (premise surveillance) and industries (dynamic robot vision and

navigation).

Specifically, the main application targeted by the proposed detecting and

tracking algorithm is for implementing an autonomous driving system. The system can

be used to automatically detect and track any moving object exist in the traffic scenes

such as moving cars in highway within the view of the moving camera.


5/47


6/47

6

CHAPTER 2

LITERATURE REVIEW

In general, the existing approaches and algorithms formulated to deal with

objects detection and tracking from the images taken by non-stationary moving camera,

can be subjectively classified into a few categories, i.e. AdaBoost classification, Haar

like features, Correlation-based matching technique and Gradient-based technique. The

basic idea behind all these techniques is that the motion of the moving object is

different with the motion of background object.

The choice of the algorithm that will perform well depends upon many

considerations. Of primary concern is the selection of an efficient model that well suited

the requirements of the particular desired application.

2.1 Adaboost Classification

AdaBoost is used both to select the features and to train the classifiers. The

AdaBoost learning algorithm is used to boost the classification performance of a simple

learning algorithm (e.g : it might be used to boost the performance of a simple

perceptron). It does this by combining a collection of weak classification functions to

form stronger classifiers.

AdaBoost, short for Adaptive boosting, is a machine learning algorithm, It is a

meta-algorithm, and can be used in conjunction with many other learning algorithms to

improve their performance. AdaBoost is adaptive in the sense that subsequent classifiers

built are tweaked in favor of those instances misclassified by previous classifiers.

AdaBoost is sensitive to noisy data and outliers. Otherwise, it is less susceptible to the

over fitting problem than most learning algorithms.


7/47

7

AdaBoost calls a weak classifier repeatedly in a series of rounds .For each call a distribution of weights Dt is updated that indicates the importance of

examples in the data set for the classification. On each round, the weights of each

incorrectly classified example are increased (or alternatively, the weights of each

correctly classified example are decreased), so that the new classifier focuses more on

those examples. Below is the boosting algorithm for learning a query online. T

hypotheses are constructed each using a single feature. The final hypotheses are a

weighted linear combination of the T hypotheses where the weights are inversely

proportional to the training error.

- Given example image where for negative andpositive examples respectively.

- initialize weights for respectively, wherem and L are the number of negatives and positives respectively.

- for t =1,..T:1. normalize the weights,

wt;iwt;iPnj=1wt;jwt;iw

so that is a probability distribution.2. For each feature, j, train the classifier which is restricted to

using a single feature. The error is evaluated with respect to:

3. Choose the classifier, , with the lowest error 4. Update the weights :


8/47

8

Where if example xi is classified correctly, Otherwise, and

5. The final strong classifiers is :

Where: 2.2 Haar like Features

Historically, for the task of object recognition, working with only image

intensities (i.e. the RGB pixel values at each and every pixel of image) made the task

computationally expensive. A publication by Papageorgiou et al. discussed working

with an alternate feature set instead of the usual image intensities. This feature set

considers rectangular regions of the image and sums up the pixels in this region. The

value this obtained is used to categorize images. For example, let us say we have an

image database with human faces and buildings. It is possible that if the eye and the hair

region of the faces are considered, the sum of the pixels in this region would be quite

high for the human faces and arbitrarily high or low for the buildings.


9/47

9

The value for the latter would depend on the structure of the building, its

environment while the values for the former will be more less roughly the same. We

could thus categorize all images whose Haar-like feature in this rectangular region to be

in a certain range of values as one category and those falling out of this range in

another. This might roughly divide the set of images into ones having a lot of faces and

a few buildings and the other having a lot of buildings and a few faces. This procedure

could be iteratively carried out to further divide the image clusters [8] [11][12].

A slightly more complicated feature is to break the rectangular image to either a left

half and a right half or a top half and a bottom half and trace the difference in the sum

of pixels across these halves. This modified feature set is called 2 rectangle features.

2.3 Correlation-Based Matching Technique

The idea of this approach is to partition an image It into segmented region and to

search for each of the segmented regions the corresponding segmented regions in the

successive image It+1 at time t+1.

The detection and tracking algorithm proposed by Volker Rermann in [8][9]

utilizes both correlation and features as matching technique to detect and track moving

object in the scene intended for robotic and vehicle guidance applications. In order to

provide more robust approach, color regions that could not be matched on the feature

level are matched on the pixel level by the integration of a correlation based

mechanism.

From his finding, with feature-based matching technique, nearly 90% of the

image area can be matched in an efficient and accurate way. The correlation-based

matching technique, which requires more elaborate processing, is thus only applied to a

small fraction of the image area. It yields very accurate displacement vectors and most

of the time finds the corresponding color regions.[7]


10/47

10

2.4 Gradient-Based Technique

Gradient-based method, especially optical-flow method, has been shown to be

powerful tool for analysis of scene in the case of static camera [1][2]. The main

principle in the optical-flow method is to solve the so-called optical flow-constraint

(OFC) equation. The equation states that the temporal gradient of the moving object is

equal to the spatial gradient, multiplied by the speed of the moving object in both x and

y directions. Both of the gradients can be computed easily, leaving the speed in both

directions to be unknown.

2.5 Selection of Technique

A technique chosen for this research is Adabooste Classification and Haar - like

feature. Basically, this technique needs less computation time and is easy to implement

compared to other techniques. Although the correlation-based technique is less affected

by noise and illumination changes, it suffers from high computational complexity by

summations over the entire template. There are some researchers who use gradient-

based technique for detecting a moving object but this technique is relatively sensitive

to local or global illumination changes.


11/47

11

CHAPTER 3

METHODOLOGY

This chapter gives an overview on the equipment/development tools used, total

cost planning, and project planning in developing the entire pedestrian detection system.

In addition, this chapter also presenting in detail the methodology of the proposed

vision based pedestrian detection system. Subsequently, the pedestrian detection

modules are elaborated. Explanation of the detection module is focused on Haar-like

features prototype; integral image and classification function with the cascade ofclassifiers.3.1 Equipment Used for Project Development

The proposed moving object detection system has been implemented on a Intel

PentiumM processor725 2.1 GHz PC with 2 GB RAM running on Windows XP. The

off-line images were acquired through a Sony Color CCD camera. Each frame of the

acquired images is converted to the 8 bit grayscale format and finally 1 bit binary

format before undergoing further processing. Each frame is fixed to a size of256 x 256

pixel.

The EuroCard Picolo frame grabber is utilized in the project setup. The task of the

frame grabber is to convert the electrical signal from the camera into a digital image

that can be processed by a computer. The frame grabber can be programmed to send the

image data without intervention of the central processing unit (CPU) to the memory

(RAM) of the PC where the images are processed.

The entire motion detection and tracking program has been developed using Microsoft

Visual C++ version 6.0, including the GUI (Graphical User Interface) and OpenCV. It


12/47

12

is fully integrated editor, compiler and debugger. Hence, it can be used to create a

complex software system.

3.2 Project Planning

The table diagram in Table 3.2.1 gives an overview of the main time

schedule in the developed detection system:

NO WEEK TASK TASK FOR

1

WEEKS 4: 4-10FEB 2008 RESEARCH ABOUT THE PROCESS

IMAGE PROCESSING MUZAMMIL

2 WEEKS 5: 11-17FEB 2008

RESEARCH ABOUT THEFEATURES OF IMAGE

PROCESSING:1)FEATURE: DIFERRENCING

TECHNIQUE2)FEATURE:BASED MATCHING

TECHNIQUE MUZAMMIL

3 WEEKS 6: 18-24FEB 2008

RESEARCH ABOUT THE

FEATURES OF IMAGE

PROCESSING:1)ADABOOSTE TECNIQUE

2)HAAR-LIKE FEATURES MUZAMMIL

4 WEEKS 7: 25-29FEB 2008 SELECTION TECHNIQUE MUZAMMIL

5WEEKS 8: 10-16MARCH2008 SUBMIT SUMMARY MUZAMMIL

6

WEEKS 9: 17-23MARCH

2008

MATCHING OPERATION:FIRST STAGE(RESEARCH) MUZAMMIL

7WEEKS 10: 24-30MARCH2008

RESEARCH: MATCHING

OPERATION OF MOTION

DETECTION AND TRACKINGMUZAMMIL

8 WEEKS 11: 1-6APRIL 2008 PREPARATION PAPERWORK MUZAMMIL

9 WEEKS 12: 7-13APRIL 2008

SUBMIT PAPERWORK AND

PRESENTATION MUZAMMIL

10 WEEKS 13: 14-20APRIL 2008

FIND AND IDENTIFY THEHARDWARE:FIRST

STAGE(CAMERA) MUZAMMIL

11 WEEKS 14: 21-27APRIL2008

FIND AND IDENTIFY THEHARDWARE:SECOND

STAGE(SOFTWARE)MUZAMMIL

12 BREAK SEMESTER BREAK

13 WEEKS 1: 14-20 JULY 2008

MATCHING OPERATION:SECOND STAGE(HARDWARE) MUZAMMIL


13/47

13

14 WEEKS 2: 21-27 JULY 2008

SETUP SOFTWARE:FIRST STAGE(OPEN CV AND C++) MUZAMMIL

15 WEEKS 3: 28-31 JULY 2008

SETUP SOFTWARE:SECOND STAGE(OPEN CV AND C++) MUZAMMIL

16

WEEKS 4:

4-10 AUGUST 2008

MATCHING OPERATION:THIRD STAGE(SOFTWARE AND

HARDWARE) MUZAMMIL

17WEEKS 5:11-17AUGUST 2008

TESTING PROJECT:FIRST STAGE(WARM UP) MUZAMMIL

18

WEEKS 6:

18-24AUGUST 2008

MATCHING OPERATION:FOURTH STAGE(UPGRADE) MUZAMMIL

19

WEEKS 7:

25-29AUGUST 2008

TESTING PROJECT:SECOND STAGE(IN SITUATION) MUZAMMIL

20 WEEKS 8: 8-14 SEPT 2008

MATCHING OPERATION:FIFTH STAGE(UPGRADE) MUZAMMIL

21 WEEKS 9: 15-21 SEPT 2008

TESTING PROJECT:THIRD STAGE(FINAL)

RESULT ANALYSIS MUZAMMIL

22 WEEKS 10: 22-28 SEPT 2008

MATCHING OPERATION:FINAL STAGE

RESULT ANALYSIS MUZAMMIL

23 WEEKS 11: 1-5 OCT 2008 PREPARATION REPORT MUZAMMIL

24 WEEKS 12: 6-12 OCT 2008 PREPARATION PRESENTATION MUZAMMIL

25 WEEKS 13: 13-19 OCT 2008 PRESENTATION MUZAMMIL

26 WEEKS 14: 20-26 OCT2008 SUBMIT REPORT MUZAMMIL

Figure 3.2.1


14/47

14

3.3. Gantt Chart.

The table diagram in Figure 3.3.1, 3.3.2, 3.3.3, 3.3.4, 3.3.5, 3.3.6 and

3.3.7 gives an overview of the Gantt-chart for the project:

Figure 3.3.1

Figure 3.3.2


15/47

15

Figure 3.3.3

Figure 3.3.4


16/47

16

Figure 3.3.5

Figure 3.3.6


17/47

17

Figure 3.3.7


18/47


19/47

19

3.5 System Overview

The block diagram in Figure 3.5.1 gives an overview of the main stages

in the developed detection system. This overview is example of detection system

for pedestrian.

Figure 3.5.1


20/47

20

3.6 Haar-like Prototypes

The Haar-like features is represented by a template (shape of the features). It is

coordinate relative to the search window origin and the size of the features (its scale). A

subset of the features prototypes used is shown in figure 3.6.1.

Each feature is composed of two or three black and white rectangles joined

together, these rectangles can be up-right or rotated by 45 degrees. The Haar-like

features value is calculated as a weighted sum of two components: the pixel gray level

value sum over the black rectangle and the sum over the whole feature area (all black

and white areas). The weights of these two components are of opposite sign and for

normalization purpose, their absolute values are inversely proportional to the areas.

Figure 3.6.1


21/47

21

3.7 Integral Image

Rectangles features can be computed very rapidly using an intermediate representation

for the image which we call the integral image. The integral image at location x, y

contains the sum of the pixels above and to the left of x,y, inclusive:

Where ii(x,y) is the integral image and i(x,y) is the original image (see figure 3.6.1)

using the following pair of recurrences:

(where s(x,y) is the cumulative row sum, s(x,-1) =0, and ii(-1,y) =0) the integral image

can be computed in one pass over the original image. Using the integral image any

rectangular sum can be computed in four array references (see figure 3.6.2). clearly the

difference between the two rectangular sums can be computed in eight references. Since

the two-rectangle features defined above adjacent rectangular sums they can be

computed in six array references, eight in the case of the three-rectangle, and nine for

four rectangle features.

One alternative motivation for the integral image comes from the boxlets

work of Simard, et al[6][7][8]. The authors point out that in the case of linear operations

(e.g. f.g), any invertible linear operation can be applied to f or g if its inverse is applied

to the result. For example in the case of convolution, if the derivative operator is applied

both to the image and the kernel the result must then be double integrated:


22/47

22

The authors go on to show that convolution can be significantly accelerated if the

derivatives of f and g are sparse ( or can be made so ). A similar insight is that the

invertible linear operation can be applied to f if its inverse as applied to g:

Viewed in this framework computation of the rectangle sum can be expressed as

a dot product, ixr, where I is the image and r is the box car image (with value 1 within

the rectangle of interest and 0 outside). This operation can be rewritten.

The integral image is in fact the double integral of the image ( first along rows and then

among column ). The second derivative of the rectangle ( first row and then column )

yields four delta functions at the corners of the rectangle. Evaluation of the second dot

product is accomplished with four array accesses.

3.8 Cascade of Classifiers.

In this section it is described an algorithm for constructing a cascade of classifiers [3]

which achieves increased detection performance while radically reducing the

computation time. The key insight is that smaller, and therefore more efficient, boosted

classifiers which reject many more of the negative sub-windows while detecting almost

all positive instances, can be constructed. Simpler classifiers are used to reject the

majority of sub-windows before more complex classifiers are called upon, in order to

achieve low false positive rates.


23/47

23

A cascade of classifiers is a degenerated decision three where, at each stage, classifiers

is trained to detect almost all objects interest while rejecting a certain fraction of the

non-objects patterns.

Each stage was trained using the AdaBoost algorithm. Adaboost is a powerful machine

learning algorithm that can learn a strong classifiers based on a (large) set of weak

classifiers by re-weighting the training samples. The feature-based classifier that best

classifies the weighted training samples is added at each round of boosting. As the stage

number increases the number of weak classifiers, needed to achieve the desired false

alarm rate at the given hit rate, also increases.

3.9 Training a Cascade

The cascade training process involves two types of tradeoff. In most cases, the

classifiers with the most features will achieve higher detection rates and lower false

positive rates. At the same time classifiers with more features require more time to

compute. In principle one could define an optimization framework in which : i) the

number of classifier stage, ii) the number of features in each stage, and iii) the threshold

of each stage, are traded off in order to minimize the expected number of evaluated

features. Unfortunately finding this optimum is a tremendously difficult problem

[6][7][8][9].

In practice in figure 3.9.1, very simple framework is used to produce an effective

classifier which is highly efficient. Each stage in a cascade reduces the false positive

rate as well as the detection rate.


24/47

24

A target is selected for the minimum reduction in false positives and the maximum

decrease in detection. Each stage is trained by adding features until the target detection

and false positives rates are met (these rates are determined by testing the detector on

validation set). Stages are added until the overall target for false positive and detection

rate is met.

3.10 Training Process

The training process uses AdaBoost to select a subset of features and construct the

classifier. In each round the learning algorithm chooses from a heterogenous set of

filters, including the appearance filters, the motion direction filters, the motion shear

filters, and the motion magnitude filters.

The AdaBoost algorithm also picks the optimal threshold for each feature as

well as the _ and _ votes of each feature. The output of the AdaBoost learning

algorithm is a classifier that consists of a linear combination of the selected features.

For details on AdaBoost, the reader is referred to [12, 5].

The important aspect of the resulting classifier to note is that it mixes motion

and appearance features. Each round of AdaBoost chooses from the total set of the

various motion and appearance features, the feature with lowest weighted error on the

training examples. The resulting classifier balances intensity and motion information in

order to maximize detection rates.


25/47

25

Figure 3.10.1: Cascade architecture.

Figure 4.3.3: Cascade architecture. Input is passed to the first classifier with decides true or

false (pedestrian or not pedestrian). A false determination halts further computation and causes

the detector to return false. A true determination passes the input along to the next classifier in

the cascade. If all classifiers vote true then the input is classified as a true example. If any

classifier votes false then computation halts and the input is classified as false. The cascade

architecture is very efficient because the classifiers with the fewest features are placed at the

beginning of the cascade, minimizing the total required computation.

Viola and Jones [14] showed that a single classifier for face detection would

require too many features and thus be too slow for real time operation. They proposed a

cascade architecture to make the detector extremely efficient (see figure 4.3.1). We use

the same cascade idea for pedestrian detection. Each classifier in the cascade is trained

to achieve very high detection rates, and modest false positive rates. Simpler detectors

(with a small number of features) are placed earlier in the cascade, while complex

detectors (with a large number of features are placed later in the cascade).

Detection in the cascade proceeds from simple to complex. Each stage of the

cascade consists of a classifier trained by the AdaBoost algorithm on the true and false

positives of the previous stage. Given the structure of the cascade, each stage acts to

reduce both the false positive rate and the detection rate of the previous stage. The key

is to reduce the false positive rate more rapidly than the detection rate.


26/47

26

A target is selected for the minimum reduction in false positive rate and the

maximum allowable decrease in detection rate. Each stage is trained by adding features

until the target detection and false positives rates are met on a validation set. Stages of

the cascade are added until the overall target for false positive and detection rate is met.

Figure 3.10.2: A small sample of positive training examples. A pair of image patterns comprisesa single example for training.


27/47


28/47

28

Another part is an additional non-pedestrian images. An additional collection

of 1200 video images not containing any pedestrians, intended for the extraction of

additional negative training examples.

Figure 4.1.1: Sample of Training Dataset of Pedestrian

4.2 Performance Dataset of Pedestrian

I generated a performance database of grayscale pedestrian images by running

our pedestrian detector and tracker on several hours of video data and cropping a fixed

window of size 500 300 pixels around the pedestrian of the middle frame of each

tracked sequence. This method yielded 50 images in which people of each make and

model were of roughly the same size. The crop window was positioned such that the

pedestrian was centered in the bottom third of the image. I chose this position as a

reference point to ensure matching was done with only pedestrian features and not

background features.

Figure 4.1.2: Sample of Performance Dataset of Pedestrian


29/47

29

4.3 Training Performance

I used six of the sequences to create a training set from which I learned both a

dynamic pedestrian detector and a static pedestrian detector. The other two sequences

were used to test the detectors. The dynamic detector was trained on consecutive frame

pairs and the static detector was trained on static patterns and so only uses the

appearance filters described above. The static pedestrian detector uses the same basic

architecture as the face detector described in [14].

Figure 4.3.1: Sample frames from each of the 6 sequences we used for training. The manually

marked boxes over pedestrians are also shown.


30/47

30

Each stage of the cascade is a boosted classifier trained using a set of 5000

positive examples and 5000 negative examples. Each positive training example is a pair

of 18 x 36 pedestrian images taken from two consecutive frames of a video sequence.

Negative examples are similar image pairs which do not contain pedestrians. Positive

examples are shown in figure 4.3.2. During training, all example images are variance

normalized to reduce contrast variations. The same variance normalization computation

is performed during testing.

Each classifier in the cascade is trained using the original 5000 positive

examples plus 5000 false positives from the previous stages of the cascade. The

resulting classifier is added to the current cascade to construct a new cascade with a

lower false positive rate. The detection threshold of the newly added classifier is

adjusted so that the false negative rate is very low.

The cascade training algorithm also requires a large set of image pairs to scan

for false positives. These false positives form the negative training examples for the

subsequent stages of the cascade. We use a set of 5000 full image pairs which do not

contain pedestrians for this purpose. Since each full image contains about 50,000

patches of20x15 pixels, the effective pool of negative patches is larger than 20 million.

The static pedestrian detector is trained in the same way on the same set of images. The

only difference in the training process is the absence of motion information. Instead of

image pairs, training examples consist of static image patches.


31/47

31

CHAPTER 5

RESULT

This section demonstrates some of the tested image sequences that are able to

highlight the effectiveness of the proposed detection system. These experimental results

are obtained using the proposed detection algorithm that has been discussed in Chapter

3 and Chapter 4. The proposed algorithm includes Haar-like features and Adaboost

Classification. The system can detect moving objects a 300 x 500 pixel image at 17

frames/s[7][8][9] on 2.0 GHz PC with 2 Giga Byte RAM running on Windows XP. The

output of the system is represented by yellow rectangle, meaning that in the search sub-

window corresponding to the red rectangle the output of the cascade of the classifier

was true, in other words, it as detected the object. Figure 4.1 and 4.2 are the examples of

output.

Figure 5-A: Output of the pedestrian

detection system.

Figure 5-B: Detecting pedestrians in a

indoor environment.


32/47

32

5.1 Stage

A Final classifier cascade consists of 11 boosted classifiers. The classifiers are

sorted by their numbers of weak classifiers so each one has equal or more weak

classifier than its previous stage. Since the computational complexity of each classifierrises with its number of features, the images are first classified by classifiers with a

small size (Figure 6.4). In this figure the eficiency of the AdaBoost algorithm is

demonstrated. Due to the arranging of the boosted classifiers according to their size,

only a reduced number of samples reach higher classifiers in the cascade, which are

more expensive to compute.

5.2 Number of Classifiers

At the last of the performance evaluation a static, The result in figure 5.2.1 show that

the Number of Classifiers is different and follow the increment of dataset. The reason

for using the different dataset is to prove that the AdaBooste system which is more

dataset will be producing more number of weak classifier. If the Adabooste system has

more numbers of weak classifier, the good performance in the AdaBooste system will

produced.

Figure 5.2.1: Number of weak classifiers


33/47

33

5.3 Percentage Hit and Missed

Training and Test Dataset for character recognition was obtained by running our

pedestrian detector on several hours of video and extracting sequences of images for

each tracked pedestrian features depend on the Dataset used.

We can see in figure 5.3.1, it shows that accuracy of Dataset. The 100 data

collection produces the lower performance which is percentage of Hit equal to 11.89%

and produced the higher percentage of Missed, 85.11%. At this performance, our result

obtained by Adabooste system is non-superior to the cascade classifier with respect to

accuracy, while having the lowest performances of Machine learning.

In addition, compared with the 4500 data collection based system, the

performance classifier based system is 89.36% percentages Hits and 10.64% percentage

Missed in detection and tracking, respectively to the law of Adabooste System, while

preserving the higher performances.

Figure 5.3.1: Percentage of Hits and Missed


34/47

34

5.4 Percentage False Alarm

We can see in the figure 5.4.1, all the data collection produced the higher percentage

false alarm. Compared the percentages false alarm between 100 data collection and4000 data collection is equal to 5.03% only. This value shows that the classifier have

the problem.

One problem with the classifier returned by AdaBoost is that the threshold is much too

low. This is because the prior probability for a pedestrian is much lower than the prior

probability for a non-pedestrian, but the AdaBoost algorithm assumes they have equal

priors.

Attempts were made to adjust the threshold automatically based on a holdout set, but

not enough negative examples were present to accurately do this. Instead, the threshold

was manually adjusted until the false positive rate was qualitatively low enough. While

the holdout set had a dismal detection rate, the performance on actual images was much

better because many windows overlap each pedestrian, giving the classifier multiple

chances to detect a given pedestrian. The thresholds for the cascade were manually

chosen so that as many negative examples were rejected as possible while still allowing

almost all positive examples though.


35/47

35

Figure 5.4.1: Percentage of False Alarm

5.5 Result in Real Application

In real application, the application target consists of three parts which is

correct detection, correct detection with false detection and Entirely False

Detection. This three criteria depends on the performance dataset which is has

been discuss before insub chapter 5.2, 5.3 and 5.4. Figure 5.5.1, 5.5.2, and 5.5.3

gives an overview of the real time application in the pedestrian detection system.

Figure 5.5.1: Correct detection


36/47

36

Figure 5.5.2: Correct detection with false alarm

Figure 5.5.3: Entirely False Detection


37/47

37

5.6 Future Work

The boosted sum-of-pixel feature technique introduced by Viola and Jones has many

potential uses. One such use would be to introduce it into a particle filter context where

the sum-of-pixel classifier could be used to estimate a likelihood. This would enable anextremely simple parameterization of pixel coordinates, scale, and velocity. This would

also increase the robustness of the Viola and Jones algorithm and make it faster since

full image searches would no longer be necessary. The simplicity of the

parameterization also would have a huge benefit over more complex contour based

parameterizations. Another possible application of this algorithm would be for behavior

classification. For this to work, however, longer-term motion analysis might be

necessary. Perhaps looking at N successive frames instead of 2 would improve

classification performance.


38/47

38

CHAPTER 6

CONCLUSION AND RECOMMENDATION

I have presented an approach for object detection which minimizes computation time

while achieving high detection accuracy. Preliminary experiments, which will be

described elsewhere, show that highly efficient detectors for other objects, such as

pedestrians, can also be constructed in this way.

This paper brings together new algorithms, representations, and insights which are quite

generic and may well have broader application in computer vision and imageprocessing. The first contribution is a new a technique for computing a rich set of image

features using the integral image. In order to achieve true scale invariance, almost all

object detection systems must operate on multiple image scales.

The integral image, by eliminating the need to compute a multi-scale image pyramid,

reduces the initial image processing required for object detection significantly. In the

domain of pedestrian detection the advantage is quite dramatic. Using the integral

image, pedestrian detection is completed before an image pyramid can be computed.

While the integral image should also have immediate use for other systems which have

used Harr-like features (such as Papageorgiou et al. [12]), it can foreseeable have

impact on any task where Harr-like features may be of value. Initial experiments have

shown that a similar feature set is also effective for the task of parameter estimation,

where the expression of a pedestrian, the position of a pedestrian, or the pose of a

pedestrian is determined.


39/47

39

The second contribution of this paper is a technique for feature selection based on

AdaBoost. An aggressive and effective technique for feature selection will have impact

on a wide variety of learning tasks. Given an effective tool for feature selection, the

system designer is free to define a very large and very complex set of features as input

for the learning process. The resulting classifier is nevertheless computationally

efficient, since only a small number of features need to be evaluated during run time.

Frequently the resulting classifier is also quite simple; within a large set of complex

features it is more likely that a few critical features can be found which capture the

structure of the classification problem in a straightforward fashion.

The third contribution of this paper is a technique for constructing a cascade of

classifiers which radically reduce computation time while improving detection

accuracy. Early stages of the cascade are designed to reject a majority of the image in

order to focus subsequent processing on promising regions. One key point is that the

cascade presented is quite simple and homogeneous in structure. Previous approaches

for attentive filtering, such as Itti et. al., propose a more complex and heterogeneous

mechanism for filtering[13]. Similarly Amit and Geman propose a hierarchical structure

for detection in which the stages are quite different in structure and processing [14]. A

homogenous system, besides being easy to implement and understand, has the

advantage that simple tradeoffs can be made between processing time and detection

performance.

Finally this paper presents a set of detailed experiments on a difficult pedestrian

detection dataset which has been widely studied. This dataset includes pedestrians under

a very wide range of conditions including: illumination, scale, pose, and camera

variation. Experiments on such a large and complex dataset are difficult and time

consuming. Nevertheless systems which work under these conditions are unlikely to be

brittle or limited to a single set of conditions. More importantly conclusions drawn from

this dataset are unlikely to be experimental artifacts.


40/47

40

REFERENCES

[1] Syed Abdul Rahman, Phd Thesis: Moving Object Feature Detector &Extractor Using a Novel Hybrid Technique, University of Bradford, Bradford,UK.1997.

[2] Wan Ayub Bin Wan Ahmad, Master Thesis: Menjejaki Objek Yang Bergerak

Dalam Satu Jujukan Imej, Universiti Teknologi Malaysia, Skudai,Malaysia.2002.

[3] Yeoh Phaik Yong, Master Draft Thesis: Integration of Projection Histograms

For Real Time Tracking Of Moving Object, Universiti Teknologi Malaysia,Skudai, Malaysia.2003.

[4] Bernd Heisele, W.Ritter and U.Krebel, Tracking Non-Rigid, Moving Objects

Based on Color Cluster Flow. Proc. Computer Vision and Pattern Recognition,pages 253-257, San Juan, 1997

[5] Bernd Heisele, Motion-Based Object Detection and Tracking in Color Image

Sequences. Fourth Asian Conference on Computer Vision, pages 1028-1033,Taipei, 2000

[6] B. Heisele, T. Serre, S. Mukherjee and T. Poggio. Feature Reduction and

Hierarchy of Classifiers for Fast Object Detection in Video Images.Proceedings of2001 IEEE Computer Society Conference on Computer Vision

and Pattern Recognition (CVPR2001), Kauai, Hawaii, Vol. 2, 18-24, December2001.

[7] Yoav Rosenberg and Michael Werman, Real-Time Object Tracking from a

Moving Video Camera: A Software Approach on a PC. 4th IEEE Workshop

on Applications of Computer Vision (WACV'98), New Jersey, 1998

[8] Paul Viola and Michael JonesRapid Object Detection using a Boosted Cascadeof simple Features, Accepted Conference On Computer Vision and Pattern

Recognition, 2001.

[9] Goncalo Monteiro, Paulo Peixoto, and Urbano Nunes, Vision Based

Pedestrian Detection Haar-like Features. Institute of Systems and RoboticsUniversity of Coimbra Polo II,Portugal, 2002


41/47

41

[10] Paul Viola and Michael Jones Robust Real Time Object Detection, SecondInternational Workshop on Statistical and Computational Theories of Vision-

Modeling, Learning, Computing and Sampling, Canada, July2001.

[11] Haar A. Zur

heorie derorthogonalen Funktionensysteme, Mathematische

Annalen, 69, pp 331-371, 1910.

[12] Papageorgiou, Oren and Poggio, "a general framework for object detection",

International Conference on Computer Vision, (1998).

[13] L. Itti, C. Koch, and E. Niebur. A model of saliency-based visual attention for

rapid scene analysis. IEEEPatt. Anal. Mach. Intell., 20(11):12541259,November 1998.

[14] Y. Amit, D. Geman, and K. Wilder. Joint induction of shape features and treeclassifiers,1997.

[15] Yoav Freund. Boosting a weak learning algorithm by majority. Information

and Computation, 121(2):256285, 1995.[10] Yoav Freund andRobertE.Schapire. Adecision-theoreticgeneralizationof online

learning and an application to boosting. Unpublishedmanuscript availableelectronically (on our web pages, or by email request). An extended abstract

appeared in Computational Learning heory: SecondEuropean Conference,EuroCOL 95, pages 2337, Springer-Verlag, 1995.

[16] Johannes Furnkranz and GerhardWidmer. Incremental reduced error pruning.InMachine Learning: Proceedings ofthe Eleventh International Conference,

pages 7077, 1994.

[17] GeoffreyW. Gates. The reduced nearest neighbor rule. IEEE ransactions onInformation heory, pages 431433, 1972.

[18] Peter E. Hart. The condensed nearest neighbor rule. IEEE ransactions on

Information heory, IT-14:515516,May 1968.

[19] Robert C. Holte. Very simple classification rules perform well on mostcommonly used datasets.Machine Learning, 11(1):6391, 1993.

[20] Jeffrey C. Jackson and Mark W. Craven. Learning sparse perceptrons. InAdvances in NeuralInformation Processing Systems 8, 1996.

[21] Michael Kearns and Yishay Mansour. On the boosting ability of top-downdecision tree learning algorithms. InProceedings ofthe wenty-EighthAnnual

ACM Symposiumon the heoryof Computing, 1996.


42/47

42

[22] Eun Bae Kong and Thomas G. Dietterich. Error-correcting output codingcorrects bias and variance. InProceedings ofthe welfth

InternationalConferenceon Machine Learning, pages 313321, 1995.

[23] J. Ross Quinlan. C4.5: Programs for Machine Learning. Morgan Kaufmann,

1993.

[24] J. Ross Quinlan. Bagging, boosting, and C4.5. InProceedings, Fourteenth

National Conference on ArtificialIntelligence, 1996.

[25] Robert E. Schapire. The strength of weak learnability.Machine Learning,5(2):197227, 1990.

[26] Patrice Simard, Yann Le Cun, and John Denker. Efficient pattern recognition

using a new transformation distance. InAdvances in NeuralInformationProcessing Systems, volume 5, pages 5058, 1993.


43/47

43

APPENDIX

A-1:Coding

#ifndef _EiC

#include "cv.h"#include "highgui.h"

//#include "highguid.h"

#include #include

#include #include

#include #include

#include #include

#include #endif

#ifdef _EiC

#define WIN32#endif

static CvMemStorage* storage = 0;

static CvHaarClassifierCascade* cascade = 0;

void detect_and_draw( IplImage* image );

const char* cascade_name ="haarcascade_frontalface_alt.xml";

/* "haarcascade_profileface.xml";*/

int main( int argc, char** argv )

{CvCapture* capture = 0;

IplImage *frame, *frame_copy = 0;int optlen = strlen("--cascade=");

const char* input_name;

if( argc > 1 && strncmp( argv[1], "--cascade=", optlen ) == 0 ){


44/47

44

cascade_name = argv[1] + optlen;input_name = argc > 2 ? argv[2] : 0;

}else

{

fprintf( stderr,"Usage: facedetect --cascade=\"\" [filename|camera_index]\n" );return -1;

/*input_name = argc > 1 ? argv[1] : 0;*/}

// Only for XML file only

//cascade = (CvHaarClassifierCascade*)cvLoad( cascade_name, 0, 0, 0 );cascade = cvLoadHaarClassifierCascade( cascade_name,

cvSize(/*18*/18,/*36*/36) ); // SIZE!!!!!!!!

if( !cascade ){

fprintf( stderr, "ERROR: Could not load classifier cascade\n" );return -1;

}storage = cvCreateMemStorage(0);

if( !input_name || (isdigit(input_name[0]) && input_name[1] == '\0') )

capture = cvCaptureFromCAM( !input_name ? 0 : input_name[0] - '0' );else

capture = cvCaptureFromAVI( input_name );

cvNamedWindow( "result", 1 );

if( capture ){

for(;;){

if( !cvGrabFrame( capture ))break;

frame = cvRetrieveFrame( capture );if( !frame )

break;if( !frame_copy )

frame_copy = cvCreateImage( cvSize(frame->width,frame->height),IPL_DEPTH_8U, frame->nChannels );

if( frame->origin == IPL_ORIGIN_TL )cvCopy( frame, frame_copy, 0 );

elsecvFlip( frame, frame_copy, 0 );


45/47

45

detect_and_draw( frame_copy );

if( cvWaitKey( 10 ) >= 0 )

break;

}

cvReleaseImage( &frame_copy );

cvReleaseCapture( &capture );}

else{

const char* filename = input_name ? input_name : (char*)"lena.jpg";IplImage* image = cvLoadImage( filename, 1 );

if( image )

{ detect_and_draw( image );

cvWaitKey(0);cvReleaseImage( &image );

}else

{/* assume it is a text file containing the

list of the image filenames to be processed - one per line */FILE* f = fopen( filename, "rt" );

if( f ){

char buf[1000+1];while( fgets( buf, 1000, f ) )

{int len = (int)strlen(buf);

while( len > 0 && isspace(buf[len-1]) )len--;

buf[len] = '\0';image = cvLoadImage( buf, 1 );

if( image ){

detect_and_draw( image );cvWaitKey(0);

cvReleaseImage( &image );}

}fclose(f);

}}


46/47

46

}

cvDestroyWindow("result");

return 0;}

void detect_and_draw( IplImage* img ){

int scale = 1;IplImage* temp = cvCreateImage( cvSize(img->width/scale,img->height/scale), 8, 3

);CvPoint pt1, pt2;

int i;

//cvPyrDown( img, temp, CV_GAUSSIAN_5x5 );cvClearMemStorage( storage );

if( cascade )

{//CvSeq* faces = cvHaarDetectObjects( img, cascade, storage,

// 1.1, 2, CV_HAAR_DO_CANNY_PRUNING,// cvSize(40, 40) );

CvSeq* faces = cvHaarDetectObjects( img, cascade, storage,1.1, 2, CV_HAAR_DO_CANNY_PRUNING,

cvSize(40, 40) );

for( i = 0; i < (faces ? faces->total : 0); i++ ){

CvRect* r = (CvRect*)cvGetSeqElem( faces, i );pt1.x = r->x*scale;

pt2.x = (r->x+r->width)*scale;pt1.y = r->y*scale;

pt2.y = (r->y+r->height)*scale;cvRectangle( img, pt1, pt2, CV_RGB(255,255,0), 3, 8, 0 );

}}

cvShowImage( "result", img );

cvReleaseImage( &temp );}

#ifdef _EiC

main(1,"facedetect.c");#endif


47/47

BODYFYP(baru)

Documents

Transcript of BODYFYP(baru)