BODYFYP(baru)

download BODYFYP(baru)

of 46

Transcript of BODYFYP(baru)

  • 8/7/2019 BODYFYP(baru)

    1/47

    1

    CHAPTER 1

    INTRODUCTION

    Pattern recognition approaches have achieved measurable success in the domain

    of visual detection. Examples include face, automobile, and pedestrian detection [10],

    [11], [13], [1], [9]. Each of these approaches use machine learning to construct a

    detector from a large number of training examples. The detector is then scanned over

    the entire input image in order to find a pattern of intensities which is consistent withthe target object. Experiments show that these systems work very well for the detection

    of faces, but less well for pedestrians, perhaps because the images of pedestrians are

    more varied (due to changes in body pose and clothing). Detection of pedestrians is

    made even more difficult in surveillance applications, where the resolution of the

    images is very low (e.g. there may only be 300-500 pixels on the target). Though

    improvement of pedestrian detection using better functions of image intensity is a

    valuable pursuit, I take a different approach.

    This paper describes a pedestrian detection system that integrates intensity

    information with motion information. The pattern of human motion is well known to be

    readily distinguishable from other sorts of motion. Many recent papers have used

    motion to recognize people and in some cases to detect them [8], [10], [7], [3]. These

    approaches have a much different flavor from the face/pedestrian detection approaches

    mentioned above. They typically try to track moving objects over many frames and then

    analyze the motion to look for periodicity or other cues. Detection style algorithms are

    fast, perform exhaustive search over the entire image at every scale, and are trained

    using large datasets to achieve high detection rates and very low false positive rates.

    1

  • 8/7/2019 BODYFYP(baru)

    2/47

    2

    In this paper we apply a detection style approach using information about

    motion as wellas intensity information. The implementation described is very efficient,

    detects pedestrians at very small scales (as small as 18x36 pixels), and has a very low

    false positive rate. The system is trained on full human figures and does not currently

    detect occluded or partial human figures.

    My approach builds on the detection work of Viola and Jones [10]. Novel

    contributions of this paper include the development of a representation of image motion

    which is extremely efficient and implementation of a state of the art pedestrian

    detection system which is operates using AdaBoost and Haar-Like Features.

    1.1 OBJECTIVE

    The main goal of this project is to develop a system which is capable to detect the

    pedestrians. This Project contains the study and implement AdaBooste Pedestrian

    detection with Haar - Like Features and analysis the performance of the AdaBoost

    system based on the different size of dataset. This project include the development of a

    representation of image motion which is extremely efficient and implementation of a

    state of the art pedestrian detection system which is operates using AdaBoost and Haar-Like Features.

    1.2 PROJECT SCOPE

    In this work, pictorial structures that represent a simple model of a person are used

    for this purpose, as was done in previous works [9] [10] [3]. However, rather than using

    a simple rectangle segment template, constant colored rectangle, or learned appearance

    models specific to individuals for part detection, I train a discriminative classifier for

    each body segment to detect candidate segments in images.

  • 8/7/2019 BODYFYP(baru)

    3/47

    3

    An overview of the algorithm is shown in figure 1.2.1. A large number of training

    examples, both positive and negative, are used to train binary classifiers for each body

    segment using the Adaboost algorithm. After training, the classifiers are scanned over

    new input images. The detections from the input image for each segment are passed to

    an algorithm that determines the best configurations consisting of one of each segment.

    The configuration cost is computed efficiently as a function of the segment cost from

    the classifier and the kinematic cost of combining pairs of segments as specified by

    some deformable model.

    Figure 1.2.1: Adaboost Algorithm

    1) Input: Training examples ),(ii

    yx , i = 1..N with positive )1( !i

    y and negative )0( !i

    y

    examples.

    2) Initialization: weightslm

    i2

    1,

    2

    1,1 ![

    with m negative and l positive examples

    3) For t=1,...,T:

    a) Normalize all weights

    b) For each feature j train classifier jh with error ! i iijj yxh |)(|I

    c) Chooset

    h with lowest errort

    I

    d) Update weights:ie

    titit

    !

    1

    ,,1 F[[ where 0!ie if ix is correctly classified and

    1!ie otherwise andt

    t

    tI

    IF

    !1

    4) Final strong classifier:

    u

    ! ! !other ise

    xhxh

    T

    t

    T

    t ttt

    :0

    5.0)(:1)( 1 1

    EEwith )

    1log(

    t

    tF

    E !

  • 8/7/2019 BODYFYP(baru)

    4/47

    4

    Pictorial structure configurations are considered valid if their cost is below a

    predetermined threshold. I examine the ability of an AdaBoost detector to find body

    segments as well as the utility of enforcing kinematic constraints on pedestrian

    detections. The following sections describe the details of each component of the

    detection framework. Many of the ideas used in this work have been presented

    previously. I cite the original authors of each algorithm, but reproduce many equations

    and algorithms for completeness.

    1.3 APPLICATION OF MOTION DETECTION AND TRACKING SYSTEM

    Detecting and tracking a moving object in a dynamic video sequence has been a

    vital aspect of motion analysis. This detecting and tracking system has becomeincreasingly important due to its application in various areas, including communication

    (video conferencing), transportation (traffic monitoring and autonomous driving

    vehicle), security (premise surveillance) and industries (dynamic robot vision and

    navigation).

    Specifically, the main application targeted by the proposed detecting and

    tracking algorithm is for implementing an autonomous driving system. The system can

    be used to automatically detect and track any moving object exist in the traffic scenes

    such as moving cars in highway within the view of the moving camera.

  • 8/7/2019 BODYFYP(baru)

    5/47

  • 8/7/2019 BODYFYP(baru)

    6/47

    6

    CHAPTER 2

    LITERATURE REVIEW

    In general, the existing approaches and algorithms formulated to deal with

    objects detection and tracking from the images taken by non-stationary moving camera,

    can be subjectively classified into a few categories, i.e. AdaBoost classification, Haar

    like features, Correlation-based matching technique and Gradient-based technique. The

    basic idea behind all these techniques is that the motion of the moving object is

    different with the motion of background object.

    The choice of the algorithm that will perform well depends upon many

    considerations. Of primary concern is the selection of an efficient model that well suited

    the requirements of the particular desired application.

    2.1 Adaboost Classification

    AdaBoost is used both to select the features and to train the classifiers. The

    AdaBoost learning algorithm is used to boost the classification performance of a simple

    learning algorithm (e.g : it might be used to boost the performance of a simple

    perceptron). It does this by combining a collection of weak classification functions to

    form stronger classifiers.

    AdaBoost, short for Adaptive boosting, is a machine learning algorithm, It is a

    meta-algorithm, and can be used in conjunction with many other learning algorithms to

    improve their performance. AdaBoost is adaptive in the sense that subsequent classifiers

    built are tweaked in favor of those instances misclassified by previous classifiers.

    AdaBoost is sensitive to noisy data and outliers. Otherwise, it is less susceptible to the

    over fitting problem than most learning algorithms.

  • 8/7/2019 BODYFYP(baru)

    7/47

    7

    AdaBoost calls a weak classifier repeatedly in a series of rounds .For each call a distribution of weights Dt is updated that indicates the importance of

    examples in the data set for the classification. On each round, the weights of each

    incorrectly classified example are increased (or alternatively, the weights of each

    correctly classified example are decreased), so that the new classifier focuses more on

    those examples. Below is the boosting algorithm for learning a query online. T

    hypotheses are constructed each using a single feature. The final hypotheses are a

    weighted linear combination of the T hypotheses where the weights are inversely

    proportional to the training error.

    - Given example image where for negative andpositive examples respectively.

    - initialize weights for respectively, wherem and L are the number of negatives and positives respectively.

    - for t =1,..T:1. normalize the weights,

    wt;iwt;iPnj=1wt;jwt;iw

    so that is a probability distribution.2. For each feature, j, train the classifier which is restricted to

    using a single feature. The error is evaluated with respect to:

    3. Choose the classifier, , with the lowest error 4. Update the weights :

  • 8/7/2019 BODYFYP(baru)

    8/47

    8

    Where if example xi is classified correctly, Otherwise, and

    5. The final strong classifiers is :

    Where: 2.2 Haar like Features

    Historically, for the task of object recognition, working with only image

    intensities (i.e. the RGB pixel values at each and every pixel of image) made the task

    computationally expensive. A publication by Papageorgiou et al. discussed working

    with an alternate feature set instead of the usual image intensities. This feature set

    considers rectangular regions of the image and sums up the pixels in this region. The

    value this obtained is used to categorize images. For example, let us say we have an

    image database with human faces and buildings. It is possible that if the eye and the hair

    region of the faces are considered, the sum of the pixels in this region would be quite

    high for the human faces and arbitrarily high or low for the buildings.

  • 8/7/2019 BODYFYP(baru)

    9/47

    9

    The value for the latter would depend on the structure of the building, its

    environment while the values for the former will be more less roughly the same. We

    could thus categorize all images whose Haar-like feature in this rectangular region to be

    in a certain range of values as one category and those falling out of this range in

    another. This might roughly divide the set of images into ones having a lot of faces and

    a few buildings and the other having a lot of buildings and a few faces. This procedure

    could be iteratively carried out to further divide the image clusters [8] [11][12].

    A slightly more complicated feature is to break the rectangular image to either a left

    half and a right half or a top half and a bottom half and trace the difference in the sum

    of pixels across these halves. This modified feature set is called 2 rectangle features.

    2.3 Correlation-Based Matching Technique

    The idea of this approach is to partition an image It into segmented region and to

    search for each of the segmented regions the corresponding segmented regions in the

    successive image It+1 at time t+1.

    The detection and tracking algorithm proposed by Volker Rermann in [8][9]

    utilizes both correlation and features as matching technique to detect and track moving

    object in the scene intended for robotic and vehicle guidance applications. In order to

    provide more robust approach, color regions that could not be matched on the feature

    level are matched on the pixel level by the integration of a correlation based

    mechanism.

    From his finding, with feature-based matching technique, nearly 90% of the

    image area can be matched in an efficient and accurate way. The correlation-based

    matching technique, which requires more elaborate processing, is thus only applied to a

    small fraction of the image area. It yields very accurate displacement vectors and most

    of the time finds the corresponding color regions.[7]

  • 8/7/2019 BODYFYP(baru)

    10/47

    10

    2.4 Gradient-Based Technique

    Gradient-based method, especially optical-flow method, has been shown to be

    powerful tool for analysis of scene in the case of static camera [1][2]. The main

    principle in the optical-flow method is to solve the so-called optical flow-constraint

    (OFC) equation. The equation states that the temporal gradient of the moving object is

    equal to the spatial gradient, multiplied by the speed of the moving object in both x and

    y directions. Both of the gradients can be computed easily, leaving the speed in both

    directions to be unknown.

    2.5 Selection of Technique

    A technique chosen for this research is Adabooste Classification and Haar - like

    feature. Basically, this technique needs less computation time and is easy to implement

    compared to other techniques. Although the correlation-based technique is less affected

    by noise and illumination changes, it suffers from high computational complexity by

    summations over the entire template. There are some researchers who use gradient-

    based technique for detecting a moving object but this technique is relatively sensitive

    to local or global illumination changes.

  • 8/7/2019 BODYFYP(baru)

    11/47

    11

    CHAPTER 3

    METHODOLOGY

    This chapter gives an overview on the equipment/development tools used, total

    cost planning, and project planning in developing the entire pedestrian detection system.

    In addition, this chapter also presenting in detail the methodology of the proposed

    vision based pedestrian detection system. Subsequently, the pedestrian detection

    modules are elaborated. Explanation of the detection module is focused on Haar-like

    features prototype; integral image and classification function with the cascade ofclassifiers.3.1 Equipment Used for Project Development

    The proposed moving object detection system has been implemented on a Intel

    PentiumM processor725 2.1 GHz PC with 2 GB RAM running on Windows XP. The

    off-line images were acquired through a Sony Color CCD camera. Each frame of the

    acquired images is converted to the 8 bit grayscale format and finally 1 bit binary

    format before undergoing further processing. Each frame is fixed to a size of256 x 256

    pixel.

    The EuroCard Picolo frame grabber is utilized in the project setup. The task of the

    frame grabber is to convert the electrical signal from the camera into a digital image

    that can be processed by a computer. The frame grabber can be programmed to send the

    image data without intervention of the central processing unit (CPU) to the memory

    (RAM) of the PC where the images are processed.

    The entire motion detection and tracking program has been developed using Microsoft

    Visual C++ version 6.0, including the GUI (Graphical User Interface) and OpenCV. It

  • 8/7/2019 BODYFYP(baru)

    12/47

    12

    is fully integrated editor, compiler and debugger. Hence, it can be used to create a

    complex software system.

    3.2 Project Planning

    The table diagram in Table 3.2.1 gives an overview of the main time

    schedule in the developed detection system:

    NO WEEK TASK TASK FOR

    1

    WEEKS 4: 4-10FEB 2008 RESEARCH ABOUT THE PROCESS

    IMAGE PROCESSING MUZAMMIL

    2 WEEKS 5: 11-17FEB 2008

    RESEARCH ABOUT THEFEATURES OF IMAGE

    PROCESSING:1)FEATURE: DIFERRENCING

    TECHNIQUE2)FEATURE:BASED MATCHING

    TECHNIQUE MUZAMMIL

    3 WEEKS 6: 18-24FEB 2008

    RESEARCH ABOUT THE

    FEATURES OF IMAGE

    PROCESSING:1)ADABOOSTE TECNIQUE

    2)HAAR-LIKE FEATURES MUZAMMIL

    4 WEEKS 7: 25-29FEB 2008 SELECTION TECHNIQUE MUZAMMIL

    5WEEKS 8: 10-16MARCH2008 SUBMIT SUMMARY MUZAMMIL

    6

    WEEKS 9: 17-23MARCH

    2008

    MATCHING OPERATION:FIRST STAGE(RESEARCH) MUZAMMIL

    7WEEKS 10: 24-30MARCH2008

    RESEARCH: MATCHING

    OPERATION OF MOTION

    DETECTION AND TRACKINGMUZAMMIL

    8 WEEKS 11: 1-6APRIL 2008 PREPARATION PAPERWORK MUZAMMIL

    9 WEEKS 12: 7-13APRIL 2008

    SUBMIT PAPERWORK AND

    PRESENTATION MUZAMMIL

    10 WEEKS 13: 14-20APRIL 2008

    FIND AND IDENTIFY THEHARDWARE:FIRST

    STAGE(CAMERA) MUZAMMIL

    11 WEEKS 14: 21-27APRIL2008

    FIND AND IDENTIFY THEHARDWARE:SECOND

    STAGE(SOFTWARE)MUZAMMIL

    12 BREAK SEMESTER BREAK

    13 WEEKS 1: 14-20 JULY 2008

    MATCHING OPERATION:SECOND STAGE(HARDWARE) MUZAMMIL

  • 8/7/2019 BODYFYP(baru)

    13/47

    13

    14 WEEKS 2: 21-27 JULY 2008

    SETUP SOFTWARE:FIRST STAGE(OPEN CV AND C++) MUZAMMIL

    15 WEEKS 3: 28-31 JULY 2008

    SETUP SOFTWARE:SECOND STAGE(OPEN CV AND C++) MUZAMMIL

    16

    WEEKS 4:

    4-10 AUGUST 2008

    MATCHING OPERATION:THIRD STAGE(SOFTWARE AND

    HARDWARE) MUZAMMIL

    17WEEKS 5:11-17AUGUST 2008

    TESTING PROJECT:FIRST STAGE(WARM UP) MUZAMMIL

    18

    WEEKS 6:

    18-24AUGUST 2008

    MATCHING OPERATION:FOURTH STAGE(UPGRADE) MUZAMMIL

    19

    WEEKS 7:

    25-29AUGUST 2008

    TESTING PROJECT:SECOND STAGE(IN SITUATION) MUZAMMIL

    20 WEEKS 8: 8-14 SEPT 2008

    MATCHING OPERATION:FIFTH STAGE(UPGRADE) MUZAMMIL

    21 WEEKS 9: 15-21 SEPT 2008

    TESTING PROJECT:THIRD STAGE(FINAL)

    RESULT ANALYSIS MUZAMMIL

    22 WEEKS 10: 22-28 SEPT 2008

    MATCHING OPERATION:FINAL STAGE

    RESULT ANALYSIS MUZAMMIL

    23 WEEKS 11: 1-5 OCT 2008 PREPARATION REPORT MUZAMMIL

    24 WEEKS 12: 6-12 OCT 2008 PREPARATION PRESENTATION MUZAMMIL

    25 WEEKS 13: 13-19 OCT 2008 PRESENTATION MUZAMMIL

    26 WEEKS 14: 20-26 OCT2008 SUBMIT REPORT MUZAMMIL

    Figure 3.2.1

  • 8/7/2019 BODYFYP(baru)

    14/47

    14

    3.3. Gantt Chart.

    The table diagram in Figure 3.3.1, 3.3.2, 3.3.3, 3.3.4, 3.3.5, 3.3.6 and

    3.3.7 gives an overview of the Gantt-chart for the project:

    Figure 3.3.1

    Figure 3.3.2

  • 8/7/2019 BODYFYP(baru)

    15/47

    15

    Figure 3.3.3

    Figure 3.3.4

  • 8/7/2019 BODYFYP(baru)

    16/47

    16

    Figure 3.3.5

    Figure 3.3.6

  • 8/7/2019 BODYFYP(baru)

    17/47

    17

    Figure 3.3.7

  • 8/7/2019 BODYFYP(baru)

    18/47

  • 8/7/2019 BODYFYP(baru)

    19/47

    19

    3.5 System Overview

    The block diagram in Figure 3.5.1 gives an overview of the main stages

    in the developed detection system. This overview is example of detection system

    for pedestrian.

    Figure 3.5.1

  • 8/7/2019 BODYFYP(baru)

    20/47

    20

    3.6 Haar-like Prototypes

    The Haar-like features is represented by a template (shape of the features). It is

    coordinate relative to the search window origin and the size of the features (its scale). A

    subset of the features prototypes used is shown in figure 3.6.1.

    Each feature is composed of two or three black and white rectangles joined

    together, these rectangles can be up-right or rotated by 45 degrees. The Haar-like

    features value is calculated as a weighted sum of two components: the pixel gray level

    value sum over the black rectangle and the sum over the whole feature area (all black

    and white areas). The weights of these two components are of opposite sign and for

    normalization purpose, their absolute values are inversely proportional to the areas.

    Figure 3.6.1

  • 8/7/2019 BODYFYP(baru)

    21/47

    21

    3.7 Integral Image

    Rectangles features can be computed very rapidly using an intermediate representation

    for the image which we call the integral image. The integral image at location x, y

    contains the sum of the pixels above and to the left of x,y, inclusive:

    Where ii(x,y) is the integral image and i(x,y) is the original image (see figure 3.6.1)

    using the following pair of recurrences:

    (where s(x,y) is the cumulative row sum, s(x,-1) =0, and ii(-1,y) =0) the integral image

    can be computed in one pass over the original image. Using the integral image any

    rectangular sum can be computed in four array references (see figure 3.6.2). clearly the

    difference between the two rectangular sums can be computed in eight references. Since

    the two-rectangle features defined above adjacent rectangular sums they can be

    computed in six array references, eight in the case of the three-rectangle, and nine for

    four rectangle features.

    One alternative motivation for the integral image comes from the boxlets

    work of Simard, et al[6][7][8]. The authors point out that in the case of linear operations

    (e.g. f.g), any invertible linear operation can be applied to f or g if its inverse is applied

    to the result. For example in the case of convolution, if the derivative operator is applied

    both to the image and the kernel the result must then be double integrated:

  • 8/7/2019 BODYFYP(baru)

    22/47

    22

    The authors go on to show that convolution can be significantly accelerated if the

    derivatives of f and g are sparse ( or can be made so ). A similar insight is that the

    invertible linear operation can be applied to f if its inverse as applied to g:

    Viewed in this framework computation of the rectangle sum can be expressed as

    a dot product, ixr, where I is the image and r is the box car image (with value 1 within

    the rectangle of interest and 0 outside). This operation can be rewritten.

    The integral image is in fact the double integral of the image ( first along rows and then

    among column ). The second derivative of the rectangle ( first row and then column )

    yields four delta functions at the corners of the rectangle. Evaluation of the second dot

    product is accomplished with four array accesses.

    3.8 Cascade of Classifiers.

    In this section it is described an algorithm for constructing a cascade of classifiers [3]

    which achieves increased detection performance while radically reducing the

    computation time. The key insight is that smaller, and therefore more efficient, boosted

    classifiers which reject many more of the negative sub-windows while detecting almost

    all positive instances, can be constructed. Simpler classifiers are used to reject the

    majority of sub-windows before more complex classifiers are called upon, in order to

    achieve low false positive rates.

  • 8/7/2019 BODYFYP(baru)

    23/47

    23

    A cascade of classifiers is a degenerated decision three where, at each stage, classifiers

    is trained to detect almost all objects interest while rejecting a certain fraction of the

    non-objects patterns.

    Each stage was trained using the AdaBoost algorithm. Adaboost is a powerful machine

    learning algorithm that can learn a strong classifiers based on a (large) set of weak

    classifiers by re-weighting the training samples. The feature-based classifier that best

    classifies the weighted training samples is added at each round of boosting. As the stage

    number increases the number of weak classifiers, needed to achieve the desired false

    alarm rate at the given hit rate, also increases.

    3.9 Training a Cascade

    The cascade training process involves two types of tradeoff. In most cases, the

    classifiers with the most features will achieve higher detection rates and lower false

    positive rates. At the same time classifiers with more features require more time to

    compute. In principle one could define an optimization framework in which : i) the

    number of classifier stage, ii) the number of features in each stage, and iii) the threshold

    of each stage, are traded off in order to minimize the expected number of evaluated

    features. Unfortunately finding this optimum is a tremendously difficult problem

    [6][7][8][9].

    In practice in figure 3.9.1, very simple framework is used to produce an effective

    classifier which is highly efficient. Each stage in a cascade reduces the false positive

    rate as well as the detection rate.

  • 8/7/2019 BODYFYP(baru)

    24/47

    24

    A target is selected for the minimum reduction in false positives and the maximum

    decrease in detection. Each stage is trained by adding features until the target detection

    and false positives rates are met (these rates are determined by testing the detector on

    validation set). Stages are added until the overall target for false positive and detection

    rate is met.

    3.10 Training Process

    The training process uses AdaBoost to select a subset of features and construct the

    classifier. In each round the learning algorithm chooses from a heterogenous set of

    filters, including the appearance filters, the motion direction filters, the motion shear

    filters, and the motion magnitude filters.

    The AdaBoost algorithm also picks the optimal threshold for each feature as

    well as the _ and _ votes of each feature. The output of the AdaBoost learning

    algorithm is a classifier that consists of a linear combination of the selected features.

    For details on AdaBoost, the reader is referred to [12, 5].

    The important aspect of the resulting classifier to note is that it mixes motion

    and appearance features. Each round of AdaBoost chooses from the total set of the

    various motion and appearance features, the feature with lowest weighted error on the

    training examples. The resulting classifier balances intensity and motion information in

    order to maximize detection rates.

  • 8/7/2019 BODYFYP(baru)

    25/47

    25

    Figure 3.10.1: Cascade architecture.

    Figure 4.3.3: Cascade architecture. Input is passed to the first classifier with decides true or

    false (pedestrian or not pedestrian). A false determination halts further computation and causes

    the detector to return false. A true determination passes the input along to the next classifier in

    the cascade. If all classifiers vote true then the input is classified as a true example. If any

    classifier votes false then computation halts and the input is classified as false. The cascade

    architecture is very efficient because the classifiers with the fewest features are placed at the

    beginning of the cascade, minimizing the total required computation.

    Viola and Jones [14] showed that a single classifier for face detection would

    require too many features and thus be too slow for real time operation. They proposed a

    cascade architecture to make the detector extremely efficient (see figure 4.3.1). We use

    the same cascade idea for pedestrian detection. Each classifier in the cascade is trained

    to achieve very high detection rates, and modest false positive rates. Simpler detectors

    (with a small number of features) are placed earlier in the cascade, while complex

    detectors (with a large number of features are placed later in the cascade).

    Detection in the cascade proceeds from simple to complex. Each stage of the

    cascade consists of a classifier trained by the AdaBoost algorithm on the true and false

    positives of the previous stage. Given the structure of the cascade, each stage acts to

    reduce both the false positive rate and the detection rate of the previous stage. The key

    is to reduce the false positive rate more rapidly than the detection rate.

  • 8/7/2019 BODYFYP(baru)

    26/47

    26

    A target is selected for the minimum reduction in false positive rate and the

    maximum allowable decrease in detection rate. Each stage is trained by adding features

    until the target detection and false positives rates are met on a validation set. Stages of

    the cascade are added until the overall target for false positive and detection rate is met.

    Figure 3.10.2: A small sample of positive training examples. A pair of image patterns comprisesa single example for training.

  • 8/7/2019 BODYFYP(baru)

    27/47

  • 8/7/2019 BODYFYP(baru)

    28/47

    28

    Another part is an additional non-pedestrian images. An additional collection

    of 1200 video images not containing any pedestrians, intended for the extraction of

    additional negative training examples.

    Figure 4.1.1: Sample of Training Dataset of Pedestrian

    4.2 Performance Dataset of Pedestrian

    I generated a performance database of grayscale pedestrian images by running

    our pedestrian detector and tracker on several hours of video data and cropping a fixed

    window of size 500 300 pixels around the pedestrian of the middle frame of each

    tracked sequence. This method yielded 50 images in which people of each make and

    model were of roughly the same size. The crop window was positioned such that the

    pedestrian was centered in the bottom third of the image. I chose this position as a

    reference point to ensure matching was done with only pedestrian features and not

    background features.

    Figure 4.1.2: Sample of Performance Dataset of Pedestrian

  • 8/7/2019 BODYFYP(baru)

    29/47

    29

    4.3 Training Performance

    I used six of the sequences to create a training set from which I learned both a

    dynamic pedestrian detector and a static pedestrian detector. The other two sequences

    were used to test the detectors. The dynamic detector was trained on consecutive frame

    pairs and the static detector was trained on static patterns and so only uses the

    appearance filters described above. The static pedestrian detector uses the same basic

    architecture as the face detector described in [14].

    Figure 4.3.1: Sample frames from each of the 6 sequences we used for training. The manually

    marked boxes over pedestrians are also shown.

  • 8/7/2019 BODYFYP(baru)

    30/47

    30

    Each stage of the cascade is a boosted classifier trained using a set of 5000

    positive examples and 5000 negative examples. Each positive training example is a pair

    of 18 x 36 pedestrian images taken from two consecutive frames of a video sequence.

    Negative examples are similar image pairs which do not contain pedestrians. Positive

    examples are shown in figure 4.3.2. During training, all example images are variance

    normalized to reduce contrast variations. The same variance normalization computation

    is performed during testing.

    Each classifier in the cascade is trained using the original 5000 positive

    examples plus 5000 false positives from the previous stages of the cascade. The

    resulting classifier is added to the current cascade to construct a new cascade with a

    lower false positive rate. The detection threshold of the newly added classifier is

    adjusted so that the false negative rate is very low.

    The cascade training algorithm also requires a large set of image pairs to scan

    for false positives. These false positives form the negative training examples for the

    subsequent stages of the cascade. We use a set of 5000 full image pairs which do not

    contain pedestrians for this purpose. Since each full image contains about 50,000

    patches of20x15 pixels, the effective pool of negative patches is larger than 20 million.

    The static pedestrian detector is trained in the same way on the same set of images. The

    only difference in the training process is the absence of motion information. Instead of

    image pairs, training examples consist of static image patches.

  • 8/7/2019 BODYFYP(baru)

    31/47

    31

    CHAPTER 5

    RESULT

    This section demonstrates some of the tested image sequences that are able to

    highlight the effectiveness of the proposed detection system. These experimental results

    are obtained using the proposed detection algorithm that has been discussed in Chapter

    3 and Chapter 4. The proposed algorithm includes Haar-like features and Adaboost

    Classification. The system can detect moving objects a 300 x 500 pixel image at 17

    frames/s[7][8][9] on 2.0 GHz PC with 2 Giga Byte RAM running on Windows XP. The

    output of the system is represented by yellow rectangle, meaning that in the search sub-

    window corresponding to the red rectangle the output of the cascade of the classifier

    was true, in other words, it as detected the object. Figure 4.1 and 4.2 are the examples of

    output.

    Figure 5-A: Output of the pedestrian

    detection system.

    Figure 5-B: Detecting pedestrians in a

    indoor environment.

  • 8/7/2019 BODYFYP(baru)

    32/47

    32

    5.1 Stage

    A Final classifier cascade consists of 11 boosted classifiers. The classifiers are

    sorted by their numbers of weak classifiers so each one has equal or more weak

    classifier than its previous stage. Since the computational complexity of each classifierrises with its number of features, the images are first classified by classifiers with a

    small size (Figure 6.4). In this figure the eficiency of the AdaBoost algorithm is

    demonstrated. Due to the arranging of the boosted classifiers according to their size,

    only a reduced number of samples reach higher classifiers in the cascade, which are

    more expensive to compute.

    5.2 Number of Classifiers

    At the last of the performance evaluation a static, The result in figure 5.2.1 show that

    the Number of Classifiers is different and follow the increment of dataset. The reason

    for using the different dataset is to prove that the AdaBooste system which is more

    dataset will be producing more number of weak classifier. If the Adabooste system has

    more numbers of weak classifier, the good performance in the AdaBooste system will

    produced.

    Figure 5.2.1: Number of weak classifiers

  • 8/7/2019 BODYFYP(baru)

    33/47

    33

    5.3 Percentage Hit and Missed

    Training and Test Dataset for character recognition was obtained by running our

    pedestrian detector on several hours of video and extracting sequences of images for

    each tracked pedestrian features depend on the Dataset used.

    We can see in figure 5.3.1, it shows that accuracy of Dataset. The 100 data

    collection produces the lower performance which is percentage of Hit equal to 11.89%

    and produced the higher percentage of Missed, 85.11%. At this performance, our result

    obtained by Adabooste system is non-superior to the cascade classifier with respect to

    accuracy, while having the lowest performances of Machine learning.

    In addition, compared with the 4500 data collection based system, the

    performance classifier based system is 89.36% percentages Hits and 10.64% percentage

    Missed in detection and tracking, respectively to the law of Adabooste System, while

    preserving the higher performances.

    Figure 5.3.1: Percentage of Hits and Missed

  • 8/7/2019 BODYFYP(baru)

    34/47

    34

    5.4 Percentage False Alarm

    We can see in the figure 5.4.1, all the data collection produced the higher percentage

    false alarm. Compared the percentages false alarm between 100 data collection and4000 data collection is equal to 5.03% only. This value shows that the classifier have

    the problem.

    One problem with the classifier returned by AdaBoost is that the threshold is much too

    low. This is because the prior probability for a pedestrian is much lower than the prior

    probability for a non-pedestrian, but the AdaBoost algorithm assumes they have equal

    priors.

    Attempts were made to adjust the threshold automatically based on a holdout set, but

    not enough negative examples were present to accurately do this. Instead, the threshold

    was manually adjusted until the false positive rate was qualitatively low enough. While

    the holdout set had a dismal detection rate, the performance on actual images was much

    better because many windows overlap each pedestrian, giving the classifier multiple

    chances to detect a given pedestrian. The thresholds for the cascade were manually

    chosen so that as many negative examples were rejected as possible while still allowing

    almost all positive examples though.

  • 8/7/2019 BODYFYP(baru)

    35/47

    35

    Figure 5.4.1: Percentage of False Alarm

    5.5 Result in Real Application

    In real application, the application target consists of three parts which is

    correct detection, correct detection with false detection and Entirely False

    Detection. This three criteria depends on the performance dataset which is has

    been discuss before insub chapter 5.2, 5.3 and 5.4. Figure 5.5.1, 5.5.2, and 5.5.3

    gives an overview of the real time application in the pedestrian detection system.

    Figure 5.5.1: Correct detection

  • 8/7/2019 BODYFYP(baru)

    36/47

    36

    Figure 5.5.2: Correct detection with false alarm

    Figure 5.5.3: Entirely False Detection

  • 8/7/2019 BODYFYP(baru)

    37/47

    37

    5.6 Future Work

    The boosted sum-of-pixel feature technique introduced by Viola and Jones has many

    potential uses. One such use would be to introduce it into a particle filter context where

    the sum-of-pixel classifier could be used to estimate a likelihood. This would enable anextremely simple parameterization of pixel coordinates, scale, and velocity. This would

    also increase the robustness of the Viola and Jones algorithm and make it faster since

    full image searches would no longer be necessary. The simplicity of the

    parameterization also would have a huge benefit over more complex contour based

    parameterizations. Another possible application of this algorithm would be for behavior

    classification. For this to work, however, longer-term motion analysis might be

    necessary. Perhaps looking at N successive frames instead of 2 would improve

    classification performance.

  • 8/7/2019 BODYFYP(baru)

    38/47

    38

    CHAPTER 6

    CONCLUSION AND RECOMMENDATION

    I have presented an approach for object detection which minimizes computation time

    while achieving high detection accuracy. Preliminary experiments, which will be

    described elsewhere, show that highly efficient detectors for other objects, such as

    pedestrians, can also be constructed in this way.

    This paper brings together new algorithms, representations, and insights which are quite

    generic and may well have broader application in computer vision and imageprocessing. The first contribution is a new a technique for computing a rich set of image

    features using the integral image. In order to achieve true scale invariance, almost all

    object detection systems must operate on multiple image scales.

    The integral image, by eliminating the need to compute a multi-scale image pyramid,

    reduces the initial image processing required for object detection significantly. In the

    domain of pedestrian detection the advantage is quite dramatic. Using the integral

    image, pedestrian detection is completed before an image pyramid can be computed.

    While the integral image should also have immediate use for other systems which have

    used Harr-like features (such as Papageorgiou et al. [12]), it can foreseeable have

    impact on any task where Harr-like features may be of value. Initial experiments have

    shown that a similar feature set is also effective for the task of parameter estimation,

    where the expression of a pedestrian, the position of a pedestrian, or the pose of a

    pedestrian is determined.

  • 8/7/2019 BODYFYP(baru)

    39/47

    39

    The second contribution of this paper is a technique for feature selection based on

    AdaBoost. An aggressive and effective technique for feature selection will have impact

    on a wide variety of learning tasks. Given an effective tool for feature selection, the

    system designer is free to define a very large and very complex set of features as input

    for the learning process. The resulting classifier is nevertheless computationally

    efficient, since only a small number of features need to be evaluated during run time.

    Frequently the resulting classifier is also quite simple; within a large set of complex

    features it is more likely that a few critical features can be found which capture the

    structure of the classification problem in a straightforward fashion.

    The third contribution of this paper is a technique for constructing a cascade of

    classifiers which radically reduce computation time while improving detection

    accuracy. Early stages of the cascade are designed to reject a majority of the image in

    order to focus subsequent processing on promising regions. One key point is that the

    cascade presented is quite simple and homogeneous in structure. Previous approaches

    for attentive filtering, such as Itti et. al., propose a more complex and heterogeneous

    mechanism for filtering[13]. Similarly Amit and Geman propose a hierarchical structure

    for detection in which the stages are quite different in structure and processing [14]. A

    homogenous system, besides being easy to implement and understand, has the

    advantage that simple tradeoffs can be made between processing time and detection

    performance.

    Finally this paper presents a set of detailed experiments on a difficult pedestrian

    detection dataset which has been widely studied. This dataset includes pedestrians under

    a very wide range of conditions including: illumination, scale, pose, and camera

    variation. Experiments on such a large and complex dataset are difficult and time

    consuming. Nevertheless systems which work under these conditions are unlikely to be

    brittle or limited to a single set of conditions. More importantly conclusions drawn from

    this dataset are unlikely to be experimental artifacts.

  • 8/7/2019 BODYFYP(baru)

    40/47

    40

    REFERENCES

    [1] Syed Abdul Rahman, Phd Thesis: Moving Object Feature Detector &Extractor Using a Novel Hybrid Technique, University of Bradford, Bradford,UK.1997.

    [2] Wan Ayub Bin Wan Ahmad, Master Thesis: Menjejaki Objek Yang Bergerak

    Dalam Satu Jujukan Imej, Universiti Teknologi Malaysia, Skudai,Malaysia.2002.

    [3] Yeoh Phaik Yong, Master Draft Thesis: Integration of Projection Histograms

    For Real Time Tracking Of Moving Object, Universiti Teknologi Malaysia,Skudai, Malaysia.2003.

    [4] Bernd Heisele, W.Ritter and U.Krebel, Tracking Non-Rigid, Moving Objects

    Based on Color Cluster Flow. Proc. Computer Vision and Pattern Recognition,pages 253-257, San Juan, 1997

    [5] Bernd Heisele, Motion-Based Object Detection and Tracking in Color Image

    Sequences. Fourth Asian Conference on Computer Vision, pages 1028-1033,Taipei, 2000

    [6] B. Heisele, T. Serre, S. Mukherjee and T. Poggio. Feature Reduction and

    Hierarchy of Classifiers for Fast Object Detection in Video Images.Proceedings of2001 IEEE Computer Society Conference on Computer Vision

    and Pattern Recognition (CVPR2001), Kauai, Hawaii, Vol. 2, 18-24, December2001.

    [7] Yoav Rosenberg and Michael Werman, Real-Time Object Tracking from a

    Moving Video Camera: A Software Approach on a PC. 4th IEEE Workshop

    on Applications of Computer Vision (WACV'98), New Jersey, 1998

    [8] Paul Viola and Michael JonesRapid Object Detection using a Boosted Cascadeof simple Features, Accepted Conference On Computer Vision and Pattern

    Recognition, 2001.

    [9] Goncalo Monteiro, Paulo Peixoto, and Urbano Nunes, Vision Based

    Pedestrian Detection Haar-like Features. Institute of Systems and RoboticsUniversity of Coimbra Polo II,Portugal, 2002

  • 8/7/2019 BODYFYP(baru)

    41/47

    41

    [10] Paul Viola and Michael Jones Robust Real Time Object Detection, SecondInternational Workshop on Statistical and Computational Theories of Vision-

    Modeling, Learning, Computing and Sampling, Canada, July2001.

    [11] Haar A. Zur

    heorie derorthogonalen Funktionensysteme, Mathematische

    Annalen, 69, pp 331-371, 1910.

    [12] Papageorgiou, Oren and Poggio, "a general framework for object detection",

    International Conference on Computer Vision, (1998).

    [13] L. Itti, C. Koch, and E. Niebur. A model of saliency-based visual attention for

    rapid scene analysis. IEEEPatt. Anal. Mach. Intell., 20(11):12541259,November 1998.

    [14] Y. Amit, D. Geman, and K. Wilder. Joint induction of shape features and treeclassifiers,1997.

    [15] Yoav Freund. Boosting a weak learning algorithm by majority. Information

    and Computation, 121(2):256285, 1995.[10] Yoav Freund andRobertE.Schapire. Adecision-theoreticgeneralizationof online

    learning and an application to boosting. Unpublishedmanuscript availableelectronically (on our web pages, or by email request). An extended abstract

    appeared in Computational Learning heory: SecondEuropean Conference,EuroCOL 95, pages 2337, Springer-Verlag, 1995.

    [16] Johannes Furnkranz and GerhardWidmer. Incremental reduced error pruning.InMachine Learning: Proceedings ofthe Eleventh International Conference,

    pages 7077, 1994.

    [17] GeoffreyW. Gates. The reduced nearest neighbor rule. IEEE ransactions onInformation heory, pages 431433, 1972.

    [18] Peter E. Hart. The condensed nearest neighbor rule. IEEE ransactions on

    Information heory, IT-14:515516,May 1968.

    [19] Robert C. Holte. Very simple classification rules perform well on mostcommonly used datasets.Machine Learning, 11(1):6391, 1993.

    [20] Jeffrey C. Jackson and Mark W. Craven. Learning sparse perceptrons. InAdvances in NeuralInformation Processing Systems 8, 1996.

    [21] Michael Kearns and Yishay Mansour. On the boosting ability of top-downdecision tree learning algorithms. InProceedings ofthe wenty-EighthAnnual

    ACM Symposiumon the heoryof Computing, 1996.

  • 8/7/2019 BODYFYP(baru)

    42/47

    42

    [22] Eun Bae Kong and Thomas G. Dietterich. Error-correcting output codingcorrects bias and variance. InProceedings ofthe welfth

    InternationalConferenceon Machine Learning, pages 313321, 1995.

    [23] J. Ross Quinlan. C4.5: Programs for Machine Learning. Morgan Kaufmann,

    1993.

    [24] J. Ross Quinlan. Bagging, boosting, and C4.5. InProceedings, Fourteenth

    National Conference on ArtificialIntelligence, 1996.

    [25] Robert E. Schapire. The strength of weak learnability.Machine Learning,5(2):197227, 1990.

    [26] Patrice Simard, Yann Le Cun, and John Denker. Efficient pattern recognition

    using a new transformation distance. InAdvances in NeuralInformationProcessing Systems, volume 5, pages 5058, 1993.

  • 8/7/2019 BODYFYP(baru)

    43/47

    43

    APPENDIX

    A-1:Coding

    #ifndef _EiC

    #include "cv.h"#include "highgui.h"

    //#include "highguid.h"

    #include #include

    #include #include

    #include #include

    #include #include

    #include #endif

    #ifdef _EiC

    #define WIN32#endif

    static CvMemStorage* storage = 0;

    static CvHaarClassifierCascade* cascade = 0;

    void detect_and_draw( IplImage* image );

    const char* cascade_name ="haarcascade_frontalface_alt.xml";

    /* "haarcascade_profileface.xml";*/

    int main( int argc, char** argv )

    {CvCapture* capture = 0;

    IplImage *frame, *frame_copy = 0;int optlen = strlen("--cascade=");

    const char* input_name;

    if( argc > 1 && strncmp( argv[1], "--cascade=", optlen ) == 0 ){

  • 8/7/2019 BODYFYP(baru)

    44/47

    44

    cascade_name = argv[1] + optlen;input_name = argc > 2 ? argv[2] : 0;

    }else

    {

    fprintf( stderr,"Usage: facedetect --cascade=\"\" [filename|camera_index]\n" );return -1;

    /*input_name = argc > 1 ? argv[1] : 0;*/}

    // Only for XML file only

    //cascade = (CvHaarClassifierCascade*)cvLoad( cascade_name, 0, 0, 0 );cascade = cvLoadHaarClassifierCascade( cascade_name,

    cvSize(/*18*/18,/*36*/36) ); // SIZE!!!!!!!!

    if( !cascade ){

    fprintf( stderr, "ERROR: Could not load classifier cascade\n" );return -1;

    }storage = cvCreateMemStorage(0);

    if( !input_name || (isdigit(input_name[0]) && input_name[1] == '\0') )

    capture = cvCaptureFromCAM( !input_name ? 0 : input_name[0] - '0' );else

    capture = cvCaptureFromAVI( input_name );

    cvNamedWindow( "result", 1 );

    if( capture ){

    for(;;){

    if( !cvGrabFrame( capture ))break;

    frame = cvRetrieveFrame( capture );if( !frame )

    break;if( !frame_copy )

    frame_copy = cvCreateImage( cvSize(frame->width,frame->height),IPL_DEPTH_8U, frame->nChannels );

    if( frame->origin == IPL_ORIGIN_TL )cvCopy( frame, frame_copy, 0 );

    elsecvFlip( frame, frame_copy, 0 );

  • 8/7/2019 BODYFYP(baru)

    45/47

    45

    detect_and_draw( frame_copy );

    if( cvWaitKey( 10 ) >= 0 )

    break;

    }

    cvReleaseImage( &frame_copy );

    cvReleaseCapture( &capture );}

    else{

    const char* filename = input_name ? input_name : (char*)"lena.jpg";IplImage* image = cvLoadImage( filename, 1 );

    if( image )

    { detect_and_draw( image );

    cvWaitKey(0);cvReleaseImage( &image );

    }else

    {/* assume it is a text file containing the

    list of the image filenames to be processed - one per line */FILE* f = fopen( filename, "rt" );

    if( f ){

    char buf[1000+1];while( fgets( buf, 1000, f ) )

    {int len = (int)strlen(buf);

    while( len > 0 && isspace(buf[len-1]) )len--;

    buf[len] = '\0';image = cvLoadImage( buf, 1 );

    if( image ){

    detect_and_draw( image );cvWaitKey(0);

    cvReleaseImage( &image );}

    }fclose(f);

    }}

  • 8/7/2019 BODYFYP(baru)

    46/47

    46

    }

    cvDestroyWindow("result");

    return 0;}

    void detect_and_draw( IplImage* img ){

    int scale = 1;IplImage* temp = cvCreateImage( cvSize(img->width/scale,img->height/scale), 8, 3

    );CvPoint pt1, pt2;

    int i;

    //cvPyrDown( img, temp, CV_GAUSSIAN_5x5 );cvClearMemStorage( storage );

    if( cascade )

    {//CvSeq* faces = cvHaarDetectObjects( img, cascade, storage,

    // 1.1, 2, CV_HAAR_DO_CANNY_PRUNING,// cvSize(40, 40) );

    CvSeq* faces = cvHaarDetectObjects( img, cascade, storage,1.1, 2, CV_HAAR_DO_CANNY_PRUNING,

    cvSize(40, 40) );

    for( i = 0; i < (faces ? faces->total : 0); i++ ){

    CvRect* r = (CvRect*)cvGetSeqElem( faces, i );pt1.x = r->x*scale;

    pt2.x = (r->x+r->width)*scale;pt1.y = r->y*scale;

    pt2.y = (r->y+r->height)*scale;cvRectangle( img, pt1, pt2, CV_RGB(255,255,0), 3, 8, 0 );

    }}

    cvShowImage( "result", img );

    cvReleaseImage( &temp );}

    #ifdef _EiC

    main(1,"facedetect.c");#endif

  • 8/7/2019 BODYFYP(baru)

    47/47