IEEETRANSACTIONS ON INTELLIGENT TRANSPORTATION...

IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, VOL. 14, NO. 4, DECEMBER 2013 1597

Vehicle Detection by Independent Partsfor Urban Driver Assistance

Sayanan Sivaraman, Member, IEEE, and Mohan Manubhai Trivedi, Fellow, IEEE

Abstract—In this paper, we introduce vehicle detection by in-dependent parts (VDIP) for urban driver assistance. In urbanenvironments, vehicles appear in a variety of orientations, i.e.,oncoming, preceding, and sideview. Additionally, partial vehicleocclusions are common at intersections, during entry and exit fromthe camera’s field of view, or due to scene clutter. VDIP provides alightweight robust framework for detecting oncoming, preceding,sideview, and partially occluded vehicles in urban driving. In thispaper, we use active learning to train independent-part detectors.A semisupervised approach is used for training part-matchingclassification, which forms sideview vehicles from independentlydetected parts. The hierarchical learning process yields VDIP,featuring efficient evaluation and robust performance. Parts andvehicles are tracked using Kalman filtering. The fully implementedsystem is lightweight and runs in real time. Extensive quantitativeanalysis on real-world on-road data sets is provided.

Index Terms—Active learning, active safety, computer vision,detection by parts, machine learning, occlusions, vehicle detection.

I. INTRODUCTION

IN THE United States, urban automotive collisions accountfor some 43% of fatal crashes. Over the past decade, while

the incidence of rural and highway accidents in the UnitedStates has slowly decreased, the incidence of urban accidentshas increased by 9%. Tens of thousands of drivers and pas-sengers die on the roads each year, with most fatal crashesinvolving more than one vehicle [1]. Research and develop-ment efforts in the areas of advanced sensing, environmentalperception, and intelligent driver assistance systems present anopportunity to help save lives and reduce the number of on-roadfatalities. Over the past decade, there has been significant re-search effort dedicated to the development of intelligent driverassistance systems, intended to enhance safety by monitoringthe driver and the on-road environment [2].

In particular, on-road detection of vehicles has been a topicof great interest to researchers over the past decade [3]. Avariety of sensing modalities has become available for on-roadvehicle detection, including radar, lidar, and computer vision.As production vehicles begin to include on-board cameras forlane tracking and other purposes, it is advantageous and cost

Manuscript received July 12, 2012; revised December 14, 2012 andMarch 18, 2013; accepted May 10, 2013. Date of publication June 14, 2013;date of current version November 26, 2013. This work was supported in partby the University of California Discovery Program, by Audi research, and bythe Volkswagen Electronics Research Laboratory, Belmont, CA. The AssociateEditor for this paper was Prof. L. Li.

The authors are with the Laboratory for Intelligent and Safe Automo-biles, University of California, San Diego, CA 92093-0434 USA (e-mail:[email protected]; [email protected]).

Color versions of one or more of the figures in this paper are available onlineat http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/TITS.2013.2264314

Fig. 1. (Top) Urban driving environment features oncoming, preceding, andsideview vehicles. (Bottom) Vehicles appear partially occluded as they enterand exit the camera’s field of view.

effective to pursue vision as a modality for detecting vehicleson the road. Vehicle detection using computer vision is achallenging problem [3]–[6]. Roads are dynamic environments,featuring effects of ego-motion and relative motion, and videoscenes featuring high variability in background and illumi-nation conditions. Further, vehicles encountered on the roadexhibit high variability in size, shape, color, make, and model.

The urban driving environment introduces further challenges[13]. In urban driving, frequent occlusions and a variety ofvehicle orientations make vehicle detection difficult, whereasvisual clutter tends to increase the false-positive rate [14]. Fullyvisible vehicles are viewed in a variety of orientations, includ-ing oncoming, preceding, and sideview. Cross traffic is subjectto frequent partial occlusions, particularly upon entry and exitfrom the camera’s field of view. Fig. 1 shows this concept.Many studies of on-road vehicle detection have detected fullyvisible vehicles. In this paper, we also detect and track partiallyoccluded vehicles.

In this paper, we introduce vehicle detection by indepen-dent parts (VDIP). Vehicle-part detectors are trained using

1524-9050 © 2013 IEEE

1598 IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, VOL. 14, NO. 4, DECEMBER 2013

Fig. 2. VDIP, the learning approach detailed in this study. An initial round ofsupervised learning is carried out to yield initial part detectors, for the front andrear parts of vehicles. We use the initial detectors to query informative trainingexamples from independent data, performing active learning to improve partdetection performance. While we query informative training examples, welabel sideview vehicles in a semisupervised manner using the active learningannotations to form fully visible vehicles.

active learning, wherein initial part detectors are used to queryinformative training examples from unlabeled on-road data.The queried examples are used for retraining to improve theperformance of the part detectors. Training examples for thepart-matching classifier are collected using semisupervised an-notation, which is performed during the active learning samplequery process. After retraining, the vehicle-part detectors areable to detect oncoming and preceding vehicles as well as frontand rear parts of cross-traffic vehicles. The semisupervised la-beled configurations are used to train a part-matching classifierfor detecting full sideview vehicles. Vehicles and vehicle partsare tracked using Kalman filtering. The final system is able todetect and track oncoming, preceding, sideview, and partiallyoccluded vehicles. The full system is lightweight, robust, andruns in real time. Fig. 2 shows the learning process for VDIP.

The main contributions of this study include the following.We introduce a novel approach to detecting and tracking vehi-cles by independent parts (VDIP), yielding a final system thatis capable of detecting vehicles from multiple views, includingpartially occluded vehicles. Leveraging active learning for im-proved part detection and semisupervised labeling for trainingpart-matching classification, we implement a full vehicle de-tection and tracking system tailored to the challenges of urbandriving. Vehicle parts and vehicles are tracked in the imageplane using Kalman filtering. The full vehicle VDIP systemruns in real time. Extensive quantitative analysis is provided.

Many prior works in vehicle detection require a root filter aspart of detection, which can limit applicability to partial occlu-sions [11], [15]. Prior works that detect independent parts arefocused on surveillance [16] or static image applications [17].Detection by independent parts has rarely been implementedfrom a moving platform, and no prior work has used detectionand tracking by independent parts for on-road vehicle detection.Further, no prior works have utilized active learning for partdetection or semisupervised learning for part matching. Fewreported part-based object detection systems report real-timeimplementation.

The remainder of this paper consists of the following.Section II briefly describes related research. Section IIIdescribes the overall approach to vehicle detection by parts, in-cluding active learning for part detection, semisupervised learn-ing of part classification, and on-road part and vehicle tracking.Section IV presents experimental evaluation. Section V offersdiscussion and concluding remarks.

II. RELATED RESEARCH

This study involves on-road vehicle detection, on-road vehi-cle tracking, and detection by parts. Here, we provide a briefreview of recent studies in these research areas.

A. On-Road Vehicle Detection and Tracking

Robust detection of other vehicles on the road using visionis a challenging problem. Driving environments are visuallydynamic and feature diverse background and illumination con-ditions. The ego vehicle and the other vehicles on the roadare generally in motion. The sizes and locations of vehiclesin the image plane are diverse, although they can be modeled[5]. Vehicles also exhibit high variability in their shape, size,color, and appearance [3]. Further, while processing power hassignificantly increased over the past decade, the requirementsfor real-time computation impose additional constraints on thedevelopment of vision-based vehicle detection systems.

Many works in vehicle detection over the past decade havefocused on detection and tracking of the rear face of vehicles[3]–[5], [9]. Detectors of this sort are designed for preceding or,sometimes, oncoming traffic. In [5], feature points are trackedover a long period of time and vehicles are detected based onthe tracked feature points. The distribution of vehicles in theimage plane is probabilistically modeled, and a hidden Markovmodel formulation is used to detect vehicles and separate themfrom the background.

In [10], a camera is mounted on the vehicle to monitor theblind spot area, which presents difficulty because of the field ofview and high variability in the appearance of vehicles, depend-ing on their relative positions. The study uses a combination ofSURF features and edge segments. Classification is performedvia probabilistic modeling, using a Gaussian-weighted votingprocedure to find the best configuration.

Detecting side-profile vehicles using a combination of partshas been explored in [20]. Using a camera looking out the sidepassenger’s window, vehicles in adjacent lanes are detectedby first detecting the front wheel and then the rear wheel.The combined parts are tracked using Kalman filtering. In[13], the idea of detection vehicles as a combination of inde-pendent parts is explored, with comparison of geometric andappearance matching features. This study expands on [13] withan augmented feature set, tracking formulation, and extensiveexperimental evaluation.

Vehicle detection using the deformable part-based model(DPM) introduced in [15] has been implemented in [11]. Thestudy implemented particle filter tracking, including adaptivethresholds for the detectors, to deal with the challenging condi-tions presented in the on-road environment.

SIVARAMAN AND TRIVEDI: VEHICLE DETECTION BY INDEPENDENT PARTS FOR URBAN DRIVER ASSISTANCE 1599

TABLE IVISION-BASED VEHICLE DETECTION

TABLE IIPART-BASED OBJECT DETECTION

On-road vehicle tracking has mainly been implemented usingKalman filtering [21]–[23] or particle filtering [9], [11]. Track-ing has been carried out in the image plane [5], [9], [11] or in3-D coordinates [21]–[24] using stereo vision. In stereo-visionstudies, optical flow is often used as the initial cue to trackmoving objects. Table I summarizes recent works in vision-based vehicle detection.

B. Part-Based Object Detection

Detecting objects by parts has been explored in the computervision community in various incarnations, with many worksfocused on detection of people. In [16], individual parts are de-tected using strong classifiers, and pedestrians are constructedusing a Bayesian combination of parts. The number of detectionpedestrians and their locations are the most likely part-basedconfiguration. In [25], a more efficient feature representation isused, and parts are chosen to be semantically meaningful andoverlapping in the image plane.

In [19], multiple-instance learning is used for part-basedpedestrian detection. The study demonstrates how the abilityof multiple-instance feature learning to deal with training setmisalignment can enhance the performance of part-based objectdetectors. In [18], people are detected using covariance descrip-tors on Riemannian manifolds, classified using a Logitboostcascade. In [17], pedestrian parts are manually assigned tosemantically meaningful parts, with their physical configura-tion manually constrained and overlapping. The combinationof parts is a weighted sum, with higher weights going to more

reliably detected parts. In [26], this work is extended, withfurther experiments on the feature set.

The work in [15] for object detection using a DPM, whichis based on the latent support vector machine (SVM), hasintroduced new avenues for detection of vehicles by parts. Anefficient cascade classifier version of the deformable part-basedobject detector is presented in [27]. While [15] demonstratesvehicle detection evaluated on static images, the DPM is usedfor video-based nighttime vehicle detection in [28]. Integratedwith tracking, vehicle detection by parts using the DPM ispresented in [11].

Specific classifiers trained for detecting occluded pedestrianshas been pursued in [29]. Using training examples featuringoccluded pedestrians, a classifier was trained using the monoc-ular, optical flow, and stereo modalities. Table II contains asummary of recent works in part-based object detection. In[30], partially occluded rear faces of vehicles are detected usingscale-invariant feature transform features and hidden randomfield detection.

III. VDIP

VDIP includes the following steps. An on-road video frameis grabbed, and front and rear detectors are applied. The frontpart detector has been trained to detect oncoming vehicles andthe front parts of sideview vehicles. The rear part detector hasbeen trained to detect preceding vehicles and the rear partsof sideview vehicles. A part-matching classifier is applied todetect full sideview vehicles. Vehicle parts and full vehicles are


Fig. 3. VDIP: Data information flow for real-time vehicle detection andtracking by parts.

tracked in the image plane using Kalman filtering. Fig. 3 showsthe vehicle-detection-by-parts process. Table III defines theterminology we use to describe vehicles and vehicle parts. Inthe following sections, we describe active learning for detectingvehicle parts, semisupervised labeling for vehicle detection byparts, and vehicle tracking using Kalman filtering.

In this paper, we use a single detector for front parts and asingle detector for rear parts. The parts can be facing left, facingright, or in any orientation in between. We use a general detec-tor instead of several orientation-specific classifiers for a fewreasons. First, training a general detector makes more efficientuse of the available data. If N training examples are requiredto train a single classifier, then training k orientation-specificclassifiers will require kN annotated data samples. Second,using a general detector is computationally more efficient whenprocessing a frame than evaluating k orientation-specific clas-sifiers. The third reason is robustness. The on-road environmentis challenging, featuring ample visual clutter. Localizing thefront or rear part of the vehicle in difficult visual clutter canbe easier than detecting a narrow part orientation in the sameclutter. The implications of object detection and object orienta-tion in clutter are examined at length in [31], where a genericdetector is used to detect objects regardless of orientation, andthen orientation-specific classifiers are used to determine theobject’s orientation.

A. Active Learning for Detecting Independent Parts

The on-road environment presents myriad challenges forvision-based vehicle detection. The backgrounds, illuminationconditions, occlusions, visual clutter, and target class variabilityall present difficulties to vehicle detection using cameras andcomputer vision. Detecting vehicle parts can be even morechallenging. While recent studies in part-based object detectionhave used a root filter as the prior information for searchingparts [11], [15], in this study, we pursue independent-partdetection. The goal is to detect vehicle parts independentlyof a root model, to detect vehicles in the presence of partialocclusions. Fig. 4 shows this phenomenon. We want to detectthe front part of the vehicle as it enters the camera’s field ofview while the vehicle remains partially occluded.

TABLE IIITAXONOMY OF ON-ROAD VEHICLE DETECTION

Fig. 4. VDIP illustrative example. As a vehicle enters the camera’s field ofview (blue) its front part is detected. As it becomes fully visible (red), its rearpart is also detected. Parts are tracked and a part-matching classifier is appliedto detect (purple) fully visible sideview vehicles.

We employ active learning for training vehicle-part detec-tors. Active learning takes into account the fact that unlabeledtraining data are abundant, whereas labeled data come withsome cost, namely human effort and data volume [32]. Activelearning can enable more efficient learning for vision-basedobject detection. Active learning has been used for objectdetection to improve recall–precision performance, train with


Fig. 5. Active learning interface used for part detection sample query and for semisupervised labeling sideview vehicles for detection by parts. (a) All trainingexamples, i.e., positive and negative, collected from a single frame, for training part matching and part detection. (b) Active learning sample query for front parts,featuring true positives (blue) and false positives (red). (c) Active learning sample query for rear parts, featuring true positives (purple) and false positives (cyan).(d) Part-matching examples for side vehicle detection, positive (green) and negative (yellow). (e) All positive training examples collected from this frame, for frontpart, rear part, and part matching. (f) All negative training examples collected from this frame, for front part, rear part, and part matching.

fewer samples, and train with less human labeling effort thanconventional supervised learning [33].

We choose the front and rear parts of vehicles for part-based detection. These parts are semantically meaningful andare often the first parts visible when a vehicle enters thecamera’s field of view. The front and rear parts of the vehiclealso have strong feature responses, making them suitable fortraining a detector. Table III details our criteria for the parts.The front part should detect oncoming vehicles and the frontparts of sideview vehicles. The rear part should detect precedingvehicles and the rear parts of sideview vehicles. We use Haar-like features and an Adaboost cascade for detecting the frontand rear parts [34], [35]. The combination of Haar-like fea-tures and Adaboost cascade classification are chosen for theirspeed and utility for vehicle detection, as they have been widelyused in the vehicle detection literature [9].

We train an initial classifier using 5000 labeled positive andnegative examples for each part detector. Negative trainingexamples were randomly selected from nonvehicle regionsfrom on-road video. As demonstrated in prior works [4], [9],active learning can contribute to improved on-road vehicledetection. We use the initial classifiers for part detection toquery informative examples and retrain improved classifiers.

Each annotated frame presents informative samples for re-training the front and rear part detectors. We define informa-tive training examples as those image patches resulting frommisclassification, i.e., false positives and missed detection. Wealso archive true positives for retraining to avoid overfitting[36]. During an active learning sample query for front and rearpart detectors, the interface allows the user to label sideviewvehicles as a combination of labeled front and rear parts. Fig. 5shows the interface used for active learning of parts and forsemisupervised labeling for part matching. Fig. 5(e) shows the

sideview vehicle in green, as a combination of front and rearparts, the blue and purple boxes contained within.

We retrain front and rear part detectors, using 5000 positiveand 5000 negative training examples for each, the trainingexamples queried using active learning. The retrained partdetectors exhibit improved performance, including improveddetection rates and lower false-positive rates. During activelearning, rear parts were included in the negative training setfor front detection, and vice versa.

While the goal is to train independent-part detectors for thefront and rear vehicles, we note that even after active learning,part classification can be ambiguous, i.e., the front detectorsometimes returning rear parts or rear detector sometimes re-turning front parts. Prior vehicle detection works have traineda single detector for oncoming and preceding vehicles [11],[37]. As such, discriminating between the two classes can bedifficult. For detecting oncoming and preceding vehicles, weresolve this ambiguity over time via tracking, as detailed inSection III-C.

B. Semisupervised Labeling for Part-Matching Classification

Given the initial part detection on a given frame, we takea semisupervised approach to learning a part-matching classi-fier for detecting sideview vehicles by parts. Semisupervisedlearning methods exploit the learner’s prior knowledge to dealwith unlabeled data [38]. In this case, we obtain labels forparts from the initial part detectors during the active learninglabeling sample query. These labels are used to train a classifierto detect sideview vehicles by matching already-detected andlabeled parts.

The part-matching classifier is based on geometric featuresthat encode the spatial relationship between the parts that


comprise a sideview vehicle. A given part can either be matchedto form a side-profile vehicle or be retained as either an on-coming or a preceding vehicle. We parametrize a rectangle pcorresponding to a detected vehicle part, either front or rear,by the i, j position of the rectangle’s top-left corner in theimage plane, its width, and its height, as shown in the followingequation:

p = [i j w h]T . (1)

We denote a detected front part by pf and a given rear part bypr. The side vehicle formed by pf and pr is denoted by ps andis the minimal rectangle that contains the two parts. To matcha pair of given parts, i.e., front to rear, we compute a set ofgeometric features as

x(pf , pr) =

[|if − ir|

σ

|jf − jr|σ

hf

hr

ws

hs

]T

σ =wf

K. (2)

The geometric feature vector encodes the relative displace-ments between a front part, i.e., pf , and a rear part, i.e., pr,in the image plane, as well as their relative sizes, and theaspect ratio of the minimal spanning rectangle ps. Parameter σnormalizes the distances to the reference frame, scaled to the24 × 24 size of image patches used in the training set. Parame-ters ws and hs are the width and the height of the minimal siderectangle that envelops detected part rectangles pf and pr in theimage plane. Using the absolute value of horizontal and verticaldistances in the image plane allows the computed features toserve for left-facing or right-facing side-profile vehicles, i.e.,

wf (Z) =f

ZWf

σ =wf (Z)

K

K =wf (Z)

σ=

fZWf1K Wf

Z

. (3)

The model is based on the premise that the relative dimen-sions of passenger vehicles fit a general physical model, whichis observed in the image plane under perspective projection.Consider the distance to a vehicle Z and a standard pinholecamera model with focal length f . Then, the width of thevehicle scales with f/Z. Parameter σ encodes the ratio betweenthe width of the vehicle under perspective projection and isdesigned to scale with distance from the camera.

The model used is intended for detection of passenger ve-hicles, including sedans, coupes, station wagons, minivans,SUVs, pickup trucks, and light trucks. While the part classifierscan detect front and rear parts of other types of vehicles, such asbuses and semi trucks, the geometric features used in this studyare intended for use with passenger vehicles. According to theBureau of Traffic Statistics, passenger vehicles and light truckscomprised over 95% of vehicles on the road in 2010 [39].

During the active learning sample query stage, we label truepositives, false positives, and missed detection returned by the

initial front and rear part detectors. We then label sideviewvehicles and compute the geometric features between pairsof parts that comprise sideview vehicles. To collect negativetraining examples, we compute the geometric features betweenthe pairs of parts that do not comprise sideview vehicles. Fig. 5shows a screenshot of the interface used for active learning andsemisupervised labeling part configurations. Thus

f(x) =∑i

αiK(x, xi) + b

f(x) =wTx+ b

y = sgn (f(x)) . (4)

Part matching uses an SVM classifier [40]. Equation (4) liststhe equations for SVM classification. We use a linear kernelfor SVM classification. Using the primal form of the linearkernel, evaluation of the classifier becomes an inner productwith weight vector w, which enables a speed advantage. Wetrain the matching classifier using 600 positive and 600 negativetraining examples. Thus

p(y = 1|x) = 11 + eAf(x)+B

. (5)

For matching evaluation, we evaluate an equation of the part-matching classifier over each front–rear-part pair, computingthe likelihood of a match using (5). We choose the best matchfor each pair of front and rear parts using a variation of thestable marriage problem [41]. The scores for matching arebased on p(y = 1|x), whose parameters A and B have beenlearned using maximum likelihood [42], [43]. Only matchesfor which the score is above 0.5 are retained as full sideviewvehicles.

Fig. 7 shows an example of the matching process. In the firstframe, the vehicle is entering the camera’s field of view at anintersection. The front part of the vehicle is detected, whichis labeled with a blue bounding box. A few frames later, thevehicle is fully visible in the camera’s field of view, and the rearpart of the vehicle is detected and labeled with a red boundingbox. Evaluating (5) yields a positive match. The third frameshows the full sideview vehicle, which is labeled with a purplebounding box.

C. Tracking Vehicle Parts and Vehicles

We integrate tracking of vehicle parts and vehicles usingKalman filtering in the image plane. For each frame, we per-form nonmaximal suppression of detection by merging de-tection that overlaps. We then track the parts and vehiclesbetween frames, estimating their positions and velocities. Thestate vector for a given tracked object is as follows. Each ofthe parameters in the state and observation vectors are in pixelunits. Thus

Vk = [ik jk wk hk Δik Δjk Δwk Δhk]. (6)

We track a given object’s i, j position in the image plane, aswell as its width and height, using a constant velocity model for


Fig. 6. Using the tracking velocities of the vehicle parts eliminates erronoussideview vehicles. (a) A sideview vehicle is erroneously constructed froman oncoming and a preceding vehicle. (b) Using velocity information fromtracking, the erroneous sideview vehicle is not constructed.

tracking. The linear dynamic system for each tracked vehicle orpart is given as

Vk+1 =AVk + ηk

Mk =CVk + ξk

Mk = [ik jk wk hk] + ξk. (7)

Variables ηk and ξk are the plant and observation noise,respectively. The state transition matrix is A and the observationmatrix is C. The observation consists of the pixel location,bounding box width, and bounding box height of the vehicleand the vehicle part.

We initiate tracks for all newly detected vehicles and vehicleparts. When a sideview vehicle is first detected, we initialize itsvelocity with that of the front part that forms it. We discard thetracks of vehicles and parts that have not been redetected forover two frames. As parts are tracked and associated, we take amajority vote over the past three frames to determine whether atracked part is a rear or a front vehicle part.

Vehicle-part velocities are used to disambiguate part con-figurations for side vehicle detection. For each pair of vehicleparts, we determine whether the parts are moving in the samedirection in the image plane by checking the motion similarityusing horizontal and width velocities of two given parts, asshown in (8). The horizontal component is used to measurethe potential cross-traffic motion of a vehicle and the widthcomponent is used to measure the changing distance from theego-vehicle. Thus

vk = [Δik Δwk]T

m(pf , pr) =√

(vf − vr)T (vf − vr). (8)

Fig. 6 shows this operation. On the left, we see that partsfrom oncoming and preceding vehicles can be matched to formerroneous side vehicles (shown in purple). On the right, we seethat using velocity information before applying (5) eliminatesthe erroneous side vehicle.

IV. EXPERIMENTAL EVALUATION

Here, we present quantitative analysis of the system pre-sented in this work. We evaluate the system on three real-worldon-road data sets, which are representative of urban driving, i.e.,LISA-Q Urban, LISA-X Downtown, and LISA-X Intersection.We describe the validation sets in detail in the Appendix.

TABLE IVNUMBER OF TRAINING EXAMPLES

A. Training Data

In this paper, we have trained the part detectors using activelearning. Initial detectors for front and rear parts were trainedusing 5000 positive and 5000 negative training examples. Usingthe initial detector, we performed a round of active learning,querying informative training examples for part detection. Weretrained part detection classifiers using these informative ex-amples, again with 5000 positive and 5000 negative examples.Image patches for part detection are 24 × 24 pixels.

During the active learning query process, we semisupervisedlabeled side-vehicle training examples. In all, we collected600 positive and 600 negative training examples for the part-matching classifier. Table IV summarizes the training data usedin this study.

B. Part Detection

Detection of vehicles is a challenging vision problem, anddetecting vehicle parts independently presents additional dif-ficulties. In pursuit of real-time vehicle detection, we workwith low-resolution image patches for part detection. Indeed,as shown in Table VIII, the video resolution is 500 × 312 forexperimental validation.

In this paper, we detect and track oncoming, preceding,sideview, and partially occluded vehicles in urban environ-ments, based on detection of independent vehicle parts. Theperformance of the overall VDIP system is dependent on theperformance of part detection. We quantify the performance ofpart detection, comparing active-learning-based part detectionwith part detection using the initial part detectors (see Fig. 7).

Fig. 8 plots the True Positive Rate versus False Positives perFrame on LISA-X Downtown. The performance of the initialpart detectors is shown in red. The performance of the active-learning-based part detectors is shown in blue. We note thatdetecting vehicle parts using active learning yields significantlyimproved performance.

Fig. 9 plots the True Positive Rate versus False Positives perFrame on LISA-X Intersection. The performance of the initialpart detectors is shown in red. The performance of the active-learning-based part detectors is shown in blue. In both cases,active learning yields significantly stronger part detection.

Reliable part detection and tracking allows us to detectpartially occluded, oncoming, and preceding vehicles, whichare all commonly encountered in urban driving. Vehicles appearpartially occluded to the camera when entering and exiting thecamera’s field of view. Vehicles are also frequently partiallyoccluded by other vehicles or by pedestrians and other road


Fig. 7. Showing the full track of a vehicle in the camera’s field of view. In the first frame, the front part of the vehicle is detected, but most of the vehicle isoccluded. A couple of frames later, the rear part of the vehicle is detected as well. The full sideview vehicle is detected and identified with a purple bounding box.The vehicle and its parts are tracked while they remain in the camera’s field of view. As the vehicle leaves the camera’s field of view, its rear part is still detected.

Fig. 8. Part detection performance, LISA-X Downtown. True-Positive Rateversus False Positives per Frame, for the (red) initial part detectors and the(blue) active-learning-based part detectors.

users. Fig. 11(c) shows a pedestrian walking in front of thecamera and the detected parts of the partially occluded vehicle.

C. VDIP System Performance and Comparative Evaluation

We perform an experimental evaluation using the validationsets described in Table VIII in the Appendix. We comparethe performance of this system with the performance of ve-hicle detection using the DPM [15], [44], using the code thatFelzenszwalb et al. in [15] have made publicly available. Wecompare the performance of our system using detection-onlyand the full detection and tracking system. Using trackinggenerally reduces the false-positive rate and increases the de-tection rate. We evaluate detection of fully visible, sideview,and partially occluded vehicles, as defined in Table III, for bothsystems. The false positives per frame are a detection that doesnot correspond to vehicles or vehicle parts.

Table V summarizes the comparative performance betweenour VDIP system and the DPM vehicle model on the LISA-QUrban data set. This data set only features preceding vehiclesand no occluded vehicles. Both systems exhibit high recall;however, the VDIP system also returns more false positivesthan the DPM. Fig. 11(a) shows an example frame from thedata set. While the false-positive rate returned by our systemsis higher than the comparison, we note that the false-positiverate is roughly equal to the system reported in [9] on the samedata set.

System performance is evaluated on the LISA-X Downtownvalidation set, which features both oncoming and preceding

Fig. 9. Part detection performance, LISA-X Intersection. True-Positive Rateversus False Positives per Frame, for the (red) initial part detectors and the(blue) active-learning-based part detectors.

vehicles in a dynamic downtown scene. Table VI summarizesthe comparative performance on this data set. We note thatVDIP exhibits high recall over the data set, which is signif-icantly higher when compared with the DPM. We note thatVDIP returns a higher number of false positives over the dataset than the DPM. Fig. 12 shows sample frames, featuring falsepositives returned by our system on this data set. These falsepositives include poorly localized parts of vehicles and multipledetection of the same parked vehicle. Asymmetric weightingof the training set can help with false-positive rates in difficultsettings and will be a future area of further research [45].

Table VII summarizes the comparative performance onLISA-X Intersection. This data set is the longest of the three andfeatures oncoming, preceding, sideview, and partially occludedvehicles. For fully visible vehicles, the VDIP system performsquite well, with an 87.5% detection rate, which rates quitefavorably against the DPM. In particular, the VDIP systemdetects preceding and oncoming vehicles at low resolutions,with higher recall than the DPM.

For sideview vehicles, both systems exhibit a high true-positive rate. The DPM does extremely well with fully visibleand sideview vehicles. Fig. 11(b) shows a sample frame fromLISA-X Intersection, showing the detection of sideview andoncoming vehicles.

In detection of partially occluded vehicles, there is a ma-jor difference in performance between the DPM and VDIPsystems. For partially occluded vehicles, the VDIP systemhas a significantly higher detection rate. This is owed to the


TABLE VLISA-Q URBAN, 300 FRAMES, PRECEDING VEHICLES ONLY

TABLE VILISA-X DOWNTOWN, 500 FRAMES, PRECEDING, ONCOMING, PARTIALLY OCCLUDED

TABLE VIILISA-X INTERSECTION, 1500 FRAMES, PRECEDING, ONCOMING, SIDEVIEW, PARTIALLY OCCLUDED

Fig. 10. True-Positive Rate versus False Positives per Frame, for the part-matching classifier, compared with the system presented in [15].

VDIP system’s training to detect and track independent parts.Detection of independent parts enables the VDIP system todetect partially occluded vehicles, for example, as they enter thesensor’s field of view. The false positives per frame returned bythe VDIP system are somewhat higher than the DPM on thisdata set but are comparably reasonable.

Fig. 11 shows sample video frames featuring successfulVDIP system performance in a variety of urban driving scenar-ios. Among those vehicles featured are preceding, oncoming,sideview, and partially occluded vehicles. Fig. 12 shows sam-ple frames in which the VDIP system had difficulties. Theseinclude false positives and missed detection or poorly localizedparts and vehicles.

We evaluate the performance of the part-matching classifierby varying the threshold over which we keep results fromevaluating (5) between two given parts. Fig. 10 plots the True-

Positive Rate versus False Positives per frame, for the part-matching classifier, with the performance of the detector in[15] plotted for reference. We note that the part-matchingclassifier features high recall, while introducing relatively fewfalse positives (see Fig. 11).

The fully implemented detection-and-tracking-by-independent-parts system operates in real time, running atapproximately 14.5 frames on an Intel i7 Core processor. Thisfavorably compares to many vehicle detection systems in theliterature. There are no specific optimizations implemented inthis system (see Fig. 12).

V. CONCLUDING REMARKS

In this paper, we have introduced VDIP for urban driverassistance. Using active learning, independent front and rearpart classifiers are trained for detecting vehicle parts. Whilequerying examples for active-learning-based detector retrain-ing, sideview vehicles are labeled using semisupervised label-ing to train a part-matching classifier for vehicle detection byparts. Vehicles and vehicle parts are tracked using Kalmanfiltering. The system presented in this work detects vehiclesin multiple views, i.e., oncoming, preceding, sideview, andpartially occluded. The system has been extensively evaluatedon real-world video data sets and favorably performs whencompared with the state of the art in part-based object detec-tion. The system is lightweight and runs in real time. Futurework will explore the extension of this detection and trackingapproach to augment driver assistance applications, such asintegration with stereo vision [23], learning vehicle motionpatterns [46], and maneuver-specific assistance [6].

APPENDIX

VALIDATION SETS

We evaluate the performance of the vehicle detection andtracking system using three data sets consisting of on-road


Fig. 11. Showcasing VDIP system performance in various scenarios. (a) Detecting preceding vehicle. (b) Detecting oncoming, sideview, and partially occludedvehicles at an intersection. (c) Detection of occluded vehicle parts while a pedestrian walks in front of the camera. (d) Oncoming and sideview vehicles.(e) Oncoming and preceding vehicles. (f) Oncoming and sideview vehicles.

Fig. 12. Showcasing VDIP system difficulties. (a) False positives in urban traffic, due to multiple poorly localized detection of vehicles. (b) Ambiguousclassification between oncoming and preceding vehicles. (c) and (d) False positives due to complex background trees. (g) False positives due to road texture.(h) Poorly localized (purple) sideview bounding box, due to poor localization of the (red) rear part.

TABLE VIIIVALIDATION SETS USED IN THIS STUDY

video. The first data set we use is LISA-Q Front FOV, takenin urban driving. LISA-Q Urban is a publicly available dataset, published in 2010, in conjunction with [9]. Captured usingcolor video, it consists of 300 frames and only features preced-ing vehicles. There are 300 vehicles to be detected. The dataset consists of a drive behind a preceding vehicle and featuresstrong camera motion due to a speed bump.

LISA-X Downtown consists of 500 frames. Captured ingrayscale, it is a more challenging set, featuring 1995 vehiclesto be detected. The data set features oncoming, preceding, andparked vehicles. Captured in a downtown area, the ego-vehicledrives toward an intersection and waits to turn.

LISA-X Intersection consists of 1500 frames, captured ingrayscale. The data set features oncoming, preceding, sideview,

and partially occluded vehicles. The ego-vehicle drives onsurface streets for some time and approaches an intersection,where vehicles enter and turn. This data set is quite challenging.

Table VIII offers summaries of each of the three data setsused in this study.

ACKNOWLEDGMENT

The authors would like to thank P. Felzenszwalb, R. Girshick,D. McAllester, and D. Ramanan for their contributions and formaking their source code available. They would also like tothank colleagues for their valuable assistance and interactionsand the anonymous reviewers for their comments and feedback.


REFERENCES

[1] Traffic Safety Facts, Department of Transportation–National HighwayTraffic Safety Administration, Washington, DC, USA, 2011.

[2] M. M. Trivedi and S. Y. Cheng, “Holistic sensing and active displays forintelligent driver support systems,” IEEE Comput., vol. 40, no. 5, pp. 60–68, May 2007.

[3] Z. Sun, G. Bebis, and R. Miller, “On-road vehicle detection: A review,”IEEE Trans. Pattern Anal. Mach. Intell., vol. 28, no. 5, pp. 694–711,May 2006.

[4] S. Sivaraman and M. M. Trivedi, “Active learning for on-road vehicledetection: A comparative study,” Mach. Vis. Appl., Special Issue CarNavig. Veh. Syst., Dec. 2011.

[5] A. Jazayeri, H. Cai, J. Y. Zheng, and M. Tuceryan, “Vehicle detection andtracking in car video based on motion model,” IEEE Trans. Intell. Transp.Syst., vol. 12, no. 2, pp. 583–595, Jun. 2011.

[6] S. Sivaraman and M. Trivedi, “Integrated lane and vehicle detection,localization, and tracking: A synergistic approach,” IEEE Trans. Intell.Transp. Syst., vol. 14, no. 2, pp. 906–917, Jun. 2013.

[7] W.-C. Chang and C.-W. Cho, “Online boosting for vehicle detection,”IEEE Trans. Syst., Man, Cybern. B, Cybern., vol. 40, no. 3, pp. 892–902,Jun. 2010.

[8] R. O’Malley, E. Jones, and M. Glavin, “Rear-lamp vehicle detection andtracking in low-exposure color video for night conditions,” IEEE Trans.Intell. Transp. Syst., vol. 11, no. 2, pp. 453–462, Jun. 2010.

[9] S. Sivaraman and M. M. Trivedi, “A general active learning frameworkfor on-road vehicle recognition and tracking,” IEEE Trans. Intell Transp.Syst., vol. 11, no. 2, pp. 267–276, Jun. 2010.

[10] B.-F. Lin, Y.-M. Chan, L.-C. Fu, P.-Y. Hsiao, L.-A. Chuang, S.-S. Huang,and M.-F. Lo, “Integrating appearance and edge features for sedan vehicledetection in the blind-spot area,” IEEE Trans. Intell. Transp. Syst., vol. 13,no. 2, pp. 737–747, Jun. 2012.

[11] H. Tehrani Niknejad, A. Takeuchi, S. Mita, and D. McAllester, “On-roadmultivehicle tracking using deformable object model and particle filterwith improved likelihood estimation,” IEEE Trans. Intell. Transp. Syst.,vol. 13, no. 2, pp. 748–758, Jun. 2012.

[12] J. C. Rubio, J. Serrat, A. M. Lopez, and D. Ponsa, “Multiple-target track-ing for intelligent headlights control,” IEEE Trans. Intell. Transp. Syst.,vol. 13, no. 2, pp. 594–605, Jun. 2012.

[13] S. Sivaraman and M. M. Trivedi, “Real-time vehicle detection usingparts at intersections,” in Proc. IEEE Intell. Transp. Syst. Conf., 2012,pp. 1519–1524.

[14] F. Garcia, P. Cerri, A. Broggi, A. de la Escalera, and J. M. Armingol, “Datafusion for overtaking vehicle detection based on radar and optical flow,”in Proc. IEEE IV Symp., Jun. 2012, pp. 494–499.

[15] P. Felzenszwalb, R. Girshick, D. McAllester, and D. Ramanan, “Objectdetection with discriminatively trained part based models,” IEEE Trans.Pattern Anal. Mach. Intell., vol. 32, no. 9, pp. 1627–1645, Sep. 2010.

[16] B. Wu and R. Nevatia, “Detection and segmentation of multiple, par-tially occluded objects by grouping, merging, assigning part detectionresponses,” Int. J. Comput. Vis., vol. 82, no. 2, pp. 185–204, Apr. 2009.

[17] D. Tosato, M. Farenzena, M. Cristani, and V. Murino, “Part-based humandetection on Riemannian manifolds,” in Proc. IEEE Int. Conf. ImageProcess., 2010, pp. 3469–3472.

[18] O. Tuzel, F. Porikli, and P. Meer, “Pedestrian detection via classifica-tion on Riemannian manifolds,” IEEE Trans. Pattern Anal. Mach. Intell.,vol. 30, no. 10, pp. 1713–1727, Oct. 2008.

[19] Z. Lin, G. Hua, and L. Davis, “Multiple instance feature for robust part-based object detection,” in Proc. IEEE Comput. Vis. Pattern Recognit.,2009, pp. 405–412.

[20] W.-C. Chang and C.-W. Cho, “Real-time side vehicle tracking using parts-based boosting,” in Proc. IEEE Int. Conf. SMC, 2008, pp. 3370–3375.

[21] A. Barth and U. Franke, “Tracking oncoming and turning vehic-les at intersections,” in Proc. IEEE Intell. Transp. Syst. Conf., 2010,pp. 861–868.

[22] H. Badino, R. Mester, J. Wolfgang, and U. Franke, “Free space compu-tation using stochastic occupancy grids and dynamic programming,” inProc. ICCV Workshop Dyn. Vis., 2007, pp. 1–12.

[23] S. Sivaraman and M. M. Trivedi, “Combining monocular and stereo-vision for real-time vehicle ranging and tracking on multilane highways,”in Proc. IEEE Intell. Transp. Syst. Conf., 2011, pp. 1249–1254.

[24] R. Danescu, F. Oniga, and S. Nedevschi, “Modeling and tracking thedriving environment with a particle-based occupancy grid,” IEEE Trans.Intell. Transp. Syst., vol. 12, no. 4, pp. 1331–1342, Dec. 2011.

[25] C. Huang and R. Nevatia, “High performance object detection by collab-orative learning of joint ranking of granules features,” in Proc. Comput.Vis. Pattern Recognit., 2010, pp. 41–48.

[26] D. Tosato, M. Farenzena, M. Cistani, and V. Murino, “A re-evaluationof pedestrian detection on Riemannian manifolds,” in Proc. Int. Conf.Pattern Recognit., 2010, pp. 3308–3311.

[27] P. Felzenszwalb, R. Girshick, and D. McAllester, “Cascade object detec-tion with deformable part models,” in Proc. IEEE Comput. Vis. PatternRecognit., 2010, pp. 2241–2248.

[28] H. Niknejad, S. Mita, D. McAllester, and T. Naito, “Vision-based vehicledetection for nighttime with discriminately trained mixture of weighteddeformable part models,” in Proc. 14th IEEE ITSC, 2011, pp. 1560–1565.

[29] M. Enzweiler, A. Eigenstetter, B. Schiele, and D. M. Gavrila, “Multi-cuepedestrian classification with partial occlusion handling,” in Proc. IEEEComput. Vis. Pattern Recognit., 2010, pp. 990–997.

[30] X. Zhang, N. Zheng, Y. He, and F. Wang, “Vehicle detection usingan extended hidden random field model,” in Proc. 14th IEEE ITSC,Oct. 2011, pp. 1555–1559.

[31] Q. Yuan, A. Thangali, V. Ablavsky, and S. Sclaroff, “Learning a familyof detectors via multiplicative kernels,” IEEE Trans. Pattern Anal. Mach.Intell., vol. 33, no. 3, pp. 514–530, Mar. 2011.

[32] S. Dasgupta and D. J. Hsu, “Hierarchical sampling for active learning,” inProc. Int. Conf. Mach. Learn., 2008, pp. 208–215.

[33] A. J. Joshi and F. Porikli, “Scene adaptive human detection with incre-mental active learning,” in Proc. ICPR, 2010, pp. 2760–2763.

[34] P. Viola and M. Jones, “Rapid object detection using a boosted cascadeof simple features,” in Proc. IEEE Comput. Vis. Pattern Recognit. Conf.,2001, pp. I-511–I-518.

[35] Y. Freund and R. Schapire, “A short introduction to boosting,” J. Jpn. Soc.Artif. Intell., vol. 14, no. 5, pp. 771–780, Sep. 1999.

[36] D. Cohn, L. Atlas, and R. Ladner, “Improving generalization with activelearning,” Mach. Learn., vol. 15, no. 2, pp. 201–221, May 1994.

[37] Z. Sun, G. Bebis, and R. Miller, “Monocular precrash vehicle detection:Features and classifiers,” IEEE Trans. Image Process., vol. 15, no. 7,pp. 2019–2034, Jul. 2006.

[38] B. Settles, “Active Learning Literature Survey,” Univ. of Wisconsin-Madison, Madison, WI, USA, Comput. Sci. Tech. Rep. 1648, 2009.

[39] Number of U.S. Aircraft, Vehicles, Vessels, and Other Conveyances,Bureau of Transportation Statistics–National Transportation Statistics,Washington. DC, USA, 2012.

[40] C. Cortes and V. Vapnik, “Support vector networks,” Mach. Learn.,vol. 20, no. 3, pp. 273–297, Sep. 1995.

[41] R. Gale and L. S. Shapley, “College admissions and the stability of mar-riage,” Amer. Math. Mon., vol. 69, no. 1, pp. 9–14, Jan. 1962.

[42] C.-C. Chang and C.-J. Lin, “LIBSVM: A library for support vector ma-chines,” ACM Trans. Intell. Syst. Technol., vol. 2, no. 3, pp. 27:1–27:27,Apr. 2011. [Online]. Available: http://www.csie.ntu.edu.tw/~cjlin/libsvm

[43] H.-T. Lin, C.-J. Lin, and R. C. Weng, “A note on Platt’s probabilistic out-puts for support vector machines,” Mach. Learn., vol. 68, no. 3, pp. 267–276, Oct. 2007.

[44] P. F. Felzenszwalb, R. B. Girshick, and D. McAllester, Discriminativelytrained deformable part models, release 4. [Online]. Available: http://people.cs.uchicago.edu/pff/latent-release4/

[45] P. Viola and M. Jones, “Fast and robust classification using asymmetricadaboost and a detector cascade,” in Proc. Adv. Neural Inf. Process. Syst.,2002, vol. 14, pp. 1311–1318.

[46] S. Sivaraman, B. T. Morris, and M. M. Trivedi, “Learning multi-lanetrajectories using vehicle-based vision,” in Proc. IEEE Int. Conf. Comput.Vis. Workshop, 2011, pp. 2070–2076.

Sayanan Sivaraman (M’07) received the B.S. de-gree in electrical engineering from the Universityof Maryland, College Park, MD, USA, in 2007;the M.S. degree in electrical engineering from theUniversity of California, San Diego, CA, USA, in2009; and the Ph.D. degree in electrical engineeringin 2013.

His Ph.D. work was in intelligent systems,robotics, and controls. His research interests in-clude computer vision, machine learning, probabilis-tic modeling, intelligent vehicles, and transportation

systems.Dr. Sivaraman served on the Organizing Committee for the IEEE Intelligent

Vehicles Symposium (IV 2010). He was awarded Second Place for the ITSAmerica Best Student Essay in 2013.


Mohan Manubhai Trivedi (F’09) received the B.E.degree (with honors) from Birla Institute of Technol-ogy and Science, Pilani, India, and the Ph.D. degreefrom Utah State University, Logan, UT, USA.

He is a Professor of electrical and computer engi-neering and the Founding Director of the ComputerVision and Robotics Research Laboratory and theLaboratory for Intelligent and Safe Automobiles,University of California, San Diego, CA, USA. Heserves as a Consultant to industry and governmentagencies in the U.S. and abroad, including various

government agencies, major auto manufacturers, and research initiatives inAsia and Europe. He is a coauthor of a number of papers that have won “BestPaper” awards. He and his team are currently pursuing research in machine andhuman perception, multimodal interfaces and interactivity, machine learning,intelligent vehicles, driver assistance, and active safety systems.

Dr. Trivedi is serving on the Board of Governors of the IEEE IntelligentTransportation Systems Society and on the Editorial Board of the IEEETRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS. Two of hisstudents, namely, Dr. S. Cheng and Dr. B. Morris, were awarded the Best Dis-sertation Award by the IEEE Intelligent Transportation Systems Society in 2008and 2010, respectively, and the dissertation of his advisee, i.e., Dr. A. Doshi,was selected among the top five finalists in the dissertation competition of theWestern Association of Graduate Schools (USA and Canada) in 2011. He hasreceived the Distinguished Alumnus Award from Utah State University, thePioneer Award (Technical Activities), and the Meritorious Service Award fromthe IEEE Computer Society. He has given over 65 Keynote/Plenary talks atmajor conferences. He is a Fellow of the International Association for PatternRecognition and the International Society for Optics and Photonics (SPIE).

IEEETRANSACTIONS ON INTELLIGENT TRANSPORTATION...

Documents

Transcript of IEEETRANSACTIONS ON INTELLIGENT TRANSPORTATION...