Vehicle subtype, make and model classiﬁcation from side...

Vehicle subtype, make and model classification from side profile video Conference or Workshop Item

Accepted Version

Boyle, J. and Ferryman, J. (2015) Vehicle subtype, make and model classification from side profile video. In: 12th IEEE International Conference on Advanced Video and Signalbased Surveillance (AVSS2015), August 2528, 2015, Karlsruhe, Germany, pp. 16. Available at http://centaur.reading.ac.uk/47318/

It is advisable to refer to the publisher’s version if you intend to cite from the work. See Guidance on citing .Published version at: http://ieeexplore.ieee.org/xpl/articleDetails.jsp?reload=true&arnumber=7301783

All outputs in CentAUR are protected by Intellectual Property Rights law, including copyright law. Copyright and IPR is retained by the creators or other copyright holders. Terms and conditions for use of this material are defined in the End User Agreement .

www.reading.ac.uk/centaur

CentAUR

http://centaur.reading.ac.uk/71187/10/CentAUR%20citing%20guide.pdf

http://www.reading.ac.uk/centaur

http://centaur.reading.ac.uk/licence

Central Archive at the University of Reading

Reading’s research outputs online

Vehicle Subtype, Make and Model Classification from Side Profile Video

Jonathan Boyle and James FerrymanComputational Vision Group, School of Systems Engineering, University of Reading, UK

{j.n.boyle | j.m.ferryman}@reading.ac.uk

Abstract

This paper addresses the challenging domain of vehicleclassification from pole-mounted roadway cameras, specifi-cally from side-profile views. A new public vehicle dataset ismade available consisting of over 10000 side profile images(86 make/model and 9 sub-type classes). 5 state-of-the-artclassifiers are applied to the dataset, with the best achiev-ing high classification rates of 98.7% for sub-type and 99.7-99.9% for make and model recognition, confirming the as-sertion made that single vehicle side profile images can beused for robust classification.

1. Introduction

Vehicle classification has many important applicationsincluding traffic analysis, tolling, and law enforcement suchas border checkpoints where it is of interest to verify theidentity of vehicles which may have spoofed number plates.This work focusses on classification from side profile viewsof vehicles. The paper first describes the approach to detectthe passage of a vehicle and extract a suitable normalisedimage ready for classification. This process results in a newpublic dataset of over 10000 images made available to thewider research community. Second, a number of state-of-the-art classifiers are evaluated to validate the use of sideprofile images for vehicle sub-type, make and model recog-nition.

2. Related work

Prior work on video-based vehicle classification is gener-ally limited to splitting observed vehicles into a small num-ber of categories. The approaches of [10, 13] estimate thesize of detected vehicles, with [10] exploiting camera cali-bration to classify vehicles as truck/non-truck. Instead of

This project has received funding from the European Union’s Sev-enth Framework Programme for research, technological development anddemonstration under grant agreement no. 312784. We would also like tothank Murray Evans for his preliminary input to this work.

considering just size, [6] use more statistics of the fore-ground silhouettes to classify seven classes (cars, vans andvarious sizes of trucks), but notably not the different sub-types of cars, nor their makes and models. One approachto extending beyond simple size estimates is to considerfitting simple wireframe 3D models to the observed vehi-cles [3]. The advantage of such model based approaches is areduction in view-point dependence; however, they remainlimited to the basic classes (bus/lorry, van, car/taxi, mo-torbike/bicycle) and producing simple 3D models accurateenough to distinguish between the many makes and modelsor subtypes of private cars seems unlikely to have high suc-cess. Turning to exploiting lower level image features, [7]apply a two-step kNN classifier with geometric and texturebased features for 7 types classes of vehicle exploiting bothfrontal and rear views, [5] applied various classifiers to threevehicle type classes based on SURF and Gabor features,and [12] developed an SVM classifier based on a structuraledge signature extracted from rear vehicle views. However,[12] is limited to 3 classes with a total of 1664 images. In-vestigation of make and model classification is generally amore recent occurrence. [15] apply number plate detectionto captured frontal images of vehicles and train a classifierensemble based on the Gabor transform and PHoG for 600images/15 brands/21 vehicle classes. A similar viewpointis used by [8], which considers edge, gradient and cornerfeatures and report 96% make/model accuracy based on asmall set of 262 frontal vehicle images. [4] and [14] pro-vide further review of various approaches.

This work departs from these approaches to consider sideprofile views and a significantly larger number of classes.For vehicle subtypes, the side profile is considered to bethe most discriminating viewpoint, however it is also con-jectured that it provides a large quantity of information thatcan be used for make and model classification in terms ofthe shapes of side windows, presence and location of trim,as well as overall vehicle shape. The vehicle wheels alsoprovide a more reliably detectable feature for normalisationthan is possible using the licence plate and frontal views.Finally, for the installation location, it was found that withflowing and queueing traffic, the frontal view of vehicles

VW LupoFigure 1: Detection and classification process. The vehicleenters the scene, and background subtraction occurs. Fore-ground pixels in the “trigger zone” trigger vehicle detection.Wheels are detected, then used for normalisation. The nor-malised image is classified.

was often occluded, which the side view did not suffer from.

3. Side Profile Dataset

A camera was installed to monitor the Pepper Lane en-trance of the University of Reading with optical axis per-pendicular to the flow of traffic at a height of approximately2.5m. At this height, the vehicle is seen primarily in profile,and the visibility of roof, bonnet (hood), and boot (trunk)surfaces is minimised. The camera was set to record twohours of the morning rush-hour every working day over thecourse of four weeks. Once vehicles had been detected andextracted, the resulting dataset contained over 10000 im-ages of vehicles, most of which were manually classifiedinto sub-types as well as over 86 make/model categorieswith a wide range of populations. This data, as well asthe frames extracted when vehicles are present, and the fullnormalised and labelled training set, are made available athttp://www.cvg.reading.ac.uk/rvd.

3.1. Vehicle detection

Detection of the vehicle’s presence in the acquired im-ages is performed using standard background subtraction[16]. A region towards the middle of the image (see Figure1) is specified as the “trigger zone”, and when the ratio offoreground to background pixels within this region exceeds0.25 the image is considered to contain a vehicle. At thisstage, wheel detection is performed. An overview of thedetection and normalisation process is shown in Figure 1.

3.1.1 Wheel Detection

Wheel detection is performed using a trained wheel/not-wheel classifier and a stochastic, multi-dimensional search.First, a Random Forest classifier [2] was trained on 27674wheel images (and 13643 negative (non-wheel) random im-ages) manually extracted from the dataset, totalling 41319images. Each wheel training image is a square image cen-tred on the centre of the wheel, with a width to approxi-mately match the diameter of the wheel plus some of thesurround (the presence of wheel-arch and some body-workis considered beneficial context to discriminate a wheelfrom the background or other clutter). The forest is grownin a traditional manner with three decision nodes (RGBcolour difference, Gradient magnitude, Gradient direction,see Section 4.1.1) that analyse the low-level features. Oneof the nodes is selected at random, and then its parame-ters (pixel locations, thresholds, etc.) optimised to producethe best decision as measured by the Gini-index. The finalforest contained 798 trees. Because the image features areextremely simple, the forest can classify a large number ofimage locations in a reasonable period of time allowing thewheel search to be conducted in near real-time.

The wheel search seeks to the optimal location in the im-age for two wheels however, not just one. As such, there arefive dimensions, x, y, dx, dy, s, where (x, y) describes thelocation of the rear wheel, (x + dx, y + dy) the location ofthe front wheel, and s controls the size of the bounding boxextracted from the image and used by the wheel classifier(although only a single lane of traffic is observed, the vari-ation in wheel size between vehicle types (e.g. SUV, citycar) can be large enough to warrant the search in scale.).The search is initialised to an estimated location relative tothe “trigger zone” which is based on initial observations, orto the previously detected location if the trigger remains ac-tive. Letting vw represent the votes of the classifier for thewheel class, and vw represent the votes for non-wheel class,the search seeks to maximise vw − vw for both wheels.

An example of a wheel detection can be seen in the cen-tral image of Figure 1, while Figure 2 shows some examplesresulting from the normalisation process for both the RGBcolour image and the foreground mask image.

3.1.2 Normalization

Classification is a far simpler task if the input data is firstnormalised to account for scale and rotation variations. As-suming that the most forward and most rearward wheels ofthe vehicle can be detected in the image, simple transfor-mations are applied to scale and rotate the detected vehicleto produce a normalised image. Once the wheels have beendetected, a 200 × 85 pixel image is created that places thefront wheel at (150, 70) and the rear wheel at (50, 70) bytranslation, rotation, scaling and cropping of the original

Figure 2: Examples of the normalised RGB and foregroundmask composing the vehicle dataset.

sub-type image countcity 882

hatchback 5821large hatchback 694

saloon 1074estate 680

people carrier 215sports 189SUV 357van 589

Table 1: Vehicle sub-types and the total number of labelledimages for each

image. As the background subtraction process provides auseful shape descriptor in the form of foreground mask, thistoo is normalised and made available for the classificationprocess.

The database of manually annotated images available forclassification consists, in total, of more than 10000 imagesdivided in a highly unbalanced way into 9 vehicle sub-typesand 86 make/model classes (see Table 3) as typically under-stood in the UK, as shown in Tables 1 and 4.

4. Vehicle ClassificationClassification is performed using five available classi-

fiers:

• Img-Forest: A random forest operating on simple im-age features.

• Evo-Forest: An evolutionary forest that fuses the ran-dom forest with with techniques from genetic algo-rithms to evolve a forest optimised for the classifica-tion task.

• HoG-Forest: A random forest operating on a HoG de-scriptor of the whole image. Decision nodes are themore traditional selecting one attribute of the descrip-tor and comparing to a threshold.

• HoG-Lin-SVM: A linear SVM operating on the im-age’s HoG descriptor.

• HoG-RBF-SVM: An SVM with an RBF kernel oper-ating on the image’s HoG descriptor.

4.1. Classifiers

4.1.1 Random Forest

The Random Forest, introduced by Breiman[2], is an en-semble of binary decision trees where each decision nodeof the tree is configured to exploit a randomly chosen clas-sifier from a set of available classifiers, with randomly ini-tialised parameters optimised to produce the best split ofthe training examples. It represents an appealing classi-fier because of its natural multi-class nature, and accom-panying fast computation. In this work the vehicle datasetconsists of RGB images and black/white foreground masks.The colour images are simply processed to produce gradi-ent magnitude and gradient orientation images (these arenot thresholded to produce edge images). This enables anumber of possible decision nodes that can be considered:

1. RGB colour difference: Given two pixels of the RGBimage, threshold the Euclidean colour difference. Thisrequires 5 parameters as the image coordinates of thetwo pixels (x1, y1), (x2, y2) and τ , the threshold.(CIELAB colour space was also tested for this node,but there was no appreciable difference in classifierperformance).

2. Gradient magnitude: Given a single pixel coordinate(x, y), determine if the image gradient has a magnitudelarger than a threshold τ .

3. Gradient orientation: Given a single pixel coordinate(x, y), determine if the orientation of the image gradi-ent is between two thresholds τ1 and τ2

4. Mask pixels: Given a single pixel coordinate (x, y),determine if the pixel is foreground or background inthe mask image.

5. Scale range: When normalising the image, the scalefactor applied to resize the image can be recorded. As-suming all vehicles are a similar distance from thecamera, this will provide an estimate of the vehiclesize. The node checks if the scale is between twothresholds.

The generated forests had a total of 200 grown trees each,with a maximum depth of 20. When a node is first beingcreated, a type from the list above is selected at random andoptimised using the Gini Impurity.

4.1.2 Evolutionary Forest

While each tree of the Random Forest is optimised to be thebest classifier it can be, the trees are not optimized to worktogether to produce the best forest. Hence this work alsoconsiders an evolutionary approach to growing the forestapplying a fitness function that aims to ensure each new treemaximises its own strength, while minimising correlationwith existing trees. The decision nodes are the same as forthe random forest. The process of training an evolutionaryforest consisting of n trees summarised as follows:

1. Start with an empty set Ff of trees.

2. Iterate until |Ff | = n:

(a) Create a sub-sample of the training set for train-ing trees of this pool.

(b) Grow a pool of potential trees Fp.

(c) Iterate until ready:

i. evaluate trees in poolii. replace poor trees with new trees by cross-

breeding/mutating/growing

(d) take best tree in Fp and add to Ff

A maximum number of 200 trees is imposed, with a ter-minating condition to stop adding additional trees if the ac-curacy over the training set reaches 100%. Further detailsof the evolutionary forest implementation are given in [1].

4.1.3 HoG-based Random Forest

A further random forest was considered whereby each im-age is represented by a pre-processed Histogram of Ori-ented Gradients (HoG) vector. With the vehicle imagesnormalized to images of size (200x85) a HoG descriptorof 7524 elements is generated with a block size of (20,14)pixels and a block stride of (10,7) pixels from a greyscaleversion of the image. Instead of using multiple nodes as de-scribed in Section 4.1.1, only a single type of node was usedwhich compares a single value in the vector to a threshold.

4.1.4 HoG-based SVM classifiers

The Support Vector Machine (SVM) is a favoured classifierin the literature. For this work, the one-vs-all multi-classapproach is used (as defended by [9]) where a classifier istrained for each class against all other classes. The winningclass for a given unknown input is the classifier with the

largest positive result. The same HoG vector as describedin Section 4.1.3 is used to train both an SVM with a linearkernel and an RBF kernel. For each final SVM trained, across-validation grid search is performed for each SVM todetermine the optimal parameters. The linear SVM’s C pa-rameter was optimised in the range {10, 100, 1000}. TheRBF SVM, with the same C parameter range, had a searchon γ in the range {0.0001, 0.001, 0.01}.

5. Experiments

5.1. Metrics

Each classifier was trained and tested 20 times. Multi-ple statistics (average accuracy, recall, precision) for multi-class classification, as described in [11], have been calcu-lated to measure the performance, with the minimum, me-dian and maximum for each statistic recorded (see Tables 3and 2):

Average Accuracy =

∑ci=1

tpi+tni

tpi+fni+fpi+tni

c(1)

RecallM =

∑ci=1

tpi

tpi+fni

c(2)

PrecisionM =

∑ci=1

tpi

tpi+fpi

c(3)

5.2. Vehicle Subtype

The vehicle dataset was first divided into classes asshown in Table 1 based on expert judgement, whichamounts to a common interpretation of the types of carson UK roads. The dataset was divided into eight cross-validation training/testing sets, where labelled images wereselected at random to be either training or testing. Thetraining sets were set to be at most min(0.5n, 200) images(where n is the total number of available training images forthe class), with the remainder being used for testing.

5.3. Vehicle Make/Model

To see what effect there might be on varying the num-ber of classes the make/model dataset was split into 4 sub-sets based on the number of vehicles available per class. Asummary of these splits are shown in Table 4. Again, theavailable data was split such that 50% of the images, or atmost 200 images, were used for training, the remainder fortesting. This did mean that the training data for these setscould be quite un-balanced, with some sets providing 200training images and some a mere 10.

where i is the class number, c is the number of classes,tpi, tni, fpi and fni are respectively the number of truepositives, true negatives, false positives, and false negatives,and M indicates the macro-average across all classes [11].

Set 20 Set 40 Set 100 Set 300Min Med Max Min Med Max Min Med Max Min Med Max

Img-ForestAcc 0.993 0.994 0.994 0.992 0.992 0.993 0.986 0.987 0.988 0.960 0.965 0.971Rec 0.570 0.602 0.623 0.675 0.698 0.713 0.766 0.782 0.797 0.879 0.897 0.912Prec 0.737 0.782 0.817 0.794 0.812 0.829 0.825 0.845 0.861 0.872 0.891 0.912

Evo-ForestAcc 0.997 0.998 0.998 0.995 0.996 0.997 0.989 0.990 0.992 0.963 0.971 0.976Rec 0.895 0.907 0.922 0.883 0.895 0.904 0.900 0.911 0.920 0.911 0.927 0.940Prec 0.843 0.858 0.883 0.829 0.849 0.883 0.866 0.887 0.908 0.903 0.924 0.938

HoG-ForestAcc 0.996 0.996 0.997 0.994 0.995 0.995 0.994 0.994 0.995 0.985 0.986 0.989Rec 0.843 0.851 0.857 0.875 0.880 0.887 0.949 0.953 0.956 0.977 0.979 0.983Prec 0.830 0.843 0.847 0.833 0.850 0.860 0.823 0.854 0.860 0.613 0.616 0.625

HoG-Lin-SVMAcc 0.999 0.999 0.999 0.999 0.999 0.999 0.998 0.998 0.998 0.991 0.995 0.997Rec 0.933 0.947 0.953 0.936 0.949 0.961 0.966 0.975 0.981 0.973 0.983 0.990Prec 0.943 0.953 0.961 0.942 0.950 0.956 0.967 0.970 0.975 0.972 0.982 0.989

HoG-RBF-SVMAcc 0.999 0.999 0.999 0.999 0.999 0.999 0.998 0.998 0.998 0.966 0.969 0.976Rec 0.929 0.937 0.948 0.939 0.945 0.953 0.967 0.974 0.980 0.881 0.899 0.921Prec 0.937 0.949 0.956 0.942 0.948 0.956 0.959 0.969 0.973 0.911 0.924 0.940

Table 2: Comparison of classification results for five classifiers on the Make/Model data splits (see Table 3). Highest numer-ical values for each statistic are highlighted in bold.

Min Med Max

Img-ForestAcc 0.944 0.948 0.954Rec 0.843 0.857 0.874Prec 0.661 0.697 0.727

Evo-ForestAcc 0.965 0.973 0.979Rec 0.826 0.838 0.855Prec 0.697 0.743 0.774

HoG-ForestAcc 0.970 0.975 0.979Rec 0.882 0.888 0.900Prec 0.776 0.809 0.819

HoG-Lin-SVMAcc 0.978 0.982 0.985Rec 0.950 0.957 0.963Prec 0.837 0.855 0.886

HoG-RBF-SVMAcc 0.979 0.982 0.987Rec 0.950 0.958 0.963Prec 0.837 0.866 0.889

Table 3: Comparison of classification results for five classi-fiers on the Subtype data splits (see Table 1). Highest nu-merical values for each statistic are highlighted in bold.

# available # classes # images300 6 3032100 27 636040 59 830520 86 9141

Table 4: Make/Model data splits, showing the number ofclasses and total number of images used for training forclasses that have at least the specified number of imagesavailable.

5.4. Results

Analysing the results for both subtype and make andmodel (Tables 3 & 2, Figure 3), it is clear that the twoSVM classifiers provide effectively equivalent results andsuperior classification performance to the three forest clas-sifiers, with the RBF kernel outperforming the linear kernelfor subtype and vice-versa for make and model. It is alsoclear that using the HoG descriptor provides a more robustclassification than using a set of low-level image features

Figure 3: Confusion Matrices. Top: Subclass HoG-RBF-SVM, Bottom: Make/Model Set 300 HoG-Lin-SVM. Bestviewed in colour.

(Img-Forest). This can be explained by the HoG descrip-tor’s blocks providing some robustness against small mis-alignments in the normalisation process and a lower sus-ceptibility to corruption caused by other sources of imagenoise. It is also noteworthy that the make and model classi-fication obtains higher accuracy than that of the sub-typeclassification task, however this mostly can be explainedby intra-class and inter-class variation. Most vehicle makesand models exhibit quite small intra-class variation, though

hatchbackpeoplecarrier

largehatchback

city

Peugeot 20x Peugeot 30x

Skoda FabiaEstate

SkodaOctaviaEstate

Figure 4: Misclassifications. Left column: Instance, rightcolumn: Classification. Best viewed in colour.

there are some exceptions such as long running marqueslike the Ford Fiesta that have undergone several generationsand ‘face lifts’. This means that the classifier can gener-ally learn very specific details for discriminating betweenthe different classes, and even though some make/modelclasses may be quite similar (see Figure 4, rows 3 & 4),the stability of each class’ appearance permits the classi-fier to find the distinctive features to tell them apart. Thisis far more difficult with the subtypes classification, wherenot only is the intra-class variation quite large (as each classcovers a wide range of vehicle makes and models), but theinter-class variation can be quite small. Indeed, the exactpoint where one category ends and another begins can oftenbe quite subjective. There will always be vehicle modelsthat span multiple categories. These can be subtle issuessuch as there being a number of vehicle models that retain asaloon shape but which actually have a hatchback style rearopening (classified as saloons in this work), or more trou-blesome like the issue of large 5-seat family cars designedto be very much the same shape and style as full size 7-seatpeople carriers (1st row, Figure 4), or where exactly the dis-tinction lies between a small city car and a more generalpurpose hatchback (2nd row, Figure 4). While the expertannotator employed best judgement, there remains subjec-tivity and overlap between the classes.

6. Conclusions and Future Work

This work has presented a method for detecting andclassifying vehicles into sub-type and make and modelfrom side profile views and produced a public databaseof labelled data. Furthermore, high classification ratesof 98%-99% are obtained from single images extendingthe state-of-the-art. Further work will establish if the ob-tained high classification rates are retained as the numberof make/model classes increases and the inter-class varia-tion between vehicles decreases. Additionally, colour clas-sification will be considered as well as performance of thedetection/classification system when it is run live.

References[1] Vehicle classification using evolutionary forests. In Proc.

of International Conference on Pattern Recognition Appli-cations and Methods, pages 387–393, 2012.

[2] L. Breiman. Random forests. Machine Learning, 45:5–32,2001.

[3] N. Buch, J. Orwell, and S. Velastin. Urban road user detec-tion and classification using 3d wire frame models. Com-puter Vision, IET, 4(2):105–116, 2010.

[4] N. Buch, S. A. Velastin, and J. Orwell. A review of com-puter vision techniqiues for the analysis of urban traffic.In IEEE Trans. on Intelligent Transportation Systems, vol-ume 12, pages 920–939, September 2011.

[5] P. Dalka and A. Czyzewski. Vehicle classification based onsoft computing algorithms. In Lecture Notes in ComputerScience, volume 6086, pages 70–79, 2010.

[6] C. Huang and W. Liao. A vision-based vehicle identificationsystem. In Pattern Recognition, 2004. ICPR 2004. Proc. ofthe 17th Int. Conf. on, volume 4, pages 364–367, 2004.

[7] N. C. Mithun, N. U. Rashid, and S. M. M. Rahman. Detec-tion and classification of vehicles from video using mulltipletime-spatial images. In IEEE Trans. Intelligent Transporta-tion Systems, volume 13, pages 1215–1225, September 2012.

[8] G. Pearce and N. Pears. Automatic make and model recog-nition from frontal images of cars. In Advanced Video andSignal Based Surveillance (AVSS), 2011 8th IEEE Int. Conf.on, pages 373–378, 2011.

[9] R. Rifkin and A. Klautau. In defense of one-vs-all classifi-cation. In Journal of Machine Learning Research, volume 5,pages 101–141, 2004.

[10] S. Shi, Z. Qin, and J. Xu. Robust algorithm of vehi-cle classification. In Software Engineering, Artificial In-telligence, Networking, and Parallel/Distributed Computing,2007. SNPD 2007. Eighth ACIS International Conferenceon, pages 269–272, 2007.

[11] M. Sokolova and G. Lapalme. A systematic analysis ofperformance measures for classification tasks. Inf. Process.Manage., 45(4):427–437, July 2009.

[12] N. S. Thakoor and B. Bhanu. Structural signatures for pas-senger vehicle classification in video. In IEEE Trans. onIntelligent Transportation Systems, volume 14, pages 1796–1805, 2013.

[13] R. Wang, L. Zhang, K. Xiao, R. Sun, and L. Cui. Easisee:Real-time vehicle classification and counting via low-costcollaborative sensing. In IEEE Trans. on Intelligent Trans-portation Systems, volume 15, pages 1–11, 2014.

[14] K. Yousaf, A. Iftikhar, and A. Javed. Comparative analysisof automatic vehicle classification techniques: A survey. Int.Journal of Image, Graphics and Signal Processing, 9:52–59,2012.

[15] B. Zhang. Reliable classification of vehicle types based oncascade classifier ensembles. Intelligent Transportation Sys-tems, IEEE Trans. on, 14(1):322–332, 2013.

[16] Z. Zivkovic. Improved adaptive gaussian mixture model forbackground subtraction. In Pattern Recognition, 2004. 17thInt. Conf. on, pages 28–31, 2004.

Vehicle subtype, make and model classiﬁcation from side...

Documents

Transcript of Vehicle subtype, make and model classiﬁcation from side...