A Large-Scale Car Dataset for Fine ... - cv-foundation.org€¦ · h at cbk M sb F sedn Foton MP-X...

1
A Large-Scale Car Dataset for Fine-Grained Categorization and Verification Linjie Yang, Ping Luo, Chen Change Loy, Xiaoou Tang Department o Information Engineering, The Chinese University of Hong Kong. This paper aims to highlight vision related tasks centered around “car”, which has been largely neglected by vision community in comparison to other objects. Cars present several unique properties that other objects can- not offer, which provides more challenges and facilitates a range of novel re- search topics in object categorization. Specifically, cars own large quantity of models that most other categories do not have, enabling a more challeng- ing fine-grained task. In addition, cars yield large appearance differences in their unconstrained poses, which demands viewpoint-aware analyses and algorithms. Importantly, a unique hierarchy is presented for the car cate- gory, which is three levels from top to bottom: make, model, and released year. This structure indicates a direction to address the fine-grained task in a hierarchical way, which is only discussed by limited literature. Apart from the categorization task, cars reveal a number of interesting computer vi- sion problems. Firstly, different designing styles are applied by different car manufacturers and in different years, which opens the door to fine-grained style analysis and fine-grained part recognition (see Fig. 2). Secondly, the car is an attractive topic for attribute prediction. In particular, cars have distinctive attributes such as car class , seating capacity, number of axles, maximum speed and displacement, which can be inferred from the appear- ance of the cars. Lastly, in comparison to human face verification [3], car verification, which targets at verifying whether two cars belong to the same model, is an interesting and under-researched problem. The unconstrained viewpoints make car verification arguably more challenging than traditional face verification. We believe the lack of high quality datasets greatly limits the explo- ration of the community in this domain. To this end, we collect and or- ganize a large-scale and comprehensive image database “CompCars”. The latest dataset can be downloaded at http://mmlab.ie.cuhk.edu. hk/datasets/comp_cars/index.html. The “CompCars” dataset is much larger in scale and diversity compared with the current car image datasets [1]. In particular, the CompCars dataset contains data from two sce- narios, including images from web-nature and surveillance-nature. The im- ages of the web-nature are collected from car forums, public websites, and search engines (see Fig. 1). The images of the surveillance-nature are col- lected by surveillance cameras. The data of these two scenarios are widely used in the real-world applications. They open the door for cross-modality analysis of cars. Specifically, the web-nature data contains 161 car makes with 1, 687 car models, covering most of the commercial car models in the recent ten years. There are a total of 136, 727 images capturing the entire cars and 27, 618 images capturing the car parts, where most of them are la- beled with attributes and viewpoints. The surveillance-nature data contains 50, 000 car images captured in the front view. We highlight the key features of the dataset: 1) Car Hierarchy: The car models can be organized into a large tree structure, consisting of three layers , namely car make, car model, and year of manufacture. 2) Car Attributes: Each car model is labeled with five attributes, including maximum speed, displacement, number of doors, number of seats, and type of car. These at- tributes provide rich information while learning the relations or similarities between different car models. 3) Viewpoints: We also label five viewpoints for each car model, including front (F), rear (R), side (S), front-side (FS), and rear-side (RS). 4) Car Parts: We collect images capturing the eight car parts for each car model, including four exterior parts (i.e. headlight, tail- light, fog light, and air intake) and four interior parts (i.e. console, steering wheel, dashboard, and gear lever). These images are roughly aligned for the convenience of further analysis. We further demonstrate a few important applications exploiting the dataset, namely car model classification, car model verification, and attribute predic- tion. Here we highlight the results on fine-grained car model classification. Please refer to our full paper for the full experiments. Specifically, we fine- This is an extended abstract. The full paper is available at the Computer Vision Foundation webpage. Benz R class MPV Audi Q3 SUV Chevrolet Aveo hatchback Mitsubishi Fortis sedan Foton MP-X E minibus Skoda Superb fastback Volkswagen Golf Variant estate Dodge Ram pickup Nissan GT-R sports Volkswagen Cross Polo crossover BWM 1 Series convertible convertible Volve C70 hardtop convertible Figure 1: Each image displays a car from the 12 car types. The correspond- ing model names and car types are shown below the images. Figure 2: Each row displays 8 car parts from a car model. The corresponding car models are Buick GL8, Peugeot 207 hatchback, Volkswagen Jetta, and Hyundai Elantra from top to bottom, respectively. tune Overfeat [2], a CNN model which is pretrained on ImageNet, for the task of classifying 431 car models. Different models are respectively fine- tuned on images of specific viewpoints and all the viewpoints, denoted as “front (F)”, “rear (R)”, “side (S)”, “front-side (FS)”, and “rear-side (RS)”, and “All-View”. The performances of these six models are summarized in Table 1, where “FS” and “RS” achieve better performances than the perfor- mances of the other viewpoint models. CNN trained on all views performs surprisingly well. Detailed analyses are provided in the full paper. We believe that the new dataset can foster more sophisticated and ro- bust computer vision models and algorithms. There are many other poten- tial tasks that can exploit CompCars, such as part detection, image ranking, and 3D reconstruction. It is also helpful to consider the rich attributes and different depths of the semantic hierarchy for learning semantic relations between different fine-grained car categories. Table 1: Fine-grained classification results. Viewpoint F R S FS RS All-View Top-1 0.524 0.431 0.428 0.563 0.598 0.767 Top-5 0.748 0.647 0.602 0.769 0.777 0.917 [1] Jonathan Krause, Michael Stark, Jia Deng, and Li Fei-Fei. 3D object representations for fine-grained categorization. In ICCV Workshops, 2013. [2] Pierre Sermanet, David Eigen, Xiang Zhang, Michaël Mathieu, Rob Fergus, and Yann LeCun. Overfeat: Integrated recognition, localiza- tion and detection using convolutional networks. arXiv preprint arX- iv:1312.6229, 2013. [3] Yi Sun, Xiaogang Wang, and Xiaoou Tang. Deep learning face repre- sentation from predicting 10,000 classes. In CVPR, 2014.

Transcript of A Large-Scale Car Dataset for Fine ... - cv-foundation.org€¦ · h at cbk M sb F sedn Foton MP-X...

Page 1: A Large-Scale Car Dataset for Fine ... - cv-foundation.org€¦ · h at cbk M sb F sedn Foton MP-X E minibus Skoda Superb fastback Volkswagen Golf Variant estate Dodge Ram pickup

A Large-Scale Car Dataset for Fine-Grained Categorization and Verification

Linjie Yang, Ping Luo, Chen Change Loy, Xiaoou TangDepartment o Information Engineering, The Chinese University of Hong Kong.

This paper aims to highlight vision related tasks centered around “car”,which has been largely neglected by vision community in comparison toother objects. Cars present several unique properties that other objects can-not offer, which provides more challenges and facilitates a range of novel re-search topics in object categorization. Specifically, cars own large quantityof models that most other categories do not have, enabling a more challeng-ing fine-grained task. In addition, cars yield large appearance differencesin their unconstrained poses, which demands viewpoint-aware analyses andalgorithms. Importantly, a unique hierarchy is presented for the car cate-gory, which is three levels from top to bottom: make, model, and releasedyear. This structure indicates a direction to address the fine-grained task in ahierarchical way, which is only discussed by limited literature. Apart fromthe categorization task, cars reveal a number of interesting computer vi-sion problems. Firstly, different designing styles are applied by different carmanufacturers and in different years, which opens the door to fine-grainedstyle analysis and fine-grained part recognition (see Fig. 2). Secondly, thecar is an attractive topic for attribute prediction. In particular, cars havedistinctive attributes such as car class , seating capacity, number of axles,maximum speed and displacement, which can be inferred from the appear-ance of the cars. Lastly, in comparison to human face verification [3], carverification, which targets at verifying whether two cars belong to the samemodel, is an interesting and under-researched problem. The unconstrainedviewpoints make car verification arguably more challenging than traditionalface verification.

We believe the lack of high quality datasets greatly limits the explo-ration of the community in this domain. To this end, we collect and or-ganize a large-scale and comprehensive image database “CompCars”. Thelatest dataset can be downloaded at http://mmlab.ie.cuhk.edu.hk/datasets/comp_cars/index.html. The “CompCars” datasetis much larger in scale and diversity compared with the current car imagedatasets [1]. In particular, the CompCars dataset contains data from two sce-narios, including images from web-nature and surveillance-nature. The im-ages of the web-nature are collected from car forums, public websites, andsearch engines (see Fig. 1). The images of the surveillance-nature are col-lected by surveillance cameras. The data of these two scenarios are widelyused in the real-world applications. They open the door for cross-modalityanalysis of cars. Specifically, the web-nature data contains 161 car makeswith 1,687 car models, covering most of the commercial car models in therecent ten years. There are a total of 136,727 images capturing the entirecars and 27,618 images capturing the car parts, where most of them are la-beled with attributes and viewpoints. The surveillance-nature data contains50,000 car images captured in the front view.

We highlight the key features of the dataset: 1) Car Hierarchy: The carmodels can be organized into a large tree structure, consisting of three layers, namely car make, car model, and year of manufacture. 2) Car Attributes:Each car model is labeled with five attributes, including maximum speed,displacement, number of doors, number of seats, and type of car. These at-tributes provide rich information while learning the relations or similaritiesbetween different car models. 3) Viewpoints: We also label five viewpointsfor each car model, including front (F), rear (R), side (S), front-side (FS),and rear-side (RS). 4) Car Parts: We collect images capturing the eight carparts for each car model, including four exterior parts (i.e. headlight, tail-light, fog light, and air intake) and four interior parts (i.e. console, steeringwheel, dashboard, and gear lever). These images are roughly aligned for theconvenience of further analysis.

We further demonstrate a few important applications exploiting the dataset,namely car model classification, car model verification, and attribute predic-tion. Here we highlight the results on fine-grained car model classification.Please refer to our full paper for the full experiments. Specifically, we fine-

This is an extended abstract. The full paper is available at the Computer Vision Foundationwebpage.

Benz R classMPV

Audi Q3SUV

Chevrolet Aveohatchback

Mitsubishi Fortissedan

Foton MP-X Eminibus

Skoda Superbfastback

Volkswagen Golf Variantestate

Dodge Rampickup

Nissan GT-Rsports

Volkswagen Cross Polo crossover

BWM 1 Series convertible convertible

Volve C70hardtop convertible

Figure 1: Each image displays a car from the 12 car types. The correspond-ing model names and car types are shown below the images.

Figure 2: Each row displays 8 car parts from a car model. The correspondingcar models are Buick GL8, Peugeot 207 hatchback, Volkswagen Jetta, andHyundai Elantra from top to bottom, respectively.

tune Overfeat [2], a CNN model which is pretrained on ImageNet, for thetask of classifying 431 car models. Different models are respectively fine-tuned on images of specific viewpoints and all the viewpoints, denoted as“front (F)”, “rear (R)”, “side (S)”, “front-side (FS)”, and “rear-side (RS)”,and “All-View”. The performances of these six models are summarized inTable 1, where “FS” and “RS” achieve better performances than the perfor-mances of the other viewpoint models. CNN trained on all views performssurprisingly well. Detailed analyses are provided in the full paper.

We believe that the new dataset can foster more sophisticated and ro-bust computer vision models and algorithms. There are many other poten-tial tasks that can exploit CompCars, such as part detection, image ranking,and 3D reconstruction. It is also helpful to consider the rich attributes anddifferent depths of the semantic hierarchy for learning semantic relationsbetween different fine-grained car categories.

Table 1: Fine-grained classification results.Viewpoint F R S FS RS All-View

Top-1 0.524 0.431 0.428 0.563 0.598 0.767Top-5 0.748 0.647 0.602 0.769 0.777 0.917

[1] Jonathan Krause, Michael Stark, Jia Deng, and Li Fei-Fei. 3D objectrepresentations for fine-grained categorization. In ICCV Workshops,2013.

[2] Pierre Sermanet, David Eigen, Xiang Zhang, Michaël Mathieu, RobFergus, and Yann LeCun. Overfeat: Integrated recognition, localiza-tion and detection using convolutional networks. arXiv preprint arX-iv:1312.6229, 2013.

[3] Yi Sun, Xiaogang Wang, and Xiaoou Tang. Deep learning face repre-sentation from predicting 10,000 classes. In CVPR, 2014.