UNBIASED LOOK AT DATASET BIAS - Computer...

UNBIASED LOOK AT DATASET BIAS Antonia Torralba Alexei A. Efros

MIT CMU

Presented by: Vivek Dubey

Harika Sabbella

NAMING DATASETS

1)  Caltech-101 2)  UIUC 3)  MSRC 4)  Tiny Images 5)  ImageNet 6)  PASCAL VOC 7)  LabelMeS 8)  SUN-09 9)  15 Scenes 10)  Corel 11)  Caltech-256 12)  COIL-100

ARE CURRENT DATASETS BIASED?

! Randomly sampled 1000 images from training images for the 12 datasets and trained a 12-way linear SVM classifier

! Tested classifier on 300 random images from the test images of the 12 datasets

RESULTS OF “NAME THAT DATASET” CLASSIFIER

CONFUSION MATRIX

TRAINING THE “CARS” CLASSIFIER

! Applied classifier to object crops of cars from five datasets

! Classifier was still able to tell the datasets apart (61% performance)

OUTLINE

Step 1) Understanding how bias sneaks into our datasets

Step 2) Raise awareness in the visual recognition community

PROMISES AND PERILS OF VISUAL DATASETS

! Datasets have helped produce breakthroughs in: !  Face detection

!  Object recognition

! Problems with the current way of looking at datasets !  “Creeping over-fitting”

!  Visual recognition community has given too much importance to “winning” a particular dataset competition

MACHINE LEARNING VERSUS COMPUTER VISION

! Datasets in machine learning World

! Datasets in computer vision !  Representation of the world

RISE OF THE MODERN DATASET

MEASURING DATASET BIAS

! Difficult to measure dataset bias ! Cross-dataset generalization

!  Training on one dataset and testing on another

CLASSIFICATION VERSUS DETECTION

! Two common objects in datasets are “car” and “person” !  Classification – finding all the images that contain some object

!  HOG descriptor followed by a linear SVM

!  Detection – finding bounding boxes containing objects !  Bag of words followed by a non-linear SVM

RESULTS OF CROSS-DATASET GENERALIZATION EXPERIMENT

RESULTS OF CROSS-DATASET GENERALIZATION EXPERIMENT (CONTINUED)

SUN09

LabelMe

PASCAL 07

ImageNet

Caltech101

MSRC

DIFFERENT KINDS OF BIAS

! Selection bias ! Capture bias

! Category or label bias

! Negative set bias

NEGATIVE SET BIAS

! Ideally, ’all’ possible negatives in the visual world ! Experiment -How is “not car” different across datasets?

MEASURING NEGATIVE SET BIAS

BASIC QUESTION: HOW CAN WE IMPROVE PERFORMANCE?

! Better features and object representation and learning algorithm

! Change training data

CROSS DATASET GENERALIZATION

COUNTER ARGUMENT

! What if the algorithms are terrible and end up over learning?

WHAT DO WE DO NOW?

!  Run any new dataset on tests presented by this work

WHAT DO WE DO NOW?

!  Try to minimize bias during construction of the dataset itself

WHAT DO WE DO NOW?

!  For selection bias, we could search on different search engines….

WHAT DO WE DO NOW?

!  …from different countries…

WHAT DO WE DO NOW?

!  ... maybe crowd sourcing for labels on unannotated images

WHAT DO WE DO NOW?

!  For capture bias, flip images, introduce jitter, generate automatic crops

WHAT DO WE DO NOW?

!  For negative dataset bias, add negatives, mine out hard negatives

QUESTIONS?

UNBIASED LOOK AT DATASET BIAS - Computer...

Documents

Transcript of UNBIASED LOOK AT DATASET BIAS - Computer...