UNBIASED LOOK AT DATASET BIAS - Computer...
Transcript of UNBIASED LOOK AT DATASET BIAS - Computer...
UNBIASED LOOK AT DATASET BIAS Antonia Torralba Alexei A. Efros
MIT CMU
Presented by: Vivek Dubey
Harika Sabbella
NAMING DATASETS
1) Caltech-101 2) UIUC 3) MSRC 4) Tiny Images 5) ImageNet 6) PASCAL VOC 7) LabelMeS 8) SUN-09 9) 15 Scenes 10) Corel 11) Caltech-256 12) COIL-100
ARE CURRENT DATASETS BIASED?
! Randomly sampled 1000 images from training images for the 12 datasets and trained a 12-way linear SVM classifier
! Tested classifier on 300 random images from the test images of the 12 datasets
RESULTS OF “NAME THAT DATASET” CLASSIFIER
CONFUSION MATRIX
TRAINING THE “CARS” CLASSIFIER
! Applied classifier to object crops of cars from five datasets
! Classifier was still able to tell the datasets apart (61% performance)
OUTLINE
Step 1) Understanding how bias sneaks into our datasets
Step 2) Raise awareness in the visual recognition community
PROMISES AND PERILS OF VISUAL DATASETS
! Datasets have helped produce breakthroughs in: ! Face detection
! Object recognition
! Problems with the current way of looking at datasets ! “Creeping over-fitting”
! Visual recognition community has given too much importance to “winning” a particular dataset competition
MACHINE LEARNING VERSUS COMPUTER VISION
! Datasets in machine learning World
! Datasets in computer vision ! Representation of the world
RISE OF THE MODERN DATASET
MEASURING DATASET BIAS
! Difficult to measure dataset bias ! Cross-dataset generalization
! Training on one dataset and testing on another
CLASSIFICATION VERSUS DETECTION
! Two common objects in datasets are “car” and “person” ! Classification – finding all the images that contain some object
! HOG descriptor followed by a linear SVM
! Detection – finding bounding boxes containing objects ! Bag of words followed by a non-linear SVM
RESULTS OF CROSS-DATASET GENERALIZATION EXPERIMENT
RESULTS OF CROSS-DATASET GENERALIZATION EXPERIMENT (CONTINUED)
SUN09
LabelMe
PASCAL 07
ImageNet
Caltech101
MSRC
DIFFERENT KINDS OF BIAS
! Selection bias ! Capture bias
! Category or label bias
! Negative set bias
NEGATIVE SET BIAS
! Ideally, ’all’ possible negatives in the visual world ! Experiment -How is “not car” different across datasets?
MEASURING NEGATIVE SET BIAS
BASIC QUESTION: HOW CAN WE IMPROVE PERFORMANCE?
! Better features and object representation and learning algorithm
! Change training data
CROSS DATASET GENERALIZATION
COUNTER ARGUMENT
! What if the algorithms are terrible and end up over learning?
WHAT DO WE DO NOW?
! Run any new dataset on tests presented by this work
WHAT DO WE DO NOW?
! Try to minimize bias during construction of the dataset itself
WHAT DO WE DO NOW?
! For selection bias, we could search on different search engines….
WHAT DO WE DO NOW?
! …from different countries…
WHAT DO WE DO NOW?
! ... maybe crowd sourcing for labels on unannotated images
WHAT DO WE DO NOW?
! For capture bias, flip images, introduce jitter, generate automatic crops
WHAT DO WE DO NOW?
! For negative dataset bias, add negatives, mine out hard negatives
QUESTIONS?