T L -SCALE FINE-GRAINED Ltrevor/public_html/... · 2012-09-28 · t owards l arge-s cale f ine-g...

TOWARDS LARGE-SCALE FINE-GRAINED CATEGORY LEARNING

TREVOR DARRELL UC BERKELEY EECS & ICSI

TOWARDS LARGE-SCALE FINE-GRAINED CATEGORY LEARNING:

POSE NORMALIZATION, HIERARCHICAL GENERALIZATION, AND TIMELY DETECTION

RYAN FARRELL, NING ZHANG, YANGQING JIA, JOSHUA ABBOTT, JOESEPH AUSTERWEIL, TOM GRIFFITHS, SERGEY KARAYEV, TOBI

BAUMGARTNER, MARIO FRITZ

CATEGORIZATION SPECTRUM

CALTECH 101 (2004) Fei-Fei, Fergus, Perona CALTECH 256 (2007)

Griffin, Holub, Perona

Animals with Attributes (2009) Lampert, Nickisch, Harmeling, Weidmann Exploration of Methods for Automatic

Whale Identification (2001) Heusel , Gunn

Identifying Individual Salamanders (2007) Gamble, Ravela, McGarigal Labeled Faces in the Wild (2007)

Huang, Ramesh, Berg, Learned-Miller

STONEFLY9 (2009) Martínez-Muñoz, Zhang, Payet, Todorovic, Larios, Yamamuro,

Lytle, Moldenke, Mortensen, Paasch, Shapiro, Dietterich Caltech/UCSD Birds 200 (2010)

Welinder, Branson, Mita, Wah, Schroff, Belongie, Perona

Visual Identification of Plant Species (2008) Belhumeur, Chen, Feiner, Jacobs, Kress,

Ling, Lopez, Ramamoorthi, Sheorey, White, Zhang

Oxford Flowers (2006-09) Nilsback, Zisserman

Fine-grained recognition bridges traditional instance and category- level tasks...

TOWARDS LARGE-SCALE FINE-GRAINED RECOGNITION

Tasks are relatively difficult even for humans: bird subspecies, vehicle brands, etc.

People learn them from small numbers of positive examples

Both local feature geometry and statistical appearance are salient.

Key differences vs. traditional recognition: – Distinctive fine-grained features are often relative

to object configuration – Degree of generalization varies across category

hierarchies – Finest grained categories may have few training

examples – Can’t detect everything all the time in a large-

scale setting

Today: • Pose Pooling Kernels

for Sub-category Recognition

• Generalization in Large-Scale Concept Hierarchies

• Timely Recognition

POSE POOLING KERNELS FOR SUB-CATEGORY RECOGNITION

NING ZHANG, RYAN FARRELL, TREVOR DARRELL CVPR 2012

FINE-GRAINED VISUAL CATEGORIZATION

DISCRIMINATIVE DETAILS MAY BE HIGHLY LOCALIZED AND POSE RELATIVE

Scarlet Tanager Photo by Paul O’Toole

Summer Tanager Photo by Liam Wolff

Summer Tanager Photo by Patti Shoupe

Snowy Egret Photo by Shelley Rutkin

Great Egret Photo by James Hawkins

“BIRDLETS” [ICCV11] APPROACH

“Blue-Headed Vireo”!

DETECTION OF VOLUMETRIC PRIMITIVES

POSE-NORMALIZED APPEARANCE SPACE

2D VS. 3D

3D Representation 2D Representation

Pros • Most accurate representation for volumetric objects

• Facilitates pose-normalized appearance model

• Far easier to train (less complex and less costly annotations)

• Far more tolerant of localization errors

Cons • Accurate detection is challenging

• Annotations are costly • Volumetric part model

doesn’t capture flat parts such as wings

• Less precise part localization can attenuate key discriminative features

A classic story: recognition via detection

faces detection

/part localization

Alignment/normalization

identification/ attribute learning

Traditional Spatial Pooling

“POSE DEPENDENT” POOLING

OVERVIEW

“Red-Eyed Vireo”

Pose-Normalized Representation

Annotated images

Poselet training

Poselet activations

SVM classification

POSELET TRAINING

Parts: back, beak, belly, breast, crown, forehead, left eye, left leg, left wing, nape, right eye, right leg, right wing, , throat

Collect positive examples from other

training images

POSELET TRAINING

Bourdev and Malik, in ICCV 2009.

POSELET ACTIVATION (PREDICTION)

Poselet #90

Poselet #66

Poselet #65

COMPARING POSELETS

• Local image features from poselets which overlap in 3D are pooled together

POSE NORMALIZED REPRESENTATION

• Cluster based on warping distance or keypoint distance

POSELET CLUSTERS

Pose pooling vs. spatial pyramid

Pose pooling

EXPERIMENTAL RESULTS

P. Welinder, S. Branson, T. Mita, C. Wah, F. Schroff, S. Belongie, P. Perona. CalTech/UCSD, 2010. http://www.vision.caltech.edu/visipedia/CUB-200.html

EXPERIMENTAL RESULTS

• Implementation Details – Poselet Activations

• Activations are generated by comparing each poselet’s keypoint distributions with the annotated keypoint locations on each image.

– Appearance Descriptor • Bag of words on SIFT feature / Spatial Pyramid Match on top of

– SVM classifier • linear SVM / fast implementation of χ2 and Intersection kernel.

– Baseline methods • VLFEAT toolbox-SIFT on the bounding-box object w.o pose

normalized information.

CATEGORIZATION RESULTS

Baseline - VLFeat 29.73% (MAP)

Warped Feature 36.33% (MAP)

Pose Pooling Kernel 40.60% (MAP)

Confusion matrices on 14 categories across two bird families {vireos, woodpeckers}.

MORE RESULTS

• Detection • Better descriptors

– Physically inspired color representations – Salience cues to reduce false positive poselet

responses

• Experiments on larger datasets…

FUTURE WORK

Caltech UCSD Berkeley

For more information contact: Ryan Farrell - farrell@eecs.berkeley.edu

“Sweet” Taster-25 (released June 2012)

“Bitter” Sparrows-33 (will be released this Fall with the full

700+ category dataset)

Bounding Box

Segmentation Mask

Part Keypoints

Part Regions

Attributes

Among the Key Innovations • Photos collected via enthusiast photographer

submissions and annotated by citizen scientists • Expert-curated collection (category for each image

is verified by a domain expert) • Deep Taxonomy and tree-based distance measures

The dataset is being collected with tools designed for re-use in other fine-grained domains

D( , ) << D( , )

Caltech UCSD Berkeley

For more information contact: Ryan Farrell - farrell@eecs.berkeley.edu

BAYESIAN CONCEPT GENERALIZATION

Yangqing Jia, Joshua Abbott, Joe Austerweil,

Tom Griffiths, Trevor Darrell

TOWARDS LARGE-SCALE FINE-GRAINED CATEGORY RECOGNITION

Current ML algorithms: – hundreds of positive

examples – thousands of

negative examples – same learning

method throughout hierarchy

Humans: – can learn from one,

two, or three examples – often no negative

examples – more generalization

for broader concepts (“size principle”)

LEARNING A NOUN “blicket” “blicket”

“blicket”

BAYESIAN INFERENCE

Posterior probability

Likelihood Prior probability

Sum over space of hypotheses h: hypothesis

d: data

LEARNING NOUNS

• Data – object-word pairs

• Hypotheses – sets of objects

• Likelihood – weak sampling

– strong sampling

(Tenenbaum, 1999; Xu & Tenenbaum, 2007)

x weak strong

“blicket”

p(d|h) = 0

“blicket”

p(d|h) = 1/3

“blicket”

p(d|h) = (1/3)3

“blicket”

p(d|h) = 1/12

“blicket”

p(d|h) = (1/12)3

“blicket”

Principles

Hypotheses

Whole-object principle Shape bias Taxonomic principle Contrast principle Basic-level bias

PREVIOUS WORK (XU & TENENBAUM, 2007)

PREVIOUS RESULTS (SMALL-SCALE)

• Small, hand-constructed domains

• Toy stimuli

• Constructing hypothesis space based on pairwise similarity judgments requires O(n2) judgments

Challenges

LARGE-SCALE WORD LEARNING

• Small, hand-constructed domains

• Toy stimuli

• Constructing hypothesis space based on pairwise similarity judgments requires O(n2) judgments

Solutions

• Hypothesis space is automatically derived from WordNet structure

Challenges

LARGE-SCALE WORD LEARNING

Here are three BLICKETS

Here are five FEPS

Here are four ZIVS Is this a BLICKET?

Is this a ZIV?

Is this a FEP?

Here are three FEPS

Can you help Mr. Frog find the other “FEPS”

LARGE SCALE MODEL RESULTS

[Abbott, Austerweil, and Griffiths, Constructing a hypothesis space from the web for large scale Bayesian word learning, COGSCI 2012]

GENERALIZING TO NEW IMAGES

• Bayesian word learning offers advantages over standard machine learning approaches.

• First challenge, scaling, has been solved (going from 45 to 14 million objects); but only using imagenet images and their location in hierarchy.

• Recent extension of model to incorporate direct pixel observations and model noisy recognition generalization from arbitrary objects and better predicts human data….

renuzit air fresher

coke-can

pringles

pasta-box

book-robotics

tea-box

leaf-node classes from cropped PR2 camera

images…

Image Classification Pipeline

• A convolutional neural network (CNN) pipeline is well suited to this type of data ([Jia et al CVPR12])

local feature coding

spatial pooling

SVM classification

“pasta box”

Densely-coded Local Features

• We extract overlapping local image patches

Encoding Local Features

local features

dictionary (learned in an unsupervised

fashion)

Encoded activation maps

Spatial Pooling

... Take max on a regular grid (or learned spatial bins

[cvpr12])

Convert to a feature vector

The whole pipeline

“pasta box”

Learning from Examples

• “Dear Robot, here are some feps”

• “Now get me more feps / are these also feps?”

(pasta-box) (peanut-butter) (spam)

Learning from Examples

• We assume a hierarchy of hypotheses from Wordnet or provided in a robot’s environment.

• Each hypothesis defines a subset of the leaf nodes (instances) that belong to it:

These are feps

This is probably not fep

These are not feps

Visually Grounded Word Learning at a Large Scale

• We tested this model in a large-scale with ImageNet

[Jia, Abbott, Austerweil, Griffiths, Darrell. 2012]

Human Behavior

More specific concepts More general concepts

More specific queries More general queries

Result Comparisons

Human Behavior

Our Model

Classical Concept Learning (without vision)

Classical Vision (without concept learning)

Sergey Karayev, Tobi Baumgartner, Mario Fritz, Trevor Darrell NIPS 2012

Timely Object Recognition

Lots of classes and images in a large scale setting… Potentially different class values… Not enough time to run all object detectors…

Our Solution: Dynamic policy for selecting detectors

Vijayanarasimhan and Kapoor. Visual Recognition and Detection Under Bounded Computational Resources. CVPR 2010. Gao and Koller. Active Classification based on Value of Classifier. NIPS 2011

Recent Related Work

New Metric: Performance (AP) vs. Time

Area under the AP vs. T curve between start and deadline times

Belief State

Action

Observations

Belief State

Action

etc Observatio

Sequential Detection

action selection

Belief State

Action

Observations

Belief State

Action

etc Observatio

ns maximize expected value

Actions

- scene context or object detector actions - run on the whole image - generate observations: - list of detections - GIST feature, etc.

Belief State

Action

Observations

Belief State

Action

etc Observatio

execute action “black box”

receive observations

action selection maximize expected value

Belief State

Action

Observations

Belief State

Action

etc Observatio

belief state update with observations, leverage context

Class presence probabilities inferred from observations

Belief state update

action selection maximize expected value

Belief State

Action

Observations

Belief State

Action

etc Observatio

belief state update with observations, leverage context

policy:

action-value function:

assuming linear structure:

reward definition

Action selection maximize expected value

Reward definition: derived from the AP vs. Time metric

policy:

action-value function:

assuming linear structure:

learning the policy parameters

Action selection maximize expected value

Learning the policy

• Sample the expectation using Q-Iteration:

• collect (state,action,reward) tuples by running episodes

• solve for weights with L1 regression

Learning the policy

• Sample the expectation using Q-Iteration:

• collect (state,action,reward) tuples by running episodes

• solve for weights with L1 regression

• Cross-validate the discount

• When is 0, learning a greedy policy

• When is 1, looking ahead to the end of the episode.

Feature Representation & Learned Policy Weights

features

Features: - Prior probability of detector class presence - Class presence probabilities & entropies given observations - Time features (time to deadline, etc.)

Greedy policy ( )

features “RL” policy ( )

Features: - Prior probability of detector class presence - Class presence probabilities & entropies given observations - Time features (time to deadline, etc.)

Feature Representation & Learned Policy Weights

Policy Trajectories

Evaluation Results

PASCAL VOC 2007 DPM detectors Mean Detection AP

Evaluation Results

PASCAL VOC 2007 DPM detectors Mean Detection AP

• Pose matters! – Distinctive fine-grained features should be relative to object configuration…

• Size matters! – Degree of generalization varies across category hierarchies and enables learning from few training examples…

• Plan ahead! – Learn what to look for, and when, in a large-scale setting to maximize time-constrained value...

• Ryan Farrell, Ning Zhang, Trevor Darrell, “Pose Pooling Kernels for Sub-category Recognition”, CVPR 2012

• Yangqing Jia, Joshua Abbott, Joe Austerweil, Tom Griffiths, Trevor Darrell, “Bayesian Concept Generalization”, UCB EECS TR

• Sergey Karayev, Tobi Baumgartner, Mario Fritz, Trevor Darrell, “Timely Object Recognition”, NIPS 2012

FOR MORE INFORMATION…

RYAN FARRELL, NING ZHANG, YANGQING JIA, JOSHUA ABBOTT, JOESEPH AUSTERWEIL, TOM GRIFFITHS, SERGEY KARAYEV, TOBI

BAUMGARTNER, MARIO FRITZ

T L -SCALE FINE-GRAINED Ltrevor/public_html/... · 2012-09-28 · t owards l arge-s cale f ine-g...

Documents

Transcript of T L -SCALE FINE-GRAINED Ltrevor/public_html/... · 2012-09-28 · t owards l arge-s cale f ine-g...

ntnews.com.aul l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l …...outrage after announcing it would no longer manufac-ture cars here from 2017, cut-ting 2900 jobs

ntnews.com.aul l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l …...The latest 2013 Hays Salary Guide outlines salary and recruiting trends for more than 1000 roles

5/1/2007 okhaleel / ENforCE 1 EN gine for C ontrolling E mergent H ierarchical R ole- B ased A ccess (ENforCE HRBAccess) Osama Khaleel Thesis Defense May.

MCS ACSD 2008 - Fagor Automation · 2017. 3. 17. · l l l l l l l l l l l l l l l l l l l l l l l l l l l l g h g i g b g l l l l l l l l l l l l l l l l l l ? ) @ & & & & !" ?)

ntnews.com.aul l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l ...€¦ · · 2013-09-14environment. Delivering service with care Phone: (08) ... Adrian

N ORMALIZATION Joe Meehean 1. R EDUNDANCIES Repeated data in database Wastes space Can cause modification anomalies unexpected side effect when changing.

TECHNICAL NOTE iBright Imaging Systems N ormalization … · Choosing a housekeeping protein is an important aspect of western blot normalization. For various reasons, not all housekeeping

Horrificsceneascar mowsdowntourists file Tuesday, August 6, 2013. NTNEWS. 17 PUB: NTNE-WS-TE:6-AGE:17LO-R:KMDA-C-Y-CO-l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l

4/26/2007okhaleel/Enforce1 EN gine FOR C ontrolling E mergent H ierarchical R ole- B ased A ccess (ENforCE HRBAccess) Osama Khaleel Thesis Defense May.

Ch 5: ARIMA model · 1.1 Non-Stationary Data [ToC] Dow Jones Index From Aug. 28 to Dec. 18, 1972 l l l l l ll l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l

2015 All Utils City Hall - Home | City of CourtenayNews/Map 7.pdf · 2017. 2. 18. · l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l

ABABABABAB BCBCBCBCBCBC June C A N A D A …ll l l l l l lll lll l l l l l l l l l ll lll l ll ll ll l l l llllllllll l l l l l l l l lll l ll l l lll l ll l ll l l l l l l l ll l

Coresets for the Nearest-Neighbor Rule...A.Flores-VelazcoandD.M.Mount 47:3 (a)Trainingset(104 pts). l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l

ntnews.com.aul l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l …...Justice Stephen South wood said Deans’ criminal conduct was a ‘‘breach of trust’’ because

High Resolution Airborne Gravity Survey€¦ · l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l ll l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l

NEWS l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l ... · 2020-05-29 · 15m x 12mm. Perma Fit ... Marigolds $250 100mm pot. 3582198 $395 25L All

Using Redundant Parameterizations to Fit Hierarchical Modelsgelman/research/published/gelman... · 2008. 3. 25. · U SING R EDUNDANT PARAMETERIZATIONS TO F IT H IERARCHICAL M ODELS

G ROUP C ENTRIC I NFORMATION SHARING U SING H IERARCHICAL M ODEL By Amit Mahale Advisor: Dr Tim Finin Co-Advisor: Dr Anupam Joshi 1.

N ORMALIZATION DBS201. What is Normalization? Normalization is a series of steps used to evaluate and modify table structures to ensure that every non-key.

T ERMINOLOGY & N ORMALIZATION DBS201. Equivalent Terms: Relational Model Table-Oriented DBMS Conventional File Systems Conceptionally Repreesents RelationTableFileEntity.