ICVSS2008: Randomized Decision Forests
description
Transcript of ICVSS2008: Randomized Decision Forests
Randomized Decision Forestsfor
Segmentation and Recognition
Jamie Shotton
ICVSS 2008, Sicily, Italy
http://jamie.shotton.org/work/presentations/ICVSS2008.zip
Randomized Decision Forests
• Very fast– for classification– for clustering
• Generalization through random training
• Inherently multi-class– automatic feature sharing [Torralba et al. 07]
• Simple training / testing algorithms
“Randomized Decision Forests” = “Randomized Forests” = “Random ForestsTM”
detailedreferences atend of slides
(Amongothers...)
Randomized Forests in Vision
[Lepetit et al., 06]keypoint recognition
[Amit & Geman, 97]digit recognition
[Moosmann et al., 06]visual word clustering
[Shotton et al., 08]object segmentation
water
boatchair
tree
road
Live Demo [Shotton et al. 08]
• Real-time object segmentationusing randomized decision forests– trained on MSRC 21-category database:
• Segment image and label segments:
airplane bicycle bird boat body bookbuilding car cat chair cow dog face flower
grass road sheep sign sky tree water
Winner CVPR 2008 Best Demo Award!
Outline
• Tutorial on Randomized Decision Forests
• Applications to Vision– keypoint recognition [Lepetit et al. 06]–object segmentation [Shotton et al. 08]
please ask questions as we go!
The Basics: Is The Grass Wet?
world state
is it raining?
is the sprinkler on?P(wet)= 0.95
P(wet)= 0.9
yesno
yesno
P(wet)= 0.1
The Basics: Binary Decision Trees
1
2 3
6 74
9
5
8
category c
split nodesleaf nodes
v
10 11 12 13
14 15 16 17
• feature vector v• split functions
fn(v)• thresholds tn
• ClassificationsPn(c)
≥
<
<
≥
Decision Tree Pseudo-Code
double[] ClassifyDT(node, v)if node.IsSplitNode then
if node.f(v) >= node.t thenreturn
ClassifyDT(node.right, v)else
return ClassifyDT(node.left, v)
endelse
return node.Pend
end
Toy Learning Example
x
y
• feature vectors are x, y coordinates: v = [x, y]T
• split functions are lines with parameters a, b: fn(v) = ax + by• threshold determines intercepts: tn
• four classes: purple, blue, red, green
• Try several lines, chosen at random
• Keep line that best separates data– information gain
• Recurse
Toy Learning Example
x
y
• feature vectors are x, y coordinates: v = [x, y]T
• split functions are lines with parameters a, b: fn(v) = ax + by• threshold determines intercepts: tn
• four classes: purple, blue, red, green
• Try several lines, chosen at random
• Keep line that best separates data– information gain
• Recurse
Toy Learning Example
x
y
• feature vectors are x, y coordinates: v = [x, y]T
• split functions are lines with parameters a, b: fn(v) = ax + by• threshold determines intercepts: tn
• four classes: purple, blue, red, green
• Try several lines, chosen at random
• Keep line that best separates data– information gain
• Recurse
Toy Learning Example
x
y
• feature vectors are x, y coordinates: v = [x, y]T
• split functions are lines with parameters a, b: fn(v) = ax + by• threshold determines intercepts: tn
• four classes: purple, blue, red, green
• Try several lines, chosen at random
• Keep line that best separates data– information gain
• Recurse
• Recursive algorithm– set In of training examples that reach node n is split:
• Features f and thresholds t chosen at random
• At leaf node n, Pn(c) is histogram of examples In
Randomized Learning
left split
right split thresholdfunction ofexample i’s
feature vector
• Features f(v) chosen from feature pool f 2 F
• Thresholds t chosen in range
• Choose f and t to maximize gain in information
More Randomized Learning
left split
right split
Implementation Details
• How many features and thresholds to try?– just one = “extremely randomized” [Geurts et al. 06]– few -> fast training, may under-fit– many -> slower training, may over-fit
• When to stop?– maximum depth– minimum entropy gain– delta class distribution– pruning?
• Unsupervised training– information gain -> most balanced split
Randomized Learning Pseudo CodeTreeNode LearnDT(I)
repeat featureTests timeslet f = RndFeature()
repeat threshTests timeslet t = RndThreshold(I, f)let (I_l, I_r) = Split(I, f, t)let gain = InfoGain(I_l, I_r)if gain is best then remember f, t, I_l, I_r
endend
if best gain is sufficient return SplitNode(f, t, LearnDT(I_l),
LearnDT(I_r))else
return LeafNode(HistogramExamples(I))end
end
• Forest is ensemble ofseveral decision trees
– classification is
A Forest of Trees
……tree t1 tree tT
category ccategory c
split nodesleaf nodes
[Amit & Geman 97][Breiman 01][Lepetit et al. 06]
v v
Decision Forests Pseudo-Code
double[] ClassifyDF(forest, v)// allocate memorylet P = double[forest.CountClasses]
// loop over trees in forestfor t = 1 to forest.CountTrees
let P’ = ClassifyDT(forest.Tree[t], v) P = P + P’ // sum distributions
end
// normaliseP = P / forest.CountTrees
end
Learning a Forest
• Divide training examples into T subsets It µ I– improves generalization– reduces memory requirements & training time
• Train each decision tree t on subset It– same decision tree learning as before
• Multi-core friendly
• Subsets can be chosen at random or hand-picked• Subsets can have overlap (and usually do)• Could also divide the feature pool into subsets
Learning a Forest Pseudo Code
Forest LearnDF(countTrees, I)// allocate memorylet forest = Forest(countTrees)
// loop over trees in forestfor t = 1 to countTrees
let I_t = RandomSplit(I)forest[t] = LearnDT(I_t)
end
// return forest objectreturn forest
end
Toy Forest Classification Demo
ToyClassification
Demo
Randomized Forests for Clustering [Moosmann et al. 06]
• Visual words good for e.g. matching, recognitionbut k-means clustering very slow
• Randomized forests for clustering descriptors– e.g. SIFT, texton filter-banks, etc.
• Leaf nodes in forest are clusters– concatenate histograms from trees in forest
543
1
2
8 96 7
42 61 3
98
……
tree t1 tree tT
75
[Sivic et al. 03][Csurka et al. 04]
Randomized Forests for Clustering [Moosmann et al. 06]
543
1
2
8 96 7
42 61 3
98
……
tree t1 tree tT
75fr
eque
ncy
tree t1 tree tT
node index
we’ll see later how to use whole tree
hierarchy!
“bag of words”
Relation to Cascades [Viola & Jones 04]
• Cascades– very unbalanced tree– good for unbalanced binary problems
e.g. sliding window object detection
• Randomized forests– less deep, fairly balanced– ensemble of trees gives robustness– good for multi-class problems
Random Ferns
• Naïve Bayes classifier over random sets of features
• Can be good alternativeto randomized forests
[Özuysal et al. 07] [Bosch et al. 07]
set of features
“random ferns”
individual features
“naïve Bayes”
Bayes’ rule
Short Pause
Any QuestionsSo Far?
Outline
• Tutorial on Randomized Decision Forests
• Applications to Vision Problems– keypoint recognition [Lepetit et al. 06]–object segmentation [Shotton et al. 08]
Fast Keypoint Recognition [Lepetit et al. 06]
• Wide-baseline matchingas classification problem
• Extract prominent key-points in training images
• Forest to classifies:– patches -> keypoints
• Features– pixel comparisons
• Augmented training set– gives robustness to patch scaling, translation, rotation
Fast Keypoint Recognition [Lepetit et al. 06]
• Example videos– from http://cvlab.epfl.ch/research/augm/detect.php
Real-Time Object Segmentation [Shotton et al. 2008]
• Aim – a better visual vocabulary– image categorization
• does this image contain cows, trees, etc.?– object segmentation
• draw and label the outlines of the cow, grass, etc.
• Design goals– fast and accurate– use learned
semantic information
Object Recognition Pipeline
extract features
SIFT, filter bank
clustering
k-means
unsupervisedhand-crafted
classification algorithm
SVM, decision forest, boostingsupervised
assignment
nearest neighbour
Object Recognition Pipeline
STF
clustering‘semantic textons’
local classification
Semantic Texton Forest (STF)• decision forest for both
clustering & classification• tree nodes have learned
object category associations
classification algorithm
SVM, decision forest, boostingsupervised
Object Recognition Pipeline
STF
clustering‘semantic textons’
local classification
SF
dog
road
building
object segmentation
SVM
test image
buildingdogroad
image categorization
Semantic Texton Forest (STF)• decision forest for both
clustering & classification• tree nodes have learned
object category associations
Support Vector Machine (SVM)• pyramid match kernel
in learned tree hierarchies
Segmentation Forest (SF)• second decision forest• features use layout & context• semantic context
Object Recognition Pipeline
STF
semantic textons(clustering)
local classification
SF
dog
road
building
object segmentation
SVM
test image
buildingdogroad
image categorization
Semantic Texton Forest (STF)• decision forest for both
clustering & classification• tree nodes have learned
object category associations
Support Vector Machine (SVM)• pyramid match kernel
in learned tree hierarchies
Segmentation Forest (SF)• second decision forest• features use layout & context
extract features
Textons & Visual Words
• Textons [Julesz 81]– computed densely e.g. references– clustered filter-bank responses [Malik 01] [Varma 05]– used for object recognition [Winn 05] [Shotton
07]• Visual words
– usually computed sparsely [Mikolacjzyk 04]– clustered descriptors [Lowe 04]– used for object recognition [Sivic 03] [Csurka 04]
localdescriptors
e.g. [Lowe 04]
filterbank clustering
k-means
assignment
nearest neighbourExpensive!
Semantic Texton Forests (STF)
• A STF is– a decision forest applied at each image pixel– simple pixel-based features
• How is this new?– no descriptors or filter-banks– decision forest
• fast clustering & assignment• local classification Very Fast
learned semantic information
i
Image Patch Features
p
i
f(p) learnedthreshold>
Pixel i gives patch p(21x21 pixels in experiments)
tree split function
?
Example Semantic Texton Forest
Input Image Ground Truth
A[r] + B[r] > 363A[b] > 98
A[g] - B[b] > 28
A[g] - B[b] > 13A[b] + B[b] > 284
|A[r] - B[b]| > 21|A[b] - B[g]| > 37
Exam
ple
Patc
hes
Leaf Node Visualization
• Average of all training patches at each leaf node
tree 1
tree 2tree 3
tree 4tree 5
STF Training Examples
• Supervised training
• Regular grid• Random transformations
– learn invariances [Lepetit et al. 06]
(GT colors categories)
Different Levels of Supervision
• STF can be trained with:– no supervision (just the images)
• clustering only – no local classification
– weak supervision (image labels)• trained as if all image labels at pixels
– full supervision (pixel labels)
treebenchgrass
Balancing the Training Set
• Datasets often unbalanced– poor average class accuracy
• Weight training examplesby inverse class frequency
Building
Grass
Tree
CowShee
p
SkyAir-plan
e
Water
FaceCarBike
Flower
SignBird BookChair
Road
CatDog
Body
Boat
Proportion of pixels by class(MSRC dataset)
Semantic Textons & Local Classification
test image
ground truth(for reference)
semantic textons(color leaf node index)
local classification(color most likely category)
comparable
Live Demo
MSRC Naïve Segmentation Baseline
• Use only local classification P(c|l) from STF
globalaccuracy
averageaccuracy
supervised 49.7% 34.5% weakly supervised 14.8% 24.1%
STF
Bags of Semantic Textons (BoSTs)semantic textons
(colors leaf node indices)local classification(colors categories)image
semantic texton histogram
freq
uenc
y
tree t1 tree tT
2 3 4 52 3 4 5depth:
prob
abili
ty
region prior
object category
region r
node index
all trees
split nodeleaf node
Choice of Regions for BoSTs
• Image categorization– region r = whole image
• Object segmentation– many image regions r
r1
i
r
r2 ir3
i ....
Other Clustering Methods
• Efficient codebooks [Jurie et al. 05]
• Hyper-grid clustering [Tuytelaars et al. 07]
• Hierarchical k-means [Nister & Stewénius 06]
• Discriminant embedding [Hua et al. 07]
• Randomized clustering forests [Moosmann et al. 06]– tree hierarchy not used– ignores classification of forest– uses expensive local descriptors
Object Recognition Pipeline
STF
semantic textons(clustering)
local classification
SF
dog
road
building
object segmentation
SVM
test image
buildingdogroad
image categorization
Semantic Texton Forest (STF)• decision forest for both
clustering & classification• tree nodes have learned
object category associations
Support Vector Machine (SVM)• pyramid match kernel
in learned tree hierarchies
Segmentation Forest (SF)• second decision forest• features use layout & context
Image Categorization
• SVM with learned Pyramid Match Kernel (PMK)– descriptor space [Grauman et al.
05]– image location space [Lazebnik et al. 06]
• New PMK acts on semantic texton histogram– matches P and Q in learned hierarchical histogram space– deeper node matches are more important
norm. depth weight
increased similarity at depth d
Categorization Experiments on MSRC
• Learned PMK vs. radial basis function (RBF)Mean AP
RBF 49.9 learned PMK 76.3
Number of Trees T
Mea
n Av
erag
e Pr
ecis
ion
NB mean average precision tougher than EER or AuC
Object Recognition Pipeline
STF
semantic textons(clustering)
local classification
SF
dog
road
building
object segmentation
SVM
test image
buildingdogroad
image categorization
Semantic Texton Forest (STF)• decision forest for both
clustering & classification• tree nodes have learned
object category associations
Support Vector Machine (SVM)• pyramid match kernel
in learned tree hierarchies
Segmentation Forest (SF)• second decision forest• features use layout & context
Segmentation Forest
• Object segmentation
• Adapt TextonBoost [Shotton et al. 06]– boosted classifier → randomized decision forest
textons → semantic textons + region priors
– no conditional random field
bicycle
road
building
Features in Segmentation Forest
ri
offset rectangle r
freq
uenc
y
node index
semantictexton bin
prob
abili
tyobject category
regionprior bin
or
bincount
learnedthreshold>
?tree split function
How the Features Work
• Rectangles pair with semantic textons can capture– appearance, layout, textural context [Shotton et al. 07]
i1 i2
i3
r2
t2i4
input image feature1 = (r1, t1)
feature2 = (r2, t2)semantic texton map
feature1 responses
feature2 response
r1 t1
Features in Segmentation Forest
• Learning the randomized forest– regular grid (10x10 pixels)– discriminative pairs of region r and BoST bin
• Region prior allows semantic context– “sheep tend to stand on grass”
• Efficient calculation– compute bins only as required– use integral images [Viola & Jones 04]– sub-sample integral images Live Demo
• Combine– image categorization (SVM with learned PMK)– object segmentation (decision forest)
Image-Level Prior (ILP)
imagecategorization
posterior
prior forsegmentation
SF ILP
weighting
See also [Verbeek et al. 07]
MSRC Segmentation Results
test image
SF
SF + ILP
bicycle flower sign bird book chair road cat dog bodybuilding grass tree cow sheep sky airplane water face car
boat
More MSRC Results
bicycle flower sign bird book chair road cat dog bodybuilding grass tree cow sheep sky airplane water face car
boat
MSRC Quantitative Comparison
• Pixel-wise segmentation accuracy (%)
• Computation time
Building
Grass
Tree
Cow
Sheep
Sky
Airplane
Water
Face
Car
Bicycle
Flower
Sign
Bird
Book
Chair
Road
Cat
Dog
Body
Boat
Global
Average
[Shotton 06] 62 98 86 58 50 83 60 53 74 63 75 63 35 19 92 15 86 54 19 62 7 71 58 [Verbeek 07] 52 87 68 73 84 94 88 73 70 68 74 89 33 19 78 34 89 46 49 54 31 - 64 SF 41 84 75 89 93 79 86 47 87 65 72 61 36 26 91 50 70 72 31 61 14 68 63 SF + ILP 49 88 79 97 97 78 82 54 87 74 72 74 36 24 93 51 78 75 35 66 18 72 67
red = winner
Training Time Test Time [Shotton 06] 2 days 30 sec / image [Verbeek 07] 1 hr 2 sec / image SF 2 hrs < 0.125 sec / image
• Pixel-wise segmentation accuracy (%)
• Computation time
MSRC Quantitative Comparison
Building
Grass
Tree
Cow
Sheep
Sky
Airplane
Water
Face
Car
Bicycle
Flower
Sign
Bird
Book
Chair
Road
Cat
Dog
Body
Boat
Global
Average
[Shotton 06] 62 98 86 58 50 83 60 53 74 63 75 63 35 19 92 15 86 54 19 62 7 71 58 [Verbeek 07] 52 87 68 73 84 94 88 73 70 68 74 89 33 19 78 34 89 46 49 54 31 - 64 SF 41 84 75 89 93 79 86 47 87 65 72 61 36 26 91 50 70 72 31 61 14 68 63 SF + ILP 49 88 79 97 97 78 82 54 87 74 72 74 36 24 93 51 78 75 35 66 18 72 67 [Tu 08] 69 96 87 78 80 95 83 67 84 70 79 47 61 30 80 45 78 68 52 67 27 78 69
red = winner
Training Time Test Time [Shotton 06] 2 days 30 sec / image [Verbeek 07] 1 hr 2 sec / image SF 2 hrs < 0.125 sec / image [Tu 08] a few days 30-70 sec / image
average accuracy only leaf node bins 64.1% all tree node bins 65.5% only region prior bins 66.1% full model 66.9% no transformations 64.4% unsupervised STF 64.2% weakly supervised STF 64.6%
MSRC Influence of Design Decisions
• MSRC dataset• Category average accuracy
VOC 2007 Segmentation Results
table
person
dog
chai
r
Average Accuracy [Brookes] 9 SF 20 SF + Image-Level Prior 24 [TKK] 30 SF + Detection-Level Prior 42
• Detection-Level Prior– [TKK] detection bounding boxes as segmentation prior
Driving Video Database
• [Brostow, Shotton, Fauqueur, Cipolla, ECCV 2008]– new structure-from-motion cues
can improve object segmentation
test image ground truth(!) STF + SF result
Semantic Texton Forests Summary
• Semantic texton forests– effective alternative to textons for recognition
• Image categorization improves segmentation– “image-level prior”– can use identical image features
• Memory– high memory requirements for training
• Efficiency– very fast on CPU
References (red = most relevant)• Amit & Geman
– Shape Quantization and Recognition with Randomized Trees.– Neural Computation 1997.
• Bosch et al.– Image Classification using Random Forests and Ferns.– ICCV 2007.
• Breiman– Random Forests.– Machine Learning Journal 2001.
• Brostow et al.– To appear ECCV 2008.
• Csurka et al.– Visual Categorization with Bags of Keypoints.– ECCV Workshop on Statistical Learning in Computer Vision, 2004.
• Geurts et al.– Extremely Randomized Trees.– Machine Learning 2006.
• Grauman & Darrel– The Pyramid Match Kernel: Discriminative Classification with Sets of Image Features.– ICCV 2005.
• Hua et al.– Discriminant Embedding for Local Image Descriptors.– ICCV 2007.
• Jurie & Triggs– Creating Efficient Codebooks for Visual Recognition.– ICCV 2005.
• Lazebnik et al.– Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories.– CVPR 2006.
• Lepetit et al.– Keypoint Recognition using Randomized Trees.– PAMI 2006.
• Lowe– Distinctive image features from scale-invariant keypoints.– IJCV 2004.
• Malik et al.– Contour and Texture Analysis for Image Segmentation.– IJCV 2001.
• Mikolajczyk & Schmid– Scale and Affine invariant interest point detectors.– IJCV 2004.
• Moosmann et al.– Fast Discriminative Visual Codebooks using Randomized Clustering Forests.– NIPS 2006.
• Nister & Stewenius– Scalable Recognition with a Vocabulary Tree.– CVPR 2006.
• Özuysal et al.– Fast Keypoint Recognition in Ten Lines of Code.– CVPR 2007.
• Sharp– To appear ECCV 2008.
• Shotton et al.– Semantic Texton Forests for Image Categorization and Segmentation.– CVPR 2008.
• Shotton et al.– TextonBoost for Image Understanding: Multi-Class Object Recognition and
Segmentation by Jointly Modeling Texture, Layout, and Context.– IJCV 2007.
• Sivic & Zisserman– Video Google: A Text Retrieval Approach to Object Matching in Videos.– ICCV 2003.
• Torralba et al.– Sharing visual features for multiclass and multiview object detection.– PAMI 2007.
• Tu– Auto-context and Its application to High-level Vision Tasks.– CVPR 2008.
• Tuytelaars & Schmid– Vector Quantizing Feature Space with a Regular Lattice.– ICCV 2007.
• Varma & Zisserman– A statistical approach to texture classification from single images.– IJCV 2005.
• Verbeek & Triggs– Region Classification with Markov Field Aspect Models.– CVPR 2007.
• Viola & Jones– Robust Real-time Object Detection.– IJCV 2004.
• Winn et al.– Object Categorization by Learned Universal Visual Dictionary.– ICCV 2005.
Take Home Message
• Randomized decision forests are– very fast (GPU friendly [Sharp, ECCV 08])
– simple to implement– flexible tools for computer vision
• Ideas for more research– biasing the randomness– optimal fusion of trees from different modalities
• e.g. appearance, SfM, optical flow
Thank [email protected]
http://jamie.shotton.org/work/presentations/ICVSS2008.zip
Internships at MSRC available for next year.Talk to me or see:
http://research.microsoft.com/aboutmsr/jobs/internships/about_uk.aspx
Example Tree in Segmentation Forest
Maximum Depth 14
More Results
objectclasses
test image
no ILP
with ILP
sky mountain tree
road sidewalk buildinggrass rock
sand plant carsnow sign
waterperson
Effect of Color Space
MSRC Categorization Results
0.0 0.2 0.4 0.6 0.8 1.00.0
0.2
0.4
0.6
0.8
1.0
Categories 1-5
building grass tree cow sheep
Recall
Prec
isio
n
0.0 0.2 0.4 0.6 0.8 1.00.0
0.2
0.4
0.6
0.8
1.0
Categories 6-10
sky aeroplane water face car
RecallPr
ecis
ion
MSRC Categorization Results
0.0 0.2 0.4 0.6 0.8 1.00.0
0.2
0.4
0.6
0.8
1.0
Categories 16-21
chair road cat dog bodyboat
RecallPr
ecis
ion
0.0 0.2 0.4 0.6 0.8 1.00.0
0.2
0.4
0.6
0.8
1.0
Categories 11-15
bicycle flower sign bird bookRecall
Prec
isio
n