Scalable Learning in Computer Vision Adam Coates Honglak Lee Rajat Raina Andrew Y. Ng Stanford...
-
Upload
brayan-fant -
Category
Documents
-
view
219 -
download
1
Transcript of Scalable Learning in Computer Vision Adam Coates Honglak Lee Rajat Raina Andrew Y. Ng Stanford...
![Page 1: Scalable Learning in Computer Vision Adam Coates Honglak Lee Rajat Raina Andrew Y. Ng Stanford University.](https://reader035.fdocuments.in/reader035/viewer/2022062313/56649c6d5503460f9491f9e7/html5/thumbnails/1.jpg)
Scalable Learningin Computer Vision
Adam CoatesHonglak LeeRajat Raina
Andrew Y. Ng
Stanford University
![Page 2: Scalable Learning in Computer Vision Adam Coates Honglak Lee Rajat Raina Andrew Y. Ng Stanford University.](https://reader035.fdocuments.in/reader035/viewer/2022062313/56649c6d5503460f9491f9e7/html5/thumbnails/2.jpg)
Adam Coates, Honglak Lee, Rajat Raina, Andrew Y. Ng – Stanford University
Computer Vision is Hard
![Page 3: Scalable Learning in Computer Vision Adam Coates Honglak Lee Rajat Raina Andrew Y. Ng Stanford University.](https://reader035.fdocuments.in/reader035/viewer/2022062313/56649c6d5503460f9491f9e7/html5/thumbnails/3.jpg)
Adam Coates, Honglak Lee, Rajat Raina, Andrew Y. Ng – Stanford University
Introduction
• One reason for difficulty: small datasets.
Common Dataset Sizes (positives per class)Caltech 101 800Caltech 256 827PASCAL 2008 (Car) 840PASCAL 2008 (Person) 4168LabelMe (Pedestrian) 25330NORB (Synthetic) 38880
![Page 4: Scalable Learning in Computer Vision Adam Coates Honglak Lee Rajat Raina Andrew Y. Ng Stanford University.](https://reader035.fdocuments.in/reader035/viewer/2022062313/56649c6d5503460f9491f9e7/html5/thumbnails/4.jpg)
Adam Coates, Honglak Lee, Rajat Raina, Andrew Y. Ng – Stanford University
Introduction• But the world is complex.
– Hard to get extremely high accuracy on real images if we haven’t seen enough examples.
1E+03 1E+040.75
0.8
0.85
0.9
0.95
1Test Error (Area Under Curve) – Claw
Hammers
Training Set Size
AUC
![Page 5: Scalable Learning in Computer Vision Adam Coates Honglak Lee Rajat Raina Andrew Y. Ng Stanford University.](https://reader035.fdocuments.in/reader035/viewer/2022062313/56649c6d5503460f9491f9e7/html5/thumbnails/5.jpg)
Adam Coates, Honglak Lee, Rajat Raina, Andrew Y. Ng – Stanford University
Introduction• Small datasets:
– Clever features• Carefully design to be
robust to lighting, distortion, etc.
– Clever models• Try to use knowledge
of object structure.
– Some machine learning on top.
• Large datasets:– Simple features
• Favor speed over invariance and expressive power.
– Simple model• Generic; little human
knowledge.
– Rely on machine learning to solve everything else.
![Page 6: Scalable Learning in Computer Vision Adam Coates Honglak Lee Rajat Raina Andrew Y. Ng Stanford University.](https://reader035.fdocuments.in/reader035/viewer/2022062313/56649c6d5503460f9491f9e7/html5/thumbnails/6.jpg)
SUPERVISED LEARNINGFROM SYNTHETIC DATA
![Page 7: Scalable Learning in Computer Vision Adam Coates Honglak Lee Rajat Raina Andrew Y. Ng Stanford University.](https://reader035.fdocuments.in/reader035/viewer/2022062313/56649c6d5503460f9491f9e7/html5/thumbnails/7.jpg)
Adam Coates, Honglak Lee, Rajat Raina, Andrew Y. Ng – Stanford University
The Learning Pipeline
ImageData
LearningAlgorithm
Low-level features
• Need to scale up each part of the learning process to really large datasets.
![Page 8: Scalable Learning in Computer Vision Adam Coates Honglak Lee Rajat Raina Andrew Y. Ng Stanford University.](https://reader035.fdocuments.in/reader035/viewer/2022062313/56649c6d5503460f9491f9e7/html5/thumbnails/8.jpg)
Adam Coates, Honglak Lee, Rajat Raina, Andrew Y. Ng – Stanford University
Synthetic Data
• Not enough labeled data for algorithms to learn all the knowledge they need.– Lighting variation– Object pose variation– Intra-class variation
• Synthesize positive examples to include this knowledge. – Much easier than building this knowledge into the
algorithms.
![Page 9: Scalable Learning in Computer Vision Adam Coates Honglak Lee Rajat Raina Andrew Y. Ng Stanford University.](https://reader035.fdocuments.in/reader035/viewer/2022062313/56649c6d5503460f9491f9e7/html5/thumbnails/9.jpg)
Adam Coates, Honglak Lee, Rajat Raina, Andrew Y. Ng – Stanford University
Synthetic Data• Collect images of object on a green-screen
turntable.Green Screen image
Segmented Object
Synthetic Background
Photometric/Geometric Distortion
![Page 10: Scalable Learning in Computer Vision Adam Coates Honglak Lee Rajat Raina Andrew Y. Ng Stanford University.](https://reader035.fdocuments.in/reader035/viewer/2022062313/56649c6d5503460f9491f9e7/html5/thumbnails/10.jpg)
Adam Coates, Honglak Lee, Rajat Raina, Andrew Y. Ng – Stanford University
Synthetic Data: Example• Claw hammers:
Synthetic Examples (Training set)
Real Examples (Test set)
![Page 11: Scalable Learning in Computer Vision Adam Coates Honglak Lee Rajat Raina Andrew Y. Ng Stanford University.](https://reader035.fdocuments.in/reader035/viewer/2022062313/56649c6d5503460f9491f9e7/html5/thumbnails/11.jpg)
Adam Coates, Honglak Lee, Rajat Raina, Andrew Y. Ng – Stanford University
The Learning Pipeline
ImageData
LearningAlgorithm
Low-level features
• Feature computations can be prohibitive for large numbers of images.– E.g., 100 million examples x 1000 features. 100 billion feature values to compute.
![Page 12: Scalable Learning in Computer Vision Adam Coates Honglak Lee Rajat Raina Andrew Y. Ng Stanford University.](https://reader035.fdocuments.in/reader035/viewer/2022062313/56649c6d5503460f9491f9e7/html5/thumbnails/12.jpg)
Adam Coates, Honglak Lee, Rajat Raina, Andrew Y. Ng – Stanford University
Features on CPUs vs. GPUs• Difficult to keep scaling features on CPUs.
– CPUs are designed for general-purpose computing.• GPUs outpacing CPUs dramatically.
(nVidia CUDA Programming Guide)
![Page 13: Scalable Learning in Computer Vision Adam Coates Honglak Lee Rajat Raina Andrew Y. Ng Stanford University.](https://reader035.fdocuments.in/reader035/viewer/2022062313/56649c6d5503460f9491f9e7/html5/thumbnails/13.jpg)
Adam Coates, Honglak Lee, Rajat Raina, Andrew Y. Ng – Stanford University
Features on GPUs
• Features: Cross-correlation with image patches.– High data locality; high arithmetic intensity.
• Implemented brute-force.– Faster than FFT for small filter sizes.– Orders of magnitude faster than FFT on CPU.
• 20x to 100x speedups (depending on filter size).
![Page 14: Scalable Learning in Computer Vision Adam Coates Honglak Lee Rajat Raina Andrew Y. Ng Stanford University.](https://reader035.fdocuments.in/reader035/viewer/2022062313/56649c6d5503460f9491f9e7/html5/thumbnails/14.jpg)
Adam Coates, Honglak Lee, Rajat Raina, Andrew Y. Ng – Stanford University
The Learning Pipeline
ImageData
LearningAlgorithm
Low-level features
• Large number of feature vectors on disk are too slow to access repeatedly.– E.g., Can run an online algorithm on one machine,
but disk access is a difficult bottleneck.
![Page 15: Scalable Learning in Computer Vision Adam Coates Honglak Lee Rajat Raina Andrew Y. Ng Stanford University.](https://reader035.fdocuments.in/reader035/viewer/2022062313/56649c6d5503460f9491f9e7/html5/thumbnails/15.jpg)
Adam Coates, Honglak Lee, Rajat Raina, Andrew Y. Ng – Stanford University
Distributed Training
• Solution: must store everything in RAM.
• No problem!– RAM as low as $20/GB
• Our cluster with 120GB RAM:– Capacity of >100 million examples.
• For 1000 features, 1 byte each.
![Page 16: Scalable Learning in Computer Vision Adam Coates Honglak Lee Rajat Raina Andrew Y. Ng Stanford University.](https://reader035.fdocuments.in/reader035/viewer/2022062313/56649c6d5503460f9491f9e7/html5/thumbnails/16.jpg)
Adam Coates, Honglak Lee, Rajat Raina, Andrew Y. Ng – Stanford University
Distributed Training
• Algorithms that can be trained from sufficient statistics are easy to distribute.
• Decision tree splits can be trained using histograms of each feature.– Histograms can be computed for small chunks of
data on separate machines, then combined.
+
Slave 2Slave 1 Master Masterx x x
=
Split
![Page 17: Scalable Learning in Computer Vision Adam Coates Honglak Lee Rajat Raina Andrew Y. Ng Stanford University.](https://reader035.fdocuments.in/reader035/viewer/2022062313/56649c6d5503460f9491f9e7/html5/thumbnails/17.jpg)
Adam Coates, Honglak Lee, Rajat Raina, Andrew Y. Ng – Stanford University
The Learning Pipeline
ImageData
LearningAlgorithm
Low-level features
• We’ve scaled up each piece of the pipeline by a large factor over traditional approaches:
> 1000x 20x – 100x > 10x
![Page 18: Scalable Learning in Computer Vision Adam Coates Honglak Lee Rajat Raina Andrew Y. Ng Stanford University.](https://reader035.fdocuments.in/reader035/viewer/2022062313/56649c6d5503460f9491f9e7/html5/thumbnails/18.jpg)
Adam Coates, Honglak Lee, Rajat Raina, Andrew Y. Ng – Stanford University
Size Matters
1E+03 1E+04 1E+05 1E+06 1E+07 1E+080.750000000000001
0.800000000000001
0.850000000000001
0.900000000000001
0.950000000000001
1Test Error (Area Under Curve) – Claw Hammers
Training Set Size
AUC
![Page 19: Scalable Learning in Computer Vision Adam Coates Honglak Lee Rajat Raina Andrew Y. Ng Stanford University.](https://reader035.fdocuments.in/reader035/viewer/2022062313/56649c6d5503460f9491f9e7/html5/thumbnails/19.jpg)
UNSUPERVISED FEATURE LEARNING
![Page 20: Scalable Learning in Computer Vision Adam Coates Honglak Lee Rajat Raina Andrew Y. Ng Stanford University.](https://reader035.fdocuments.in/reader035/viewer/2022062313/56649c6d5503460f9491f9e7/html5/thumbnails/20.jpg)
Adam Coates, Honglak Lee, Rajat Raina, Andrew Y. Ng – Stanford University
Traditional supervised learning
Testing:What is this?
Cars Motorcycles
![Page 21: Scalable Learning in Computer Vision Adam Coates Honglak Lee Rajat Raina Andrew Y. Ng Stanford University.](https://reader035.fdocuments.in/reader035/viewer/2022062313/56649c6d5503460f9491f9e7/html5/thumbnails/21.jpg)
Adam Coates, Honglak Lee, Rajat Raina, Andrew Y. Ng – Stanford University
Self-taught learning
Natural scenes
Testing:What is this?
Car Motorcycle
![Page 22: Scalable Learning in Computer Vision Adam Coates Honglak Lee Rajat Raina Andrew Y. Ng Stanford University.](https://reader035.fdocuments.in/reader035/viewer/2022062313/56649c6d5503460f9491f9e7/html5/thumbnails/22.jpg)
Adam Coates, Honglak Lee, Rajat Raina, Andrew Y. Ng – Stanford University
Learning representations
ImageData
LearningAlgorithm
Low-level features
• Where do we get good low-level representations?
![Page 23: Scalable Learning in Computer Vision Adam Coates Honglak Lee Rajat Raina Andrew Y. Ng Stanford University.](https://reader035.fdocuments.in/reader035/viewer/2022062313/56649c6d5503460f9491f9e7/html5/thumbnails/23.jpg)
Adam Coates, Honglak Lee, Rajat Raina, Andrew Y. Ng – Stanford University
Computer vision features
SIFT Spin image
HoG RIFT
Textons GLOH
![Page 24: Scalable Learning in Computer Vision Adam Coates Honglak Lee Rajat Raina Andrew Y. Ng Stanford University.](https://reader035.fdocuments.in/reader035/viewer/2022062313/56649c6d5503460f9491f9e7/html5/thumbnails/24.jpg)
Adam Coates, Honglak Lee, Rajat Raina, Andrew Y. Ng – Stanford University
Unsupervised feature learning
Input image (pixels)
“Sparse coding”(edges; cf. V1)
[Related work: Hinton, Bengio, LeCun, and others.]
DBN (Hinton et al., 2006) with additional sparseness constraint.
Higher layer
(Combinations
of edges; cf.V2)
![Page 25: Scalable Learning in Computer Vision Adam Coates Honglak Lee Rajat Raina Andrew Y. Ng Stanford University.](https://reader035.fdocuments.in/reader035/viewer/2022062313/56649c6d5503460f9491f9e7/html5/thumbnails/25.jpg)
Adam Coates, Honglak Lee, Rajat Raina, Andrew Y. Ng – Stanford University
Unsupervised feature learning
Input image
Model V1
Higher layer
(Model V2?)
Higher layer
(Model V3?)
• Very expensive to train. > 1 million examples. > 1 million parameters.
![Page 26: Scalable Learning in Computer Vision Adam Coates Honglak Lee Rajat Raina Andrew Y. Ng Stanford University.](https://reader035.fdocuments.in/reader035/viewer/2022062313/56649c6d5503460f9491f9e7/html5/thumbnails/26.jpg)
Adam Coates, Honglak Lee, Rajat Raina, Andrew Y. Ng – Stanford University
Learning Large RBMs on GPUs
5 hours
2 weeks
GPU
Dual-core CPU
Learning time for
10 million examples
(log scale)
Millions of parameters 1 18 36 45
8 hours
½ hour
2 hours
35 hours
1 hour
1 day
1 week
(Rajat Raina, Anand Madhavan, Andrew Y. Ng)
72x faster
![Page 27: Scalable Learning in Computer Vision Adam Coates Honglak Lee Rajat Raina Andrew Y. Ng Stanford University.](https://reader035.fdocuments.in/reader035/viewer/2022062313/56649c6d5503460f9491f9e7/html5/thumbnails/27.jpg)
Adam Coates, Honglak Lee, Rajat Raina, Andrew Y. Ng – Stanford University
Pixels
Edges
Object parts(combination of edges)
Object models
Learning features
• Can now train very complex networks.
• Can learn increasingly complex features.
• Both more specific and more general-purpose than hand-engineered features.
![Page 28: Scalable Learning in Computer Vision Adam Coates Honglak Lee Rajat Raina Andrew Y. Ng Stanford University.](https://reader035.fdocuments.in/reader035/viewer/2022062313/56649c6d5503460f9491f9e7/html5/thumbnails/28.jpg)
Adam Coates, Honglak Lee, Rajat Raina, Andrew Y. Ng – Stanford University
Conclusion
• Performance gains from large training sets are significant, even for very simple learning algorithms.– Scalability of the system allows these algorithms to
improve “for free” over time.
• Unsupervised algorithms promise high-quality features and representations without the need for hand-collected data.
• GPUs are a major enabling technology.
![Page 29: Scalable Learning in Computer Vision Adam Coates Honglak Lee Rajat Raina Andrew Y. Ng Stanford University.](https://reader035.fdocuments.in/reader035/viewer/2022062313/56649c6d5503460f9491f9e7/html5/thumbnails/29.jpg)
Adam Coates, Honglak Lee, Rajat Raina, Andrew Y. Ng – Stanford University
THANK YOU