Web-Scale Computer Vision Using MapReducefor Multimedia Data Mining
Brandyn White, Tom Yeh, Jimmy Lin, Larry Davis
University of Maryland, College [email protected]
July 25, 2010
Motivation
Massive amount of data available:
I Flickr: ∼4B images
I Youtube: ∼800M videos
I Google Maps: Panorama and map of most US streets
I Facebook: Social graph 500M users + ∼15B photos
Web-scale computer vision:
I New implementation techniques
I Data ingest and conditioning
I Past conclusions may not hold (Banko and Brill)
I Predictable requirements
I Focus on throughput over latency
Motivation
Massive amount of data available:
I Flickr: ∼4B images
I Youtube: ∼800M videos
I Google Maps: Panorama and map of most US streets
I Facebook: Social graph 500M users + ∼15B photos
Web-scale computer vision:
I New implementation techniques
I Data ingest and conditioning
I Past conclusions may not hold (Banko and Brill)
I Predictable requirements
I Focus on throughput over latency
Problem
I Common vision researchI ∼10K imagesI MATLAB/C++I Single ProcessorI Current Trade-off: data-size vs run-time
I More data than we can process
I Strong results showing data-size is critical (Torralba PAMI08)
I MapReduce: Successful abstraction for large-scale problems
MapReduce
I Programming Abstraction: User writes Map and Reduce
I Optional Optimization: Combiner (less network transfer)
I Framework: Distributed file system, job execution
I Fault tolerant (abstraction allows for reexecution)
I ‘Move code to data’
I Batch processingI Lots of data
I Used on > 1 PB of data at Yahoo, Google, and FacebookI 10K+ core cluster at Yahoo
X 5 Y 7 Z 8
A B C D E Fα β γ δ ε ζ
Shuffle and Sort: aggregate values by keys
ba 1 2 c c3 6 a c5 2 b c7 8
a 1 5 b 2 7 c 6 2 8
ba 1 2 c 6 a c5 2 b c7 8
combiner combiner combiner combiner
partitioner partitioner partitioner partitioner
mapper mapper mapper mapper
reducer reducer reducer
Paper Overview
Goal: MapReduce intro for web-scale vision processing
I Example driven
I Provides thought process and considerations
I Description, pseudo-code, and Python
Topics:
I Data representation
I Design patterns
I Algorithm design
I Experimentation
Algorithms Covered
Selection Criteria:
I Relevant large scale applications
I Diverse
I Non-trivial implementation
Algorithms:
I Clustering
I Bag-of-Features
I Background Subtraction
I Sliding Windows + NMS
I Classifier Training
I Image Registration
Clustering
I Goal: Group unlabeled points using a distance metricI Aid in data analysisI Improve computational efficiency
I Fundamental role in large scale data processing
I Our method evaluated at scale (> 200 GB)
I Implementation choices matter
K-Means
Input (id, point) Note: id is unused(x , p0) . . . (x , p1) . . . (x , pn)
Map (cluster num, point)(5, p0) . . . (2, p1) . . . (5, p2) . . . (1, pn) . . .
Shuffle(1, [pn, . . . ]) . . . (2, [p1, . . . ]) . . . (5, [p0, p2, . . . ])
Reduce (cluster num, centroid)(1, c1) . . . (2, c2) . . . (5, c5) . . .
K-Means w/ Combiner
Input (id, point) Note: id is unused(x , p0) . . . (x , p1) . . . (x , pn)
Map (cluster num, point)(5, p0) . . . (2, p1) . . . (5, p2) . . . (1, pn) . . .
Combine (cluster num, point)(5, p0,2) . . . (2, p1) . . . (1, pn) . . .
Shuffle(1, [pn, . . . ]) . . . (2, [p1, . . . ]) . . . (5, [p0,2, . . . ])
Reduce (cluster num, centroid)(1, c1) . . . (2, c2) . . . (5, c5) . . .
K-Means Experimental Setup
I Google/IBM Academic Cluster
I 410 NodesI Evaluated on 205 GB of points
I 100 ClustersI 1000 Dimensions
I Algorithm modificationsI UncombinedI CombinerI In-map combiner
105 106 107 108
Input Size
101
102
103
104
Tim
e (s
ec.)
K-Means Clustering: Overall Time vs # Points
Uncombined R=10Uncombined R=100Combiner R=10IMC R=10Single Machine (baseline)
Bag-of-Features
I Quantize image feature-points, treat as ‘words’
I Represents an image globally with local featuresI Lacks spatial information
I Can be encoded if desiredI Some invariance to object pose
I Sensitive to feature selectionI RandomI UniformI SIFT, SURF, Harris/Hessian-Affine, MSER
I At scale speed depends onI Number of clusters, features, and dimensionsI Distance metric
Bag-of-Features
Input (image num, feature)(0, f0) . . . (0, f1) . . . (M, fn)
Map (image num, cluster num)(0, 0) . . . (0, 3) . . . (0, 0) . . . (1, 0) . . .
Shuffle(0, [0, 3, 0, . . . ]) . . . (1, [0 . . . ]) . . .
Reduce (image num, bof)(0, 〈2, 0, 0, 1, . . . 〉) . . . (1, 〈1, . . . 〉) . . .
Background Subtraction
I Goal: Distinguish between FG and BG in video sequencesI Detection method in surveillance settingsI High-level action analysis
I Past work: real-time processing (latency)
I Web-scale: batch processing (throughput)I Initial experiment
I Single GaussianI Remove frame dependenceI Learn local neighboring blocksI Uses the order-inversion design pattern
I Processed the PETS’06 dataset in 156 secI 82K 720x576 frames @ 528 FPSI 8GB of video dataI Google/IBM Academic Cluster
Background Subtraction
Input (frame num, image)(0, ) . . . (100, ) . . . (200, )
Map ((block num, flag), (frame num, image))((0, 0), (0, ))((0, 1), (0, ))
Shuffle((0, 0), [(0, ), (1, ), (2, ) . . . ])((0, 1), [(0, ), (1, ), (2, ) . . . ])
Reduce (frame num, bgsub)(0, ) . . . (100, ) . . . (200, )
Applications
I Visual Search Indexing (CBIR)
I Offline Surveillance Processing
I Scene Recognition (e.g., indoor, bank, beach)
I Object Recognition (e.g., cars, faces)
I Auto-Tagging
I Image Registration and Stabilization
I 3D Reconstruction
Conclusion
I MapReduce is applicable to large scale vision tasks
I Diverse set of algorithms shown to fit this model
I Algorithm design guidelines and recommendations
I Web-scale: Emphasis on high throughput
I Design patterns by Lin et al. simplify algorithm design
References
[1] M. Banko and E. Brill. Scaling to very very large corpora fornatural language disambiguation. In ACL, pages 26-33, 2001.
[3] G. Csurka, C. Dance, L. Fan, J. Willamowski, and C. Bray.Visual categorization with bags of keypoints. In ECCV Workshopon Statistical Learning in Computer Vision, 2004.
[5] J. Dean and S. Ghemawat. MapReduce: Simplified dataprocessing on large clusters. In OSDI, pages 137-150, 2004.
[13] J. Lin and C. Dyer. Data-Intensive Text Processing withMapReduce. Morgan & Claypool Publishers, 2010.
[20] A. Torralba, R. Fergus, and W. T. Freeman. 80 million tinyimages: A large data set for nonparametric object and scenerecognition. PAMI, 30(11):1958-1970, 2008.
Questions?
Brandyn White
http://www.cs.umd.edu/~bwhite/
http://github.com/bwhite/hadoopy
http://github.com/bwhite/hadoop_vision
Top Related