Transcript of 3D Object Recognition Using Computer Vision VanGogh Imaging, Inc.
- Slide 1
- Slide 2
- 3D Object Recognition Using Computer Vision VanGogh Imaging,
Inc.
- Slide 3
- Kenneth Lee CEO/Founder klee@vangoghimaging.com
- Slide 4
- Corporate Overview Founded in 2007, located in McLean VA
Mission: Provide easy to use, real-time 3D computer vision (CV)
technology for embedded and mobile applications 2D to 3D for better
visualization, higher reliability, and accuracy Solve problems that
require spatial measurements (e.g. parts inspection) Target
customer: Application and System Developers Enhance existing
product or develop new products Product: Starry Night 3D-CV
Middleware (Unity Plugin) Operating Systems: Android and Linux 3D
Sensor: Occipital Structure and Intel RealSense Processors: ARM and
Xilinx Zynq Our focus Object recognition Feature detection Analysis
(e.g., measurements)
- Slide 5
- Potential Applications 3D Printing Parts Inspection Robotics
Entertainment Automotive Safety Security Medical Imaging
- Slide 6
- Challenges for Implementing Real-Time 3D Computer Vision Busy
uncontrolled real-world environment Limited processing power and
memory Noisy and uncalibrated low-cost scanners Difficult to use
libraries Hard to find proficient computer vision engineers Lack of
standards Large development investment
- Slide 7
- Starry Night Unity Plugin (patent pending) Starry Night Video:
https://www.youtube.com/watch?v=IZX-9PH7Erw&feature=youtu.be
- Slide 8
- The Starry Night Template-Based 3D Model Reconstruction
Reliable - The output is always a fully-formed 3D model with known
feature points despite noisy or partial scans Easy to use Fully
automated process Powerful Known data structure for easy analysis
and measurement Fast Real-time modeling Input Scan (Partial) +
Reference Model = Full 3D Model
- Slide 9
- 3D Object Recognition Algorithm for mobile and embedded
Devices
- Slide 10
- Challenges - Scene Busy scene, object orientation, and
occlusion
- Slide 11
- Challenges - Platform Mobile and Embedded Devices ARM A9 or
A15,
- Slide 12
- Previous Approaches (2D) Texture-Based Methods Color-based
depends heavily on lighting or color of the object Machine learning
robust, but requires training for each object Neither method
provides transform (i.e., orientation) (3D) Methods Hough transform
and geometric hashing slow Geometric hashing even slower Tensor
matching not good for noisy and sparse scene Correspondence-based
methods using rigid geometric descriptors The models must have
distinctive feature points which is not true for most models (i.e.,
cylinder) Tried
- Slide 13
- General Concept for CV-Based Object Recognition Reference
Object Descriptor Scene Compare Distance & Normal Distance
& Normal of Random Sample Points Match Criteria Fine-Tune
Orientation Location Transpose
- Slide 14
- Block Diagram
- Slide 15
- Model Descriptor (Pre-Processed) Sample all point pairs in the
model that are separated by the same distance D Use the surface
normal of the pair to group them into the hash tablet key
(1,1,1)P1, P2P3, P4 (2,2,2)P5, P6P7, P8P9, P10P11, P12 (3,3,3)P13,
P14 Note: In the bear example, D = 5 cm which resulted in 1000
pairs Note: The keys are angles derived from the normal of the
points. alpha() = first normal to second point beta() = second
normal to first point omega() = angle of the plane between two
points
- Slide 16
- Object Recognition Workflow Grab Scene Sample point pair w/
distance D using RANSAC Generate key using same hash function Use
key to retrieve similarly oriented points in the model & rough
transform Match criteria to find the best match Use ICP to refine
transform Note: The example scene has around 16K points Note: We
iterated this sampling process 100 times Note: Entire process can
be easily parallelized Very Important: Multiple models can be found
using a single hash table, for example, sampled point pair in the
scene
- Slide 17
- Implementation Result Object Recognition Video:
https://www.youtube.com/watch?v=h7whfei0fTw&feature=youtu.be
- Slide 18
- Object Recognition Examples * CONFIDENTIAL *18
- Slide 19
- Adaptive 3D Object Recognition Algorithm Resize and
Reshape
- Slide 20
- Object Recognition for Different Sizes & Shape Objects in
the real world are not always identical Similarity Factor, S%, can
be used to denote % of shape difference This allows recognition of
object thats similar but does not have the exact shape as the
reference model Size Factor, Z%, can be used to note the % size the
object can recognize This allows recognition of object thats of
different sizes from the reference model
- Slide 21
- General Approach Dynamically resizes the reference model
Dynamically reshapes the reference model Uses our Shape-based
Registration technique Hence, the reference model is deformed to
match the object in the scene Results in very robust object
recognition The end reference model best represents the object in
the scene both in size and shape
- Slide 22
- Block Diagram Adaptive Object Recognition with feedback
Reference model is iteratively modified with every new frame until
it converges into the same object in the scene Note: Currently in
the process of being implemented and will be available in Version
1.2 later this year
- Slide 23
- Object Recognition Performance Numbers
- Slide 24
- Reliability (w/ bear model) Reliability % false positives
depends on the scene Clean scene: 99% Model facing sideways
(narrower): 85%
- Slide 25
- Performance - Mobile Performance on Cortex A-15 2GHz ARM (on
Android mobile) Amount of time it takes to find one object Single
thread: 2 seconds Multi-thread & NEON: 0.3 second Amount of
time it takes to find two objects Single thread: 2.5 seconds
Multi-thread & NEON: 0.5 second Note: Effective use of NEON led
to significant performance gains of X2.5 for certain functions
- Slide 26
- Hardware Acceleration Using FPGA Xilinx Zynq SoC provides 20 to
1,000 parallel voxel processors depending on the size of the FPGA
Zynq ARM FPGA Processor 1 Processor 20+ voxel scan
- Slide 27
- Hardware Acceleration: FPGA (Xilinx Zynq) Select Functions to
Be Implemented in Zynq FPGA: Matrix operations Dual-core ARM: Data
management + Floating point Entire implementation done in C++
(Xilinx Vivado-HLS)
- Slide 28
- Performance: Embedded Using FPGA Note: Currently, only 30% of
the computationally intensive functions are implemented on the FPGA
with the rest still running on ARM A9. Speed will be much improved
once the remaining high-intensity functions are transferred to the
FPGA. Performance on Xilinx Zynq (Cortex A-9 800 MHZ + FPGA) Amount
of time it takes to find one object Zynq 7020: 0.7 second Zynq 7045
(est.): 0.1 second No test results for two objects, but should
scale the same way as for the ARM
- Slide 29
- Future The chosen algorithm works well in most real-world
conditions The chosen algorithm is tolerant to size and shape
differences respect to the reference model The chosen algorithm can
find multiple objects at the same time with minimal additional
processing power Additional improvements in performance are needed
Algorithm Application-specific parameters (e.g., size of the model
descriptor) ARM - NEON Optimize the use of FPGA core
- Slide 30
- Summary Key implementation issues Model descriptor Data
structure Sampling technique Platform IMPORTANT Both ARM & FPGA
provide the scalability Therefore Real-time 3D object recognition
was very difficult but successfully implemented on both mobile and
embedded platforms! LIVE DEMO AT THE Xilinx BOOTH!
- Slide 31
- Resources www.vangoghimaging.com Android 3D printing:
http://www.youtube.com/watch?v=7yCAVCGvvsohttp://www.youtube.com/watch?v=7yCAVCGvvso
Challenges and Techniques in Using CPUs and GPUs for Embedded
Vision by Ken Lee, VanGogh Imaginghttp://www.embedded-
vision.com/platinum-members/vangogh-imaging/embedded-vision-
training/videos/pages/september-2012-embedded-vision-summithttp://www.embedded-
vision.com/platinum-members/vangogh-imaging/embedded-vision-
training/videos/pages/september-2012-embedded-vision-summit Using
FPGAs to Accelerate Embedded Vision Applications, Kamalina Srikant,
National Instruments http://www.embedded-
vision.com/platinum-members/national-instruments/embedded-vision-
training/videos/pages/september-2012-embedded-vision-summithttp://www.embedded-
vision.com/platinum-members/national-instruments/embedded-vision-
training/videos/pages/september-2012-embedded-vision-summit
Demonstration of Optical Flow algorithm on an FPGA
http://www.embedded-vision.com/platinum-members/bdti/embedded-
vision-training/videos/pages/demonstration-optical-flow-algorithm-fpg
http://www.embedded-vision.com/platinum-members/bdti/embedded-
vision-training/videos/pages/demonstration-optical-flow-algorithm-fpg
* Reference: An Efficient RANSAC for 3D Object Recognition in Noisy
and Occluded Scenes by Chavdar Papazov and Darius Burschka.
Technische Universitat Munchen (TUM), Germany.
- Slide 32