CONCEPTS, ALGORITHMS & PRACTICAL APPLICATIONS IN 2D …€¦ · Senior Scientist Markus Hofstätter...
Transcript of CONCEPTS, ALGORITHMS & PRACTICAL APPLICATIONS IN 2D …€¦ · Senior Scientist Markus Hofstätter...
CONCEPTS, ALGORITHMS & PRACTICAL
APPLICATIONS IN 2D AND 3D COMPUTER
VISION
Michael Rauter, Christian Zinner, Andreas Zweng,
Andreas Zoufal, Julia Simon, Daniel Steininger,
Markus Hofstätter und Andreas KriechbaumSenior Scientist
Center for Vision, Automation & Control
Autonomous Systems
AIT Austrian Institute of Technology GmbH
Vienna, Austria
Csaba Beleznai
2
GRAND
CHALLENGES
RECOGNITION
SEGMENTATION
RECONSTRUCTION
TIME GRAND
CHALLENGESTIME
▪ Research is evolution → so is your learning process
▪ Balance: becoming a domain expert vs. being a „globalist“
▪ Researchers tend to favour certain paradigms - Learn to outline trends, look upstream
▪ Revisit old problems to see them under new light
▪ Specialize the general & Generalize the specific
▪ Factorize your know-how (code, topics, …) into components → sustainable, scalable
MOTIVATION
VISUAL OBJECT RECOGNITION TRENDS
Human-level performance
2012 time
Accuracy
2019
2012 time
Computational
costs
/for real-time/
2019
CPU
DEDICATED
COMP.HW. COMPUTATIONAL
BARRIER
2012 time
Amount of image
data (for training)
2019
IMAGE DATA
BARRIER
Nuclear
Engineering
Seibersdorf
GmbH
Seibersdorf
Labor GmbH
AIT AUSTRIAN INSTITUTE OF TECHNOLOGY
AIT Austrian Institute of Technology
EnergyHealth &
Bioresources
Digital Safety &
Security
Vision, Automation &
Control
AIT Austrian Institute of Technology
Mobility SystemsLow-Emission
Transport
Technology
Experience
Innovation Systems &
Policy
4
Federal Ministry for Transport,
Innovation and Technology
50,46%
Federation of Austrian
Industries
49,54%
1300+ employees
Budget: 140 Mio €
Business Model: 40:30:30
Robust and flexible
3D vision technology
VISION, AUTOMATION & CONTROL
High-Performance
Vision
3D Vision and
Modeling
Complex Dynamical
Systems
Worldwide fastest vision
sensor technology Advanced handling and
smart production
F r o m S e n s o r T o D e c i s i o n 5
AIT AUTONOMOUS & ASSISTIVE SYSTEMS
Driver Assistance
System for Trams
Assistance Systems for
Construction Machines
Driverless Missions in Crisis
& Disaster Management
Autonomous
Local Railway
Autonomous
Bus
7
ENABLING METHODOLOGIES FOR
ASSISTED OR SELF-DRIVING
Mobile platforms: sensory signals + local context (situation) → decisions
recognition
localization
motion analysis
vehicle control
objects (type, location, pose)
environment
sensor/data fusion
positioning
mapping
ego-motion computation
sensor fusion
object tracking
state prediction
probability
based behavior
elements
dynamic model
computation
vehicle model
vehicle control
safety
compliance
Deep learning based
detection & segmentationLocalization,
map building
Sparse motion
estimation, tracking
Vision algorithms testing
RELATED
KNOW-HOW
INTELLIGENT PERCEPTION FOR MOBILE MACHINES
AUTONOMOUS OFFROAD VEHICLE
1014.07.2019
A frequently asked question
Introduction
11
Example: Crop detection
▪ Radial symmetry
▪ Near regular structure
Example for robust vision
IDEA
branch & bound
research methodology
PRODUCTAPPLICATION
RESEARCH DEVELOPMENT
Alg. A
Alg. B
Alg. C
MATLAB C++
Motivation
Introduction
▪ Challenges when developing Vision Systems:
▪ Complexity Algorithmic, Systemic, Data
▪ Non-linear search for a solution
13
Real-time optical flow based particle advection
for object detection and tracking
2D
1414.07.2019
MOTIVATION – I.
OBJECT DETECTION PIPELINES
Spatial distribution of
posterior probability
Score map (DPM, R-CNN, …)
Vote map
Occupancy map
back-projected similarity map
…
R-CNN: Region-based Convolutional Neural Networks
DPM: Deformable Part Models
Delineated objects
Bounding boxes
Instance segmentation
More complex
parametric
representations
weakly constrained structural prior
Non-maximum suppressionNeubeck & Van Gool, 2006Rothe et al., 2014
Leibe et al. 2005
Comaniciu & Meer, 2002 Bradski 1998
Dollar & Zitnick 2013 Kontschieder et al. 2011
Leibe et al. 2005
Mean Shift, CAMShift
Center-surround filter
MeanShift and CamShift iterations
Implicit Shape Model
Structured random forests
RELATED STATE-OF-THE-ART
▪ Clustering detections
▪ Detection by voting/segmentation/learningimplicit or explicit structural prior
Kontschieder et al. 2011
Markov Point Processes for object configurationsVerdie, 2014
CNN‘s for Non-Max. SuppressionHosang et al. 2016, Wan et al. 2015
Hosang et al. 2016
Optical flow driven advection
16
ti ti+1
Dense optical flow field
Advection: transport mechanism induced by a force field
Vx,i
Vy,iA particle trajectory
induced by the OF field
Particle advection with FW-BW consistency
▪ A simple but powerful test
Forward:
Backward:
Successful
Failure
< x x : mean offsetConsistency check:
Pedestrian Flow Analysis
Public dataset: Grand Central Station, NYC: 720x480 pixels, 2000 particles, runs at 35 fps
SHAPE-GUIDED TRACKLET GROUPING
Optical flow driven particle tracklets
𝑇𝑖 = 𝑥𝑡, 𝑦𝑡 𝑡=1..𝑁, 𝒗, 𝑤The ith tracklet:
Clustering directly performed in the
discrete tracklet-domain
STEP 1: sampling STEP 2: weight generation
from orientation similarity
(w.r.t. center tracklet)
STEP 3: local shape, scale
and center estimation
Estimated cluster parameters
+ mode location
STEP 4: find nearest
tracklet to mode estimate
𝒗 – velocity vector
𝑤 – weight (scalar)
repeat from STEP 1 until
convergenceSingle parameter:
W – initial scale
W
FOR COMPACT OBJECTS
20
STEREO DEPTH INFORMATION
CHARACTERISTICS AND USAGE
3D
PASSIVE STEREO BASED DEPTH MEASUREMENT
• Depth ordering of people
• Robustness against illumination,
shadows,
• Enables scene analysis
Advantage:
▪ 3D stereo-camera system developed by AIT
▪ Area-based, local-optimizing, correlation-
based stereo matching algorithm
▪ Specialized variant of the Census Transform
▪ Resolution: typically ~1 Mpixel
▪ Run-time: ~ 14 fps (Core-i7, multithreaded, SSE-optimized)
▪ Excellent “depth-quality-vs.-computational-costs” ratio
▪ USB 2 interface
STEREO CAMERA CHARACTERISTICS
22
Trinocular setup:
▪ 3 baselines possible
▪ 3 stereo computations with
results fused into one disparity
image
large baseline
near-range
far-range
small medium
23
Planar surface in 3D space
(x,y) image coordinates, d disparity
d(x,y)
Data characteristics
Intensity image Disparity image
y
d
y
Height (world)
Ground plane (world)
correct measurementnoisy measurement
Stereo setup
Computed top view of the 3D point cloud
2.5D approach
3D approach
2.5D vs. 3D algorithmic approaches
Additional knowledge (compared to existing video analytics solutions):
• Stationary object (Geometry introduced to a scene)
• Object geometric properties (Volume, Size)
• Spatial location (on the ground)
LEFT ITEM DETECTION
METHODOLOGY
Background
model
Stereo disparity Combination
of proposals
+Validation
Final
candidatesInput images
Ground plane
estimation
Change detectionINTENSITY
DEPTH
Processing intensity and depth data
Ortho-map
generation
Object
detection and
validation in
the ortho-
map
Ortho-transform
2714.07.2019
Left Item Detection – Demos
28
Clustering in discrete two-dimensional distributions
2914.07.2019(a)
Object detection as clustering
A Frequently Occurring Task
Analysis of discrete two-dimensional distributions
…… LEARNED
CODEBOOK
3114.07.2019
EXAMPLES
▪ Description of the Binary-Shape-driven 2D clustering
• Shape learning
• Shape clustering, delineation
▪ Results
• Occupancy map clustering
• Text line delineation
• Object delineation by shape-guided
tracklet grouping
3232
Intermediate probabilistic representations 2D distributions
prior, structure-specific knowledge
Local grouping generate consistent
object hypotheses
Challenge:
▪ arbitrarily shaped distributions
▪ multiple nearby modes
▪ noise, clutter
TASK DEFINITION
Definitions:
mode = location of maximum density
computed using a kernel K
density estimation of variable x
𝑓 𝑥 =
𝑎
𝐾 𝑎 − 𝑥 𝑤(𝑎)
Shape learning – Case: Compact clusters
2. Sampling using an analysis window
discretized into a ni×ni grid
1. Binary mask from manual annotation or
from synthetic data
M
Off-the-mode
samples
Mode-centered
samples
Spatial resolution of local structure
3. Building a codebook of binary shapes
with a coarse-to-fine spatial resolution
Codebook:
Shape learning
3414.07.2019
Example Codebook – Case: Compact clusters
Shape learning – Case: Line structures
Spatial resolution of local structure
low mid highBinary mask from manually annotated
text lines
Codebook:
Shape learning
Shape delineation – I.
Shape delineation
Density measure for each resolution level for the binary structure
Step 1: Fast Mode Seeking
Step 2: Local density analysis
Three integral images: and
Mode location:
COMPACT CLUSTERS LINE STRUCTURES
Enumerating all binary shapes at each resolution level
→ Finding best matching entry:
Shape delineation – II.
Shape delineation
Recursive search for end points, starting from
mode locations:
Line-centered structuresOff-the-line
structures
… …
END POINT
CANDIDATES
VOTE MAP
Relative line end locations define:- Search direction
- Line end positions
Experimental results - Case: Compact clusters
Human detection by occupancy map clustering:
13 m
Passive stereo depth sensing → depth data projected orthogonal to the
ground plane
Occupancy map (1246×728 pix.) clustering: 56 fps, overall system (incl. stereo computation): 6 fps
Experimental results - Case: Compact clusters
Input image
Probability distribution for text
Binarization is very sensitive to employed threshold
Our scheme has no threshold, only local structural priors
Simple binarization
Proposed scheme
Experimental results - Case: Line structures (Text line segmentation)
Experimental results - Case: Text line segmentation
42
Queue length detection using depth and
intensity information
2D + 3D
Queue Length + Waiting Time estimation
What is waiting time in a queue?
Ch
eck
po
int
Waiting time
Time measurement relating to last
passenger in the queue
Example: Announcement of waiting times (App) → customer satisfaction
Why interesting?
Example: Infrastructure operator → load balancing
Queue analysis
▪ Challenging problem
Waiting time =
▪ Shape
▪ No predefined shape (context/situation-dependent and time-varying)
▪ Motion→ not a pure translational pattern
▪ Propagating stop-and-go behaviour with a noisy „background“
▪ Signal-to-noise ratio depends on the observation distance
Length
Velocity
1. What is the shape and extent of the queue?
2. What is the velocity of the propagation?
DEFINITION: Collective goal-oriented motion pattern of multiple humans
exhibiting spatial and temporal coherence
simple complex
▪ How can we detect (weak) correlation?
▪ Much data is necessary → Simulating crowding phenomena in Matlab
▪ Social force model (Helbing 1998)
Source: Parameswaran et al. Design and Validation of a
System for People Queue Statistics Estimation, Video
Analytics for Business Intelligence, 2012
t
x
y
Correlation in space and time
goal-driven kinematics – force field repulsion by walls repulsion by „preserving privacy “
Visual queue analysis - Overview
46
Simulation tool → Creating infinite number of possible queueing zones
Two simulated examples (time-accelerated view):
Queue analysis
Queue analysis (length, dynamics)
Staged scenarios, 1280x1024 pixels, computational speed: 6 fps
Straight line Meander style
▪ Pairwise spatio-temporal
correlation analysis
▪ Correlation → data weights
▪ Mode seeking and tracking
▪ Queue delineation posed as an
open Traveling Salesman
Problem (oTSP) with a fixed start
▪ Queue forward velocity
estimation using a deformable
elastically connected chain
Estimated configuration
(top-view) Detection results
Adaptive estimation of the spatial extent of the queueing zone
Left part of the image is intentionally blurred
for protecting the privacy of by-standers,
who were not part of the experimental setup.stereo sensor
Scene-aware heatmap
51
End-to-end video text recognition
2D
Overview
INPUT
Presence (y/n)
OUTPUT
TextDetection Localization Propagation SegmentationRecognition,Propagation
▪ The End-to-End Video Recognition Process
Location (single frames)
(x, y, w, h)
Location (frame span)
(x, y, w, h)
Binary image regions
Text (e.g. in ASCII)
Characterizing dynamic elements: running text
Evaluation: High accuracy at each stage is necessary
Very high recall throughout the chain
Increasing Precision toward the end of the chain
Algorithmic chain - Motivation
Main strategies for text detection:
What is text (when appearing in images)?:
An oriented sequence of characters in close proximity, obeying a certain regularity
(spatial offset, character type, color).Sample text region + complex background
(a)
(b)
Improved text detection – synthetic text generation (Classification using Aggregated Channel Features)
Convolutional Neural Network based OCR - TrainingGenerated single characters (0-9, A-Z, a-z): include spatial jitter, font variations
6000 „0“
6000 „A“
6000 „Z“
▪ role of jitter: characters can be recognized despite an offset at detection time
56
Convolutional Neural Network based OCR - ResultsAnalysis window is scanned along the textline, and likelihood ration (score1/score2) is
plotted in the row (below textline) belonging to the maximum classification score.
Optimum partitioning
5714.07.2019
Our development concept
Implementation details
MATLAB
&
PYTHON
C/C++
Method, Prototype,
Demonstrator
Real-time
prototype
▪ MATLAB / PYTHON:
▪ Broad spectrum of algorithmic libraries,
▪ Well-suited for image analysis,
▪ Visualisation, debugging,
▪ Rapid development →Method, Prototype, Demonstrator
▪ C/C++
▪ Real-time capability
Porting
mex
shared library
Computationally
intensive
methods
Verification
Matlab engine
LEARNED REPRESENTATIONS OF HIGH
DISCRIMINATIVE POWER
6014.07.2019
MODEL-BASED VISION - USE CASERelevance:
1. DeepLearning: coarse pose estimation (coarse initialization/re-initialization)
2. Edge based model tracking (continuous fine 6DoF pose estimation)
Pose regressor
trained from
synthetic data:
THANK YOU!
CSABA BELEZNAI
Senior Scientist
Center for Vision, Automation & Control
Autonomous Systems
AIT Austrian Institute of Technology GmbH
Giefinggasse 4 | 1210 Vienna | Austria
T +43(0) 664 825 1257 | F +43(0) 50550-4170
[email protected] | http://www.ait.ac.at