CONCEPTS, ALGORITHMS & PRACTICAL APPLICATIONS IN 2D …€¦ · Senior Scientist Markus Hofstätter...

CONCEPTS, ALGORITHMS & PRACTICAL

APPLICATIONS IN 2D AND 3D COMPUTER

VISION

Michael Rauter, Christian Zinner, Andreas Zweng,

Andreas Zoufal, Julia Simon, Daniel Steininger,

Markus Hofstätter und Andreas KriechbaumSenior Scientist

Center for Vision, Automation & Control

Autonomous Systems

AIT Austrian Institute of Technology GmbH

Vienna, Austria

Csaba Beleznai

2

GRAND

CHALLENGES

RECOGNITION

SEGMENTATION

RECONSTRUCTION

TIME GRAND

CHALLENGESTIME

▪ Research is evolution → so is your learning process

▪ Balance: becoming a domain expert vs. being a „globalist“

▪ Researchers tend to favour certain paradigms - Learn to outline trends, look upstream

▪ Revisit old problems to see them under new light

▪ Specialize the general & Generalize the specific

▪ Factorize your know-how (code, topics, …) into components → sustainable, scalable

MOTIVATION

VISUAL OBJECT RECOGNITION TRENDS

Human-level performance

2012 time

Accuracy

2019

2012 time

Computational

costs

/for real-time/

2019

CPU

DEDICATED

COMP.HW. COMPUTATIONAL

BARRIER

2012 time

Amount of image

data (for training)

2019

IMAGE DATA

BARRIER

Nuclear

Engineering

Seibersdorf

GmbH

Seibersdorf

Labor GmbH

AIT AUSTRIAN INSTITUTE OF TECHNOLOGY

AIT Austrian Institute of Technology

EnergyHealth &

Bioresources

Digital Safety &

Security

Vision, Automation &

Control

AIT Austrian Institute of Technology

Mobility SystemsLow-Emission

Transport

Technology

Experience

Innovation Systems &

Policy

4

Federal Ministry for Transport,

Innovation and Technology

50,46%

Federation of Austrian

Industries

49,54%

1300+ employees

Budget: 140 Mio €

Business Model: 40:30:30

Robust and flexible

3D vision technology

VISION, AUTOMATION & CONTROL

High-Performance

Vision

3D Vision and

Modeling

Complex Dynamical

Systems

Worldwide fastest vision

sensor technology Advanced handling and

smart production

F r o m S e n s o r T o D e c i s i o n 5

AIT AUTONOMOUS & ASSISTIVE SYSTEMS

Driver Assistance

System for Trams

Assistance Systems for

Construction Machines

Driverless Missions in Crisis

& Disaster Management

Autonomous

Local Railway

Autonomous

Bus

7

ENABLING METHODOLOGIES FOR

ASSISTED OR SELF-DRIVING

Mobile platforms: sensory signals + local context (situation) → decisions

recognition

localization

motion analysis

vehicle control

objects (type, location, pose)

environment

sensor/data fusion

positioning

mapping

ego-motion computation

sensor fusion

object tracking

state prediction

probability

based behavior

elements

dynamic model

computation

vehicle model

vehicle control

safety

compliance

Deep learning based

detection & segmentationLocalization,

map building

Sparse motion

estimation, tracking

Vision algorithms testing

RELATED

KNOW-HOW

INTELLIGENT PERCEPTION FOR MOBILE MACHINES

AUTONOMOUS OFFROAD VEHICLE

1014.07.2019

A frequently asked question

Introduction

11

Example: Crop detection

▪ Radial symmetry

▪ Near regular structure

Example for robust vision

IDEA

branch & bound

research methodology

PRODUCTAPPLICATION

RESEARCH DEVELOPMENT

Alg. A

Alg. B

Alg. C

MATLAB C++

Motivation

Introduction

▪ Challenges when developing Vision Systems:

▪ Complexity Algorithmic, Systemic, Data

▪ Non-linear search for a solution

13

Real-time optical flow based particle advection

for object detection and tracking

2D

1414.07.2019

MOTIVATION – I.

OBJECT DETECTION PIPELINES

Spatial distribution of

posterior probability

Score map (DPM, R-CNN, …)

Vote map

Occupancy map

back-projected similarity map

…

R-CNN: Region-based Convolutional Neural Networks

DPM: Deformable Part Models

Delineated objects

Bounding boxes

Instance segmentation

More complex

parametric

representations

weakly constrained structural prior

Non-maximum suppressionNeubeck & Van Gool, 2006Rothe et al., 2014

Leibe et al. 2005

Comaniciu & Meer, 2002 Bradski 1998

Dollar & Zitnick 2013 Kontschieder et al. 2011

Leibe et al. 2005

Mean Shift, CAMShift

Center-surround filter

MeanShift and CamShift iterations

Implicit Shape Model

Structured random forests

RELATED STATE-OF-THE-ART

▪ Clustering detections

▪ Detection by voting/segmentation/learningimplicit or explicit structural prior

Kontschieder et al. 2011

Markov Point Processes for object configurationsVerdie, 2014

CNN‘s for Non-Max. SuppressionHosang et al. 2016, Wan et al. 2015

Hosang et al. 2016

Optical flow driven advection

16

ti ti+1

Dense optical flow field

Advection: transport mechanism induced by a force field

Vx,i

Vy,iA particle trajectory

induced by the OF field

Particle advection with FW-BW consistency

▪ A simple but powerful test

Forward:

Backward:

Successful

Failure

< x x : mean offsetConsistency check:

Pedestrian Flow Analysis

Public dataset: Grand Central Station, NYC: 720x480 pixels, 2000 particles, runs at 35 fps

SHAPE-GUIDED TRACKLET GROUPING

Optical flow driven particle tracklets

𝑇𝑖 = 𝑥𝑡, 𝑦𝑡 𝑡=1..𝑁, 𝒗, 𝑤The ith tracklet:

Clustering directly performed in the

discrete tracklet-domain

STEP 1: sampling STEP 2: weight generation

from orientation similarity

(w.r.t. center tracklet)

STEP 3: local shape, scale

and center estimation

Estimated cluster parameters

+ mode location

STEP 4: find nearest

tracklet to mode estimate

𝒗 – velocity vector

𝑤 – weight (scalar)

repeat from STEP 1 until

convergenceSingle parameter:

W – initial scale

W

FOR COMPACT OBJECTS

20

STEREO DEPTH INFORMATION

CHARACTERISTICS AND USAGE

3D

PASSIVE STEREO BASED DEPTH MEASUREMENT

• Depth ordering of people

• Robustness against illumination,

shadows,

• Enables scene analysis

Advantage:

▪ 3D stereo-camera system developed by AIT

▪ Area-based, local-optimizing, correlation-

based stereo matching algorithm

▪ Specialized variant of the Census Transform

▪ Resolution: typically ~1 Mpixel

▪ Run-time: ~ 14 fps (Core-i7, multithreaded, SSE-optimized)

▪ Excellent “depth-quality-vs.-computational-costs” ratio

▪ USB 2 interface

STEREO CAMERA CHARACTERISTICS

22

Trinocular setup:

▪ 3 baselines possible

▪ 3 stereo computations with

results fused into one disparity

image

large baseline

near-range

far-range

small medium

23

Planar surface in 3D space

(x,y) image coordinates, d disparity

d(x,y)

Data characteristics

Intensity image Disparity image

y

d

y

Height (world)

Ground plane (world)

correct measurementnoisy measurement

Stereo setup

Computed top view of the 3D point cloud

2.5D approach

3D approach

2.5D vs. 3D algorithmic approaches

Additional knowledge (compared to existing video analytics solutions):

• Stationary object (Geometry introduced to a scene)

• Object geometric properties (Volume, Size)

• Spatial location (on the ground)

LEFT ITEM DETECTION

METHODOLOGY

Background

model

Stereo disparity Combination

of proposals

+Validation

Final

candidatesInput images

Ground plane

estimation

Change detectionINTENSITY

DEPTH

Processing intensity and depth data

Ortho-map

generation

Object

detection and

validation in

the ortho-

map

Ortho-transform

2714.07.2019

Left Item Detection – Demos

28

Clustering in discrete two-dimensional distributions

2914.07.2019(a)

Object detection as clustering

A Frequently Occurring Task

Analysis of discrete two-dimensional distributions

…… LEARNED

CODEBOOK

3114.07.2019

EXAMPLES

▪ Description of the Binary-Shape-driven 2D clustering

• Shape learning

• Shape clustering, delineation

▪ Results

• Occupancy map clustering

• Text line delineation

• Object delineation by shape-guided

tracklet grouping

3232

Intermediate probabilistic representations 2D distributions

prior, structure-specific knowledge

Local grouping generate consistent

object hypotheses

Challenge:

▪ arbitrarily shaped distributions

▪ multiple nearby modes

▪ noise, clutter

TASK DEFINITION

Definitions:

mode = location of maximum density

computed using a kernel K

density estimation of variable x

𝑓 𝑥 =

𝑎

𝐾 𝑎 − 𝑥 𝑤(𝑎)

Shape learning – Case: Compact clusters

2. Sampling using an analysis window

discretized into a ni×ni grid

1. Binary mask from manual annotation or

from synthetic data

M

Off-the-mode

samples

Mode-centered

samples

Spatial resolution of local structure

3. Building a codebook of binary shapes

with a coarse-to-fine spatial resolution

Codebook:

Shape learning

3414.07.2019

Example Codebook – Case: Compact clusters

Shape learning – Case: Line structures

Spatial resolution of local structure

low mid highBinary mask from manually annotated

text lines

Codebook:

Shape learning

Shape delineation – I.

Shape delineation

Density measure for each resolution level for the binary structure

Step 1: Fast Mode Seeking

Step 2: Local density analysis

Three integral images: and

Mode location:

COMPACT CLUSTERS LINE STRUCTURES

Enumerating all binary shapes at each resolution level

→ Finding best matching entry:

Shape delineation – II.

Shape delineation

Recursive search for end points, starting from

mode locations:

Line-centered structuresOff-the-line

structures

… …

END POINT

CANDIDATES

VOTE MAP

Relative line end locations define:- Search direction

- Line end positions

Experimental results - Case: Compact clusters

Human detection by occupancy map clustering:

13 m

Passive stereo depth sensing → depth data projected orthogonal to the

ground plane

Occupancy map (1246×728 pix.) clustering: 56 fps, overall system (incl. stereo computation): 6 fps

Experimental results - Case: Compact clusters

Input image

Probability distribution for text

Binarization is very sensitive to employed threshold

Our scheme has no threshold, only local structural priors

Simple binarization

Proposed scheme

Experimental results - Case: Line structures (Text line segmentation)

Experimental results - Case: Text line segmentation

42

Queue length detection using depth and

intensity information

2D + 3D

Queue Length + Waiting Time estimation

What is waiting time in a queue?

Ch

eck

po

int

Waiting time

Time measurement relating to last

passenger in the queue

Example: Announcement of waiting times (App) → customer satisfaction

Why interesting?

Example: Infrastructure operator → load balancing

Queue analysis

▪ Challenging problem

Waiting time =

▪ Shape

▪ No predefined shape (context/situation-dependent and time-varying)

▪ Motion→ not a pure translational pattern

▪ Propagating stop-and-go behaviour with a noisy „background“

▪ Signal-to-noise ratio depends on the observation distance

Length

Velocity

1. What is the shape and extent of the queue?

2. What is the velocity of the propagation?

DEFINITION: Collective goal-oriented motion pattern of multiple humans

exhibiting spatial and temporal coherence

simple complex

▪ How can we detect (weak) correlation?

▪ Much data is necessary → Simulating crowding phenomena in Matlab

▪ Social force model (Helbing 1998)

Source: Parameswaran et al. Design and Validation of a

System for People Queue Statistics Estimation, Video

Analytics for Business Intelligence, 2012

t

x

y

Correlation in space and time

goal-driven kinematics – force field repulsion by walls repulsion by „preserving privacy “

Visual queue analysis - Overview

46

Simulation tool → Creating infinite number of possible queueing zones

Two simulated examples (time-accelerated view):

Queue analysis

Queue analysis (length, dynamics)

Staged scenarios, 1280x1024 pixels, computational speed: 6 fps

Straight line Meander style

▪ Pairwise spatio-temporal

correlation analysis

▪ Correlation → data weights

▪ Mode seeking and tracking

▪ Queue delineation posed as an

open Traveling Salesman

Problem (oTSP) with a fixed start

▪ Queue forward velocity

estimation using a deformable

elastically connected chain

Estimated configuration

(top-view) Detection results

Adaptive estimation of the spatial extent of the queueing zone

Left part of the image is intentionally blurred

for protecting the privacy of by-standers,

who were not part of the experimental setup.stereo sensor

Scene-aware heatmap

51

End-to-end video text recognition

2D

Overview

INPUT

Presence (y/n)

OUTPUT

TextDetection Localization Propagation SegmentationRecognition,Propagation

▪ The End-to-End Video Recognition Process

Location (single frames)

(x, y, w, h)

Location (frame span)

(x, y, w, h)

Binary image regions

Text (e.g. in ASCII)

Characterizing dynamic elements: running text

Evaluation: High accuracy at each stage is necessary

Very high recall throughout the chain

Increasing Precision toward the end of the chain

Algorithmic chain - Motivation

Main strategies for text detection:

What is text (when appearing in images)?:

An oriented sequence of characters in close proximity, obeying a certain regularity

(spatial offset, character type, color).Sample text region + complex background

(a)

(b)

Improved text detection – synthetic text generation (Classification using Aggregated Channel Features)

Convolutional Neural Network based OCR - TrainingGenerated single characters (0-9, A-Z, a-z): include spatial jitter, font variations

6000 „0“

6000 „A“

6000 „Z“

▪ role of jitter: characters can be recognized despite an offset at detection time

56

Convolutional Neural Network based OCR - ResultsAnalysis window is scanned along the textline, and likelihood ration (score1/score2) is

plotted in the row (below textline) belonging to the maximum classification score.

Optimum partitioning

5714.07.2019

Our development concept

Implementation details

MATLAB

&

PYTHON

C/C++

Method, Prototype,

Demonstrator

Real-time

prototype

▪ MATLAB / PYTHON:

▪ Broad spectrum of algorithmic libraries,

▪ Well-suited for image analysis,

▪ Visualisation, debugging,

▪ Rapid development →Method, Prototype, Demonstrator

▪ C/C++

▪ Real-time capability

Porting

mex

shared library

Computationally

intensive

methods

Verification

Matlab engine

LEARNED REPRESENTATIONS OF HIGH

DISCRIMINATIVE POWER

6014.07.2019

MODEL-BASED VISION - USE CASERelevance:

1. DeepLearning: coarse pose estimation (coarse initialization/re-initialization)

2. Edge based model tracking (continuous fine 6DoF pose estimation)

Pose regressor

trained from

synthetic data:

THANK YOU!

CSABA BELEZNAI

Senior Scientist

Center for Vision, Automation & Control

Autonomous Systems

AIT Austrian Institute of Technology GmbH

Giefinggasse 4 | 1210 Vienna | Austria

T +43(0) 664 825 1257 | F +43(0) 50550-4170

[email protected] | http://www.ait.ac.at

mailto:[email protected]

http://www.ait.ac.at/

CONCEPTS, ALGORITHMS & PRACTICAL APPLICATIONS IN 2D …€¦ · Senior Scientist Markus Hofstätter...

Documents

Transcript of CONCEPTS, ALGORITHMS & PRACTICAL APPLICATIONS IN 2D …€¦ · Senior Scientist Markus Hofstätter...