Computer Vision...Computer Vision From traditional approaches to deep neural networks Stanislav...

Computer Vision

From traditional approaches to deep neural networks

Stanislav Frolov München, 27.02.2018

● Computer vision● Human vision● Traditional approaches and methods● Artificial neural networks● Summary

Outline of this talkWhat we are going to talk about

● trained deep neural networks for object detection during master thesis

● still fascinated and interested

Stanislav Frolov

Big Data Engineer @inovex

● Teach computers how to see● Automatic extraction, analysis and understanding of

images● Infer useful information, interpret and make decisions● Automate tasks that human visual system can do● One of the most exciting fields in AI and ML

What is computer visionGeneral

What is computer visionMotivation

● Era of pixels● Internet consists

mostly of images● Explosion of visual

data● Cannot be labeled

by humans

What is computer visionDrivers

● Two drivers for computer vision explosion○ Compute (faster and cheaper)○ Data (more data > algorithms)

What is computer visionInterdisciplinary field

Computer Science

Mathematics

Engineering

Physics

BiologyPsychology

Information Retrieval

Machine LearningGraphs,

Algorithms

Systems Architecture

Robotics

Speech, NLP

Image Processing

OpticsSolid-State Physics

Neuroscience

Cognitive SciencesBiological vision

Synonyms?

● Imaging for statistical pattern recognition● Image transformations such as pixel-by-pixel operations

○ Contrast enhancement○ Edge extraction○ Noise reduction○ Geometrical and spatial operations (i.e rotations)

What is computer visionRelated fields - image processing

● Creates new images from scene descriptions● Produces image data from 3D models● “Inverse” of computer vision● AR as a combination of both

What is computer visionRelated fields - computer graphics

● Mainly manufacturing applications● Image-based automatic inspection, process control,

robot guidance● Usually employs strong assumptions (colour, shape,

light, structure, orientation, ...) -> works very well● Output often pass/fail or good/bad● Additionally numerical/measurement data, counts

What is computer visionRelated fields - machine vision

● Create “intelligent” systems● Studying computational aspects of intelligence● Make computers do things at which, at the moment,

people are better● Many techniques play an important role (ML, ANNs)● Currently does a few things better/faster at scale than

humans can● Ability to do anything “human” is not answered

What is computer visionRelated fields - AI

● Related fields have a large intersection● Basic techniques used, developed and studied are very

similar

What is computer visionRelated fields- summary

Short trip to human vision

● Two stage process○ Eyes take in light reflected off the objects and retina

converts 3D objects into 2D images○ Brain’s visual system interprets 2D images and “rebuilds”

a 3D model

What is human visionGeneral

● Pair of 2D images with slightly different view allows to infer depth

● Position of nearby objects will vary more across the two images than the position of more distant objects

What is human visionStereoscopic vision

● Prior knowledge of relative sizes and depths is often key for understanding and interpretation

What is human visionPrior knowledge

● Texture and texture change helps solving depth perception

What is human visionTexture pattern

What is human visionBiases and illusions in human perception

● Shadows make all the difference in interpretation● Gradual changes in light ignored to not be misled by

shadow

What is human visionA few more illusions

● Two arrows with different orientations have the same length

● Assumptions and familiarity (distorted room)● Face recognition bias● Up-down orientation bias

What is human visionBiases and illusions in human perception

What is human visionSummary

● Illusions are fun, but the complete puzzle to understand human vision is far from being complete

Back to computer vision

● Recognition● Localization● Detection● Segmentation

What is computer visionTypical tasks

● Part-based detection○ Deformable parts model○ Pose estimation and poselets

● Image captioning (actions, attributes)

● Motion analysis○ Egomotion (camera)○ Optical flow (pixels)

● Scene understanding and reconstruction

● Image restoration● Colouring black & white photos

Solving this is useful for many applications

What is computer visionTypical applications

● Assistance systems for cars and people● Surveillance● Navigation (obstacle avoidance, road following, path

planning)● Photo interpretation● Military (“smart” weapons)● Manufacturing (inspection, identification)● Robotics● Autonomous vehicles (dangerous zones)

What is computer visionTypical applications

● Recognition and tracking● Event detection● Interaction (man-machine interfaces)● Modeling (medical, manufacturing, training, education)● Organizing (database index, sorting/clustering)● Fingerprint and biometrics● …

Why so difficult?

What is computer visionWhy it is difficult

● Occlusion● Deformation● Scale● Clutter● Illumination● Viewpoint● Object pose

● Tons of classes and variants

● Often n:1 mapping● Computationally

expensive● Full understanding of

biological vision is missing

System overview

● Input: image(s) + labels● Output: Semantic data, labels

● Digital image pixels usually have three channels [R,G,B] each [0...255] + Location[x,y]

● Digital images are just vectors

What is computer visionSystem overview

1. Image acquisition (camera, sensors)2. Pre-processing (sampling, noise reduction,

augmentation)3. Feature extraction (lines, edges, regions, points)4. Detection and segmentation5. Post-processing (verification, estimation, recognition)6. Decision making● -> Ability of a machine to step back and interpret the big

picture of those pixels37

What is computer visionSystem overview

Some history

● 2D imaging for statistical pattern recognition● Theory of optical flow based on a fixed point

towards which one moves

What is computer visionHistory

Image processing

● Histograms● Filtering● Stitching● Thresholding● ...

What is computer visionTraditional approaches

● Desire to extract 3D structure from 2D images for scene understanding

● Began at pioneering AI universities to mimic human visual system as stepping stone for intelligent robots

● Summer vision project at MIT: attach camera to computer and having it “describe what it saw”

● Given to 10 undergraduate students● … an attempt to use our summer workers effectively … ● … construction of a significant part of a visual system … ● … task can be segmented into sub-problems … ● … participate in the construction of a system complex

enough to be a real landmark in the development of “pattern recognition” …

What is computer visionHistory: summer vision project @MIT 1966

● Goal: analyse scenes and identify objects● Structure of system:

○ Region proposal○ Property lists for regions○ Boundary construction○ Match with properties○ Segment

● Basic foreground/background segmentation with simple objects (cubes, cylinders, ….)

● Unlike general intelligence, computer vision seemed tractable

● Amusing anecdote, but it did never aimed to “solve” computer vision

● Computer vision today differs from what it was thought to be in 1966

● Formed many algorithms that exist today● Edges, lines and objects as interconnected

structures

What is computer visionTraditional approaches

Edge detection based on

● Brightness● Gradients● Geometry● Illumination

What is computer visionTraditional approaches - part based detector

● Objects composed of features of parts and their spatial relationship

● Challenge: how to define and combine

● More rigorous mathematical analysis and quantitative aspects

● Optical character recognition● Sliding window approaches● Usage of artificial neural networks

What is computer visionTraditional approaches - HOG detection (histogram of oriented gradients)

● Concept in 80s but used only in 2005● Create HOG descriptors (object generalizations)● One feature vector per object● Train with SVM● Sliding window @multiple scales

What is computer visionTraditional approaches - HOG detection (histogram of oriented gradients)

● Computation of HOG descriptors:

1. Compute gradients2. Compute histograms on cells3. Normalize histograms4. Concatenate histograms

● Requires a lot of engineering● Must build ensembles of feature descriptors

● Significant interaction with computer graphics (rendering, morphing, stitching)

● Approaches using statistical learning● Eigenface (Ghostfaces) through principal component

analysis (PCA)

What is computer visionTraditional approaches - deformable parts model (DPM)

● Objects constructed by its parts● First match whole object, then refine on the parts● HOG + part-based + modern features ● Slow but good at difficult objects● Involves many heuristics

What is computer visionFeatures

● Feature points○ Small area of pixels with certain properties

● Feature detection○ Use features for identification○ Activate if “object” present

● Examples:○ Lines, edges, colours, blobs, …○ Animals, faces, cars, ...

What is computer visionTraditional approaches - classical recognition

● Init: extract features for objects in different scales, colours, orientations, rotations, occlusion levels

● Inference: extract features from query image and find closest match in database or train a classifier

● Computationally expensive (hundreds of features in image, millions in database) and complex due to errors and mismatches

Before the new era

● Bags of features● Handcrafted ensembles

Input Feat. 2

Feat. 1

Feat. n

FinalDecision

Feature Extraction

The new era of computer vision

● Elementary building block

● Inspired by biological neurons

● Mathematical function y=f(wx+b)

● Learnable weights

Artificial neural networksFundamentals - artificial neuron

● Collection of neurons organized in layers

● Universal approximators

● Fully-connected network here

Artificial neural networksFundamentals - artificial neural networks

Artificial neural networksFundamentals - training

● Basically an optimization problem

● Find minimum of a loss function by an iterative process (training)

● Designing the loss function is sometimes tricky

Artificial neural networksFundamentals - training

Simple optimizer algorithm:

1. Forward pass with a batch of data2. Calculate error between actual and wanted output3. Nudge weights in proportion to error into the right

direction (same data would result in smaller error)4. Repeat until convergence

Artificial neural networksFundamentals - CNN

● Local neighborhood contributes to activation

● Exploit spatial information

● Hierarchical feature extractors

● Less parameters input

activation

filters

receptive field

Artificial neural networksFundamentals - CNN

● Filter of size 3x3 applied to an input of 7x7

Artificial neural networksFundamentals - pooling

● Max-pooling● Dimension reduction/adaption● Existence is more important than location

Artificial neural networksFundamentals - pooling

● Zero-padding● Controlling dimensions

Artificial neural networksFundamentals - general network architecture

Input image

convolutional layers

... Final decision

Artificial neural networksFundamentals - hierarchical feature extractors

Lines, edges, blobs, colours, ...

Abstract objectsParts of abstract objects

First layers Deeper layers

Activations for:

Modern history of object recognition

● Classification and detection○ 27k images○ 20 classes

■ person, bird, cat, cow, dog, horse, sheep, aeroplane, bicycle, boat, bus, car, motorbike, train, bottle, chair, dining table, potted plant, sofa, tv/ monitor

BenchmarkDatasets - PASCAL VOC

● Challenges on a subset of ImageNet○ 14kk labeled images○ 20k object categories

● ILSVRC* usually on 10k categories including 90 out of 120 dog breeds

BenchmarkDatasets - ImageNet

*ImageNet Large Scale Visual Recognition Challenge

● ILSVRC 2012 winner by a large margin from 25% to 16%● Proved effectiveness of CNNs and kicked of a new era● 8 layers, 650k neurons, 60kk parameters

Artificial neural networksRoadmap - AlexNet

● ILSVRC 2013 winner with a best top-5 error of 11.6%● AlexNet but using smaller 7x7 kernels to keep more

information in deeper layers

Artificial neural networksRoadmap - ZFNet

● ILSVRC 2013 localization winner● Uses AlexNet on multi-scale input images with sliding

window approach● Accumulates bounding boxes for final detection (instead

of non-max suppression)

Artificial neural networksRoadmap - OverFeat

● 2k proposals generated by selective search● SVM trained for classification● Multi-stage pipeline

Artificial neural networksRoadmap - RCNN (region based CNN)

● Not a winner but famous due to simplicity and effectiveness

● Replace large-kernel convolutions by stacking several small-kernel convolutions

Artificial neural networksRoadmap - VGGNet

● ILSVRC 2014 winner● Stacks up “inception” modules● 22 layers, 5kk parameters

Artificial neural networksRoadmap - InceptionNet (GoogleNet)

● Jointly learns region proposal and detection● Employs a region of interest (RoI) that allows to reuse

the computations

Artificial neural networksRoadmap - Fast RCNN

● Directly predicts all objects and classes in one shot● Very fast● Processes images at ~40 FPS on a Titan X GPU● First real-time state-of-the-art detector● Divides input images into multiple grid cells which are

then classified

Artificial neural networksRoadmap - YOLO (you only look once)

● ILSVRC 2015 winner with a 3.6% error rate (human performance is 5-10%)

● Employs residual blocks which allows to build deep networks (hundreds of layers)

● Additional identity mapping

Artificial neural networksRoadmap - ResNet (Microsoft)

● Not a recognition network● A region proposal network● Popularized prior/anchor boxes (found through

clustering) to predict offsets● Much better strategy than starting the predictions with

random coordinates● Since then heuristic approaches have been gradually

fading out and replaced

Artificial neural networksRoadmap - MultiBox

● Fast RCNN with heuristic region proposal replaced by region proposal network (RPN) inspired by MultiBox

● RPN shares full-image convolutional features with the detection network (cost-free region proposal)

● RPN uses “attention” mechanism to tell where to look● ~5 FPS on a Titan K40 GPU● End-to-end training

Artificial neural networksRoadmap - Faster RCNN

● SSD leverages the Faster RCNN’s RPN to directly classify objects inside each prior box (similar to YOLO)

● Predicts category scores and box offsets for a fixed set of default bounding boxes

● Fixes the predefined grid cells used in YOLO by using multiple aspect ratios

● Produces predictions of different scales● ~59 FPS

Artificial neural networksRoadmap - SSD (single shot multibox detector)

● Open-source software library for machine learning applications

● Tensorflow Object Detection API○ A collection of pretrained models○ construct, train and deploy object detection models

Artificial neural networksTensorFlow object detection API

Summary

● Humans are good at understanding the big picture● Neural networks are good at details● But they can be fooled...

SummaryHuman vs machine

● Need a large amount data● Lots of engineering● Trial and error● Long training time● Still lots of hyperparameter parameter tuning● No general network (generalization not answered)● Little mathematical foundation

SummaryComputer vision is still difficult

● Despite all of these advances, the dream of having a computer interpret an image at the same level as a human remains unrealized

SummaryComputer vision is hard

Thank You

Stanislav Frolov

Big Data Engineer

sfrolov@inovex.de

0173 318 11 35

inovex GmbH

Lindberghstraße 3

80939 München

Computer Vision...Computer Vision From traditional approaches to deep neural networks Stanislav...

Documents

Transcript of Computer Vision...Computer Vision From traditional approaches to deep neural networks Stanislav...

The manifesto of post-institutionalism: institutional ... · The manifesto of post-institutionalism: institutional complexity research agenda D. P. Frolov Volgograd State Technical

Workshop Summaryint09.aei.mpg.de/talks/Arutyunov.pdf · Sergey Frolov: string hypothesis =⇒ TBA equations =⇒ Y-system and its analytic properties =⇒ Ground state Volodya Kazakov:

Presentatie YE 2017 Pers en analisten v 27.02.2018 · 2018-02-28 · ** Belgium ‐ New platform Cinedata operational, but not all operators are participating yet **** Complex and

Metals2013 - ФОРУМlom.rusmet.ru/photo-2013/rusmet-lom-2013-02_catalog-en.pdf · Andrey Moiseenko, ... Sergey Frolov, Director for Investors ... Galina Tsutskareva, Editor-in-Chief,

Interface Documentation - IHE-Europe · EPR – Central Services 2/43 Document history Version Date Changes Authors 1.0.21 27.02.2018 Release version FOITT, Louis Bernath, Reto

Calabar’s Micheal Stephens (right) clears the final corner ...youthlinkjamaica.com/sites/default/files/guides/CXC_20180227.pdf · 27.02.2018 · Calabar’s Micheal Stephens (right)

AIM Industrial Growth Freehold and Leasehold Real Estate ...aimirt.listedcompany.com/misc/fs/20180227-aimirt-fs-fy20… · 27.02.2018 · My opinion on the financial statements does

Module MA1132 (Frolov), Advanced Calculus Homework Sheet 7frolovs/AdvCalculus/PS7_sol.pdf · Module MA1132 (Frolov), Advanced Calculus Homework Sheet 7 Each set of homework questions

FreeEnergy - Frolov Alexandr

Alexander Frolov - Fuel-less Thrust

GIS-Based Technology of Ship Routing in Ice Igor Stepanov, Sergey Frolov, Sergey Klyachkin, Yury Scherbakov (Arctic & Antarctic Research Institute, St.

1 CA IB / Aton “Emerging Europe” Conference Istanbul, Turkey Pyaterochka Holding N.V. Gennady Frolov, Director of Communications October 6, 2006 1.

G1100445-v3 Integration Planning April 25, 2011 Valera Frolov, Daniel Sigg, Peter Fritschel NSF Review, LLO.

Forecasting Eddy Ulysses Patrice D. Coholan and Steven P. Anderson Horizon Marine, Inc. (Marion, MA) Sergei Frolov Accurate Environmental Forecasting,

MODEL STUDIES OF BLOOD FLOW IN BASILAR ARTERY WITH 3D LASER DOPPLER ANEMOMETER Biomedical Engineering Sergey Frolov, Tambov State Technical University,

Trajectories 05.11.16. Bond, Contaldi, Frolov, Kofman, Souradeep, Vaudrevange 05.

РЕФЕРАТ - plunix.ruplunix.ru/f/viktor-markelovich-frolov/simptomaticheskoe-vospalenie.pdf4 Mackenzie в качестве возможного пути для передачи

Automatically generated PDF from existing images.jawel.me/wp-content/uploads/2018/09/Budimir-Vorotovic-licenca-za... · Aktom, br.UP1107/7-871/1 od 27.02.2018.godine, VOROTOVIÓ BUDIMIR,

Crosswords by Dmitry Frolov

GDPR Workshop QAs - redline (27.02.2018) if you want to refresh consents and send out the consent form and do not ... GDPR Workshop QAs - redline (27.02.2018) Author: Julie Churchyard

Presentatie YE 2017 Pers en analisten v 27.02.2018 · 2018-02-28 · Belgium ‐ New platform Cinedata operational, but not all operators are participating yet ** Complex and