Image Understanding

39
Image Understanding A Crash Course for Robot Racers 17 January 2013

description

Image Understanding. A Crash Course for Robot Racers 17 January 2013. Terminology. Image processing Input is image, output is image Goal: make image look better to human viewer Machine vision Input is image, output is information about content Goal: determine what is in image - PowerPoint PPT Presentation

Transcript of Image Understanding

Page 1: Image Understanding

Image Understanding

A Crash Course for Robot Racers17 January 2013

Page 2: Image Understanding

2

Terminology

Image processing Input is image, output is image Goal: make image look better to human viewer

Machine vision Input is image, output is information about content Goal: determine what is in image AKA computer vision, image understanding

Our task is machine vision, not image processing

Page 3: Image Understanding

3

Typical vision system organization

FeatureMeasurement

PatternClassifier

Rawdata

Featurevector

Classidentity

Noiseremoval

Rawdata

FeaturesSegmentationConsistency

AnalysisMatching

ShapeAnalysis

Possible block contents

Page 4: Image Understanding

4

Identifying/evaluating objects Critical in many applications

Inspection in industrial setting Automatic target recognition

Designer knows a priori what to look for Feature set is application specific Environment often simplified for robotic applications

Limited set of visually distinctive objects Example: vertical pylons in racers (2008-2009)

A “general recognizer” is far more difficult Consider Google’s self-driving cars

Page 5: Image Understanding

5

Typical building blocks

Common image operators can be found in MATLAB, OpenCV, similar libraries

Can they help us? Real-time operation critical for our application Not ported to our platform

Developing a vision system Find effective algorithm; use whatever is convenient Implement simple C version from scratch, verify Move to hardware if necessary

Page 6: Image Understanding

Follow the data: source

Sensor (MT9V024) captures Bayer RGB Global shutter Frame rate: 60 fps Active array 752 x 480 (10 bits/pixel)

Camera gives you image byte at a time in: Bayer, or YCbCr/YUV, or RGB (546,555,444) Non-Bayer formats are interpolated

Camera offers unexplored capabilities We’re hoping to get data sheet – proprietary

6

Page 7: Image Understanding

Follow the data: camera interface VHDL core

Performs required handshaking with camera Buffers pixels, performs desired processing Writes resulting pixel values to memory Informs CPU (via interrupt) that new image is available

7

Page 8: Image Understanding

Follow the data: format choice Typical approach of previous teams

Let camera do Bayer-to-RGB conversion Get RGB from camera

Latest technique Get full Bayer from camera Do conversion to RGB in VHDL Results:

Greater color depth (8 bits/channel) Better color discrimination in vision algorithms

8

Page 9: Image Understanding

Follow the data: software

Execution triggered by interrupt Only real constraint on your processing is time Need to finish processing before next image appears Clever optimizations can help to speed processing

9

Page 10: Image Understanding

10

Noise: anticipate it

Actual images from camera

Probably more extreme than you will experience

Kalman filter, anyone?

Page 11: Image Understanding

11

Our (visual) simplifications

Only objects we must consider: Trucks Base stations Landmarks Obstacles

Both have light towers

Will have distinctive appearance

Page 12: Image Understanding

12

A simple approach: segmentation Definition:

Segmentation is partitioning an image into connected, homogenous regions

Example: Isolating dark objects on tan convey belt for inspection Easy to separate light and dark with consistent lighting

For us, segments might be Lights on towers Obstacles Navigation markers

Page 13: Image Understanding

Color segmentation: find red pixels

13

In RGB, requires 3 test per pixel Is red channel in range? Is green? Is blue?

Observation: 3D nature of RGB adds complexity Easier with gray-scale images

Page 14: Image Understanding

14

It gets worse…

Teams used segmentation to find pylons, BUT Brightness and color changed with ambient lighting,

view angle, camera settings, etc. For light towers, appearance is more consistent

because of LEDs, BUT We’ll want to see things (landmarks, obstacles) that

won’t have LEDs We probably can’t rely on segmentation alone

Page 15: Image Understanding

Making segmentation fast

Method 1: Process band of pixels near center of image Process other rows only if candidate region identified Rationale: Location of towers in images will be consistent

Method 2: For each target, process all pixels & produce binary image Sum each row of pixels, each column of pixels: find high values Rationale: Tower lights will appear as rectangles in image

15

Page 16: Image Understanding

16

Reducing dimensionality

Segmentation in RGB is inherently 3D What can we do to reduce the 3 tests per pixel?

Solution: use a different color space: Consider HSI/HSV rather than RGB Advantage: 1D color discrimination VHDL cores exist to convert image to HSI/HSV

Page 17: Image Understanding

17

RGB vs. HSI: the gist

sdf

Black

White

Inten

sity (

I)

Saturation (S)

Hue

Think about what happens to pixel values when lighting changes

Page 18: Image Understanding

18

Back to basics

What attracts our eye in an image? Contrast plays a big

part. In image to right:

High contrast: man and background

Low contrast: features on coat.

Page 19: Image Understanding

19

Measuring contrast

Assume gray scale: 0 (black) to 255 (white) Proposed algorithm:

Work through image array comparing intensity of adjacent pixels. Effectively computing partial derivative or slope

If difference is high, pay attention. Experiment:

Let’s construct new image where new pixel value is old pixel value minus pixel value to left (saturating to 0).

High contrast in image1 should be white in image2.

Page 20: Image Understanding

20

Result

Page 21: Image Understanding

21

Discussion

Clearly we’re on to something We can make out tripod, parts of head in result image.

But it is far from perfect. It completely missed left side of coat – why?

Pixel difference was large but negative; saturated to 0 (black). In noisy picture (say white pixel surrounded by black),

you’d get bogus result.

Page 22: Image Understanding

22

Algorithm revisited

Let’s visualize the computation performed Let array Ixy represent pixels in original picture. Computation equivalent to dot product of each pair

with small vector shown.

-1 1

Page 23: Image Understanding

23

Generalizing Cross correlation produces new image by

Sliding “kernel” over image in all possible positions Computing sum of products of matching elements (dot

product) at each position Using numerical result at each point as new pixel value

Image

Kernel

Page 24: Image Understanding

24

Kernels

A wide variety of kernels can be used that vary in size and function computed.

Sometimes kernels are chosen to implement specific steps Example: blur image based on Gaussian distribution

and differentiate Kernels are often tweaked until they work

Both size and values can be changed

Let’s explore a bit

Page 25: Image Understanding

25

Kernels

Limitation of [-1 1] kernel: Estimate of change

depends only on one adjacent pixel.

Idea: consider both left and right neighbors: [-1 0 1]

Improvement not striking

with [-1 1]

with [-1 0 1]

Page 26: Image Understanding

26

Kernels

Limitation of [-1 0 1] kernel: Sensitive to noise Considers just one row

Idea: improve by averaging vertically New kernel:

with [-1 0 1]

-1 0 1-1 0 1-1 0 1

new kernel

Page 27: Image Understanding

27

Kernels

Problem with kernel:

Why give equal weight to all rows?

New kernel (Sobel):

-1 0 1-1 0 1-1 0 1

-1 0 1-2 0 2-1 0 1

with old kernel

with Sobel kernel

Page 28: Image Understanding

28

Kernels

Problems with Sobel kernel: Catches edges going

from black to white, not white to black.

Misses horizontal lines. (Could rotate kernel 90° and double the processing…)

with Sobel kernel

with Sobel kernel

Page 29: Image Understanding

29

Other kernels

Suppose you just want to remove noise.

Could use a kernel to smooth.

Try:1 1 11 1 11 1 1 Oops! What happened?

Our kernel did not preserve intensity.Kernel elements sum to 9.

Page 30: Image Understanding

30

Other kernels

Try again with

Note how image is blurred

1 1 11 1 11 1 1

1/9

Page 31: Image Understanding

31

Other kernels

Try again with

Note increased blurring

1 1 1 1 11 1 1 1 11 1 1 1 11 1 1 1 11 1 1 1 1

1/25

Page 32: Image Understanding

32

Other kernels

Example

An approximation of Laplacian of brightness (related to 2nd derivative)

-1 2 -1 2 -4 2-1 2 -1

Page 33: Image Understanding

33

Kernel limitations

Edge operators based on kernel operations have problems with noisy images: Edges will be

Too thick in places Missing in places Extraneous in places

More sophisticated techniques have been developed to solve these problems. Most likely too complex for our project, platform.

Page 34: Image Understanding

Impressive results (Renegades of Funk) From 2012 team website How useful might this edge detection be?

34

Original image With Sean Thomas’s Sobel kernel

Page 35: Image Understanding

35

The Hough transform

Uses voting procedure to find lines (shapes) Finds edge points based on local pixel values Each edge pixel votes for line in discretized parameter

space Could use (intercept, slope), but vertical lines a problem Instead uses (r, ): r = x cos + y sin r is distance from origin to line, is angle from origin to

closest point on line After processing image, votes above some threshold

in 2D array indicate most likely lines

Page 36: Image Understanding

36

Example

See Wikipedia article

Page 37: Image Understanding

37

Moving forward: a suggestion

Prototype with MATLAB or OpenCV Take many images of light towers and landmarks, from

varying distances in different lighting Code and develop edge/shape/color detection

algorithms, test thoroughly Support for many image operators is built-in.

Design, implement, and test simplified version that can run on the Helios board Critical you understand what functions do; must go beyond

black-box understanding.

Page 38: Image Understanding

38

We further recommend...

Assign one team member responsibility for vision algorithms.

Look for online tutorials, demos, examples. Don’t worry too much (initially) about the underlying

mathematics: Focus on (1) does it do what I want? and (2) can I build it?

Do lots of experiments in software Make sure your approach is robust, reliable Move to hardware (VHDL) only if it is simple (e.g. color

space conversion) or too slow (e.g., yielding just 1 fps).

Page 39: Image Understanding

39

Big-picture: things to consider At what frame rate must images be processed? How noisy are images, how will you handle noise? How will you recognize and distinguish objects? If we add obstacles and landmarks, how should they

be marked? How will you estimate distance to objects? How can you adapt to dynamic changes in lighting?