oefpos09037 iis fah KLT A0.indd – Fast GPU-based KLT ...Fast GPU-based KLT feature point tracking...

1
a TRADITION of INNOVATION ISO 9001:2008 certified oef pos 09 037 16.48 35.57 82.54 1.92 3.44 7.32 0 20 40 60 80 100 SD HD 720p HD 1080p Image format Time [ms] GPU CPU HD 1080p video (Full-HD) 5500 7000 8500 10000 Image format 0 50 100 150 200 Time [ms] 98.7 120.1 147.3 172.3 10.7 11.9 13.7 15.3 GPU CPU GPU 5000 CPU 5000 GPU 7000 CPU 7000 GPU 10000 CPU 10000 Features 0 40 80 120 160 200 240 280 320 360 400 Overall runtime for Full-HD video Tracking Enf. Min. Dist. Selecting www.joanneum.at/iis JOANNEUM RESEARCH Forschungsgesellschaft mbH Institute of Information Systems Hannes Fassold Steyrergasse 17 8010 GRAZ, AUSTRIA Phone +43 316 876-1119 Fax +43 316 876-1191 [email protected] [email protected] www.joanneum.at/iis This work was partially supported by the European Commission under the contracts FP7-215475, “2020 3D Media – Spatial Sound and Vision”, www.20203dmedia.eu and FP7-216465, “SCOVIS – Self- Configurable Cognitive Video Supervision”, www.scovis.eu FP7-231161, “PrestoPRIME”, www.prestoprime.org Fast GPU-based KLT feature point tracking using CUDA Introduction Automatic selection and tracking of feature points is a basic task for many CV algorithms (e. g. structure from motion, object tracking) Very popular due to good performance: KLT algorithm [Kanade, Lucas and Tomasi] Optimized OpenCV implementation available, but is not realtime capable (> 30 fps) for High Definition (HD) video GPUs provide a lot of computation power and can be used for general purpose compution (CUDA) Goal: Port KLT feature point tracker to CUDA to achieve realtime performance for HD video Feature point selection Algorithm Calculate cornerness for each pixel 1. Enforce min. quality (5 – 10 % of max. cornerness) and do non-maxima 2. suppression Enforce minimum distance between all feature points 3. First two steps are done on the GPU Step 3 is inherently serial, therefore done on CPU Alternative method for step 3 was implemented (uses a mask image, much faster for many feature points) Feature point tracking Algorithm Calculate image pyramids 1. For each feature on every pyramid level 2. (from coarse to fine), calculate the optical flow iteratively, using Gauss-Newton method All steps are done on the GPU (each feature point is one thread) For a low number of feature points (< 1000) modern GPUs are under-utilized Evaluation Tracker quality (subjective assessment) Most of the feature points and their optical flow are the same for CPU and GPU implementation Differences mainly for feature points which seem to be not correctly tracked by both implementations Tracker runtime CPU (OpenCV) routine uses IPP & OpenMP internally, runs on Intel Xeon QuadCore 2.4 Ghz GPU routine runs on Geforce GX280 Achieves overall 5 – 10 times speedup Higher speedup for larger images and more feature points Conclusion Goal of HD realtime tracking achieved Key for successfully porting an algorithm to the GPU Parallelization of the algorithm into a large number of loosely coupled threads possible Knowledge of the GPU architecture (e. g the different memory types and their properties) Hannes Fassold , Jakub Rosner, Peter Schallauer and Werner Bailer Figure 1: KLT feature points (green: GPU implementation, red: CPU implementation, yellow: both) Figure 2: Runtime for feature point selection for different image sizes (without minimum distance enforcement) Figure 3: Runtime for feature point tracking for Full-HD images for different numbers of feature points Figure 4: Overall runtime of the KLT feature point tracker for Full-HD (1920 x 1080) video

Transcript of oefpos09037 iis fah KLT A0.indd – Fast GPU-based KLT ...Fast GPU-based KLT feature point tracking...

Page 1: oefpos09037 iis fah KLT A0.indd – Fast GPU-based KLT ...Fast GPU-based KLT feature point tracking using CUDA Introduction Automatic selection and tracking of feature points is a

a TRADITION of INNOVATIONISO 9001:2008 certified

oef pos 09 037

16.48

35.57

82.54

1.92

3.44

7.32

0 20 40 60 80 100

SD

HD 720p

HD 1080p

Imag

e fo

rmat

Time [ms]

GPUCPU

HD 1080p video (Full-HD)

5500

7000

8500

10000

Imag

e fo

rmat

0 50 100 150 200Time [ms]

98.7

120.1

147.3

172.3

10.7

11.9

13.7

15.3

GPUCPU

GPU5000

CPU5000

GPU7000

CPU7000

GPU10000

CPU10000Features

04080

120160200240280320360400

Overall runtime for Full-HD video

TrackingEnf. Min. Dist.Selecting

www.jo

anne

um.at/iis

JOANNEUM RESEARCHForschungsgesellschaft mbH

Institute of Information Systems

Hannes Fassold

Steyrergasse 178010 GRAZ, AUSTRIA

Phone +43 316 876-1119Fax +43 316 876-1191

[email protected]

[email protected]/iis

This work was partially supported by the European Commission

under the contracts

FP7-215475, “2020 3D Media – Spatial Sound and Vision”,

www.20203dmedia.eu

and

FP7-216465, “SCOVIS – Self-Configurable Cognitive Video

Supervision”, www.scovis.eu

FP7-231161, “PrestoPRIME”, www.prestoprime.org

Fast GPU-based KLT feature point tracking using CUDA

IntroductionAutomatic selection and tracking of feature points is a basic task ■

for many CV algorithms (e. g. structure from motion, object tracking)

Very popular due to good performance: ■

KLT algorithm [Kanade, Lucas and Tomasi]

Optimized OpenCV implementation available, but is not realtime capable ■

(> 30 fps) for High Defi nition (HD) video

GPUs provide a lot of computation power and can be used ■

for general purpose compution (CUDA)

Goal: Port KLT feature point tracker ■

to CUDA to achieve realtime performance for HD video

Feature point selectionAlgorithm ■

Calculate cornerness for each pixel1. Enforce min. quality (5 – 10 % of max. cornerness) and do non-maxima 2. suppressionEnforce minimum distance between all feature points3.

First two steps are done on the GPU ■

Step 3 is inherently serial, therefore done on CPU ■

Alternative method for step 3 was implemented ■

(uses a mask image, much faster for many feature points)

Feature point trackingAlgorithm ■

Calculate image pyramids1. For each feature on every pyramid level 2. (from coarse to fi ne), calculate the optical fl ow iteratively, using Gauss-Newton method

All steps are done on the GPU ■

(each feature point is one thread)

For a low number of feature points (< 1000) ■

modern GPUs are under-utilized

Evaluation Tracker quality (subjective assessment) ■

Most of the feature points and their optical fl ow are the same ➜

for CPU and GPU implementationDifferences mainly for feature points ➜

which seem to be not correctly tracked by both implementations

Tracker runtime ■

CPU (OpenCV) routine uses IPP & OpenMP internally, ➜

runs on Intel Xeon QuadCore 2.4 GhzGPU routine runs on Geforce GX280 ➜

Achieves overall ➜ 5 – 10 times speedupHigher speedup for larger images and more feature points ➜

ConclusionGoal of HD realtime tracking achieved ■

Key for successfully porting an algorithm to the GPU ■

Parallelization of the algorithm ➜

into a large number of loosely coupled threads possibleKnowledge of the GPU architecture ➜

(e. g the different memory types and their properties)

Hannes Fassold, Jakub Rosner, Peter Schallauer and Werner Bailer

Figure 1: KLT feature points (green: GPU implementation, red: CPU implementation, yellow: both)

Figure 2: Runtime for feature point selection for different image sizes (without minimum distance enforcement)

Figure 3: Runtime for feature point tracking for Full-HD images for different numbers of feature points

Figure 4: Overall runtime of the KLT feature point tracker for Full-HD (1920 x 1080) video