D Hi hl P ll l I C t D t tiDamascene: Highly Parallel Image Contour DetectionDamascene: Highly Parallel Image Contour Detectiong y gB C t N S d B Yii S Y L M k M h K t K tBryan Catanzaro, Narayanan Sundaram, Bor‐Yiing Su, Yunsup Lee, Mark Murphy, Kurt Keutzery , y , g , p , p y,
I C D i Program Flow Overall PerformanceImage Contour Detection Program Flow Overall PerformanceImage Contour Detection
Image Size: 481 by 321, 154401 pixels in totalI t d t ti i f d t l t iConvert Color Space Textons: K meansImage
Image Size: 481 by 321, 154401 pixels in totalCPU/GPU Runtime: 236 7s/2 081s
Image contour detection is fundamental to image Convert Color Space Textons: K-meansg CPU/GPU Runtime: 236.7s/2.081ssegmentation and many other computer vision problems
Speedup: 114xsegmentation and many other computer vision problems
CPU/GPU: 0.09/0.001 (s) 90xCPU/GPU: 0.09/0.001 (s) 90x CPU/GPU: 8.58/0.169(s) 51xCPU/GPU: 8.58/0.169(s) 51x p p
L A B Convert
Kmeans
Local CuesLocal Cues
Combine
Nonmax
Intervening
Local CuesLocal CuesEigensolver
Local CuesCPU/GPU: 53 18/0 78(s) 68x
Local CuesCPU/GPU: 53 18/0 78(s) 68x
OE
CombineCPU/GPU: 53.18/0.78(s) 68xCPU/GPU: 53.18/0.78(s) 68x Combine
BG CGA CGB TGImage Human Generated Machine Generated BG CGA CGB TGgContours Contours CPU R i GPU RuntimeContours Contours CPU Runtime GPU Runtime
R=3 R=5 R=5 R=5R=3 R=5 R=5 R=5
Precision‐Recall GraphgPb Algorithm: Current Leader Precision‐Recall GraphgPb Algorithm: Current LeaderR=5 R=10 R=10 R=10
g g
R=10 R=20 R=20 R=20global Probability of CVPR 2008 damasceneR=10 R=20 R=20 R=20g yboundary [Maire 1boundary [Maire, A b l F lk
1
We achieve slightly better Arbelaez, Fowlkes,
We achieve slightly better h B k l
Malik, CVPR 2008 ]accuracy on the Berkeley
, ]Currently the most 0.8y y
Segmentation DatasetCurrently, the most
t i t Combine Non-max Suppression Intervening ContourSegmentation Datasetaccurate image contour Combine Non max Suppression Intervening Contour
Comparing to human detector
n
p gsegmented “ground truth”5 8mins per small image
0.6
onCPU/GPU: CPU/GPU: CPU/GPU: CPU/GPU: CPU/GPU: CPU/GPU: segmented ground truthF meas re 0 70 for both
5.8mins per small image (481 b 321) li it it is
io
277/0.833(ms)277/0.833(ms) 130/0.31(ms)130/0.31(ms) 6.32/0.034 (s)6.32/0.034 (s) F‐measure 0.70 for both(481 by 321) limits its eci277/0.833(ms)
332x277/0.833(ms)
332x130/0.31(ms)
419x130/0.31(ms)
419x6.32/0.034 (s)
185x6.32/0.034 (s)
185xapplicability 0.4Pr
e332x332x 419x419x 185x185xpp yToo slow for interactive
P
Too slow for interactive h t ditiphoto editing
Too slow even for Image 0.2
Generlaized Eigen SolverToo slow even for Image Retrieval Generlaized Eigen SolverRetrieval
0CPU/GPU: 151.2/0.81 (s) 186xCPU/GPU: 151.2/0.81 (s) 186x 0
0 0.2 0.4 0.6 0.8 1
CPU/GPU: 151.2/0.81 (s) 186xCPU/GPU: 151.2/0.81 (s) 186x
RecallRecall
Platform: Nvidia GTX200 SeriesPlatform: Nvidia GTX200 Series
ConclusionOriented Energy Combination ConclusionOriented Energy CombinationS ifi ti GTX280Specifications GTX280 CPU/GPU: 2300/16.5 (ms) 140xCPU/GPU: 2300/16.5 (ms) 140xProcessors 30 @ 1.3
( )( )
GHzPhysical SIMD Width 8 • Damascene provides highest quality image contour Physical SIMD Width 8SP GFLOPS 933
• Damascene provides highest quality image contour d i bl SP GFLOPS 933 detection at user acceptable rates
Memory Bandwidth 141.7 GB/sp
• It demonstrates the transformational speedup yRegister File 1 875 MB
It demonstrates the transformational speedup t ti l f hit tRegister File 1.875 MB
L l S 480 kBpotential of manycore architectures
Local Store 480 kB • Damascene was enabled by the collaborative Global PbCombine, NormalizeMemory 1 GB
Damascene was enabled by the collaborative environment at the Berkeley UPCRCGlobal PbCombine, Normalizey environment at the Berkeley UPCRC
• Future work will generalize Damascene into a case CPU/GPU: CPU/GPU: gstudy for application and programming frameworks
CPU/GPU: 437/0.241 (ms)
CPU/GPU: 437/0.241 (ms) study for application and programming frameworks
d437/0.241 (ms)
1813x437/0.241 (ms)
1813x (stay tuned)1813x1813x ( y )
Top Related