Porting a 3D image registration application to multi-core environment

13
K. Sándor, M. Kozlovszky, V. Kamarás, L. Ficsór, S. V. Varga, B. Molnár HPCS 2008 HPCS 2008 April 14, 2008, Ottawa, Canada April 14, 2008, Ottawa, Canada

description

Porting a 3D image registration application to multi-core environment. K. S á ndor, M. Kozlovszky, V. Kamar á s, L. Fics ó r, S. V. Varga, B. Moln á r. HPCS 2008 April 14, 2008, Ottawa, Canada. Budapest Tech John von Neumann Faculty of Informatics (NIK). - PowerPoint PPT Presentation

Transcript of Porting a 3D image registration application to multi-core environment

Page 1: Porting a 3D image registration application to multi-core environment

K. Sándor, M. Kozlovszky, V. Kamarás, L. Ficsór, S. V. Varga, B.

Molnár

HPCS 2008 HPCS 2008 April 14, 2008, Ottawa, CanadaApril 14, 2008, Ottawa, Canada

Page 2: Porting a 3D image registration application to multi-core environment

1970 Kandó Polytechnic of Electrical 1970 Kandó Polytechnic of Electrical EngineeringEngineering

1970 1970 Department of ComputingDepartment of Computing

Budapest TechBudapest Tech (established 2000)(established 2000)Integration of 3 PolytechnicsIntegration of 3 Polytechnics

John von Neumann Faculty of John von Neumann Faculty of Informatics (NIK) (2000)Informatics (NIK) (2000)

Total number of students in the Total number of students in the faculty faculty ~1.000~1.000

April 14, 2008Budapest Tech 2

Page 3: Porting a 3D image registration application to multi-core environment

The goal is to speed up a linear image registration code by using the Cell architecture.

Histological, cytological and fluorescent slides 100-150 MB for each slide.

1 object consists of 100-300 slides. “Registration” - process of transforming input images

into one coordinate system. 2D 3D image reconstruction Input slides (tissue slices): situated in the picture at

different positions, different angles, significantly strained at random parts and might be disordered during the digital acquisition

Already implemented algorithm: Coarse Mutual Adjustment

Windows platform, sequential task

April 14, 2008Budapest Tech 3

Page 4: Porting a 3D image registration application to multi-core environment

Input: bitmap images

Calculation of center-of-mass

Image pre-processing (mask creation)

First search – approximate slew

Second search – slew

Output: center-of-mass coordinates slew

April 14, 2008Budapest Tech 4

ThresholdOpen

Median filter

RotationComparison

Page 5: Porting a 3D image registration application to multi-core environment

Code adaptation to the Cell SDK 2.1(following the original source code as much as

possible)

Code parallelization to the dual-threaded PPE(identifying concurrent tasks)

Offloading concurrent tasks to SPEs(utilizing parallelization

April 14, 2008Budapest Tech 5

Page 6: Porting a 3D image registration application to multi-core environment

Elapsed time of the porting phases

Offloading concurrent

tasks to SPEs50%

Code adaptation to the Cell SDK

2.140%

Parallelization to the dual-

threaded PPE10%

Code adaptation to the Cell SDK 2.1

40% of total time to adapt the code to the Cell SDK 2.1- analysis of original software code - analysis and search for appropriate substantial libraries ( IPL -> OpenCV )- implementation of missing functions ( image I/O, 1bpp image operations)- re-design of class structures

April 14, 2008Budapest Tech 6

Page 7: Porting a 3D image registration application to multi-core environment

Code parallelization to the dual-threaded PPE

Elapsed time of the porting phases

Offloading concurrent

tasks to SPEs50%

Code adaptation to the Cell SDK

2.140%

Parallelization to the dual-

threaded PPE10%

10% of total time to parallelize the code to the dual threaded PPE- strongly modular source code- standard C++ functions supported - almost no additional data transfer related implementation

April 14, 2008Budapest Tech 7

Page 8: Porting a 3D image registration application to multi-core environment

Offloading concurrent tasks to SPEs

Elapsed time of the porting phases

Offloading concurrent

tasks to SPEs50%

Code adaptation to the Cell SDK

2.140%

Parallelization to the dual-

threaded PPE10%

50% of total time to offload concurrent tasks to the SPEs- offload strategy, design- SPE-specific instructions (‘intrinsics’ )- further substantial function development- implementation of data transfer mechanism- debugging

April 14, 2008Budapest Tech 8

Page 9: Porting a 3D image registration application to multi-core environment

April 14, 2008Budapest Tech 9

Page 10: Porting a 3D image registration application to multi-core environment

Overall runtime results per mask pair

Performance measurement results

0

1

2

3

4

5

6

7

8

ORIG LIN DT FTPro

cess

ing

time

per

mas

k-pa

ir (s

)

ORIG – original code using IPL (~3.35s)

(sequential procedures, utilizing SIMD instructions)

LIN – sequential code ported to the SDK2.1,

Linux (~6.85s)

DT – dual-threaded parallelized code (~4.1s)

FT – fully threaded code on the Cell Broadband

Engine (~1s)

~2x

~3x

>2x

April 14, 2008Budapest Tech 10

Page 11: Porting a 3D image registration application to multi-core environment

CELL Blade QS20 hosts the projects’ website (http://cell.nik.bmf.hu/)◦ Off-line Demo illustrating the outputs of the

ported application◦ On-line Demo that gives results on the fly◦ Animated Demo illustrating the infrastructure of

the application being developed

April 14, 2008Budapest Tech 11

Page 12: Porting a 3D image registration application to multi-core environment

IBM US‘Development of a microscopy application for fast 3D image modeling and reconstruction’ Faculty Award

Dr. Dezső Sima DScJohn von Neumann Faculty of Informatics

László Kiss KollárIBM Global Engineering Solutions

Balázs MolnárJohn von Neumann Faculty of Informatics, Biotech Group

April 14, 2008Budapest Tech 12

Page 13: Porting a 3D image registration application to multi-core environment

Thank you!