An Algorithm to Compute Independent Sets of Voxels for Parallelization of ICD-based Statistical...

Post on 04-Jan-2016

220 views 0 download

Tags:

Transcript of An Algorithm to Compute Independent Sets of Voxels for Parallelization of ICD-based Statistical...

An Algorithm to ComputeIndependent Sets of Voxels for Parallelization of ICD-based Statistical Iterative Reconstruction

Sungsoo Ha and Klaus Mueller

Department of Computer Science

Visual Analytics and Imaging (VAI) Lab

Stony Brook University and SUNY Korea

Motivation

• Statistical Iterative Reconstruction Algorithm

FBP SIR

Motivation

• Statistical Iterative Reconstruction Algorithm• Weighted Least Square (WLS) cost function

�̂�=arg min𝑥 ≥ 0 {1

2(𝐲−𝐀𝐱 )𝑇𝐖 (𝐲−𝐀𝐱 )+𝑅 (𝐱 )}

y Measured projection data

X Attenuation coefficients of the object subject to be reconstructed

A System matrix with size of

W Diagonal matrix for statistical weighting

R(x) Regularization

Motivation

• Statistical Iterative Reconstruction Algorithm• Weighted Least Square (WLS) cost function

High cost for forward & back projectionsThe nature of iterative algorithm

�̂�=arg min𝑥 ≥ 0 {1

2(𝐲−𝐀𝐱 )𝑇𝐖 (𝐲−𝐀𝐱 )+𝑅 (𝐱 )}

Motivation: optimization

ICD-based CG-based

FAST SLOWConvergence rate

HARD EASYParallelization

x

y

GCD (Fessler et al. 1997)

B-ICD(Benson et al. 2010)

x

y

ABCD(Fessler et al. 2011)

z

Goal

• Devise an algorithm– Find voxels that are “fully” independent each other– No additional algorithmic & computational complexity– More accurate (also complicated) pattern– Applicable for all CT geometry

ICD-based GC-based

FAST SLOWConvergence rate

HARD EASYParallelization

Independency among voxels

• Single voxel update scheme–Minimizing one direction at a time

correction weighting update

A

Single voxel update

A voxel A

object

x-ray source

flat detector

region related to voxel A

A

B

A voxel A

object

x-ray source

flat detector

region related to voxel A

B voxel B

region related to voxel B

Independent voxel

System Matrix, - M: # of line-integrals- N: # of voxels

A B C

Overlap between B & C

CT system matrix view

M

N

• Independent– A, B

• Dependent – A, C– B, C

Overlap between A & C

• Knapsack problem:

Finding set of independent voxels

min ZERO {¿𝑔∈𝐺𝑔 }𝑠 .𝑡 .𝐺= {𝑎𝑘∨1≤𝑘≤ N }

𝑎𝑚∩𝑎𝑛=𝟎∀𝑎𝑚𝑎𝑛∈𝐺 ,𝑚≠𝑛

• Knapsack problem:

• Combinatorial NP-hard problem

min ZERO {¿𝑔∈𝐺𝑔 }𝑠 . 𝑡 .𝐺={𝑎𝑘∨1≤𝑘≤ N }

𝑎𝑚∩𝑎𝑛=𝟎∀𝑎𝑚𝑎𝑛∈𝐺 ,𝑚≠𝑛

Finding set of independent voxels

A B C D E F AG = B CX

min ZERO {¿𝑔∈𝐺𝑔 }𝑠 . 𝑡 .𝐺={𝑎𝑘∨1≤𝑘≤ N }

𝑎𝑚∩𝑎𝑛=𝟎∀𝑎𝑚𝑎𝑛∈𝐺 ,𝑚≠𝑛

Finding set of independent voxels• Knapsack problem:

• Combinatorial NP-hard problem• First-Fit Decreasing algorithm

1. Sort voxels in descending order of the number of non-zero elements

2. Pick the voxel that contain the largest number of non-zero elements

3. Invalidate all voxels that depend on the selected voxel

Experiment settings

• Cone-beam CT geometry• Volume: 128 x 128 x 128 (1 x 1 x 1 mm)• Flat detector: 512 x 512 (1 x 1 mm)• SAD: 600 mm• SID: 1000 mm• The number of projections– Varying from 1 to 360– Uniformly distributed over 360 degrees

Extreme case study

# views# independent

groupMax. size of

independent groupAvg. size of

independent group

1 187 16,186 11,214

360 13,569 449 154

• ABCD (Axial Block Coordinate Descent) algorithm• Along z-direction: 128

More parallelism No additional complexity

Theoretical parallelism

# views# independent

groupMax. size of

independent groupAvg. size of

independent group

1 187 16,186 11,214

360 13,569 449 154

• Expected speed-up (theoretical parallelism) with ideal GPU implementation

Estimated gain of GPU-accelerated OS-SIR

𝒈𝒂𝒊𝒏𝑶𝑺−𝑺𝑰𝑹𝑮𝑷𝑼

𝑔𝑎𝑖𝑛𝑂𝑆−𝑆𝐼𝑅𝐺𝑃𝑈 =

𝑝𝑎𝑟𝑎𝑙𝑙𝑒𝑙𝑖𝑠𝑚(360 /¿𝑜𝑓 𝑣𝑖𝑒𝑤𝑠𝑝𝑒𝑟 𝑠𝑢𝑏𝑠𝑒𝑡)

𝒑𝒂𝒓𝒂𝒍𝒍𝒆𝒍𝒊𝒔𝒎= 𝟏𝟐𝟖𝟑

¿𝒖𝒑𝒅𝒂𝒕𝒆𝒔

Number of views / subset

Independence visualization

1

5

10

20

45

90

180

360

32 (bottom) 64 (middle) 96 (top) 32 (bottom) 64 (middle) 96 (top)

• At 360 views

Independence visualization

32 (bottom) 96 (top)

• A clue for optimism

Independence visualization

32 (bottom) 96 (top)

1 view

360 views

Conclusion & Future works

• More parallelism than existing methods– No additional complexity– One time computation– Applicable for all CT geometry

• Hints for GPU implementation of SIR

• Apply to actual GPU-accelerated SIR framework– Determine optimal computational performance– Convergence rate

Thanks!

• Q&A

• This research was partially supported by NSF grant IIS-11732 and the MSIP (Ministry of Science, ICT and Future Planning), Korea, under the ‘IT Consilience Creative Program (ITCCP)’ (NIPA-2013-H0203-13-1001) supervised by NIPA (National IT Industry Promotion Agency).