Image reconstruction on hypercube computers Application to ...Keywords. Filtered backprojection,...

Signal Processing 27 (1992) 51 64 51 Elsevier

Image reconstruction on hypercube computers" Application to electron microscopy*

E.L. Z a p a t a

Departamento Arquitectura de Computadores, Facultad Informdttica, Universidad Mdlaga, M(daga, 29013, Spain

I. B e n a v i d e s

Departamento Electr6nica, E.U. Polit~cnica C6rdoba 14004, Spain

F.F. Rivera, J.D. Bruguera, T.F. P e n a

Departamento Electr6nica, Facultad Fisica, Universidad Santiago, Santiago 15706, Spain

J.M. C a r a z o 1

Ck, ntro de Biologia Molecular, CSIC, Universidad Autdnoma de Madrid, Madrid 28049, Spain

Received 25 April 1991 Revised 8 August 1991

Abstract. Filtered backprojection is a popular algorithm for the reconstruction of n-dimensional signals from their ( n - l)- dimensional projections (in the sense of line integrals). Here we specifically treat the problem of the 3-dimensional (3D) reconstruction of an object from its 2-dimensional (2D) projection images. In this work we perform the implementation of the filtered backprojection method in hypercube computers. The parallel algorithm is general in the sense that it does not impose any restriction in the problem space dimensions and is adaptable to any hypercube dimension. The flexibility of the algorithm is rooted in the methodology developed for embedding algorithms into hypercubes. Finally, we analyze the complexity of the parallel algorithm and apply the parallel algorithm to the 3-dimensional reconstruction of the oligomer formed by the chaperonin GroEL from E.coli.

Zusummenfussung. Gefilterte Riickprojektion ist eine g/ingige Vorgehensweise bei der Rekonstruktion von n-dimensionalen Signalen aus ihren ( n - l)-dimensionalen Projektionen (im Sinne yon Linienintegralen). Hier behandeln wir insbesondere das Problem der dreidimensionalen (3D) Rekonstruktion eines Objekts aus seinen zweidimensionalen (2D) Projektionsbildern. In unserer Arbeit haben wir die gefilterte Rtickprojektion in Hypercube-Rechnern durchgef/ihrt. Der Parallelalgorithmus ist allgemeing/iltig in dem Sinne, dab er keine Vorbedingungen ffir die r/iumlichen Dimensionen in das Problenm einbringt und an jede Hypercube-Dimension angepaBt werden kann. Die Flexibilit/it des Verfahrens geht zuriick auf die Methodik, die zur Einbettung yon Algorithmen in Hypercubes entwickelt wurde. Schliel31ich analysieren wit die Komplexit/it des Parallel- algorithmus' und wenden ihn an auf die dreidimensionale Rekonstruktion der olygomeren Struktur, die durch die Begleitsub- stanz GroEL yon E-Kolibakterien gebildet wird.

R6sum& La r6tro-projection filtr+e est un algorithme populaire pour la reconstruction de signaux n-dimensionnels fi partir de leurs projections (n-1)-dimensionnelles (au sens d'int~grales de ligne). Nous traitons ici spbcifiquement le probl6me de la reconstruction tri-dimensionnelle (3D) d'un objet 5. partir de ses images de projection bi-dimensionnelles (2D). Nous avons effectu6 dans ce travail l'implementation de la m6thode de r6tro-projection filtr~e sur des calculateurs hypercubes. L'algorithme

Present address: Centro Nacional de Biotecnologia, CSIC, Universidad Autbnoma de Madrid, Madrid 28049, Spain. * This work was supported by the Ministry of Education and Science (CICYT) of Spain under contracts TIC88-0094, MIC88-

0549 and PB87-0365, the Xunta de Galicia XUGA80406488, and the Spanish institution Fundacibn Rambn Areces (Institutional Support to Centro de Biologia Molecular).

Elsevier Science Publishers B.V.

52 E.L. Zapata et al. / Image reconstruction on hypercube computers

parall~le est g6n6ral dans le sens qu'il n'impose aucune restriction sur les dimensions de l'espace consid/~r6 et est adaptable ft. toute dimension d'hypercube. La flexibilit6 de l'algorithme a ses racines dans la m&hodologie d6velopp6e pour les algorithmes d'immersion dans les hypercubes. Nous analysons enfin la complexit6 de l'algorithme parall61e et l'appliquons ~i la reconstruction tri-dimensionnelle de l'oligom6re form6 par le chaperonin GroEL obtenu de E.coli.

Keywords. Filtered backprojection, image reconstruction, hypercube computers, parallel computation.

1. Introduction

The problem of the 3D reconstruction of an object from its 2D projection images (in the sense of line integrals) is a very common situation is such diverse fields as astronomy, medical imaging and electron microscopy, among many others [2, 10]. In general, reconstruction algorithms can be grouped into two classes: series expansion methods, exemplified by the ART (Algebra Reconstruc- tion Technique) algorithm and its many variants, and transform methods, which are usually based on a discretized version of the Radon inversion formula [10, 11]. In this work we will concentrate on one of the best known and most popular transform methods, the so-called convolution (or filtered) backprojection method.

From the computational view point, the main problem that reconstruction algorithms present is the long processing time that they require. This time is more related to the huge number of data to be processed than to the number of operations to be performed on them. It is thus clear that reconstruction algorithms, as well as the generality of image processing algorithms, need the introduction of parallel processing methods in order to achieve reasonable processing times. Up to date many efforts have been directed to the design and implementation of dedicated architectures for image processing. These efforts have focused on the use of multiprocessor systems of most types (vector processors, array processors, more general multiprocessors) [4, 27, 30], as well as on the design of dedicated processors based on VLSI technology [13, 25].

In this work we approach the parallelization on SIMD hypercube computers of the filtered backprojection algorithm, which is one of the most Signal Processing

common methods to reconstruct a three-dimensional object from its two-dimensional projection images. We evaluate the proposed parallel algorithm and we present its practical application to the 3D structure reconstruction.

The work is organized as follows. Section 2 is devoted to the study of the basic steps of the algorithm (filtering of the 2D projections, backprojection and 3D filtering of the reconstructured object). In Section 3 we describe the hypercube topology. Section 4 presents the parallel algorithm, analyzing its adaptation to hypercube computers. In Section 5 we evaluate the complete algorithm. Finally, in Section 6 we apply the algorithm to the three- dimensional reconstruction of the oligomer formed by the bacterial chaperonin GroEL from E.coli.

2. Filtered backprojection

One of the most simple approaches to the reconstruction of an object from its projections is by using the direct backprojection. This operation consists in uniformly projecting back the pixel density of each of the projection images along the projecting direction. The summations of all these backprojections render an approximation to the original volume [10]. However, the so obtained volume cannot be an exact reconstruction. This follows easily by inspecting the Radon inversion formula, since the direct backprojection only implements part of the required operations (specifically, it neglects the calculation of some partial derivatives as well as a Hilbert transform).

The inaccuracies introduced by the use of the direct backprojection can also be modelled as the convolution of the exact 3D reconstruction with a 3D point spread function (PSF). In principle,

E.L. Zapata et al. / Image reconstruction on hypercube computers 53

a correction by this PSF can be implemented in Fourier space by multiplying (filtering) the Fourier transform of each projection image with the inverse of the Fourier transform of the PSF.

The actual implementation of the filter function is application dependent, since it v~ries with the data acquisition geometry. In the specific case of the 3D reconstruction of general biological macro- molecules, there is no control over some param- eters of either the electron microscope or the specimen, thus resulting in an angularly uneven distribution of projection images. Methods to calculate the exact filter function from an arbitrary set of projection images have been reported [ 18, 19]. They are 'exact' in the sense that the actual geometry is explicitly and optimally considered. The general filter function so derived is the inverse of a summation of sinc functions along the direction perpendicular to each one of the images that are used to generate the filter [18, 19].

Let PW be the projection image that we want to filter and PG the projection image generating the sinc functions. Let rWw be a vector in the coordinate system associated to PW (that is, any system with the x and y axis in the plane PW), and rwg the same vector in the coordinate system associated to PG. Let r be a vector in a fixed coordinate plane, and DW the matrix of rotations from r to rWw (rWw = DWr). In the same way, let DG be the matrix of rotations from r to rWg (rwg=DGr). Then, the argument of the sinc function is proportional to the z coordinate in the coordinate system associated with PG:

rw~ = D G D W - lrww. ( 1 )

A very interesting and useful simplification happens in those cases in which the set of projection images to be filtered is the same set as the one of images generating the filters. In this case the matrix DW-1 is the transpose of DG.

To facilitate the description of the operations, we define the following notation: small bold letters for vectors, capital bold letters for matrices and * to denote points in the transformed domain.

The value of the filter function corresponding to the point (y*, x*) in the ith transformed image will be

weight(y*, y*)

= 2r~d sinc(arg(y*, X*)g) , (2) I /

where

arg(y*, X*)g = d [0, y*, x*][d] T, (3)

where M is the number of generating projections, d the diameter of the 3D volume to reconstruct, N the dimensions of the image that we want to filter (O<y*,x*<~N) and d is the last row of matrix DGDW-1 ([ ]v is the transpose). Without loss of generality, in the following we will consider the images of square dimensions (N 2 pixels) and the final volume to be a cube of dimensions N x N × N (N 3 voxels).

As the last step in the definition of the filtering function we introduce a threshold value a on the values calculated from (2) (if weight (y*, x*) < 1/a, then weight(y*, x*) = 1/a) as a way to prevent the boosting of those frequencies which were less determined (sampled) from the geometrical arrangement of the input projection images. In our application a value of a = 0.6 has been found to be adequate.

After calculating the filter function, the next step in the process of 3D reconstruction is the direct backprojection. At this step the input projection images (which have already been filtered) are uniformly backprojected in the reconstruction volume.

The geometrical relationship between the coordinates of the object (Zo, Y0, Xo) and the corresponding ones in each one of the projections (yi, xi) are described by the following equation:

[SoO cos0s n cos0cos ] [yi, xi] T= cos q~ -s in ~o

x [z0, yo, Xo] T, (4) Vol. 27. No, 1, April 1992


where 0 and ~0 are the azimuthal and tilt angles of each projection, respectively. Direct backprojection can be implemented by applying (4) to each one of the object voxel coordinates in order to obtain the coordinates corresponding to the projections that we are analyzing and, then, performing a summation over all the contributions to each voxel.

As a result of (4) we have two different situa- tions. The first one happens when the generated coordinates (yi, x~) are outside the image (O>~y~, x~>N) and, therefore, the i-th projection does not contribute to the (z0, y0, x0) voxel of the object. The second situation happens when the generated coordinates (y~, x~) are within the image (0<yi, x~<~N). In general, y~ and x~ will be real numbers, hence the intensity value at those latter coordinates has to be calculated by means of some interpolation algorithm. A bilineal interpolation scheme has been used here.

In short, the filtered backprojection reconstruction method, expressed in pseudocode, comprises the following functions:

Algorithm Filtered backprojection { Compute matrix DG for each projection; for (w= 1;w~<M;w++){

Read image(w) ; Compute 2D-FFT(image(w)) ; Weight/Filter the transformed image(w):

eqs. (2) and (3); Compute inverse 2D-FFT(image(w)) ; Backprojeetion(image(w)): eqs. (4) and (5);

} Compute 3D-FFT(object) ; Weight/Filter the transformed object; Compute inverse 3D-FFT(object);

)

a + ( b - a)fi~ + ( c - a)fiy

+ { ( d - c) - (b - a) } 6x6 ,, (5)

where a, b, c, d are the intensities of the image at pixels (yi, xi), (Yi, xi+l) , (yi+l,x~) and (y~ + 1, xg + 1), respectively, and Éix and 6y are the fractional parts resulting from (4).

The final step in the process of 3D reconstruction of an object is the attenuation of the non-significant high frequencies that are present in the calculated volume. This process has to be performed after the volume has been reconstructed since the actual interval of meaningful frequencies is only known a posteriori by statistically comparing different 3D reconstructions [18]. Obviously, in those applications in which the meaningful range of frequencies can be estimated a priori, this low- pass filtration should be performed at the time each projection is weighted before backprojection. In this application a very simple low-pass filter has been used, namely a radially symmetric step with a cosine attenuation (although more sophisticated filters have been discussed [7] and could be easily implemented). As in the case of the image filtering (weighting), we will also take into account the sym- metry of the 3D FFT.

3. Hypercube architecture

A q-dimensional hypercube computer is a machine with Q = 2 q processing elements interconnected like the vertices of a binary q-dimensional cube are interconnected by its edges. In this way, each PE(r) (r--0, 1 . . . . . Q - 1) has q bidirectional and non- shared links with the other q PE(r ~b~) (b= 0, 1 . . . . . q -1 ) , where r ~b~ is the number whose binary representation differs from the one of r only in bit b. Reference [1] shows examples of commer- cial hypercube concurrent computers. We have chosen a SIMD computational model with an unshared distributed memory for the implementation of the algorithms we have presented in the previous section.

The basic operation in the conversion of sequential algorithms into parallel algorithms is the unrol- ling of the nested loops so that different iterations are processed in different PEs. This implies that the data processed and produced in different iterations have to be distributed among the PEs, but the distribution of the processing structure must determine the distribution of the data and not the other

Signal Processing

E.L. Zapata et al. / Image reconstruction on hypercube computers

way around. Also, the processing must be distributed taking into account that a parallel algorithm must be flexible enough to adapt to problems whose sizes are independent from the number of PEs in the hypercube. There is not only one method of assignment, therefore we must analyze each algorithm to determine the distribution mode which gives the best performance.

Recently, Zapata et al. [33] and Rivera [20] analyzed how to partition a sequential algorithm in order to be processed in parallel by a hypercube computer. The design procedure is as follows: (1) Identification of the maximum nesting level of

the independent loops of the sequential algorithm (this will define the number of dimensions of the algorithmic space).

(2) Partitioning the dimensions of the hypercube into subsets associated with the independent loops of the sequential algorithm.

(3) Distribution of the data arrays that are to be used among the PEs according to the indexing scheme of the PEs and the data distribution mode that have been chosen.

(4) Make the parallel algorithm. (5) Performance optimization changing the parti-

tion in step (2). This procedure has been successfully applied to

the 'parallelization' of numerous sequential algorithms [20 22, 29, 31 34]. In all of them we have considered a pure binary indexing scheme for the PEs and the distributions of data in the local memories of the PEs to be regular, as corresponds to cyclic and consecutive storage schemes [12]. Finally, this procedure is general in the sense that the parallel algorithm encompasses the sequential algorithm as a particular case. In fact, it is transformed into the sequential algorithm when the dimension of the hypercube is zero (only one PE).

The programming language that we will use in the following to show our parallel algorithm is an extension of the C language, called ACLAN (Array C LANguage) [15 17], which is machine independent and allows the direct programming of the PEs local memories. ACLAN includes operators and data types allowing the direct control of

55

intra-PE operations, as well as inter-PE and host- PEs communications. ACLAN allows the pro- grammer to work with the storage elements con- tained in the PEs and in the host. There is a special register included in each PE which stores the logical index that identifies it. This register is represented by the predefined name # . All these names can be combined in expressions by means of operators. Apart from most of the C operators, ACLAN includes the bit operator (.:), which allows the extraction of a bit or a range of bits fi'om an integer value.

In ACLAN we can distinguish two types of executable sentences, scalar and parallel. The for- mer, written directly in C, are executed by the control unit of the SIMD computer. Their function is to process universal data and to control the program flow. The latter are in charge of processing the data distributed in the processing elements or performing the distribution of this data. They are executed in the nodes and for their specification ACLAN uses the syntax action {mask}. The action field of any parallel sentence represents the action itself to be executed by the PEs. This action can be masked by the mask field of the sentence, which is optional.

The parallel actions are either assignment or data movement from a certain set of memory locations to another one. Depending on the physical position of these memory locations, there are local actions (where the data transference is restricted to each PE), remote actions (or routing transferences among different PEs) and central actions (or data transferences among the host and the PEs in either direction). Table 1 shows the symbols used to spec- ify the three kinds of parallel actions.

4. Description of the parallel algorithm

The sequential algorithm can be divided into three parts, filtering of the projections (this computation involves 2D-FFT, calculation of the weight function according to (2) and (3) and inverse 2D- FFT), backprojection of each of the projections

Vol. 27, N o 1, April 1992

56

Table 1 Basic parallel structures of ACLAN

E.L. Zapata et al. / Image reconstruction on hypercube computers

Symbol Meaning Explanation

Data transfer from one register to another in the same PE Data swap between two regis- ters of the same PE Data transfer from one register to another in a different PE Data swap between two regis- ters in directly connected PEs Data transfer from PE to CU or viceversa Extract a bit or a range of bits (//means optional) Check if a value is in a certain range (//means optional) Identify the index of the PE Identify the interconnection functions

:= Local assignment

:= : Local exchange

Remote assignment

.--, Remote exchange

Central assignment

. / : / Bit operator

in : / : / Set operator

~- Parallel register neigh[ ] Keyword

and filtering of the three-dimensional object (this

computat ion involves 3D-FFT, calculation of the filter function, and inverse 3D-FFT). The paralleli-

zation of the direct and inverse k-dimensional F F T on SIMD hypercube computers has recently been

analyzed by Zapata et al. [32]. The calculation of the weight function associated

to each projection is carried out by means of (2)

and (3). From the analysis of these equations we can deduce that there are three main nested loops (rows and columns of the projection and the number of projections). The three loops are independent and therefore can be parallelized.

In order to minimize interprocessor communications we have chosen a 2-partition for all the functions included in the filtering of the projections. This solution reduces the potential parallel- ism of (2) and (3), as we could use a 3-partition; nevertheless, this reduction is more apparent than real due to the fact that a 2-partition permits the use of up to N 2 processors, which is quite enough if we take into account the hypercube systems that are currently available. Furthermore, we avoid the interprocessor routing associated to the change of partitions by keeping the same partition for the calculation of the 2D-FFT and for the weighting function.

Signal Processing

The partition of the dimensions of the hypercube

into two subsets, qt and qo where q = qt + qo, permits the representation of the index r of each PE by means of a vector (r~, ro), where r=ro+r~2 q°, and where ri is associated to the subset qi.

The second part of the algorithm is the backprojection of each one of the projections on the volume (eqs. (4) and (5)). In this process there are three independent nested loops associated to the dimensions of the three-dimensional object. By means of a 3-partition each processor analyzes only a sub- volume of the 3D specimen structure. In this way the interprocessors communications are avoided, provided that each processor keeps a copy of the projection image we are working on.

We, therefore, perform a partition of the dimensions of the hypercube into three subsets, q2, q~, q0, where q = q2 + q~ + q0, which are associated to the dimensions of the object; this partition allows the representation of the index r of each processor by means of a vector (r2, r~,ro), where r = r0 ÷ rl 2q° + r22 q°+q~.

The third part of the algorithm is the filtering of the three-dimensional object. In this case there are also three independent nested loops. It is therefore natural to keep the 3-partition of the dimensions of the hypercube [32].

According to the partition chosen and a consecutive distribution scheme, the distribution of the matrices of the projections (of dimensions N x N) and of the objects (of dimensions N × N × N) is as follows: (a) The element in row i and column j of the

projection will be stored in position ( i m o d w ~ , j m o d Wo) of the local submatrix

L I M A G ( r j , r0) in all processors for which rl = [i/wl] and rz=[j/wo], being wl = I N / 2 ql] and

Wo = [N/2q°]. (b) The element (i,j, k) of the object will be stored

in position (i rood v2,j rood v~, k rood Vo) of the local submatrix LOBJ(r2, r~, ro) in the PEs for which index r2, rl and ro are rz=[i/v2], rl =[ j /v l ] and ro=[k/vo], being v2 = [N/2q2], t) 1 = I N / 2 ql ] and Vo = [N/2q°].

Thus, each PE stores two local submatrices, L I M A G and LOB J, with dimensions wl x Wo and v2 x vj x v0, respectively.

E.L. Zapata et aL / Image reconstruction on hypercube computers

void weight(k) {

1 for ( x = 0 ; x < w ~ ; x + + ) 2 for ( y = 0 ; y < w 0 ; y + + ) { 3 p :=0 4 ag:= r~ * w~ + x ; /* global coordinates *" 5 b~:=ro* wo+y; 6 a g : = N - a g {bg>N/'2 && ag!=0};

b g : - N - b g {bg>N/2}; 7 a g : = N - a g {ag>N/2}; 8 for ( j = 0 ; j < M ; j + + ) { 9 z:=d32[j] , ag;

z :=z+d31[ j ] * bg; 10 d x : - rt • z /N ; 11 a r g : = d * dx; 12 p := p + sio(arg)/arg {arg! = 0} ;

p : = p + 1 {arg==0} ;

J 13 pint:= l/(2rtdp) ;

pint:= I/c~ {2~dp ~< c~} ; 14 LIMAG[x][y]:= pint * LIMAG[x][y];}

Fig. 1. Parallel program for filtering the 2D image.

57

Figure 1 shows the ACLAN program for filtering the 2D image (eqs. (2) and (3)). d is the diameter of the three-dimensional volume that has to be reconstructed. We have supposed that vector d has been previously distributed among the local memories of the PEs. The calculation of (2) implies the nesting of the three loops (sentences 1, 2 and 8). The loops of sentences 1 and 2 go through all pixels of the image and obtain their global coordinates from their local coordinates (sentences 3-7). The masks in sentences 6 and 7 restrict their execution to the upper half of the transformed image.

In the innermost loop, sentences 8 12, the value of the weight function is obtained for each point of the image. In sentences 9 11 the value of the function arg (eq. (2)) is obtained, and in sentence 12 the final value of the weighting function is calculated, d32 and d31 are the corresponding elements of matrix DG DW- l. The filtering operation is carried out in sentences 13-14. Mask in sentence 13 enforces a limit to those values ofp that are smaller than ct /2rtd.

The next step of the filtered backprojection is the simple backprojection of the images on the reconstructed volume. The ACLAN program that per- forms this operation is shown in Fig. 2. ctr~

represents the i-th coordinate of the center of the 3D object. This process implies three parallelized nested loops (lines 1, 2 and 3) which go through all the voxels of the object. In sentences 4-6 we obtain the general coordinates of each voxel. In sentences 7-8 we determine if they are or not within the image according to (4).

With sentences 9 and 10 we calculate the value of the mask that will be used throughout the algorithm. MASK takes the value 1 if the point belongs to the image. When the points are in the image, the bilineal interpolation expressed in (5) is carried out by sentences 11 21. With sentences 11-17 the intensities of the projection image in pixels (yi, x~), (y~,x~+l), (y~+l, xi) and ( y g + l , x ~ + l ) are stored in matrix ph. The intensity associated to points outside the image ix = = N - 1 or iv = = N - 1 are set to 0 (sentences 12-15). Finally, (5) is evalua- ted in sentences 18-21.

Between the two steps of image filtering and simple backprojection it is necessary to change the way the projection images are stored in the PEs [33], since we will use different partitions in each step. The reason for this change is easy to understand by inspecting the simple backprojection algorithm defined by (4) : it is clear that the coordinates corresponding to the projection images are not known

Vol. 27, NO. 1. April 1992


void backpr( ) {

1 for ( z = 0 ; z < = v 2 ; z + + ) 2 for ( y = 0 ; y < = v t ; y + + ) 3 for ( x = 0 ; x < = v o ; x++){ 4 zg:=r2 * V2+Z; 5 y g : = r l * v t + y ; 6 xg:=ro * v0+x; 7 rx: = -(Xg--Ctr,) * sin(O)+ (yg- ctr2) * cos(O) * s in (~)+ (Zg-Ctr3) * cos(O) * cos(~)+ ctr4 ; 8 ry:= (yg-ctr2) * cos (O) - (zg- ctr3) * sin(qb) +ctr4 ; 9 (ix, iy):=(flOOlr (rx), flOOr (ry));

10 MASK := ((ix ~> 0)&&(iy ~> 0)&&(ix < N)&&(iy < N)) ; /* interpolation */

11 ph°°: = LIMAG[ix] [iy] 12 (ph °1, phll) := (0, 0)

13 ph °' := LIMAG[iy][ix + 1] 14 yd : -0 15 (pht°, ph i l ) := (0, 0) 16 phi° := LIMAG[iy + 1 ] [ix] 17 ph H := LIMAG[iy + 1][ix] 18 (xd, Yd):-- (rx --ix, ry --iy) 19 dto : - ph °° + (ph °l - ph °°) * xa 20 dh : - ph TM + (ph 11 - ph TM) * xa

21 LOBJ[z,][y,][x,] : - LOBJ[z~][y,][x~] + dto + (dt, - dto) *Yd }

}

{MASK}; {MASK && ( i x = = N - 1)}; {MASK && ! ( ix==N - 1)}; {MASK && ( i y = = N - 1 ) } ; {MASK && ( i y = = N - 1 ) } ; {MASK && !(iy== N - 1 ) } ; {MASK && ( ( i x < N - 1 ) & & ( i y < N - 1))}; {MASK}; {MASK}; {MASK} ; { MASK } ;

Fig. 2. Parallel program for the backprojection.

a priori. In fact, it is easy to imagine a situation in which the image points (pixels) contributing to the intensity value of a given voxel could be stored in different PEs. If this were the case, a quite intensive asynchronous routing among the different PEs would be needed. The way we have devised in order to avoid this routing is by storing a complete copy of the image we are dealing with on each PE; this approach avoids routing, but forces a change in partition.

The last step in the 3D reconstruction process is the filtering of the volume in order to attenuate the non-significant high frequencies that are present in the reconstruction. The process of filtering the three-dimensional object is similar to the filtering of the image and, therefore, the corresponding program can be obtained by modifying the program in Fig. 1.

5. Evaluation

The complexity of the parallel algorithm for filtered backprojection is a function of the number Signal Processing

and the size of the projections, of the size of the object to be reconstructed and of the partition of the dimensions of the hypercube [3]:

O[1 + M × (2B0 + B, +R+Bz)+(2Co+ C,)],

where M is the number of projections. B0 and B~ are the algorithmic complexities of the

basic projection filtering processes: 2D-FFT and calculation of the weight functions, respectively. The parallelization of the multidimensional FFT on SIMD hypercube computers has been analyzed by Zapata et al. [32], achieving a high performance. The factor of 2 multiplying B0 is due to the fact that the filter function is calculated and applied in Fourier space, however it is necessary to transform the filtered image back to real space in order to perform the simple backprojection. The algorithmic complexity of the calculation of the filter function is

BI=O[1 +wl × w0 × M].

Since there is no routing among the processors, the algorithmic complexity of the filter function

E . L . Z a p a t a et al. / h n a g e recons truc t ion on h y p e r c u b e c o m p u t e r s

only depends on the dimension of the local image that each PE stores (w~ x w0) and on the total number of projections, M. The factor M corresponds to the non-parallelizable loop of (2).

R is the complexity of the routing stage corresponding to the change from 2- to 3-partition [33] :

R = O [ I + Qx wj × w0+N2].

The complexity of this routing stage is directly proportional to the global dimensions of the image, since each processor keeps a copy of this image.

The next term, B2, is the complexity of the simple backprojecton :

B2 = O[ 1 + v2 x vl x v0].

As it happened during the calculation of the filter function, there is no routing among the different PEs. The complexity of the simple backprojection is therefore a function of the local dimensions of the object only.

Finally, Co and C, are the complexities of the processes for filtering the three-dimensional object: 3D-FFT [32] and calculation of the filter function, respectively. The process of calculating the 3D filter for the reconstructed volume is similar to the calculation of the 2D filter for each individual projection. Therefore, the algorithmic complexity only depends on the local dimensions of the object:

Ci =O[1 + v2 x v, x v0].

In Table 2 we show the complexities of these stages for different dimensions of the hypercube and different sizes of the problem with n = logz N. The partitions we have chosen allow for the reduction of the interprocessor communication times,

Table 2

Algorithmic complexity of the parallel filtered backprojection

q2 q~ qo n PEs 2D weight Back 3D weight projection

0 0 0 1 N 2 M N ~ N ~

0 0 n N N M N 2 N 2

0 n 0 N N M N 2 N 2

0 n n N 2 M N N

n n n N 3 M 1 1

59

thus obtaining a behavior which is close to the optimum. If the size of the hypercube is the same as the dimensions of the 3D volume, then the algorithmic complexity depends only of the number of projections that we are considering. On the other hand, if the hypercube is of dimension 0 (i.e., a single PE), the algorithmic complexity is the same as the one obtained on a single-processors system. In other words, with the methodology for mapping and partition that we have used, the sequential execution is just a particular case of the parallel one.

We have executed our parallel algorithm on an NCUBE/10 multiprocessor system with 128 PEs. We started from 258 projection images. The dimensions of the projection images were 32 x 32, while the reconstructed volume was of' dimensions 32 x 32 x 32. To analyze the influence of the problem size over the performance we have simulated another data set with a larger value of N, N = 64. Recently, we have implemented this reconstruction method on a transputer network working as a hypercube of dimension 3 [4, 26].

Before continuing our presentation of the results it is necessary to clarify the SIMD programming model we have used in this work, as well as the general programming of the NCUBE/10. To be precise, the SIMD algorithms presented in Section 4 were ported to the language C of the hypercube NCUBE/10, and the simulation was performed using the approach SCMD (Single Code Multiple Data), that makes easier the concurrent execution of tasks in a synchronous way [9],

Figure 3 shows execution times in a logarithmic scale corresponding to the different steps of the parallel algorithm performing the filtered backprojection for different values of the hypercube dimension and the problem size (N =3 2 and N=64) . Remark that there is a linear decrease of the loga- rithm of computing time with the increase of dimension of the hypercube for all processes but the change of partition. For this latter process the execution time is practically constant, as the result of a balance between the dimension of the hypercube and the size of messages that are exchanged.

Vol. 27. No. I, April 1992


log T

Total

2D F i l t e r

Backpr.

- ~ Chapge

3D F i l t e r

I I I I I I i q

1 3 5 7 log T

~ Total

2D F i l te r Backpr.

---._._

3~ F i l t e r

I [ ! I I I I q

1 3 5 7

Fig. 3. Runtime versus hypercube dimension corresponding to the different steps of the parallel algorithm. (a) N =32. (b)

N = 64.

Note that the most important single process in the filtered backprojection algorithm, as far as total execution time is concerned, is the 2D filtering of the images, since it has to be performed on each one of them and M>>N.

Signal Processing

The corresponding sequential run time is slightly lower than the one obtaind when q = O. This differ- ence is due to overhead. Our parallel algorithm presents a small overhead because the structure of the parallel and sequential algorithms are similar. We have not included in Fig. 3 computing times for q = 0, 1, 2 and N = 64, because the data area corresponding to the local node programs overrun the capacity of their local memories (500 kbytes only).

The linearity of Fig. 3 is reflected in a good efficiency of the parallel algorithm, as it is shown in Fig. 4 (N= 32). In the latter figure we show the efficiency of the complete algorithm (filtered backprojection) and two of its main steps (2D filter and simple backprojection). The best efficiency corresponds to the algorithm of simple backprojection, which does not require routing between PEs as a result of the data distribution scheme that we have used (although there is some data redundancy, since each PE holds a copy of the projection image under study). The small decrease in efficiency that happens when the number of PEs increases is associated to the volume of data that each PE has to process (i.e., the size of the problem does not require a so large number of PEs). The algorithm corresponding to the 2D filter shows an efficiency similar to the one of the previous algorithm,

l

. 8

.6

.4

.2

B a c k p r o j e c t i i ~ 2D-Filter _ _ / /

Filtered 8ackprojection

! I ! I I

1 3 5

Fig. 4. Parallel algorithm efficiency.

I q

7


although the efficiency decrease with the number of PEs is more severe. This stronger decrease is due to the fact that this algorithm does have some routing associated to the calculation of the direct and inverse 2D FFTs of each image. As expected, the complete algorithm of the filtered backprojection reconstruction is the one showing the worst efficiency, since it requires the projection images to be redistributed (an only-routing process) and the filtering of the 3D object by a 3D FFT.

As a last remark, it is interesting to point out that the parallel algorithm will be really efficient as long as the execution time corresponding to the change of partition is not dominant. However, this situation would only happen for very large hypercube dimension and small problem size.

6. Application: macromolecular structures

The first practical application of the methodology described in this work has been the calculation of the three-dimensional structure of the oligomer formed by the bacterial chaperonin GroEL from E.coli under certain stress conditions [5]. The study of chaperonin proteins is becoming

increasingly important in molecular biology and biotechnology, since they are implicated in such fundamental molecular processes as protein fold- ing [8] and protein export [14].

The starting projection images were obtained in a transmission electron microscope. The image data collection strategy followed in this work corresponds to the one proposed by Radermacher et al. [18], that is usually referred to as 'conical tilt'. Within the framework of this strategy it is possible to show that only with a pair of images, showing the same field first tilted and then untilted (Fig. 5), it is possible to obtain a useful estimate of the specimen 3D structure. The method is suitable for specimens that tend to give under the electron microscope a ~preferential view', that is, specimens that interact with the surface of the grid in a very regular way rendering distinct and reproducible untilted views. The key idea of the method is the realization that the tilted images from a conical tilt data set whose geometry is uniquely defined from the knowledge of both the tilt angle and the orientation of the different views in the plane of the electron microscopy grid. Therefore, only the tilted images enter ino the reconstruction process, but the untilted images are needed in order to identify the geometry.

Fig. 5. Images of the bacterial GroEL chaperonin. (a) Untilted. (b) Tilted 50 °.

Vol. 27, No. 1. April 1992


Fig. 6. Template used in the matching algorithm.

The first logical step in this methodology is the detection of the specimens in the untilted field and the identification of their in-plane orientation. We have accomplished these two goals by a template matching approach using as template the average image of a reduced set of manually selected images (Fig. 6). Image template matching is a fundamental application in the field of image processing. Star- ting from a large image search field and a given template, the template matching process basically amounts to the finding of a subimage or, in general, a set of subimages, that are similar to the template and that are localized within the search field. All the proposed methods to solve this problem are based on the definition of a measure of similarity between two images [23, 24].

Signal Processing

Fig. 7. Different views of the 3D reconstruction of the GroEL chaperonin.


Image template matching algorithms are ideal candidates for parallel implementations, since the computing time that they require is very large and most of the operations are among neighbor elements within the image. Zapata et al. [28] presented a general parallel algorithm on SIMD hypercube computers that does not impose any restriction on the size of the search field, the template or the hypercube. This algorithm has been developed using the same mapping and partition methodology that has been previously described in this work. The cross-correlation coefficient was used as a measure of similarity, since the images were very noisy and this measure has proved to be robust with respect to the noise [24]. This algorithm presents a high efficiency that does not degrade the parallelization of the global 3D reconstruction process.

Figure 7 shows four different views of the 3D reconstruction of the GroEL chaperonin. This structure has been obtained applying the filtered backprojection method as described in this work.

The problem dimensions are as follows: search field is 1024 x 1024, template is 64 × 64 and the final reconstructed volume is 64 x 64 x 64.

7. Conclusions

Filtered backprojection is one of the most common reconstruction schemes currently in use. A number of different parallel implementations have been performed until now, among them our own implementation on a shared-memory multiprocessor of the BBN type [30]. However, no general implementation for hypercube computers has been proposed so far. In this work we present a parallel implementation of the filtered backprojection algorithm on hypercube computers of arbitrary size that is efficient enough to be of practical interest. The efficiency of the approach is based on the distribution of the image and object data using a consecutive storage model, in the pure binary indexing of the PEs, and on the use of different hypercube partitions for each process. In this way we present

our results on the 3D reconstruction of an important biomolecule from 2D projection images obtained in a transmission electron microscope. The implementation is very flexible, being succect- ible of massively parallel implementations as well as of workstation accelerators-type of configurations.

References

[1] W.C. Athas and C.L. Seitz, "Multicomputers: Message- passing concurrent computers", IEEE Compur, Vol. 21, No. 8, 1988, pp. 9 24.

[2] R.H. Bates and M.J. McDonnell, Image Restoration and Reconstruction, Oxford Univ. Press, London, 1986.

[3] I. Benavides, Reconstruction of electronic microscopy images in hypercube computers, Ph.D Thesis, Univ. Santiago de Compostela, Spain, 1990 (in Spanish).

[4] J.M. Carazo, I. Benavides, S.M. Marco, J.L. Carrascosa and E.L. Zapata, "Detection, averaging and 3D reconstruction of biological specimens on hypercube (transputers based) computers", Proc. XII Internat. Con]'. Electron Microscopy, Seattle, USA, Vol. 1, 1990, pp. 454 455.

[5] J.M. Carazo, S. Marco, G. Abella, J.L. Carrascosa, J.P. Secilla and M. Muyal, "Heat-shock induced structural changes occur among morphogenetic factors from different bacteria", submitted.

[6] G. Fox, M. Johnson, G. Lyzenga, S. Otto, J. Salmon and D. Walker, Solving Problems on Concurrent Processor, Prentice Hall, Englewood Cliffs, N J, 1988.

[7] J. Frank, A. Verschoor and T. Wagenknecht, "Computer processing of electron microscopy images of single molecules", in: T.T. Wu, ed., New Metho&)logies in Studies q[' Protein Configurations, Van Nostrand Reinhold, New York, 1985, pp. 36 89:

[8] P. Goloubinoff, J.T. Christeller, A.A. Gatenby and G.H. Lorimer, "Reconstitution of active dimeric ribulose bisphosphate carboxylase from an unfolded state depends on two chaperonin protein and Mg-ATP", Nature, Vol. 342, 1989, pp. 884 889.

[9] J.P. Hayes and T. Mudge, "Hypercube supercomputers", Proc. IEEE, Vol. 77, No. 8, December 1989, pp,. 1829 1841.

[10] G.T. Herman, Image Reconstruction Jkom Projections." The Fundamentals of Computerized Tomography, Academic Press, New York, 1980.

[11] D.A. Heyner and W.K. Jenkins, "'The missing cone problem in computer tomography", in: T.S. Huang, ed., Advances in Computer Vision and Image Processing, JAI Press, London, 1984, pp. 83 144.

[12] S.L. Johnsson, "Communication efficient basic linear algebra computations on hypercube architecture", J. Parallel andDistr. Comput., Vol. 4, No. 2, 1987. pp. 133 172.

Vol 27, No. I, April 1992

64 E.L. Zapata et aL / Image reconstruction on hypercube computers

[13] S.Y. Kung, "VLSI array processors for signal/image processing", in K. Hwang and D. Degroot, eds., Parallel Pro- cessing for Supercomputers and Artificial Intelligence, McGraw-Hill, New York, 1989, pp. 561-608.

[14] N. Kusukawa, T. Yira, C. Ueguchi, Y. Akiyama and K. lto, "Effects of mutations in heat shock genes GroES and GroEL on protein export in eschericcia coli", EMBO J., Vol. 8, No. 11, 1989, pp. 3517 3521.

[15] O.G. Plata, ACLAN, A parallel language for multiprocessor systems, Ph.D Thesis, Univ. Santiago de Compostela, Spain, 1989 (in Spanish).

[16] O.G. Plata, J.D. Bruguera, F.F. Rivera, R. Doallo and E.L. Zapata, "ACLE, a software package for SIMD computers simulations', Comput. J., Vol. 33, No. 3, 1990, pp. 194 203.

[17] O.G. Plata, E.L. Zapata, F.F. Rivera and R. Peskin, "An array processing language for message-passage hypercubes", in: D.J. Evans et al., eds. Advances on Parallel Computing, Elsevier, Amsterdam, 1990, pp. 455 460.

[18] M. Radermacher, A. Verschoor, T. Wagenknecht and J. Frank, "Three-dimensional reconstruction from a single- exposure, random conical tilt series applied to the 50S ribosomal subunit of escherichia coli", J. Microscopy, Vol. 146, 1987, pp. 113-136.

[19] M. Radermacher, T. Wagenknecht, A. Verschoor and J. Frank, "A new reconstruction scheme applied to the 50S ribosomal subunit of E.coli", J. Microscopy, Vol. 141, 1986, pp. RPI P2.

[20] F.F. Rivera, Partition and mapping of algorithms in hypercube computers: Pattern recognition, Ph.D Thesis, Univ. Santiago de Compostela, Spain, 1990 (in Spanish).

[21] F.F. Rivera, R. Doallo, J.D. Bruguera, E.L. Zapata and R. Peskin, "Gaussian elimination with pivoting into hypercubes", J. Parallel Comput. Vol. 14, 1990, pp. 51-60.

[22] F.F. Rivera, M.A. Ismail and E.L. Zapata, "Parallel squared error clustering on hypercube arrays", J. Parallel Distr. Comput., Vol. 8, No. 3, 1990, pp. 292-299.

[23] A. Rosenfeld and A.C. Kak, Digital Picture Processing, Vols. 1 and 2, Academic Press, New York, 1982.

[24] J.P. Secilla, N. Garcia and J.L. Carrascosa, "Template location in noisy pictures", Signal Processing, Vol. 14, No. 4, June 1988, pp. 347 361.

[25] J. Vermeesch, P. Bulckaert, M. Defrise and O. Steenhaut, "A pipelined VLSI based structure for the reconstruction of the three dimensional images from projections", J. Microprocessing Microprogramming, Vol. 21, 1987, pp. 129-136.

[26] E.L. Zapata, I. Benavies, J.D. Bruguera and J.M. Carazo, "Image reconstruction on transputer networks", in: D.J. Pritchard and C.J. Scott, eds., Applications of Transputers, Vol. 2, IOS Press, 1990, pp. 164 171.

[27] E.L. Zapata, I. Benavides, J.D. Bruguera and J.M. Car- azo, "Image reconstruction on hypercube computers". Proc. 3rd Internat. Symp. Frontiers of Massively Parallel Computation, Piscataway, IEEE Press, New York, 1990, pp. 127 133.

[28] E.L. Zapata, I. Benavides, O.G. Plata, F.F. Rivera and J.M. Carazo, "Image template matching on hypercube SIMD computers", Signal Processing, Vol. 21, No. 1, Sep- tember 1990, pp. 49-60.

[29] E.L. Zapata, J.D Bruguera, O.G. Plata and F.F. Rivera, "A parallel Markovian model reliability algorithm for hypercube computers", J. Microproeessing Microprogram- ruing, Vol. 27, 1989, pp. 501-508.

[30] E.L. Zapata, J.M. Carazo, I. Benavides, S. Walther and R. Peskin, "Filtered back projection on shared memory multiprocessors", J. Ultramicroscopy, Vol. 34, 1990, pp. 271 282.

[31] E.L. Zapata, O.G. Plata, F.F. Rivera, J.D. Bruguera, R. Doallo, I. Benavides and F. Argiiello, "Software tools for multiprocessor simulation and programming", J. Cyber- netics Systems, Vol. 21, No. 23, 1990, pp. 157 176.

[32] E.L. Zapata, F.F. Rivera, J.l. Benavides, J.M. Carazo and R. Peskin, "Multidimensional fast Fourier transform into fixed size hypercubes", lEE Proc. Part E." Computers and Digital Techniques, Vol. 137, 1990, pp. 253 260.

[33] E.L. Zapata, F.F. Rivera and O.G. Plata, "On the partition of algorithms into hypercubes", in : D.J. Evans, ed., Advances on Parallel Computing, JAI Press, 1990, pp. 149 171.

[34] E.L. Zapata, F.F. Rivera, O.G. Plata and M.A. Ismail, "Parallel fuzzy clustering on fixed size hypercube SIMD computers", J. Parallel Comput., Vol. 13, No. 3, 1989, pp. 291-303.

Signal Proce~ing

Image reconstruction on hypercube computers Application to ...Keywords. Filtered backprojection,...

Documents

Transcript of Image reconstruction on hypercube computers Application to ...Keywords. Filtered backprojection,...