3D Haar Wavelet Transform With Dynamic Partial

4
3D Haar Wavelet Transform with Dynamic Partial Reconfiguration for 3D Medical Image Compression Afandi Ahmad, Benjamin Krill, Abbes Amira Electronic and Computer Engineering School of Engineering and Design Brunel University, West London, United Kingdom Email: {Afandi.Ahmad, Benjamin.Krill, Abbes.Amira}@brunel.ac.uk Hassan Rabah Laboratoire d’Instrumentation Electronique de Nancy University Henri Poincare, France Email: [email protected] Abstract— This paper describes the design and implementation of 3D Haar wavelet transform (HWT) with transpose based computation and dynamic partial reconfiguration (DPR). As a result of the separability property of the multi-dimensional HWT, the proposed architecture has been implemented using a cascade of three N-point 1D HWT and two transpose memory for a 3D volume of N ×N ×N , suitable for 3D medical image compression. The proposed 3D HWT architectures were implemented on Xilinx Virtex-5 field programmable gate array (FPGA) using VHDL. An in-depth performance analysis and comparison has shown that DPR based implementation improves both speed and power consumption as well as reducing the hardware required for the system. I. I NTRODUCTION The nature of medical image processing applications in- volves performing complex tasks, mainly matrix transforms, repeatedly on a large set of volume data, often under real-time requirements. As an example, the computational complexity for fast Fourier transform (FFT) and the recent developed curvelet transform is in the order from O(N × logN ) to O(N 2 × J ) with N is the transform size and J is the maximum transform resolution level, and hence are extremely computationally intensive for large medical volumes data [1]. In order to solve this issue, efficient implementation for these operations are pertinent of important and lead to ef- ficient solutions for three-dimensional (3D) medical image compression. Higher compression ratios can be achieved using multi-resolution analysis where the 3D wavelet transform is widely applied due to its features of perfect reconstruction property and lack of blocking artifacts. In this research, Haar wavelet transform (HWT) as the simplest of all wavelets has been chosen as a result of the following features: conceptually simple, fast, memory efficient, and it is exactly reversible without the edge effects which are a problem with other wavelet transforms [2]. Reconfigurable hardware, especially field programmable gate arrays (FPGAs) offers significant potential for the efficient implementation of a wide range of computationally intensive signal and image processing algorithms and applications, from simple low-resolution and low bandwidth (multimedia, picture phone) to very high-resolution and high-bandwidth (medical imaging, HDTV) applications [3]. Complexity in data addressing and accessing, massive amount of data to be processed and requirement of sev- eral building blocks for its computationally intensive matrix transformation operations have resulted a big restriction for hardware implementation in 3D medical image compression. FPGAs with dynamic partial reconfiguration (DPR) is a promising solution for reducing the hardware required for an efficient design implementation as well as improving the performance, speed and power consumption of the 3D medical image compression system. Despite its complexity, there has recently been an interest in 3D discrete wavelet transform (DWT) implementation on FPGAs. However, a survey of existing implementations and architectures indicates that the research is still in its infancy as demonstrated by the limited contributions [4], [5]. With regards to DPR mechanism, it has been widely studied in various fields [6]-[10]. A significant contribution presented in [11] with novel FPGA-based scalable architecture for dis- crete cosine transform (DCT) using DPR and exhibits sig- nificant results for partial reconfiguration process with better saving of power consumption, reduce the processing clock cycles and the reconfiguration overhead. These achievements motivate a strong justification to further explore the 3D HWT implementation with DPR and evaluate their performance in terms of area, power consumption and maximum speed. In this paper, the evaluation of the proposed architectures for 3D HWT with transpose based computation and DPR mechanism on FPGA that are suitable for 3D medical im- age compression is discussed. Comparative studies for both architectures in terms of area, power consumption, maximum speed and the influence of the transform size on the hardware performance are also presented. The structure of the paper is organised as follows. Section II presents the proposed architecture of 3D HWT with DPR mechanism. Experimental results, comparison and analysis are described in Section III. Section IV concludes this paper. II. PROPOSED ARCHITECTURE FOR 3D HWT WITH DPR In this section, the proposed system architecture as depicted in Fig. 1(a) to (e) is briefly explained, including the implemen- tation of 3D wavelet compression and decompression system, the computation process of 3D HWT with transpose based 978-1-4244-4918-7/09/$25.00 ©2009 IEEE 137

Transcript of 3D Haar Wavelet Transform With Dynamic Partial

Page 1: 3D Haar Wavelet Transform With Dynamic Partial

3D Haar Wavelet Transform with Dynamic PartialReconfiguration for 3D Medical Image Compression

Afandi Ahmad, Benjamin Krill, Abbes AmiraElectronic and Computer Engineering

School of Engineering and DesignBrunel University, West London, United Kingdom

Email: {Afandi.Ahmad,Benjamin.Krill,Abbes.Amira}@brunel.ac.uk

Hassan RabahLaboratoire d’Instrumentation

Electronique de NancyUniversity Henri Poincare, France

Email: [email protected]

Abstract— This paper describes the design and implementationof 3D Haar wavelet transform (HWT) with transpose basedcomputation and dynamic partial reconfiguration (DPR). As aresult of the separability property of the multi-dimensional HWT,the proposed architecture has been implemented using a cascadeof three N-point 1D HWT and two transpose memory for a 3Dvolume of N×N×N , suitable for 3D medical image compression.The proposed 3D HWT architectures were implemented on XilinxVirtex-5 field programmable gate array (FPGA) using VHDL.An in-depth performance analysis and comparison has shownthat DPR based implementation improves both speed and powerconsumption as well as reducing the hardware required for thesystem.

I. INTRODUCTION

The nature of medical image processing applications in-volves performing complex tasks, mainly matrix transforms,repeatedly on a large set of volume data, often under real-timerequirements. As an example, the computational complexityfor fast Fourier transform (FFT) and the recent developedcurvelet transform is in the order from O(N × logN) toO(N2 × J) with N is the transform size and J is themaximum transform resolution level, and hence are extremelycomputationally intensive for large medical volumes data [1].

In order to solve this issue, efficient implementation forthese operations are pertinent of important and lead to ef-ficient solutions for three-dimensional (3D) medical imagecompression. Higher compression ratios can be achieved usingmulti-resolution analysis where the 3D wavelet transform iswidely applied due to its features of perfect reconstructionproperty and lack of blocking artifacts. In this research, Haarwavelet transform (HWT) as the simplest of all wavelets hasbeen chosen as a result of the following features: conceptuallysimple, fast, memory efficient, and it is exactly reversiblewithout the edge effects which are a problem with otherwavelet transforms [2].

Reconfigurable hardware, especially field programmablegate arrays (FPGAs) offers significant potential for the efficientimplementation of a wide range of computationally intensivesignal and image processing algorithms and applications, fromsimple low-resolution and low bandwidth (multimedia, picturephone) to very high-resolution and high-bandwidth (medicalimaging, HDTV) applications [3].

Complexity in data addressing and accessing, massiveamount of data to be processed and requirement of sev-eral building blocks for its computationally intensive matrixtransformation operations have resulted a big restriction forhardware implementation in 3D medical image compression.FPGAs with dynamic partial reconfiguration (DPR) is apromising solution for reducing the hardware required foran efficient design implementation as well as improving theperformance, speed and power consumption of the 3D medicalimage compression system.

Despite its complexity, there has recently been an interestin 3D discrete wavelet transform (DWT) implementation onFPGAs. However, a survey of existing implementations andarchitectures indicates that the research is still in its infancyas demonstrated by the limited contributions [4], [5].

With regards to DPR mechanism, it has been widely studiedin various fields [6]-[10]. A significant contribution presentedin [11] with novel FPGA-based scalable architecture for dis-crete cosine transform (DCT) using DPR and exhibits sig-nificant results for partial reconfiguration process with bettersaving of power consumption, reduce the processing clockcycles and the reconfiguration overhead. These achievementsmotivate a strong justification to further explore the 3D HWTimplementation with DPR and evaluate their performance interms of area, power consumption and maximum speed.

In this paper, the evaluation of the proposed architecturesfor 3D HWT with transpose based computation and DPRmechanism on FPGA that are suitable for 3D medical im-age compression is discussed. Comparative studies for botharchitectures in terms of area, power consumption, maximumspeed and the influence of the transform size on the hardwareperformance are also presented. The structure of the paperis organised as follows. Section II presents the proposedarchitecture of 3D HWT with DPR mechanism. Experimentalresults, comparison and analysis are described in Section III.Section IV concludes this paper.

II. PROPOSED ARCHITECTURE FOR 3D HWT WITH DPR

In this section, the proposed system architecture as depictedin Fig. 1(a) to (e) is briefly explained, including the implemen-tation of 3D wavelet compression and decompression system,the computation process of 3D HWT with transpose based

978-1-4244-4918-7/09/$25.00 ©2009 IEEE 137

Page 2: 3D Haar Wavelet Transform With Dynamic Partial

Fig. 1. Proposed system architecture framework. (a) and (b) Block diagrams of the 3D wavelet compression/decompression. (c) Computation of 3D HWTcoefficients using transpose based computation. (d) Proposed top level architecture for 3D HWT using DPR (e) Transpose module implementation withoutDPR mechanism.

computation, top level architecture for 3D HWT with DPRand the transpose module implementation without DPR.

A. 3D HWT and Transpose

Computation of 3D HWT is performed as follows. The inputto the first one-dimensional (1D) HWT is read row by row,and the 1D HWT is performed on each input vector as theyare provided. The calculated values are sent to the transposemodule T1 which calculated the memory addresses for thetransposition and stores the data into memory. The transposeT1 acts as a memory forwarder and performs matrix transpose,since row vectors are provided by the 1D HWT.

After transposition of the resultant matrix, another 1D HWTis performed on the coefficients which are stored in memoryto yield the two-dimensional (2D) HWT coefficients. This isthe conventional row-column 2D HWT computation. The 2DHWT computation is performed on each sub-image S0 to S7

for N = 8, where S0 is the first sub-image and S7 is theeighth sub-image of the input volume. The output coefficientsof the 2D DWT are sent to the second transpose, T2. Asdescribed before all coefficients are stored into memory alsothe transpositions of T2 are stored after transformation intomemory.

Instead of using the logic and other embedded resources forthe transpose implementation, optimisation of block randomaccess memory (BRAM) has been considered in this work.This approach significantly improves utilisations of available

storage resources, optimises system performance, and meetsthe design goals.

B. DPR System Architecture and Implementation

There are two areas in the DPR framework: reconfigurableand static. The reconfigurable areas have been used for 1DHWT and different transposition modules, while the staticarea consists of the data fetch unit and the memory controller(Wishbone compliant). Fig. 1(d) illustrates the details of theworking system for the implementation of 3D HWT with DPR.

The DPR module connections are performed with simplebus interfaces. Data fetch unit and HWT DPR area areconnected with a defined data bit width bus, a request lineand back signal free. The fetch unit sends data and the requestto the HWT core as long the free signal is active. HWT andtransposition module are connected with the defined data bitwidth bus and an enable signal. Each cycle where the enablesignal is active data will be transposed and written into thememory.

The proposed system is implemented with the current partialreconfiguration suites, ISE 9.2PR and PlanAhead 10.1 fromXilinx [12]. It uses the module based DPR where configurationframes are reconfigured and busmacros are used to connectthe DPR areas with the static area [13]. This methodologyhas the restriction that all design files and reconfigurablemodules must be available to the build environment to buildpartial modules. The main advantage of DPR is that an

138

Page 3: 3D Haar Wavelet Transform With Dynamic Partial

implementation of a given design can be integrated into asmaller FPGA. This reduces cost, package size and power.Also power consumption and logic size can be reduced bycascading calculation modules.

In the 3D HWT case, the transposition module and the1D HWT module can be changed. The transposition modulewill be changed during image calculation three times foreach sub-image. First transposition T1 performs the row tocolumn transposition which are active till a sub-image istransposed. After the T1 sub-image transposition the DPRarea is reconfigured with the T2 transposition which savesthe sub-images and these operations will be repeated for allsub-images. After all sub-images are computed and transposedwith T2, the transposition DPR is reconfigured with the straighttransposition and the last 1D HWT is performed on all T2

sub-images. The HWT DPR area can be reconfigured toswitch between different transform sizes. The transform sizeN dependency is propagated from the HWT module to allconnected modules, and offers the advantage that no otherlogic changes are necessary.

On the other hand, Fig. 1(e) illustrates the implementationof transpose module without DPR with all modules have tobe combined and connected with multiplexer, hence lead tohigher area resources demand.

III. RESULTS AND ANALYSIS

A. Medical Images Simulation

Fig. 2(a) to (i) illustrate the best quality and compressioncomparison for the first medical volumes slices of originaland the reconstructed slices for computerised tomography(CT), medical resonance imaging (MRI) and positron emissiontomography (PET) images using 3D HWT in a medical im-age compression system with context-based adaptive variablelength coding (CAVLC).

B. FPGA Implementation

Both architectures were implemented using VHDL on Xil-inx University Program XUPV5-LX110T Development Sys-tem. This development platform comes with on-board memory,industry standard connectivity interfaces and equipped withXC5VLX110T FPGA.

Table I lists the overall performance results for both pro-posed architectures. The implementation of 3D HWT withDPR mechanism provides significant results with better savingof area and reduce the power consumption by 1.27% and13.96% respectively. In terms of maximum frequency, DPRmechanism yielding 17.216% better maximum frequency thanwithout DPR.

Concerning the generated bitstreams files and configurationtimes required, a full bitstream of 3,889,941 bytes is requiredfor 3D HWT configuration and the shortest configurationtime needed is also the worst at 4.8 ms. On the contrary,full partial bitstreams generated are significantly smaller andhence reducing the storage space required to store the variousbitstreams. The results show that the file size of transform size(N = 64) for full partial bitstreams is reduced about 86.95%

Fig. 2. Comparison of original and reconstructed CT, MRI and PET imagesfor the first slices.

TABLE I

RESOURCES UTILISATION AND OVERALL PROPOSED ARCHITECTURES

PERFORMANCE ON VIRTEX-5 (XC5VLX110T) FOR N = 128.

Parameters Proposed 3D HWT

Without DPR With DPR

Area (Slices) 21,047 (30.45%) 20,779 (30.06%)

Power consumption (mW) 1964.14 1689.84

Maximum frequency (MHz) 288.02 347.92

of a full bitstream and the configuration time is also reducedby 86.88%. In summary, by comparing the file sizes of thebitstreams, partial reconfiguration has more efficient bitstreamand as proven, smaller bitstream decreases the configurationtime.

C. Discussions

In order to evaluate the relationship of the transform sizestowards the area, power consumption and maximum speed,there are four different transform sizes (N = 8, 16, 32, 64 and128) which have been used for the FPGA implementation.Various transform sizes used are reflecting the various size ofvolumes data in 3D medical imaging.

Influence of transform size on area, power consumptionand maximum frequency is depicted in Fig. 3. For easeof visualisation, the graphs are plotted on a log scale tothe base 10. Results indicate that the proposed 3D HWTwith transpose based computation requires more area, whileby using DPR mechanism the area saving can be achievedbetween 2.75% to 12.87%. In terms of power consumption,non-partial reconfiguration consumes up to 1377.96 mW for N

139

Page 4: 3D Haar Wavelet Transform With Dynamic Partial

Fig. 3. Influence of transform size on area, power consumption and maximumfrequency.

= 64 and it saves by 4.20% to 18.81% by performing partialreconfiguration.

Moreover, in order to visualise the impact of non-partialand partial reconfiguration for the proposed architecture, chiplayouts on different FPGA devices of Virtex-5 are shownin Fig. 4. With DPR mechanism, the area for static andreconfigurable area can be specified and it can be clearly seenin the layouts generated.

Comparative study for both non-partial and partial reconfig-uration processes shows an important conclusion concerningthe advantages offered by DPR especially in processing largemedical volumes. Analysis for the performance achieved fordifferent parameters such as area utilised, power consumed andmaximum frequency achieved clearly reveals that with DPR,complex designs can be implemented on limited hardwareresources and hence lead to better performance achievements.

IV. CONCLUSIONS

Two architectures for 3D HWT have been proposed in thispaper based on transpose computation and partial reconfigura-tion. Comparative study for both non-partial and partial recon-figuration processes shows interesting conclusions concerningthe advantages offered by DPR and lead to a promising solu-tion for implementing computationally intensive applicationssuch as 3D medical image compression. Using DPR, severallarge systems are mapped to small hardware resources andthe area, power and maximum frequency are optimised andimproved.

REFERENCES

[1] I. S. Uzun., “Design and FPGA Implementation of Matrix Transforms forImage and Video Processing”, PhD Thesis, School of Computer Science,The Queen’s University of Belfast 2006.

[2] A. Khashman and K. Dimililer, “Image Compression using NeuralNetworks and Haar Wavelet”, WSEAS Trans. Sig. Proc., vol. 4, pp. 330–339, 2008.

[3] A. Ahmad, K. K. Loo and J. Cosmas, “VLSI Architecture DesignApproaches for Real-time Video Processing”, WSEAS Trans. Cir. andSys., vol. 7, pp. 855–868, 2008.

Fig. 4. Comparison of chip layout for different Virtex-5 devices for N = 64.

[4] M. Jiang and D. Crookes, “Area-Efficient High-Speed 3D DWT Proces-sor Architecture”, Electronics Letter, vol. 43, pp. 502–503, 2007.

[5] M. Jiang and D. Crookes, “FPGA Implementation of 3D DiscreteWavelet Transform for Real-time Medical Imaging”, in Proc. 18th Euro-pean Conf. on Circuit Theory and Design (ECCTD 2007), Seville, Spain,pp. 519–522, 2007.

[6] M. Majer, J. Teich, A. Ahmadinia and C. Bobda, “The Erlangen Slot Ma-chine: A Dynamically Reconfigurable FPGA-based Computer”, Journalof VLSI Signal Processing, vol. 47, pp. 15–31, 2007.

[7] C. Claus, J. Zeppenfeld, F. Muller and W. Stechele, “Using Partial-Run-Time Reconfigurable Hardware to Accelerate Video Processing in DriverAssistance System”, in Proc. Conference Design, Automation, Test andExhibition in Europe (DATE ’07), Nice, France, pp. 1–6, 2007.

[8] L. Braun, K. Paulsson, H. Kromer, M. Hubner and J. Becker, “DataPath Driven Waveform-like Reconfiguration”, in Proc. International Con-ference on Field Programmable Logic and Applications (FPL 2008),Heidelberg, Germany, pp. 607–610, 2008.

[9] A. Shoa and S. Shirani, “Run-Time Reconfigurable Systems for DigitalSignal Processing Applications: A Survey”, Journal of VLSI SignalProcessing, vol. 39, pp. 213–235, 2005.

[10] P. Manet, D. Maufroid, L. Tosi, G. Gailliard, O. Mulertt, M. D. Ciano,J. -D. Legat, D. Aulagnier, C. Gamrat, R. Liberati, V. L. Barba, P.Cuvelier, B. Rousseau and P. Bertrand, “An Evaluation of DynamicPartial Reconfiguration for Signal and Image Processing in ProfessionalElectronics Applications”, EURASIP J. Embedded Syst., vol.2008, pp.1–11, 2008.

[11] J. Huang, M. Parris, J. Lee and R. F. DeMara, “Scalable FPGA-basedArchitecture for DCT Computation Using Dynamic Partial Reconfigura-tion”, ACM Trans. on Embedded Comput. Syst., vol. V, pp. 1–18, 2008.

[12] Xilinx INC v2.1, “Partial Reconfiguration Design with PlanAhead”,2008.

[13] Lysaght, P. and Blodget, B. and Mason, J. and Young, J. and Bridgford,B., “Invited Paper: Enhanced Architectures, Design Methodologies andCAD Tools for Dynamic Reconfiguration of Xilinx FPGAs”, in FieldProgrammable Logic and Applications, 2006. FPL ’06. InternationalConference on, Madrid, Spain, pp. 1–6, 2006.

140