Synopsis Fourier Based Image Translation, Scale, Local … · 2004. 9. 22. · This poster shows...

1
Synopsis This poster shows results from our development of an extended MATLAB image processing toolbox, which implements some useful optical image processing and mosaicking algorithms found in the literature. We surveyed and selected algorithms from the field which showed promise in application to the underwater environment. We then extended these algorithms to explicitly deal with the unique constraints of underwater imagery in the building of our toolbox. As such, the algorithms implemented include: 1. Contrast limited adaptive histogram specification (CLAHS) to deal with the inherent non- uniform lighting in underwater imagery 2. Fourier based methods for scale, rotation, and translation recovery which provide robust- ness against dissimilar image regions 3. Local normalized correlation for image registration to handle the unstructured environment of the seafloor 4. Multiresolution pyramidal blending of images to form a composite seamless mosaic with- out blurring or loss of detail near image borders Keeping in theme with the global view of CenSSIS, “Diverse Problems, Similar Solutions,” many of the algorithms are useful to the rest of the CenSSIS community. Take a look at the normal- ized correlation section of the poster to see some recent applications of our algorithm to medical imaging. Contrast Limited Adaptive Histogram Specification Figure 1: (top left) Original imagery collected by the AUV ABE acquired at a geological site of interest in the East Pacific Rise. (top right) Adaptive histogram equalized imagery. While this technique compensates very well for the nonuniform lighting pattern, it cannot (as seen in the lower right corner of the image) compensate for parts of the image where the light intensity levels are of the order of the sensitivity of the camera. (bottom left) Histogram of the original imagery. (bottom right) Histogram of the AHE imagery. The propagation of light underwater suffers from rapid attenuation and extreme scattering. These, in combination with the limited camera-to-light separation available on most underwater imaging platforms, places severe limita- tions on underwater imagery. To deal with the lighting artifacts of nonuniform illumination and low contrast underwater imagery, we utilize the classical techniques associated with contrast limited adaptive histogram equal- ization (CLAHE) (Zuiderveld 1994). With this technique the image is broken up into sub-regions. The optimal gray scale distribution is calculated for each of these sub-regions, based upon its histogram and a previously deter- mined transfer function, which is based upon the desired histogram of the sub-region. Then, each pixel of the image is adjusted based upon interpolation between the manipulated histograms of the neighboring sub-regions. Our extensive work upon underwater imagery has suggested that the model of a Raleigh distribution is most suited for underwater imagery. Contact Info We’d be happy to supply you with a copy of our MATLAB image processing toolkit. We also have extended underwater imagery data sets which are available for research work. Please contact either of us: Dr. Hanumant Singh (508) 289-3270, [email protected] Ryan Eustice (508) 289-3269, [email protected] References Brown, L. G. (1992). “A Survey of Image Registration Tech- niques.” ACM Computing Surveys 24(4): 325—376. Burt, P. J. and E. H. Adelson (1983). “A Multiresolution Spline with Application to Image Mosaics.” ACM Transactions of Graph- ics 2(4): 217—236. Eustice, R., O. Pizarro, et al. (2002). UWIT: Underwater Imaging Toolbox for Optical Image Processing and Mosaicking in MATLAB. Proceedings of the Third Underwater Technology Symposium, 2002, Tokyo, Japan. (to be presented) Irani, M. and P. Anandan (1996). Robust Multi-Sensor Image Alignment. Sixth International Conference on Computer Vision, 1998. Mandelbaum, R., G. Salgian, et al. (1999). Correlation-Based Estimation of Ego-Motion and Structure from Motion and Stereo. Pro- ceedings of the Seventh IEEE International Conference on Com- puter Vision, 1999, Kerkyra, Greece. Reddy, B. S. and B. N. Chatterji (1996). “An FFT-Based Tech- nique for Translation, Rotation, and Scale-Invariant Image Reg- istration.” IEEE Transactions on Image Processing 5(8): 1266— 1271. Zuiderveld, K. (1994). Contrast Limited Adaptive Histogram Equalization. Graphics Gems IV. P. Heckbert. Boston, Academic Press. IV: 474—485. Fourier Based Image Translation, Scale, and Rotation Recovery Many image processing problems involve the fundamental task of registration of a pair of images. Methods range from: 1) correlation methods which use pixel values directly; 2) fast Fourier transform methods which use fre- quency domain information; and 3) feature based methods which use low-level features such as edges and cor- ners. This particular algorithm is based upon Fourier domain methods for scale, rotation, and translation recovery by making use of the phase shift property of Fourier transforms (Reddy 1996). Let ) , ( 2 y x f be a shifted version of the image ) , ( 1 y x f : ) , ( ) , ( 0 0 1 2 y y x x f y x f - - = By the shifting property of the Fourier Transform: ) ( 2 1 1 2 1 2 0 2 9 1 ) , ( ) , ( y x j e F F ϖ ϖ ϖ ϖ ϖ ϖ + - = The translational offsets ) , ( 0 0 y x can be recovered by locating the impulse associated with the inverse transform of the cross-power spectrum of the two images: ) , ( ) , ( ) , ( ) , ( ) , ( 0 0 ) ( * 2 1 2 2 1 1 * 2 1 2 2 1 1 1 0 2 0 1 y y x x e F F F F y x j + + = = - + δ ϖ ϖ ϖ ϖ ϖ ϖ ϖ ϖ ϖ ϖ This same property can be exploited for images which are rotated and scaled by representing them in a coordinate system where scale and rotation appear as shifts. For example, when ) , ( 2 y x f is a rotated version of ) , ( 1 y x f : ) cos sin , sin cos ( ) , ( 0 0 0 0 1 2 θ θ θ θ y x y x f y x f + - + = Their Fourier transforms are related by: ) cos sin , sin cos ( ) , ( 0 2 0 1 0 2 0 1 1 2 1 2 θ ϖ θ ϖ θ ϖ θ ϖ ϖ ϖ + - + = F F Looking at the magnitudes of the Fourier transforms and converting to polar coordinates we see that the rotations can be written as shifts: ) , ( ) , ( 0 1 2 θ θ ρ θ ρ - = M M ) arctan( 2 1 2 2 2 1 ϖ ϖ θ ϖ ϖ ρ = + = Similarly, when two images are related by a scale factor, a , then their Fourier transforms are related by: ) , ( 1 ) , ( 2 1 1 2 2 1 2 a a F a F ϖ ϖ ϖ ϖ = Taking the logarithm of the frequency results in the scale appearing as a shift: ) log log , log (log 1 ) log , (log 2 1 1 2 2 1 2 a a F a F - - = ϖ ϖ ϖ ϖ Combing all of these properties we see that the magnitudes of two translated, scaled, and rotated images are related by: ) , ( ) , ( 0 2 2 θ θ ρ θ ρ - = a M M After taking the log of the radius, rotation and scale are now both represented as shifts: ) , log log ( ) , ( 0 2 2 θ θ ρ θ ρ - - = a M M Filtered Input image. 50 100 150 200 250 300 350 400 450 500 550 50 100 150 200 250 300 350 Magnitude of spectrum. Filtered Input image. 100 200 300 400 500 600 100 200 300 400 500 600 Logpolar Magnitude of Spectrum. Input Image. h*log(rho)t, rho in pixel s theta [deg] 100 200 300 400 500 600 0 20 40 60 80 100 120 140 160 Figure 5: (top) Control image. (middle) Input image. (bottom) Registered image. Control image 50 100 150 200 250 300 350 400 450 500 550 50 100 150 200 250 300 350 Input image 50 100 150 200 250 300 350 400 450 500 550 50 100 150 200 250 300 350 Registered Input image 50 100 150 200 250 300 350 400 450 500 550 50 100 150 200 250 300 350 Control image 50 100 150 200 250 300 350 400 450 500 550 50 100 150 200 250 300 350 Input image 50 100 150 200 250 300 350 400 450 500 550 50 100 150 200 250 300 350 Registered Input image 50 100 150 200 250 300 350 400 450 500 550 50 100 150 200 250 300 350 Figure 4: (Left top) The underwater imagery is first adaptive histogram equalized, windowed, and then Laplacian of Gaussian filtered. (above) Fourier magnitude spectra of filtered image. (left bottom) Log polar of the annulus of fre- quencies of the filtered image. Figure 6: (top) Control image. (middle) Input image. (bottom) Registered image. Figure 2: (left) Original imagery of the whale remains imaged by the DSV Alvin off of San Diego at a depth of 1700m. Limitations in power led to overall intensity levels that are quite low even though the image is uniformly illuminated. (right) Adap- tive histogram equalized imagery. Figure 3: (left) Original color imagery at the base of a hydrothermal vent in the Guaymas Basin at depth of 2200m. (right) Adaptive histogram equalized imagery as applied to the Y component of the imagery in YIQ color space. 0 50 100 150 200 250 0 500 1000 1500 2000 2500 3000 3500 4000 4500 0 50 100 150 200 250 0 500 1000 1500 2000 2500 3000 3500 Multiresolution Pyramidal Based Blending Due to the rapid attenuation of light underwater, the only way to get a large scale view of the seafloor is to build up a mosaic from smaller local images, such as in Figure 7. The mosaic technique is used to construct an image with a far larger field of view and level of resolution than could be obtained with a single photograph. Once the mosaic is generated, a technical problem in image representation is joining image borders so that the edge between them is not visible. The two images to be joined may be considered as two surfaces, where the image intensity I (x,y) is viewed as the elevation above the (x,y) plane. The problem then is how to gently distort the images near their common border so that the seam is smooth? We implement a multiresolution pyramidal blending approach where the two images are decomposed into differ- ent band-pass frequency components, merged on those levels, and then reassembled into a single seamless com- posite image (Burt, 1983). The idea is that with this technique the transition zone between band-pass image components can be appropriately chosen to match the scale of features in that band-pass component. First, a Gaussian pyramid is constructed for each image where the base level in the pyramid, G 0 , is the original image. Each successive level is a low-pass filtered and down-sampled by factor of two version of the previous level (i.e. for an appropriately chosen kernel w(m,n)). Next, the different band-pass components are formed by generating the Laplacian pyramid. The Laplacian pyramid is generated from the Gaussian pyramid by expanding the image at the next higher level in the pyramid to the resolution of the current level and then subtracting them (i.e. ). This results in each level of the Laplacian pyramid containing a separate band-pass component of the original image. The two Laplacian pyramids are then merged at each level of the pyramid and the resulting new seamless image is constructed from the different pyramid levels via where N is the number of pyramid levels and the notation L l,l implies expansion of the level L l, l times to the reso- lution of G 0 . Local Normalized Correlation Normalized correlation is a practical measure of similarity (Brown, 1992). Normalized correlation of two signals is invariant to local changes in mean and contrast. When two signals are linearly related, their normalized correlation is 1. When the two signals are not linearly related, but do contain similar spatial variations, normalized correlation will still yield a value close to unity (Irani, 1996). The lack of rich features in underwater imagery precludes indirect feature based methods, and experimental evi- dence suggests that direct correlation based methods yield good results. We employ a dense local normalized correlation to determine correspondence between images. The shape of the local normalized correlation surfaces will be concave and have a prominent peak at the correct displacement. We fit a quadratic surface near the surface peak and analytically check for concavity (Mandelbaum, 1999) as a method of outlier rejection. Figure 7: A sequence of images which was mosaicked together in a globally consistent manner utilizing all available cross-linked image pair correspondences. To obtain good registration, especially along edges, we must compensate for radial distortion. The similarity metric used for point correspondence was local normalized correlation. Figure 8: George Chen, from MGH, asked us to look into the registra- tion of different frames of CT-Scan imag- ery to track patient organ motion. Above are the first stab results of applying our standard local normalized correlation al- gorithm to determine point correspon- dences. (top) The green dots represent the initial base points which were selected at soft tissue boundaries by applying an edge detection filter to the image. (bottom) The yellow dots represent the corresponding match for each base point. The red vector at each point represents the motion mag- nitude of that point between frames. Figure 9: (top) A two image mosaic with seam. The top image is overlaid over the bottom image. (bottom) The final blended result.

Transcript of Synopsis Fourier Based Image Translation, Scale, Local … · 2004. 9. 22. · This poster shows...

Page 1: Synopsis Fourier Based Image Translation, Scale, Local … · 2004. 9. 22. · This poster shows results from our development of an extended MATLAB image processing toolbox, ... This

SynopsisThis poster shows results from our development of an extended MATLAB image processingtoolbox, which implements some useful optical image processing and mosaicking algorithmsfound in the literature. We surveyed and selected algorithms from the field which showedpromise in application to the underwater environment. We then extended these algorithms toexplicitly deal with the unique constraints of underwater imagery in the building of our toolbox.As such, the algorithms implemented include:

1. Contrast limited adaptive histogram specification (CLAHS) to deal with the inherent non-uniform lighting in underwater imagery

2. Fourier based methods for scale, rotation, and translation recovery which provide robust-ness against dissimilar image regions

3. Local normalized correlation for image registration to handle the unstructured environmentof the seafloor

4. Multiresolution pyramidal blending of images to form a composite seamless mosaic with-out blurring or loss of detail near image borders

Keeping in theme with the global view of CenSSIS, “Diverse Problems, Similar Solutions,” manyof the algorithms are useful to the rest of the CenSSIS community. Take a look at the normal-ized correlation section of the poster to see some recent applications of our algorithm tomedical imaging.

Contrast Limited Adaptive HistogramSpecification

Figure 1: (top left) Original imagery collected by the AUV ABE acquired at a geological site of interest in the East Pacific Rise.(top right) Adaptive histogram equalized imagery. While this technique compensates very well for the nonuniform lighting pattern, itcannot (as seen in the lower right corner of the image) compensate for parts of the image where the light intensity levels are of the orderof the sensitivity of the camera. (bottom left) Histogram of the original imagery. (bottom right) Histogram of the AHE imagery.

The propagation of light underwater suffers from rapid attenuation and extreme scattering. These, in combinationwith the limited camera-to-light separation available on most underwater imaging platforms, places severe limita-tions on underwater imagery. To deal with the lighting artifacts of nonuniform illumination and low contrastunderwater imagery, we utilize the classical techniques associated with contrast limited adaptive histogram equal-ization (CLAHE) (Zuiderveld 1994). With this technique the image is broken up into sub-regions. The optimal grayscale distribution is calculated for each of these sub-regions, based upon its histogram and a previously deter-mined transfer function, which is based upon the desired histogram of the sub-region. Then, each pixel of theimage is adjusted based upon interpolation between the manipulated histograms of the neighboring sub-regions.Our extensive work upon underwater imagery has suggested that the model of a Raleigh distribution is most suitedfor underwater imagery.

Contact InfoWe’d be happy to supply you with a copy ofour MATLAB image processing toolkit. Wealso have extended underwater imagery datasets which are available for research work.Please contact either of us:

Dr. Hanumant Singh(508) 289-3270, [email protected] Eustice(508) 289-3269, [email protected]

ReferencesBrown, L. G. (1992). “A Survey of Image Registration Tech-niques.” ACM Computing Surveys 24(4): 325—376.

Burt, P. J. and E. H. Adelson (1983). “A Multiresolution Splinewith Application to Image Mosaics.” ACM Transactions of Graph-ics 2(4): 217—236.

Eustice, R., O. Pizarro, et al. (2002). UWIT: Underwater ImagingToolbox for Optical Image Processing and Mosaicking in MATLAB.Proceedings of the Third Underwater Technology Symposium,2002, Tokyo, Japan. (to be presented)

Irani, M. and P. Anandan (1996). Robust Multi-Sensor ImageAlignment. Sixth International Conference on Computer Vision,1998.

Mandelbaum, R., G. Salgian, et al. (1999). Correlation-BasedEstimation of Ego-Motion and Structure from Motion and Stereo. Pro-ceedings of the Seventh IEEE International Conference on Com-puter Vision, 1999, Kerkyra, Greece.

Reddy, B. S. and B. N. Chatterji (1996). “An FFT-Based Tech-nique for Translation, Rotation, and Scale-Invariant Image Reg-istration.” IEEE Transactions on Image Processing 5(8): 1266—1271.

Zuiderveld, K. (1994). Contrast Limited Adaptive HistogramEqualization. Graphics Gems IV. P. Heckbert. Boston, AcademicPress. IV: 474—485.

Fourier Based Image Translation, Scale,and Rotation RecoveryMany image processing problems involve the fundamental task of registration of a pair of images. Methods rangefrom: 1) correlation methods which use pixel values directly; 2) fast Fourier transform methods which use fre-quency domain information; and 3) feature based methods which use low-level features such as edges and cor-ners. This particular algorithm is based upon Fourier domain methods for scale, rotation, and translation recoveryby making use of the phase shift property of Fourier transforms (Reddy 1996).

Let ),(2 yxf be a shifted version of the image ),(1 yxf :

),(),( 0012 yyxxfyxf −−=By the shifting property of the Fourier Transform:

)(211212

0291),(),( yxjeFF ωωωωωω +−=The translational offsets ),( 00 yx can be recovered by locating the impulse associated with the inverse transform of

the cross-power spectrum of the two images:

),(),(),(

),(),(00

)(

*212211

*212211

1

0201 yyxxeFF

FF yxj ++==⋅⋅ −ℑ

+ δωωωωωωωω ωω

This same property can be exploited for images which are rotated and scaled by representing them in a coordinate

system where scale and rotation appear as shifts. For example, when ),(2 yxf is a rotated version of ),(1 yxf :

)cossin,sincos(),( 000012 θθθθ yxyxfyxf +−+=Their Fourier transforms are related by:

)cossin,sincos(),( 020102011212 θωθωθωθωωω +−+= FFLooking at the magnitudes of the Fourier transforms and converting to polar coordinates we see that the rotationscan be written as shifts:

),(),( 012 θθρθρ −=MM

)arctan( 2122

21 ωωθωωρ =+=

Similarly, when two images are related by a scale factor, a , then their Fourier transforms are related by:

),(1

),( 2112212 aaFa

F ωωωω =Taking the logarithm of the frequency results in the scale appearing as a shift:

)loglog,log(log1

)log,(log 2112212 aaFa

F −−= ωωωωCombing all of these properties we see that the magnitudes of two translated, scaled, and rotated images arerelated by:

),(),( 022 θθρθρ −= aMMAfter taking the log of the radius, rotation and scale are now both represented as shifts:

),loglog(),( 022 θθρθρ −−= aMM

Filtered Input image.

50 100 150 200 250 300 350 400 450 500 550

50

100

150

200

250

300

350

Magnitude of spectrum. Filtered Input image.

100 200 300 400 500 600

100

200

300

400

500

600

Logpolar Magnitude of Spectrum. Input Image.

h*log(rho)t, rho in pixel s

thet

a [d

eg]

100 200 300 400 500 600

0

20

40

60

80

100

120

140

160

Figure 5: (top) Control image. (middle) Input image.(bottom) Registered image.

Control image

50 100 150 200 250 300 350 400 450 500 550

50

100

150

200

250

300

350

Input image

50 100 150 200 250 300 350 400 450 500 550

50

100

150

200

250

300

350

Registered Input image

50 100 150 200 250 300 350 400 450 500 550

50

100

150

200

250

300

350

Control image

50 100 150 200 250 300 350 400 450 500 550

50

100

150

200

250

300

350

Input image

50 100 150 200 250 300 350 400 450 500 550

50

100

150

200

250

300

350

Registered Input image

50 100 150 200 250 300 350 400 450 500 550

50

100

150

200

250

300

350

Figure 4: (Left top) The underwater imagery is firstadaptive histogram equalized, windowed, and then Laplacianof Gaussian filtered. (above) Fourier magnitude spectra offiltered image. (left bottom) Log polar of the annulus of fre-quencies of the filtered image.

Figure 6: (top) Control image. (middle) Input image.(bottom) Registered image.

Figure 2: (left) Original imagery of the whale remains imaged by the DSV Alvin off of San Diego at a depth of 1700m.Limitations in power led to overall intensity levels that are quite low even though the image is uniformly illuminated. (right) Adap-tive histogram equalized imagery.

Figure 3: (left) Original color imagery at the base of a hydrothermal vent in the Guaymas Basin at depth of 2200m. (right)Adaptive histogram equalized imagery as applied to the Y component of the imagery in YIQ color space.

0 50 100 150 200 250

0

500

1000

1500

2000

2500

3000

3500

4000

4500

0 50 100 150 200 250

0

500

1000

1500

2000

2500

3000

3500

Multiresolution Pyramidal Based BlendingDue to the rapid attenuation of light underwater, the only way to get a large scale view of the seafloor is to build upa mosaic from smaller local images, such as in Figure 7. The mosaic technique is used to construct an image witha far larger field of view and level of resolution than could be obtained with a single photograph.

Once the mosaic is generated, a technical problem in image representation is joining image borders so that theedge between them is not visible. The two images to be joined may be considered as two surfaces, where the imageintensity I(x,y) is viewed as the elevation above the (x,y) plane. The problem then is how to gently distort theimages near their common border so that the seam is smooth?

We implement a multiresolution pyramidal blending approach where the two images are decomposed into differ-ent band-pass frequency components, merged on those levels, and then reassembled into a single seamless com-posite image (Burt, 1983). The idea is that with this technique the transition zone between band-pass imagecomponents can be appropriately chosen to match the scale of features in that band-pass component.

First, a Gaussian pyramid is constructed for each image where the base level in the pyramid, G0, is the original

image. Each successive level is a low-pass filtered and down-sampled by factor of two version of the previous level

(i.e. for an appropriately chosen kernel w(m,n)). Next, the different band-pass

components are formed by generating the Laplacian pyramid. The Laplacian pyramid is generated from the Gaussianpyramid by expanding the image at the next higher level in the pyramid to the resolution of the current level and

then subtracting them (i.e. ). This results in each level of the Laplacian pyramid

containing a separate band-pass component of the original image. The two Laplacian pyramids are then mergedat each level of the pyramid and the resulting new seamless image is constructed from the different pyramid levels

via where N is the number of

pyramid levels and the notation Ll,l impliesexpansion of the level Ll,l times to the reso-lution of G0.

Local Normalized CorrelationNormalized correlation is a practical measure of similarity (Brown, 1992). Normalized correlation of two signals isinvariant to local changes in mean and contrast. When two signals are linearly related, their normalized correlationis 1. When the two signals are not linearly related, but do contain similar spatial variations, normalized correlationwill still yield a value close to unity (Irani, 1996).

The lack of rich features in underwater imagery precludes indirect feature based methods, and experimental evi-dence suggests that direct correlation based methods yield good results. We employ a dense local normalizedcorrelation to determine correspondence between images. The shape of the local normalized correlation surfaceswill be concave and have a prominent peak at the correct displacement. We fit a quadratic surface near the surfacepeak and analytically check for concavity (Mandelbaum, 1999) as a method of outlier rejection.

Figure 7: A sequence of images which was mosaicked together in a globallyconsistent manner utilizing all available cross-linked image pair correspondences.To obtain good registration, especially along edges, we must compensate for radialdistortion. The similarity metric used for point correspondence was local normalizedcorrelation.

Figure 8: George Chen, fromMGH, asked us to look into the registra-tion of different frames of CT-Scan imag-ery to track patient organ motion. Aboveare the first stab results of applying ourstandard local normalized correlation al-gorithm to determine point correspon-dences. (top) The green dots represent theinitial base points which were selected atsoft tissue boundaries by applying an edgedetection filter to the image. (bottom) Theyellow dots represent the correspondingmatch for each base point. The red vectorat each point represents the motion mag-nitude of that point between frames.

Figure 9: (top) A two image mosaic with seam. The top image isoverlaid over the bottom image. (bottom) The final blended result.