Efficient Image Warping in Parallel for Multiview Three ...

9
JOURNAL OF DISPLAY TECHNOLOGY, VOL. 12, NO. 11, NOVEMBER 2016 1335 Efficient Image Warping in Parallel for Multiview Three-Dimensional Displays Nan Guo, Xinzhu Sang, Songlin Xie, Peng Wang, and Chongxiu Yu Abstract—Three-dimensional (3D) display technologies make great process in recent years. View synthesis for 3D content re- quires the hole-filling, which is a challenging task. The increase of resolution and the number of views for view synthesis brings new challenges on memory and processing speed. A predicted hole mapping (PHM) algorithm is presented, which requires no filling priority and smoothing operation, allowing parallel computation that facilitates a real-time 3D conversion system. In experiments, the proposed PHM is evaluated and compared with some other methods in terms of peak signal to noise ratio and structural simi- larity index measurement, and the result shows the advantages in the numbers. The method can operate on the 32-view display with 4K × 2K resolution in real time on GPU. Index Terms—Image generation, stereo image processing, stereo vision, three-dimensional television. I. INTRODUCTION R ECENTLY, multi-view 3D display technology [1], [2] has developed rapidly, which gradually becomes accessible to broader audience. New progresses in super multiview (SMV) displays [3], [4] require sufficiently large numbers of discrete images to present the appearance of continuous parallax. How- ever, the lack of 3D video content limits popular applications of the 3D display. One of the practical methods to generate 3D content is Depth Image Based Rendering (DIBR) [5], where the source data for- mats of 3D scene are expressed as "video" + depth”, which can be widely captured by different approaches including film- ing and modeling. In the DIBR method, hole-filling is a tough problem, and substantial efforts [6]–[18] have been made for it. The increase of resolution and the number of views makes the virtual view rendering face new challenges, including the limited graphic memory and the increasing amount of calcula- tion, which make it difficult to meet the demands for real-time application. General content generation system for multi-view display in- cludes rendering of virtual views and interweaving these views into a format suitable for 3D displays, as shown in Fig. 1(a). Manuscript received April 19, 2016; revised August 14, 2016; accepted Au- gust 22, 2016. Date of publication August 25, 2016; date of current version October 11, 2016. This work was supported in part by the“863” Program under Grant 2015AA015902, in part by the National Science Foundation of China under Grant 61575025), and in part by the fund of the State Key Laboratory of Information Photonics and Optical Communications. The authors are with the State Key Laboratory of Information Photonics and Optical Communications, Beijing University of Posts and Telecommunications, Beijing 100876, China (e-mail: [email protected]; [email protected]; [email protected]; [email protected]; [email protected]). This paper contains supplemental material available at http:// ieeexplore.ieee.org. Color versions of one or more of the figures are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/JDT.2016.2602386 Fig. 1. Flow chart for (a) conventional general multi-view stereo video synthesis. (b) The proposed direct synthesis architecture. The system requires large amounts of calculation and memory during the rendering process. It produces information redun- dancy, as the image displayed in the screen is only part of these views. Take 32-view display as an example, only 1/32 of pixels in each view are displayed on the screen. The memory foot- prints grow rapidly with the increase of resolution and number of views. Real-time 3D video applications require a processing rate of at least 30 frames per second, which is a high computa- tional cost task. Ordinary graphic cards are difficult to fulfill the requirements of memory and speed. Here, an efficient 3D video conversion system based on GPU is presented, in which the images in 3D format are synthesized directly by reference views and corresponding depth informa- tion, as shown in Fig. 1(b). Most hole-filling methods in the general system with favor- able results rely on processing the synthesized virtual views, for example, the adaptive recursive interpolation algorithm (ARIA) [6], [7], the hierarchal hole-filling (HHF) algorithm [8], and the patch match algorithm [9]. In these methods, the filling pri- ority, local segmentation, and smoothing operation are always required, which are time consuming and difficult to compute in parallel. There are quite a few methods preventing the appearance of holes during the synthesizing process. A reverse warping method [10] was adapted instead of a forwarding warping, in which the color images were sampled in the viewpoint of the vir- tual camera. But the sampling is also time consuming process. In the pre-processing algorithm [11], holes were avoided by smoothing the depth map, which inevitably changed the struc- ture of depth values, causing the edge deformation. What’s more, holes could not be completely avoided for the wide baseline be- tween virtual views. Another approach for the pretreatment of the depth map to prevent holes is to build a quad tree structure of the 2D image [12]–[15]. The deformation is weaken by a series of optimizing processes, but the computation complex- ity is increased. The image-domain-warping (IDW) [16], [17] method relies on sparse disparities to synthesize novel views by using an image warping framework [18]. The method of 1551-319X © 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications standards/publications/rights/index.html for more information. Authorized licensed use limited to: BEIJING UNIVERSITY OF POST AND TELECOM. Downloaded on February 13,2020 at 09:16:10 UTC from IEEE Xplore. Restrictions apply.

Transcript of Efficient Image Warping in Parallel for Multiview Three ...

Page 1: Efficient Image Warping in Parallel for Multiview Three ...

JOURNAL OF DISPLAY TECHNOLOGY, VOL. 12, NO. 11, NOVEMBER 2016 1335

Efficient Image Warping in Parallel for MultiviewThree-Dimensional Displays

Nan Guo, Xinzhu Sang, Songlin Xie, Peng Wang, and Chongxiu Yu

Abstract—Three-dimensional (3D) display technologies makegreat process in recent years. View synthesis for 3D content re-quires the hole-filling, which is a challenging task. The increaseof resolution and the number of views for view synthesis bringsnew challenges on memory and processing speed. A predicted holemapping (PHM) algorithm is presented, which requires no fillingpriority and smoothing operation, allowing parallel computationthat facilitates a real-time 3D conversion system. In experiments,the proposed PHM is evaluated and compared with some othermethods in terms of peak signal to noise ratio and structural simi-larity index measurement, and the result shows the advantages inthe numbers. The method can operate on the 32-view display with4K × 2K resolution in real time on GPU.

Index Terms—Image generation, stereo image processing, stereovision, three-dimensional television.

I. INTRODUCTION

R ECENTLY, multi-view 3D display technology [1], [2] hasdeveloped rapidly, which gradually becomes accessible

to broader audience. New progresses in super multiview (SMV)displays [3], [4] require sufficiently large numbers of discreteimages to present the appearance of continuous parallax. How-ever, the lack of 3D video content limits popular applications ofthe 3D display.

One of the practical methods to generate 3D content is DepthImage Based Rendering (DIBR) [5], where the source data for-mats of 3D scene are expressed as "video" + depth”, whichcan be widely captured by different approaches including film-ing and modeling. In the DIBR method, hole-filling is a toughproblem, and substantial efforts [6]–[18] have been made forit. The increase of resolution and the number of views makesthe virtual view rendering face new challenges, including thelimited graphic memory and the increasing amount of calcula-tion, which make it difficult to meet the demands for real-timeapplication.

General content generation system for multi-view display in-cludes rendering of virtual views and interweaving these viewsinto a format suitable for 3D displays, as shown in Fig. 1(a).

Manuscript received April 19, 2016; revised August 14, 2016; accepted Au-gust 22, 2016. Date of publication August 25, 2016; date of current versionOctober 11, 2016. This work was supported in part by the“863” Program underGrant 2015AA015902, in part by the National Science Foundation of Chinaunder Grant 61575025), and in part by the fund of the State Key Laboratory ofInformation Photonics and Optical Communications.

The authors are with the State Key Laboratory of Information Photonics andOptical Communications, Beijing University of Posts and Telecommunications,Beijing 100876, China (e-mail: [email protected]; [email protected];[email protected]; [email protected]; [email protected]).

This paper contains supplemental material available at http://ieeexplore.ieee.org.

Color versions of one or more of the figures are available online athttp://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/JDT.2016.2602386

Fig. 1. Flow chart for (a) conventional general multi-view stereo videosynthesis. (b) The proposed direct synthesis architecture.

The system requires large amounts of calculation and memoryduring the rendering process. It produces information redun-dancy, as the image displayed in the screen is only part of theseviews. Take 32-view display as an example, only 1/32 of pixelsin each view are displayed on the screen. The memory foot-prints grow rapidly with the increase of resolution and numberof views. Real-time 3D video applications require a processingrate of at least 30 frames per second, which is a high computa-tional cost task. Ordinary graphic cards are difficult to fulfill therequirements of memory and speed.

Here, an efficient 3D video conversion system based on GPUis presented, in which the images in 3D format are synthesizeddirectly by reference views and corresponding depth informa-tion, as shown in Fig. 1(b).

Most hole-filling methods in the general system with favor-able results rely on processing the synthesized virtual views, forexample, the adaptive recursive interpolation algorithm (ARIA)[6], [7], the hierarchal hole-filling (HHF) algorithm [8], andthe patch match algorithm [9]. In these methods, the filling pri-ority, local segmentation, and smoothing operation are alwaysrequired, which are time consuming and difficult to compute inparallel.

There are quite a few methods preventing the appearanceof holes during the synthesizing process. A reverse warpingmethod [10] was adapted instead of a forwarding warping, inwhich the color images were sampled in the viewpoint of the vir-tual camera. But the sampling is also time consuming process.In the pre-processing algorithm [11], holes were avoided bysmoothing the depth map, which inevitably changed the struc-ture of depth values, causing the edge deformation. What’s more,holes could not be completely avoided for the wide baseline be-tween virtual views. Another approach for the pretreatment ofthe depth map to prevent holes is to build a quad tree structureof the 2D image [12]–[15]. The deformation is weaken by aseries of optimizing processes, but the computation complex-ity is increased. The image-domain-warping (IDW) [16], [17]method relies on sparse disparities to synthesize novel viewsby using an image warping framework [18]. The method of

1551-319X © 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See http://www.ieee.org/publications standards/publications/rights/index.html for more information.

Authorized licensed use limited to: BEIJING UNIVERSITY OF POST AND TELECOM. Downloaded on February 13,2020 at 09:16:10 UTC from IEEE Xplore. Restrictions apply.

Page 2: Efficient Image Warping in Parallel for Multiview Three ...

1336 JOURNAL OF DISPLAY TECHNOLOGY, VOL. 12, NO. 11, NOVEMBER 2016

deformation mapping is not adopted here, because we want tomaintain existing correct regions as much as possible. What’smore, our approach is faster than IDW with less computationalcomplexity.

As there are no entire virtual views cached into memory in ourdirect synthesis architecture, a predicted hole-mapping (PHM)algorithm without filling priority or smoothing methods is pro-posed to avoid holes and cracks. The mapping coordinate in thevirtual view rendering is associated with the depth gradient andthe direction of translation. Suitable pixels are determined inaccordance with an energy minimization function to map fromthe original view to the target view to avoid holes. Significantsavings are provided during both rendering and interweavingcomputation. The method can work on most of multiview au-tostereoscopic displays, including lenticular arrays and integralimaging system displays, except for some based on time multi-plexing method [1].

Experiments show that the method is rapid in calculation,occupy less memory space, and is easier for engineering re-alization. With the number of viewpoints increasing and theimage resolution enhancing, the advantage of our algorithmis more prominent. The proposed predicted image warpingmethod together with the video input module and the dis-play module compose the 3D video conversion system, whichprovides high-quality 3D viewing experiences. The processfor 32 views 3D video conversion on 4Kx2K resolution is28 ms per frame in this system, achieving excellent real-timeperformance.

The rest of the paper is organized as follows. The analysisof the hole prediction is presented in Section 2. In Section 3the proposed method is described. Section 4 presents the syn-thesis results and discussion. Finally, the conclusion is given inSection 5.

II. ANALYSIS OF THE HOLE PREDICTION

For the depth image-based rendering algorithm (DIBR), thevirtual view can be synthesized by transforming pixels from theoriginal view to the corresponding position in the target viewaccording to the disparity, which indicates pixel displacementhere. When the camera configuration is parallel, the value ofeach pixel i of the depth image z, denoted as zi , is inverselyproportional to the corresponding disparity value di ,di ∝ 1/zi .

The case of generating one view is used to explain the holeprediction. We take the case that the right hand virtual view isgenerated, and the left view is regarded as the original view,where the larger the disparity values, the closer the objects tothe camera. Other cases of generating virtual images at arbitraryposition can also use the same method by changing the directionof the disparity gradient.

Not only can the discontinuities in the disparity map demon-strate where the holes and overlaps appear in the output image,but also it can be used to calculate the hole size. That is, thepositions of holes are associated with disparity values. Just asshown in Fig. 2, holes appear where the neighboring pixel inthe right shift less than pixel in the left. The width of hole is

Fig. 2. The coordinate relationship for DIBR. The minus sign of disparitygradient judges whether the holes occur. And the absolute value is equal to thenumber of occluded pixels on the right.

equal to the difference between the numbers of adjacent shiftedpixels.

The disparity gradient gdisp in the horizontal direction is com-puted as follows,

gdisp (x) = d (x + 1) − d (x) (1)

where d denotes disparity, and x is the horizontal coordinate.The vertical coordinate is omitted as only horizontal parallax isconsidered. In the generated virtual view, overlaps and holes arejudged by the sign of the disparity gradient. That is to say, theocclusion occurs in the area where gdisp is not zero.

Let Isrc and Idst denote the color images of the original viewand the virtual view respectively. xsrc and xdst represent thecorresponding horizontal coordinates.

If gdisp(xsrc) is equal to 0, no occlusion occurs, and themapping relationship is computed as equation (2).

xdst = xsrc − d (xsrc) . (2)

As it is one-to-one mapping without overlaps or holes, thepixel can be mapped to the target image directly, as given inequation (3).

Idst (xdst) = Isrc (xsrc) . (3)

When gdisp(xsrc) is positive, overlaps occur. In this case, thepixel with the largest disparity is shown finally and mapped tothe output image. The mapping relationship is as follows,

d (xsrc) = max (d (xsrc1) , d (xsrc2) , . . .) (4)

where xdst = xsrc1 − d(xsrc1) = xsrc2 − d(xsrc2) = . . ..When gdisp(xsrc) is negative, holes will appear in the new

view, which can be regarded as the opposite case of overlaps.The coordinates of the holes are computed explicitly by,

xdst = xsrc + j − d (xsrc) , gdisp (xsrc) ≤ j < 0 (5)

where j is a negative integer between gdisp(xsrc) and zero.To avoid these unmapped regions, a predicted hole-mapping

algorithm is proposed, which assumes that the occlusion area isdetected by nearby depth discontinuity, and hole mappings areprocessed in those regions.

Authorized licensed use limited to: BEIJING UNIVERSITY OF POST AND TELECOM. Downloaded on February 13,2020 at 09:16:10 UTC from IEEE Xplore. Restrictions apply.

Page 3: Efficient Image Warping in Parallel for Multiview Three ...

GUO et al.: EFFICIENT IMAGE WARPING IN PARALLEL FOR MULTIVIEW THREE-DIMENSIONAL DISPLAYS 1337

III. PROPOSED METHOD

Firstly, the use of the predicted hole mapping (PHM)algorithm in one view synthesis is demonstrated. Then the PHMis used in 3D conversion system.

A. Predicted Hole-Mapping Algorithm

The proposed hole mapping algorithm is a substitute of tradi-tional hole-filling algorithm with the limitation of lacking com-plete views in our conversion architecture. An energy functionE is defined for computing suitable mapping pixels to preservethe disparity continuity and enhance sense of immersion. Pixelswith highest energy are mapped to holes as follows,

arg maxq∈Ω

E (q) . (6)

Pixel patches Ω are extracted from adjacent regions in theoriginal view as candidate regions, and the proper definition forΩ is in subsection 1). q indicates one of the candidate pixelsin Ω.

The optimization energy function composed of three energyterms, including the depth term Edepth , the proximity termEproximity , and the structure term Estructure . These energiesdenote three kinds of constraints and are multiplied together inthe final function. Formally,

E = Edepth · Eproximity · Estructure . (7)

The visual similarity between the target view and the inputview is maintained by sampling the source region for Ω andthese energy terms. The energy function is similar with ref. [19],except for their temporary priority value, which determines theorder in which holes are filled. As the filling order is criticalfor the propagation of textured regions and in violation of theparallel computing rules, the source region is sampled, and thestructure term works together instead of making up for the de-fect without filling order in parallel computing. The combinationof the proximity term and the structure term produces the de-sired organic balance between the spatial and texture proximity,where the inwards growth of image structure is enforced withmoderation.

1) Candidate Region Definition: As the complete virtualview does not exist in the synthesized process, a candidateregion Ω for the hole is selected from the source image. Thecenter p(xsrc + 1, ysrc) in Ω is defined as the first pixel on theright side of the mark pixel (xsrc , ysrc), where gdisp(xsrc , ysrc)is negative. p is a background pixel relatively in the adjacentregion.

The radius of Ω is set to 2 for an example. Then the number ofcandidate pixels is totally 25. As shown in Fig. 3, the referencearea Ω, labeled in blue, is chosen by down sampling the adjacentregion. The sampling rate is related to the hole position. Thefurther the distance from the hole to the background edge is, thehigher the sampling rate is. If the width of total holes excludes4 pixels, these holes can be divided into 4 parts, and each partshares one kind of reference region. The sample-based guidancetends to propagate strong textures quickly. The method can

Fig. 3. Candidate regions in the reference view. Red squares denote the centerpixel p. The grey level represents the disparity value. The lighter is the color,the greater is the disparity, and the object is closer to the camera. Squares inblue color label the reference areas. The vector (x, y) in the last four picturesindicates that the sample rate in the horizontal direction is x, and in the verticaldirection is y. Letters a, b, c, d denotes 4 holes to be filled, and the distance fromeach hole to the background edge is 1, 2, 3, 4 in order.

Fig. 4. Illustration of the self-occlusion.

keep color distribution of filled areas similar with background,meeting the visual comfort to some degree.

2) Depth Term: The depth term in the proposed energy func-tion is used to judge whether these holes are produced by self-occlusion or by partial occlusion only. Fig. 4 demonstrates theself-occlusion occurs when the depth of foreground changescontinuously. So, for self-occlusion the candidate pixels arefrom part of the foreground.

The depth constraint is constructed as,

Edepth (q) =

{1, dq − dforegroundBoundray < 0

|dq −d f o r e g r o u n d B o u n d r ay ||d f o r e g r o u n d B o u n d r ay −db a ck g r o u n d | , others

. (8)

The disparity on the foreground boundary is equal tod(xsrc , ysrc). The depth energy of a candidate pixel q in theforeground increases along with the disparity difference withthe boundary pixel (xsrc , ysrc). If there is no change in the fore-ground disparity, the covered area is regarded as part of thebackground and the energy of candidate pixel in the foregroundis 0. For the candidate pixel from background, the energy is setas 1.

Authorized licensed use limited to: BEIJING UNIVERSITY OF POST AND TELECOM. Downloaded on February 13,2020 at 09:16:10 UTC from IEEE Xplore. Restrictions apply.

Page 4: Efficient Image Warping in Parallel for Multiview Three ...

1338 JOURNAL OF DISPLAY TECHNOLOGY, VOL. 12, NO. 11, NOVEMBER 2016

3) Proximity Term: The proximity term Eproximity isdefined according to the principle that only small spatial dis-tances strongly correlate with the human discrimination per-formance. The pixel proximity decreases along with the spatialdistance from q to the reference pixel p increases. Therefore, theproximity term using the Laplacian kernel is defined as follows,

Eproximity (q) = exp(− (euclidean (p, q) + i) γ−1

p

)(9)

where the spatial distance is expressed as the sum of the Eu-clidean distance euclidean(p, q) and the hole order i. γp isdetermined by the width of holes, or the absolute value of dis-parity gradient. In fact, γp is also related to the field-of-view ofthe human visual system.

4) Structure Term: In order to compute the visual impor-tance texture, the texture structure image S is constructed asthe first derivative of the color image I . The Sobel derivativeoperator is used to represent the differentiation as

S = |Gx | + |Gy |

Gx =

⎛⎝−1 0 +1

−2 0 +2−1 0 +1

⎞⎠ ⊗ I (10)

Gy =

⎛⎝−1 −2 −1

0 0 0+1 +2 +1

⎞⎠ ⊗ I.

The general purpose of the structure term is to extend imagetexture and structure by improving the energy of pixels on tex-tures of the original image, so that they are filled with higherpriority. The structure constraints is computed by,

Estructure (q) = α · min (S (q) , β) . (11)

The structure term of equation (11) gives higher priority topixels on the image structure and preserves the texture continuityfor the incoming reconstructed texture. Truncation parameter βis taken to label the texture effectively, which is usually setto 100.

B. 3D Conversion Based on PHM

The display image is synthesized in conjunction with param-eters of display screen. To synthesize hybrid images in 3D for-mat, suitable viewpoint is selected through the mask in each subpixel location. The above section shows that the hole-mappingcan propagate both texture and structure information to dis-occlusion areas. As complete data in one view can be obtainedwith the mapping method, arbitrary view can be generated. Thediagram of the proposed method is shown in Fig. 5. A colorimage, a corresponding depth or disparity image, and a maskfor the particular display screen are taken as the inputs.

In the mask for 3D display screen, the colorized rectangle isone of the RGB subpixels. The number in each rectangle denotesone particular view which is visible in the screen finally [20].

The conversion includes two step, pre-processing and imagewarping. Firstly, in the preprocessing part, given the depth bud-get of the scene, such as the distance of the near and far plane inpixels relative to the zero plane, the depth map is transformed

Fig. 5. Proposed 3D conversion system by predicated image warping algo-rithm. (Cones from Middlebury Stereo Datasets 2003. http://vision.middlebury.edu/stereo/data/scenes2003/)

Fig. 6. Flow chat of each thread in the conversion system.

into a pixel-based disparity map. The texture structure image iscomputed by the Sobel derivative operator as in equation (10).Then, the display image is synthesized according to the colorimage, the texture structure image, the disparity image, and thesynthesis parameter matrix of the screen. The mapping processis described in detail then, with which holes in the target textureare avoided while visual continuity is preserved.

In the output 3D image corresponding to the 3D dis-play screen, adjacent subpixels are from different viewpoints.m(xdst , ydst , c) indicates the viewpoint number on each sub-pixel c(c = 1, 2, 3 represent three channels RGB) of coordinate(xdst , ydst), which is the input in the mask image.

For each pixel as a thread in parallel, the running flow chartis presented in Fig. 6. The disparity di associated with the viewi is computed by,

di (xsrc) = i · α · d (xsrc) (12)

Authorized licensed use limited to: BEIJING UNIVERSITY OF POST AND TELECOM. Downloaded on February 13,2020 at 09:16:10 UTC from IEEE Xplore. Restrictions apply.

Page 5: Efficient Image Warping in Parallel for Multiview Three ...

GUO et al.: EFFICIENT IMAGE WARPING IN PARALLEL FOR MULTIVIEW THREE-DIMENSIONAL DISPLAYS 1339

Fig. 7. Results. (a1) One virtual view in 3D format, and (a2) photographsshown on the screen. (b1) Multi- view in 3D format, (b2) photographs shownon the screen.

where α is a constant value.The horizontal disparity gradient value gdisp(xsrc) is calcu-

lated by the equation (1), which is used to judge whether thereexist holes in the judgment 1 of the flowchart. The dis-occlusionsoccur in the virtual view where disparities decrease.

If there is no possibility for holes, the target coordinate iscomputed simply as equation (2).

If there are holes, hole coordinates are computed byequation (4).

In the judgment box 2, for each coordinate, whether the pointsare displayed is judged according to the mask by m(xdst) == i.Only when the mask on the computed coordinate m(xdst , c) isequal to the number of view i, the pixel is mapped to the finalimage and is displayed in the screen.

The ordinary forward mapping without hole-mapping processis expressed as equation (3). To each hole coordinate, the holemapping is processed by computing the priority energy function(5)–(8). In the PHM part of the flowchart, the judgment 3 is tomake sure all holes are mapped. j indicates a number for eachhole. The holes are mapped from left to right and all holes arefilled after j is equal to zero.

What’s more, there is another judgment inside each mappingprocess for the disparity between mapping pixels and pixels inthe target view, which is used to ensure occlusion relationshipsimilar with z-buffer method [21]. Let ddst indicate the disparityvalue of the target image. Only when d(xsrc) > ddst(xdst), themapping is implemented.

3D display results on the stereoscopic device [20] are ob-served and photographs are shown in Fig. 7. Fig. 7(a1) presentsone virtual view in 3D format, and (a2) manifests its photographon 3D display. Fig. 7(a1) looks black, as it only contains oneview on a 32-view image. Details of the image are shown inthe upper right corner. Fig. 7(b1) and (b2) show the multi-viewimage and the display photograph. The hybrid image of (b1)is blurry in the normal screen and becomes clear with stereoexperience in (b2) when it is shown on the specified 3D screen.

Cracks elimination: During the projection process, cracks oc-cur in the virtual view if the target position is rounded into theneighbor position of an integer when falling onto a floating point

Fig. 8. One view results. (a) Reference image and corresponding dispar-ity image. Target images and corresponding disparity images of (b) withouthole-mapping, and (c) with predict hole-mapping process. ( C© copyright 2008,Blender Foundation / www.bigbuckbunny.org).

coordinate. The double-mapping method is proposed to avoidthese crakes. Pixels with non-integer target x-coordinates aremapped twice by flooring and ceiling the x-coordinate. Then,no cracks appear in the synthesized output view, as shown inFig. 8. With only one pixel deviation, the double-mappingmethod has little effect on the accuracy of the generated view.

IV. EXPERIMENT

The performance of the proposed predicted hole-mapping(PHM) algorithm is evaluated, which is based on visual qual-ity and computational complexity in 3D conversion system.Experiments are processed on data from Middlebury datasets,two-view movie, 3D model rendering, and the stereo camera.

A. Synthesis Visual Quality

In the quantification comparison, the searching region ofthe PHM is set to include only 25 pixels to balance the ef-ficiency and synthesis quality. On the Middlebury datasets[22], the 2nd views are generated by the 1st views as virtualimages. The quality of synthesized image is compared with thelow-rank matrix restoration (LMR) [23] method, which showsbetter performance than a series of other inpainting methods,including partial differential equation based iterative algorithm(PDE) [24], local optimization method (Local) [25], and someother methods. The quality is measured by Peak Signal to NoiseRatio (PSNR) and Structural Similarity Index Measurement(SSIM). As shown in Table I, the proposed method achieves boththe highest PSNRs and SSIMs for most cases, which demon-strates our superior visual performance. For the dataset Midd2,we do not have the best SSIM, and our method is only 0.006dB lower than the best recovery method. For other datasets, ourmethod is obviously higher than other second best methods. InFig. 8, the filling of dis-occlusion regions are shown in the high-lighted region for Laundry and Moebius as an example. It showsthe proposed PHM method produces natural synthesis results.

The synthesis image on the movie Big Buck Bunny is shown inFig. 9, where the disparity is computed by dense stereo match-ing algorithm [26] with two view sequences. The renderingcomparison without hole-mapping and with PHM algorithm isdepicted in Fig. 9(b) and (c). It can be seen that the texturefilling in the hole is similar as the adjacent region, and the

Authorized licensed use limited to: BEIJING UNIVERSITY OF POST AND TELECOM. Downloaded on February 13,2020 at 09:16:10 UTC from IEEE Xplore. Restrictions apply.

Page 6: Efficient Image Warping in Parallel for Multiview Three ...

1340 JOURNAL OF DISPLAY TECHNOLOGY, VOL. 12, NO. 11, NOVEMBER 2016

TABLE IQUALITY COMPARISON ON DIS-OCCLUSION FILLING IN VIEW SYNTHESIS (FOR THE SECOND VIEW)

Art Books Dolls Laundry Moebius Aloe Midd2

PSNR SSIM PSNR SSIM PSNR SSIM PSNR SSIM PSNR SSIM PSNR SSIM PSNR SSIM

proposed 30.56 0.923 28.53 0.92 30.53 0.938 30.22 0.934 31.51 0.923 28.93 0.918 31.04 0.951LMR 26.94 0.881 27.87 0.906 29.80 0.905 27.31 0.887 30.50 0.915 27.34 0.857 30.25 0.957Local 25.31 0.858 27.08 0.892 30.14 0.780 26.53 0.873 29.94 0.899 26.86 0.839 29.44 0.949PDE 25.47 0.871 27.46 0.894 29.43 0.903 26.42 0.878 30.06 0.899 27.03 0.846 29.95 0.955

Fig. 9. Cracks eliminating. (a) Forward mapping by rounded into integercoordinate, (b) forward mapping using double mapping method.

Fig. 10. Our predicted image warping result for one virtual view. (a) Imagesafter view warping, (b) images inpainted by predicted hole-mapping (PHM).Arbitrary virtual views with visual continuity can be synthesized by renderingonce in parallel. (Image source: http://vision.middlebury.edu/stereo/data/, 2005datasets)

appearance of horizontal stripes is avoided in addition to partof the image border, as the relative disparity around the imageborder is large with relatively less information to compute theenergy function. Fortunately, the visual image border effect isless on the entire image, and the virtual view seems realisticsubjectively.

For small holes, the PHM algorithm shows its advantagesin the numbers compared with some other methods in termsof PSNR and SSIM. For large holes, the differences with thereal scene increase. The limited search region used to im-prove speed is the main reason. In the current method, thecolor distribution of filled areas is similar with the background,

which just meets the visual comfort to some degree. Fail-ure cases include that it cannot preserve straight lines. Thisproblem can be alleviated with additional linear structure de-tection and heterogeneous searching region. The requirementfor the quality of the disparity map can be reduced with theassumption that the background is uniform in most cases.Although DIBR parallel computing has these disadvantages,it will be widely applied for fast calculation and acceptablequality.

The synthesized hybrid 3D images created by the PHM al-gorithm also show a superior subjective quality. A series ofvisual results is provided in Fig. 11, including the generated3D images (b) and the photograph shown in the screen [20] ofFig. 11(c). Images Lotus, Birds, and Manhattan are computedthrough 3D model rendering. Images in the last row are cap-tured by the stereo camera, and the disparity is computed bythe stereo calibration and dense stereo matching algorithm [26].It can be seen that in the generated images, the edges betweenforeground and background changes naturally and seamlessly.The system realizes 3D display with continuous, large scenedepth. The generated 3D images (b) are blurry as they are in aformat for the 3D display, and clear only in the particular screenwith the sense of depth. The pictures of the particular computerscreen are shown in (c). The current solution already providesvery acceptable results for 3D displays. The visualization forthe 3D display at different viewing angles for Manhattan andDesktop are shown in Visualization 1.

B. Real-Time Multiview Generation System

The real-time implementation of stereo to multi-view con-version is always a hard constraint, which is difficult tomeet while the generated material is maintained a high qual-ity. There are some methods realizing real-time synthesis onGPU on the relatively low resolution images. For the gen-eral architecture, interweaving is with less computing workand the time almost can be negligible to standard or HDdefinition videos. However, for 4K × 2K resolution and su-per multi-view display on the upcoming applications, it takesmore time and cannot be ignored. Therefore, an efficient ar-chitecture is necessary to meet the resolution and multi-viewrequirement.

The proposed 3D video conversion method together withvideo input module and display module composes the3D video conversion system. In the video input module,

Authorized licensed use limited to: BEIJING UNIVERSITY OF POST AND TELECOM. Downloaded on February 13,2020 at 09:16:10 UTC from IEEE Xplore. Restrictions apply.

Page 7: Efficient Image Warping in Parallel for Multiview Three ...

GUO et al.: EFFICIENT IMAGE WARPING IN PARALLEL FOR MULTIVIEW THREE-DIMENSIONAL DISPLAYS 1341

Fig. 11. Results for 2D and 3D views. (a) The original 2D views, (b) the generated stereo images when N = 32, and (c) their 3D display photographs shown inthe screen.

the color image sequences and disparity image sequences are de-coded from videos and transmitted to the GPU memory. Stereo-scopic image sequences are synthesized by the forward mappingtogether with the predicted hole-mapping algorithm in the con-version module. Then the display module is processed throughOpenGL.

The system is processed on one PC with a NVIDIA GTX770 GPU, running the windows 7 operating system with 64 bitarchitecture. The process time is associated with the numberof holes needed to map. In other words, the disparity distri-bution and range affects the speed. The run time of 4K × 2Kresolution on 32-view display screen performs in 28 ms for theaverage runtime over 1000 frames with the average disparityrange limited to 100 approximately. The processing rate sur-passes the requirements for real-time rendering. The display re-sults in a smooth free viewpoint experience with 32 views. Thereal-time 3D scene display can also be achieved as shown inVisualization 2.

Temporal continuity: The accuracy and temporal continu-ity of output image sequences is tightly depend on the in-put depth sequences. In general, the disparity map needsthe dilate operation to reduce ghosting artifacts. As themain purpose of this paper is to realize multi-view stereo-scopic video display in real-time, the ultimate sequencescontinuity can be improved by pre-processing of depth im-age sequences, for example, smoothing the depth in timedomain.

C. Optical Field Application

By changing the direction of the disparity gradient, themethod can also be used to generate virtual images atarbitrary position. Full-parallax stereo images can be obtained toreconstruct optical field for integral imaging 3D display accord-ingly. Experimental results are shown in Fig. 12. The crosshairis set as a reference for disparity changes.

Authorized licensed use limited to: BEIJING UNIVERSITY OF POST AND TELECOM. Downloaded on February 13,2020 at 09:16:10 UTC from IEEE Xplore. Restrictions apply.

Page 8: Efficient Image Warping in Parallel for Multiview Three ...

1342 JOURNAL OF DISPLAY TECHNOLOGY, VOL. 12, NO. 11, NOVEMBER 2016

Fig. 12. Synthesized full-parallax images. The image in the center is the original one. The crosshair is added as a reference to observe disparity conveniently.

V. CONCLUSION

Along with the number of viewpoints and the resolution in-creasing in the autostereoscopic 3D display, the general stereoto multi-view conversion architecture cannot meet the require-ment on processing memory and time. An efficient conversionsystem is presented based on CUDA and OpenGL, which leadsto considerable reduction of computing time and memory foot-print and provides high-quality viewing experiences. As eachsubpixel in arbitrary virtual views can be synthesized in parallelwith the predicted hole-mapping algorithm, there are no virtualviews saving and interweaving processes in the proposed ar-chitecture. Experimental results show that the method is withfavorable hole-filling ability and can realize 32 view 3D displaywith the resolution of 4K × 2K in real time. The method willpromote the super multi-view display to be applied widely.

REFERENCES

[1] H. Urey, K. V. Chellappan, E. Erden, and P. Surman, “State of the art instereoscopic and autostereoscopic displays,” Proc. IEEE, vol. 99, no. 4,pp. 540–555, Apr. 2011.

[2] J. Y. Son, W. H. Son, S. K. Kim, K. H. Lee, and B. Javidi, “Three-dimensional imaging for creating real-world-like environments,” Proc.IEEE, vol. 101, no. 1, pp. 190–205, Jan. 2013.

[3] A. Stern, Y. Yitzhaky, and B. Javidi. “Perceivable light fields: Matchingthe requirements between the human visual system and autostereoscopic3-D displays,” Proc. IEEE, vol. 102, no. 10, pp. 1571–1587, Oct. 2014.

[4] Y. Takaki, M. Tokoro, and K. Hirabayashi, “Tiled large-screen three-dimensional display consisting of frameless multi-view display modules,”Opt. Express, vol. 22, no. 62, pp. 6210–6221, 2014.

[5] C. Fehn, “Depth-image-based rendering (DIBR), compression, and trans-mission for a new approach on 3D-TV,” in Proc. 10th SPIE Conf. Stereo-scopic Displays Virtual Reality Syst., San Jose, CA, USA, Jan. 2004,pp. 93–104.

[6] D. T. Thai, T. A. Nguyen, B. S. Kim, and M. C. Hong, “Hole-fillingmethod using depth based adaptive recursive interpolation algorithm forview synthesis,” in Int. Conf. Electron., Inf. Commun., 2013, pp. 398–399.

[7] H. N. Doan, T. A. Nguyen, and M. C. Hong, “Directional hole fillingalgorithm in new view synthesis for 3D video using local segmentation,”in Proc. Res. Adapt. Convergent Syst., 2014, pp. 100–104.

[8] M. Solh and G. AlRegib, “Hierarchical hole-filling for depth-based viewsynthesis in FTV and 3D video,” IEEE J. Sel. Topics Signal Process.,vol. 6, no. 5, pp. 495–504, Sep. 2012.

[9] C. Barnes, E. Shechtman, A. Finkelstein, and D. Goldman, “PatchMatch:A randomized correspondence algorithm for structural image editing,”ACM Trans. Graph., vol. 28, no. 3, Aug. 2009, Art. no. 24.

[10] D. Min, D. Kim, S. Yun, and K. Sohn, “2D/3D freeview video genera-tion for 3DTV system,” Signal Process. Image Commun., vol. 24, no. 1,pp. 31–48, 2009.

[11] L. Zhang and W. J. Tam, “Stereoscopic image generation based ondepth images for 3D TV,” IEEE Trans. Broadcast., vol. 51, no. 2,pp. 191–199, Jun. 2005.

[12] A. Agarwala, “Efficient gradient-domain compositing using quadtrees,”ACM Trans. Graph., vol. 26, no. 3, Jul. 2007, Art. no. 94.

[13] O. Wang et al., “Stereobrush: Interactive 2d to 3d conversion usingdiscontinuous warps,” in Proc. Eighth Eurograph. Symp. Sketch-BasedInterfaces Model., 2011, pp. 47–54.

[14] N. Plath, S. Knorr, L. Goldmann, and T. Sikora, “Adaptive image warpingfor hole prevention in 3D view synthesis,” IEEE Trans. Image Process.,vol. 22, no. 9, pp. 3420–3432, Sep. 2013.

Authorized licensed use limited to: BEIJING UNIVERSITY OF POST AND TELECOM. Downloaded on February 13,2020 at 09:16:10 UTC from IEEE Xplore. Restrictions apply.

Page 9: Efficient Image Warping in Parallel for Multiview Three ...

GUO et al.: EFFICIENT IMAGE WARPING IN PARALLEL FOR MULTIVIEW THREE-DIMENSIONAL DISPLAYS 1343

[15] N. Plath, L. Goldmann, A. Nitsch, S. Knorr, and T. Sikora, “Line-preserving hole-filling for 2D-to-3D conversion,” in Proc. 11th Euro. Conf.Vis. Media Prod., , Nov. 2014, Art. no. 8.

[16] N. Stefanoski et al., “Automatic view synthesis by image-domain-warping,” IEEE Trans. Image Process., vol. 22, no. 9, pp. 3329–3341,Sep. 2013.

[17] M. Schaffner, F. K. Gurkaynak, P. Greisen, H. Kaeslin, L. Benini, andA. Smolic, “Hybrid ASIC/FPGA System for Fully Automatic Stereo-to-Multiview Conversion using IDW,” IEEE Trans. Circ. Syst. Video Technol.,DOI: 10.1109/TCSVT.2015.2501640.

[18] M. Lang et al., “Nonlinear disparity mapping for stereoscopic 3D,” ACMTrans. Graph., vol. 29, no. 4, Jul. 2010, Art. no. 75.

[19] A. Criminisi, P. Perez, and K. Toyama, “Region filling and object re-moval by exemplar-based image inpainting,” IEEE Trans. Image Process.,vol. 13, no. 9, pp. 1200–1212, Sep. 2004.

[20] X. Yu et al., “Autostereoscopic three-dimensional display with high denseviews and the narrow structure pitch,” Chin. Opt. Lett., vol. 12, no. 6, 2014,Art. no. 060008.

[21] Y. Mori, N. Fukushima, T. Yendo, T. Fujii, and M. Tanimoto, “Viewgeneration with 3D warping using depth information for FTV,” SignalProcess., Image Commun., vol. 24, no. 1, pp. 65–72, 2009.

[22] Middlebury Stereo Datasets 2005 and 2006, [Online]. Available:http://vision.middlebury.edu/stereo/data/

[23] X. Ye, J. Yang, H. Huang, C. Hou, and Y. Wang, “Computational multi-view imaging with kinect,” IEEE Trans. Broadcast., vol. 60, no. 3,pp. 540–554, Sep. 2014.

[24] Y. S. Ho and S. B. Lee, “Joint multilateral filtering for stereo imagegeneration using depth camera,” in The Era of Interactive Media. Berlin,Germany: Springer-Verlag, 2013, pp. 373–383.

[25] I. Ahn and C. Kim, “Depth-based disocclusion filling for virtual viewsynthesis,” in Proc. IEEE Int. Conf. Multimedia Expo, Jul. 2012,pp. 109–114.

[26] N. Guo et al., “Automatic parameter estimation based on the degree oftexture overlapping in accurate cost-aggregation stereo matching for three-dimensional video display,” Appl. Opt., vol. 54, no. 29, pp. 8678–8685,2015.

Nan Guo received the Bachelor’s degree in electronicinformation engineering from Shandong Universityof Technology, Zibo, China, in 2011. She is cur-rently working toward the Ph.D. degree at the StateKey Laboratory of Information Photonics and Opti-cal Communications, Beijing University of Posts andTelecommunications, Beijing, China.

Her research interests include 3D display and com-puter vision.

Xinzhu Sang was born in Heze, Shandong province,China in 1977. He received the dual Bachelor’s de-grees in instrument science and management engi-neering from Tianjin University, Tianjin, China, andthe M.S. degree from Beijing Institute of Machinery,Beijing, China, in 1999 and 2002, respectively, andthe Ph.D. degree in physical electronics from BeijingUniversity of Posts and Telecommunications, Bei-jing, in 2005.

From December 2003 to March 2005, he was withthe Optoelectronics Research Centre, Department of

Electronic Engineering, City University of Hong Kong as a Research Assistant.From July 2007 to July 2008, he was a Postdoctoral Research Scholar with theUniversity of California, Irvine, CA, USA. He is currently with Beijing Univer-sity of Posts and Telecommunications as a Full Professor. His research interestsinclude three-dimensional display, holography, and novel photonic devices.

Dr. Sang is the Secretary-General of the Committee of Holography andOptical Information, Chinese Optical Society Processing, a Senior Member ofChinese Institute of Communication, and a Senior Member of the Chinese Insti-tute of Electronics. In 2011, he was selected for the Program for New CenturyExcellent Talents in University, and Beijing Nova Program of Science and Tech-nology.

Songlin Xie received the Bachelor’s degree in op-tic information science and technology from HefeiUniversity of Technology, Hefei, China, in 2012. Heis currently working toward the Ph.D. degree at theState Key Laboratory of Information Photonics andOptical Communications, Beijing University of Postsand Telecommunications, Beijing, China.

His research interests include 3D video process-ing, 3D display, and computer vision.

Peng Wang received the Bachelor’s degree in com-munication engineering from the University of Sci-ence and Technology Beijing, Beijing, China, in2012. He is currently working toward the Ph.D.degree at the State Key Laboratory of InformationPhotonics and Optical Communications, Beijing Uni-versity of Posts and Telecommunications, Beijing,China.

His research interests include computational imag-ing and computational display.

Authorized licensed use limited to: BEIJING UNIVERSITY OF POST AND TELECOM. Downloaded on February 13,2020 at 09:16:10 UTC from IEEE Xplore. Restrictions apply.