Improving Visual Quality of View Transitions in ...

Improving Visual Quality of View Transitions in Automultiscopic Displays

Song-Pei Du1,2 Piotr Didyk1 Frédo Durand1 Shi-Min Hu2 Wojciech Matusik1

1MIT CSAIL 2TNList, Tsinghua University, Beijing

Input light field I1

In

Stitching

Light field reconstruction

Input light field (EPI)

Global & local shearing Optimal cut for stitching

Figure 1: To reduce artifacts caused by the limited angular view coverage of an automultiscopic display, our technique performs light fieldmanipulation to improve its continuity. It takes as an input the original light field and performs a global shearing followed by a local shearingto align the scene around the screen plane and provide a good structure alignment between the first and the last views shown on the display,i. e., I1 and In. Next, replicas of the light field are overlapped, and an optimal stitching cut is found. As the last step, the method performs thestitching along the cut in the gradient domain and reconstructs the image. The part of the light field marked in orange can be then shown onthe screen.

Abstract

Automultiscopic screens present different images depending on theviewing direction. This enables glasses-free 3D and provides mo-tion parallax effect. However, due to the limited angular resolutionof such displays, they suffer from hot-spotting, i. e., image qual-ity is highly affected by the viewing position. In this paper, weanalyze light fields produced by lenticular and parallax-barrier dis-plays, and show that, unlike in real world, the light fields producedby such screens have a repetitive structure. This induces visual arti-facts in the form of view discontinuities, depth reversals, and exces-sive disparities when viewing position is not optimal. Although theproblem has been always considered as inherent to the technology,we demonstrate that light fields reproduced on automultiscopic dis-plays have enough degrees of freedom to improve the visual qual-ity. We propose a new technique that modifies light fields usingglobal and local shears followed by stitching to improve their con-tinuity when displayed on a screen. We show that this enhances vi-sual quality significantly, which is demonstrated in a series of userexperiments with an automultiscopic display as well as lenticularprints.

CR Categories: I.3.3 [Computer Graphics]: Picture/Imagegeneration—display algorithms,viewing algorithms;

Keywords: multi-view display, lighfield, view transitions

Links: DL PDF WEB

1 Introduction

Multi-view autostereoscopic (or automultiscopic) displays providean immersive, glasses-free 3D experience, which gives them the po-tential to become the future of television and cinema. By showing adifferent image depending on the viewer’s position, they reproduceboth binocular and motion parallax depth cues. This is typicallyachieved by adding a parallax barrier [Ives 1903] or a lenticularsheet [Lippmann 1908] atop a high-resolution display, which tradessome of the spatial display resolution for angular resolution. In or-der to provide an adequate binocular parallax, a sequence of imagesis shown within the primary field of view (Figure 2). Beyond it, thesame sequence repeats, forming additional viewing zones and ex-tending the effective field of view. The main advantage of suchsolutions is that they have the potential for providing immersiveglasses-free 3D for multiple users everywhere in front of the screen.We believe that such “stereo free-viewing” is a requirement for 3Ddisplays to succeed – think of a family watching a 3DTV at home.However, a problem arises when the left and right eyes fall into dif-ferent zones (Figure 2, viewpoint B). In this situation, a depth rever-sal, discontinuities, and excessive disparities occur. Moreover, thereversed depth usually creates a conflict between occlusion depthcue and binocular disparity. All these effects lead to significantquality reduction for non-optimal viewing positions. We refer tothis phenomenon as transitions.

The artifacts due to the limited extent of viewing zones in the cur-rent displays are widely recognized as a significant shortcoming.This restricts the usage of such screens in home applications andlarge scale visualizations. Existing solutions [Peterka et al. 2008;Yi et al. 2008; Ye et al. 2010] are based on hardware extensions,including head-tracking and dynamic parallax barriers. Althoughthey can reduce the problem, they are suitable only for a small num-ber of viewers (1 – 3). Furthermore, the additional hardware and theneed for real-time processing, which depends on the current view-ers’ position, make these approaches hard to implement in commer-cial devices such as 3DTVs.

In this paper, we demonstrate that multi-view content offers enoughdegrees of freedom to improve its quality by only modifying thedisplayed views. To this end, we first analyze light fields produced

http://doi.acm.org/10.1145/2661229.2661248

http://portal.acm.org/ft_gateway.cfm?id=2661248&type=pdf

http://people.csail.mit.edu/pdidyk/projects/MultiviewTransitions/

43214321432143214321 Flat screen

Parallax barrier

4 1 2 3 4 1 2 3 4 Correct depth

Reversed depth

Viewing zonesViews

Primary field of view

A

B

Figure 2: Illustration of a 4-view automultiscopic display with par-allax barrier. 3D viewing is correct within the marked viewing an-gle. The discontinuities between viewing zones (views 1 and 4) leadto visual artifacts.

by traditional lenticular as well as parallax barrier automultiscopicdisplays, and give a unified and comprehensive explanation of tran-sitions in the light field space. Based on this analysis, we propose anew method which optimizes input images to improve the perceivedquality at the transitions. In contrast to previous hardware solutions,our optimization does not require knowledge about viewer’s posi-tion, which makes the technique suitable for an arbitrary number ofobservers. It also does not require any hardware modifications andcan be used as a pre-processing step. To the best of our knowledge,this is the first technique of this kind. We demonstrate the resultsfor static images and video sequences using both parallax barriersand lenticular sheets. To further validate the quality improvement,we present a series of user experiments that analyze the advantagesof the optimized content.

2 Related Work

Besides the hardware solutions mentioned in the previous section,the problem of transitions has not been addressed before. Our workis mostly related to light field processing and manipulation tech-niques. It also takes advantage of a wealth of techniques for seam-less image and video compositing.

Light Field Processing and Manipulation A light field is a con-tinuous function that represents radiance emitted from the scene[Levoy and Hanrahan 1996]. Due to the discrete nature of ac-quisition and display stages, light fields are usually aliased. Sev-eral techniques have been developed to correctly reconstruct lightfields from recorded data [Isaksen et al. 2000; Stewart et al. 2003]and avoid both spatial and inter-view aliasing on automultiscopicdisplays [Zwicker et al. 2006; Konrad and Agniel 2006; Didyket al. 2013]. To further adjust content to a particular device, afew techniques for multi-view content depth manipulation havebeen proposed [Zwicker et al. 2006; Didyk et al. 2012; Masiaet al. 2013]. They focus on depth manipulations to achieve thebest trade-off between the blur introduced by inter-view antialias-ing and presented depth. To better adjust light fields to differentscreens, Birklbauer et al. [2012] proposed a retargeting techniquefor changing the size of a displayed light field. Also resolutionlimitation in light field reproduction has been recently addressed.Tompkin et al. [2013] introduced a novel technique to increase theresolution of lenticular prints by optimizing lenslet arrays based onthe input content. With an increasing interest in light field captureand display, several techniques for manipulating and editing suchcontent have been also introduced. These include methods for lightfield morphing [Zhang et al. 2002], deformation [Chen et al. 2005],and compositing [Horn and Chen 2007]. Lightfields provide alsoa great flexibility in the context of stereoscopic content production.

Kim et al. [2011] demonstrated a technique for generating stereoimage pairs with a per-pixel disparity control where each view isdefined as a 2D cut through the 3D lightfield volume.

Seamless Image/Video Compositing To avoid transitions, wewant to assure that the light field produced by an automultiscopicdisplay is continuous. To achieve this goal, our work is inspiredby the recent advances in seamless image and video editing. Inparticular, we rely on image stitching techniques [Levin et al. 2004;Jia et al. 2006; Jia and Tang 2008; Eisemann et al. 2011], whichcan combine different images into one natural-looking composition.Creating continuous light fields is also related to work on videotextures [Schödl et al. 2000; Agarwala et al. 2005], where the goal isto create sequences that can be played continuously and indefinitely.We also adapt work on video retargeting by Rubinstein et al. [2008].All of these techniques heavily rely on gradient-based compositing[Pérez et al. 2003; Agarwala 2007] and graph cut methods [Kwatraet al. 2003], which are also used in our technique.

3 Autostereoscopic Transitions

A standard autostereoscopic screen allows for displaying differentviews which are visible only from the corresponding locations. Ifthe views are displayed densely enough (i. e., with a high angularresolution), each eye can receive a different view, which leads tostereoscopic viewing. Due to the limited resolution of display pan-els, such screens can display only a limited number of views, forinstance, the high-end automultiscopic Philips BDL5571VS/00 has28 views. This allows for reproducing only a small part of the lightfield observed in the real world. In this section, we analyze thelight field produced by such displays and show how this limitationimpacts the perceived quality. We restrict our discussion to a paral-lax barrier displays; however, the same analysis holds for lenticularbased systems.

3.1 Scene vs. Display Light Field

A light field is a function that describes the light traversing a scene.In this work, our goal is to analyze the light field produced by au-tomultiscopic displays. For this purpose, it is enough to consider alight field as a four-dimensional function L parametrized using twoparallel planes (s, t) and (u,v). In such parametrization, L (s, t,u,v)corresponds to the image value obtained by intersecting a scenewith a ray originating from the first plane at the location (s, t) andpassing through the second plane at the location (u,v). For visual-ization purposes, we limit our discussion to epipolar-plane images(EPI) which are 2D slices through the 4D light field (i. e., param-eters t and v are fixed). Such images correspond to a stack of 1Dskylines captured from different viewing locations along the hori-zontal direction. In particular, every point in the scene correspondsto a line in the image, and the slant of this line encodes the depthof the point. Figure 3 shows a simple scene with a correspondinglight field. Although the light field is limited in the figure, it extendsfurther along the s and u axes.

To visualize the light field produced by an automultiscopic display,we embedded the screen in the scene (Figure 3, top). For every slitof the automultiscopic display, only a small range of directions canbe shown (green cones). The signal shown in these cones is alsorepeated at other locations along the s axis (red cones), althoughit does not corresponds to these locations. This creates repetitionsin the light field created by the screen. The colored boxes in Fig-ure 3 demonstrate how a fragment of the original light field (middle,dashed line) is encoded in the panel (top), and how this fragmentforms replicas in the screen light field (bottom). The repetitive struc-ture of this light field creates discontinuities (bottom, dashed line),

s

u

s

u

s

s c) Light field produced by the screen (EPI)

a) Scene with embeded screen

panel

parallax barrier

viewing coness1

u1

r1

r1

r2

b) Light field of the scene (EPI)

s1

r2

s2

u2

uu1

s2

u2

Figure 3: A simple scene (top) and the corresponding light field(middle). Rays r1 and r2 show the relationship between the sceneand the light field representation. To visualize the relation betweenthe scene light field and the light field produced by the display, weembedded the screen in the scene (top). For simplicity, we assumedits alignment with the u axis. The bottom image shows the lightfield that is produced by the display. Due to the limited angularcoverage of the display with views (green cone), the screen is ableto reproduce only a part of the original light field (marked with thedashed line in (b)). Beyond this range, the screen creates replicasof the light field, which results in discontinuities (dashed line in (c)).

which can significantly affect the quality of perceived images.

3.2 Repetitive Light Field and Quality

The images observed on an automultiscopic screen correspond to acut through the light field created by the screen. For example, for aviewing location located on the s axis, the view is a horizontal cut(x1 and x2 in Figure 4). As the observer moves along the s axis,different skylines of the EPI are observed. Whenever the viewermoves away from the s axis, the observed image no longer corre-sponds to a skyline, but to a slanted line (x3 and x4 in Figure 4).

The repetitive structure of the light field produced by an automul-tiscopic display may lead to visual artifacts. For example, when aview corresponds to a slanted line in the EPI, it may cross severalreplicas of the original light field. This creates a discontinuity inthe perceived image at locations that correspond to the boundariesof the replicas. Furthermore, when the observer moves, these arti-facts become more apparent as they change their location. Pleasesee our interactive demo and supplementary video for a visualiza-tion.

The discontinuities have also a significant influence on depth per-ception. In EPIs, the depth of the scene is encoded by the slops oflines that correspond to the same points in the scene. In contrast,the perceived depth is related to the slope of the line that passesthrough the intersections of the line corresponding to a given pointin the scene and the lines corresponding to the left- and right-eyeview (Figure 5). In the case when both eyes see the same replica

u

s

Scene

x1 x2

x3 x4

u

sLight �eld

x1

x2

x3

x4

Figure 4: The figure visualizes a scene together with its light fieldin the two-plane parametrization. Images observed on the screencorrespond to linear cuts through the epipolar-plane images. Whileviews located on the s axis correspond to horizontal lines, viewingpositions, which are away from the axis, result in slanted lines. Asslants encode the distance to the scene, two viewing positions lo-cated at the same distance result in parallel lines.

of the light field (Figure 5a), the perceived depth is correct. How-ever, when the views correspond to different replicas, the estima-tion of the depth may be wrong (Figure 5b). In particular, the signof the slope changes, creating a depth reversal or excessive dispar-ities, which may lead to viewing discomfort [Shibata et al. 2011].Depending on the viewing position, the depth reversal can be ob-served in the entire image or only in some parts (Figure 5c). InFigure 6, we provide an example of a stereoscopic image with andwithout a depth reversal. For an interactive demonstration pleasesee our demo.

u

s

}

}

}

a

c

b

Figure 5: A simple light field produced by an automultiscopic dis-play showing three objects at different depths with three differentstereoscopic viewing locations indicated by pairs of dashed lines(each line corresponds to a view for one eye). Insets on the rightpresent close-ups at the light field. The relative slopes of the blacksolid lines with respect to the dashed lines correspond to the per-ceived depth. Green and red colors indicate correct (green) andwrong (red) depth reproduction. When both eyes look at the samereplica (a), the slopes (depth) estimated by the observer are cor-rect. When both eyes see different replicas (b), the estimated slopeschange the sign, and reversed depth is produced for the whole im-age. When the observer moves away from the plane s, the linescorresponding to the views are slanted and depth may be estimatedincorrectly for some parts of the image (c).

4 Our Method

We introduce the idea of modifying multi-view content to reducethe artifacts caused by the discontinuities between viewing zonesof an automultiscopic display. Our key idea is that by applyingsubtle modifications to the input content, we can improve the conti-nuity of the light field at those locations, thereby hiding the displayimperfections.

4.1 Light Field Shearing

Our first observation is that in specific cases, discontinuities in alight field can be completely removed if the multi-view content iscarefully designed. For example, we can utilize a repetitive struc-

Figure 6: The figure presents stereoscopic images (in anaglyph col-ors) with (left) and without (right) depth reversals. The image onthe left has a depth reversal as the views are composed of two dif-ferent replicas of the original light field (EPI on top).

ture in a scene. Figure 7 shows a light field produced by an automul-tiscopic display for a scene with a periodic pattern located at a cer-tain depth (left). By applying a small horizontal shear to the originallight field, a new light field without any transitions can be obtained(right). As the slope of each line corresponds to scene depth, such

Example Light Field

Repeated

Original Sheared

Figure 7: For a simple planar pattern, the discontinuities in theoriginal light field can be removed by applying a horizontal shear.

an operation corresponds to re-positioning the entire scene alongthe depth direction. Although this modifies the absolute depth, itdoes not significantly affect local depth changes, which dominatedepth perception [Brookes and Stevens 1989]. Inspired by this ob-servation, our first step towards reducing discontinuities in a lightfield is a global horizontal shear followed by small local shears thatfurther improve results. The difference between local and globalshears is visualized in Figure 8.

Global shearing

Local shearing

Figure 8: Visualization of global and local shear: input light fieldon the left, output on the right. The arrows indicate the magnitudeof the shear applied at each location.

Global shear is defined by one value s, which encodes the amountof shear that needs to be applied to the last view of the light fieldshown on a screen to best match the first one. Instead of modifyingindividual EPIs separately, we apply the shear to the entire 3D lightfield, and compute the optimal shear on full 2D views using thefollowing formula:

argmins

1Np

∑(x,y)

Q(I1, In,x,y,s) (1)

where I1 and In are the first and last views presented on the screen,Np is the total number of pixels, and Q is a matching error betweenthe local neighborhood of a pixel (x,y) in I1 and the neighborhood

of (x+ s,y) in In. Here, we adopt a matching function introducedby Mahajan et al. [2009], which was originally proposed for opticalflow correspondence:

Q(I1, In,x,y,s) =

√j∇I1(x,y) − ∇In(x+ s,y)j2 +0.5jI1(x,y) − In(x+ s,y)j2

σ(I1,x,y) �σ(In,x+ s,y)(2)

where ∇I is the gradient of image I and σ(I,x,y) represents thestandard deviation in the 9�9 neighborhood of pixel (x,y) in view I.To find the best s we iterate over integer values in the range betweensmin and smax and choose a value that results in the smallest valueof the matching function. We found that values smin = − 200 andsmax = 200 were sufficient for all results presented in this paper.

Local Shears The optimization defined by Equation 1 finds a largeglobal shear that minimizes the matching error between the first andthe last view. To further improve the continuity of the light field, wecan refine it using small local shears. The idea is similar to findingan optical flow between the two views and minimizing the differ-ence between them using a warp guided by the flow field. Sucha solution, however, would lead to flattening the entire scene. Toprevent this, we restrict the local warps to be small, which resultsin matching similar structures instead of the same objects. Find-ing a dense correspondence between views may also introduce anadditional problem of disocclusions. This may lead to significantcompression and stretching artifacts during the warping. To avoidthese problems we define a regular grid (20�20), and find the opti-mal shears only for the grid points. This allows us to find good shearmagnitudes that vary smoothly across different locations. Later, thecoarse grid is warped to improve the continuity of the light field.During this process, the problematic regions are filled-in using theneighboring signal.

The problem of finding the optimal local shear can be formulated asa minimum graph cut. To this end, for each grid point (i, j) we cre-ate multiple nodes (i, j,s), where s spans the whole range of integervalues from [s′min,s

′max], and corresponds to different magnitudes of

shear considered at each location. In our implementation we uses′min = − 10 and s′max = 10. The edges in the graph are between(i, j,s) and (i, j,s+1), and encode the cost of the shear s at the po-sition (i, j). The cost is defined as E(i, j,s) = Q(I1, In, i, j,s). Inorder to find a cut that defines the optimal shears, we add a sourceand a target node (S,T ) to the graph, which are connected to all(i, j,s′min) and (i, j,s′max) respectively. Additionally, to guaranteethat the cut is continuous and passes through every position (i, j)only once, we adapt the idea of forward edges introduced by Ru-binstein et al. [2008], and add additional edges with an infinitecost (Figure 9, right). For more details please refer to the origi-nal method. After finding the optimal cut of the graph, the amountof shear at the position (i, j) is defined by the location of the cut,i. e., if the cut goes through edge (i, j,s)! (i, j,s+1), the optimalshear for position (i, j) is s. In order to apply optimal shears to thelight field, we first propagate them from In to all views using linearinterpolation, assuming that view I1 receives zero shear. Then, ev-ery view is separately sheared by warping the grid together with theunderlying view.

4.2 Light Field Stitching

Shearing techniques that are discussed in the previous section canalready align the structure of the repeating light field fragments.However, sharp color differences can still remain visible. In orderto reduce them, we apply an additional compositing of repeatinglight field structures in a gradient domain. Inspired by image/videostitching and retargeting techniques [Jia et al. 2006; Jia and Tang2008; Rubinstein et al. 2008; Eisemann et al. 2011], we use a sim-ilar technique to further hide the transitions. To this end, we firstcreate two copies of the original light field and overlap them by m

Inpu

tGl

obal

shea

rO

urm

etho

d

Figure 10: Examples of light fields produced by our technique. From the top: input light fields, only global shearing applied, our full method,insets with close-ups at the discontinuities.

cut

Figure 9: The local warping could be regarded as a continuous cutof the function E (left). The weights on edges are shown on the right.For simplicity, we show the forward edges (dashed lines) only in 2D.In general, forward edges are added in the s − x and s − y planes.

views along the s direction. Then, we find a cut through the overlap-

u

vs

Figure 11: Light field stitching. We overlap light field copies andfind the optimal cut. The reconstructed light field is shown at thebottom.

ping part, which provides us with a surface where both replicas fitbest. This cut, similarly to finding good shears, can be found usingthe graph cut technique. We first transform the overlapping lightfield volume into a graph, where each voxel (s,u,v) correspondsto a node. The edges between (s,u,v) and (s+ 1,u,v) encode thecost of the cut between these two voxels. The goal of this cost isto penalize large differences in gradients between the overlappingreplicas, and we express it as:

C(u,v,s) =jj∇suL (s,u,v) − ∇suL (n − m+1+ s,u,v)jj+jj∇suL (s+1,u,v) − ∇suL (n − m+2+ s,u,v)jj,

(3)

where ∇suL is the (s,u) component of the light field gradient, n isthe total number of views, and m is the number of views that areoverlapped. (s,u,v) and (n − m+ 1+ s,u,v) as well as (s+ 1,u,v)and (n − m+ 2+ s,u,v) are positions that are directly overlapping.Similarly to the construction of the graph for the local shearing, weadd forward edges with an infinite cost as well as a source and a tar-get node to perform minimal graph cut. After finding the optimalcut in the graph, we stitch gradients of the overlapping light fieldreplicas along the cut, and compute the full light field by recon-structing each EPI separately using Poisson reconstruction [Pérezet al. 2003]. The whole process is visualized in Figure 11. Thewidth of the overlap m controls the number of views that are af-fected by the method. From our experience, around 4 views are nec-essary to create a good transition between different viewing zones.Therefore, for a display with 8 views we use m = n/2. For displaysthat offer higher numbers of views, smaller m can be used. In ourresults, we use n/2 and n/4.

4.3 Extension to Videos

So far we have described our method only for static light fields. Thedirect extension of the shearing and stitching to videos includes acomputation of a minimal graph cut for a 4D volume and Poissonreconstruction in 3D. To avoid high computation costs, a full com-putation can be performed for every k-th frame. In our experimentswe use k = 50. Later, the shears as well as the cuts can be linearlyinterpolated for the remaining frames. Besides better performance,such a solution provides temporally coherent results (see supple-mentary materials).

5 Results

We have tested our technique on a variety of images and videos. Fig-ure 10 shows comparison of the individual EPIs for several scenes.Compared to the original light field and to one where only globalshear was applied, our full technique provides much smoother re-sults. In many high frequency regions, our method successfullyfinds the local repetitive structures and eliminates transitions. Thestitching propagates transitions optimally into different views in dif-ferent regions, making them less pronounced. Additionally, in Fig-ure 12, we provide a comparison of four views that are generatedusing our method to a version where only global shear is appliedto align both light fields in the same way around the screen. Ourresults present smoother transitions with less pronounced disconti-nuities and depth reversals as well as fewer diagonal strips.

Full

tech

niqu

eA

fter

glo

bal s

hear

Figure 12: The comparison of the results obtained using full technique and only global shearing on an automultiscopic screen (simulation).For better perception of depth, we remove the colors and show a monochrome anaglyph; please refer to the supplementary materials for fullresults.

Performance Processing one multi-view image composed of 100views (1200�800 pixels) using our Matlab implementation takes 1minute. This includes 5 seconds for shearing and stitching, and 55seconds for Poisson reconstruction which is the main bottleneck asit needs to be performed for each epipolar plane image separately.Processing 80 frames of a multi-view video in resolution 800�540takes almost 1 hour, which is also limited by the Poisson recon-struction. We believe that the performance of our technique canbe greatly improved. For example, the method is highly paralleliz-able, i. e., every shot can be processed separately. Also, for slowlychanging scenes, the full computation can be performed for fewerframes. These improvements together with a GPU implementation,can reduce the computation time significantly.

6 Evaluation

In order to evaluate the quality improvement provided by our tech-nique, we conducted user experiments in which we first comparedperformance of our automatic global shear with manual adjustmentdone by users and then evaluated the full technique. In all experi-ments 16 participants took part. They had normal or corrected-to-normal vision and were tested for stereoblindness.

6.1 Manual Adjustment vs. Global Shear

The global shear adjusts the position of the entire scene with re-spect to the screen plane. A similar, manual correction is a commonpractice to reduce the need of inter-view antialiasing [Zwicker et al.2006; Didyk et al. 2013] and visual discomfort [Shibata et al. 2011].To validate our approach, we compared it to the same operation per-formed manually. To acquire the optimal correction, three videosequences (Figure 13) were presented to each participant. The se-quences were displayed using a 42-inch 8-view automultiscopicNewsight Japan display. The participants were asked to sit 1.5 maway from the screen, which is the optimal viewing distance for thisdisplay, and adjust the global depth of the scene until the best view-ing quality is achieved. From collected data, the average adjustmentfor each scene was computed. The same content was processed us-ing our global shear. The resulting adjustments were compared withthe results from the user study. The small differences and relativelyhigh inter-subject variability (Table 1) suggest that the differencebetween the global shear and the manual adjustments is small. Al-though it is unclear whether this observation holds in general, in thefurther evaluation we decided to compare our full technique only tothe global shear as both of them provide an automatic solution.

Train Couch Teapot

Bunny Llama DogIm

ages

Vide

o se

quen

ces

Figure 13: Examples used in our experiments. COUCH [Kimet al. 2013], TRAIN and DOG were captured using a camera array.BUNNY, TEAPOT and LLAMA were rendered.

Scene ∆ adjustment σ SEMBUNNY 0.5 px 1.3 0.3LLAMA -0.2 px 0.7 0.17

DOG -0.32 px 0.6 0.15

Table 1: Statistics for the manual adjustment of the content. ∆ ad-justment represents the difference between correction provided byour global shear and manual adjustment provided by users. The dif-ference is expressed as a change of the disparities between neigh-boring views and measured for Full HD resolution. Additionally,standard deviation (σ ) and standard error of the mean (SEM) areprovided.

6.2 Global Adjustment vs. Full Technique

In the second experiment, we compared the results obtained usingour automatic global shear with sequences produced using the fulltechnique. In order to evaluate how our method performs for differ-ent viewing locations, 8 viewing positions were tested. In each trial,participants were asked to stay in one location which was markedon the floor. They were presented with both versions of the content,and could switch between them using a keyboard. The task was tojudge which one provided a better viewing experience. There wereno other specific instructions provided to the observers. Each ofthem performed the task for every location and for all videos. Thearrangement of the experiment and the results for all the viewpointsare shown in Figure 14. Although the participants were instructed

to remain in the same locations, some of them were adjusting theirhead position to find a better viewing spot. We consider this to be anatural scenario for watching such content. In 68% of cases, partic-

Screen

Optimal viewing distance

0 1m

Global shear onlyOurs81%

19%

73%

27%

73%

27%

64%

36%

69%

31%

71%

29%

54%

46%

58%

42%

Figure 14: Results of the experiment for automultiscopic displays.The numbers correspond to percentage of people who chose ourfull technique (green) and the content processed using only globalshear (yellow). The places for which the results were not statisti-cally significant are marked in red.

ipants preferred sequences processed using the full technique. Thisdemonstrates that besides improving view transitions our methoddoes not introduce temporal artifacts which would reduce the qual-ity of the content. To test significance of the results we performed aseries of binomial tests. All results were significant (p < 0.05) ex-cept for two viewing positions located in the optimal distance fromthe screen. In this experiment, all views were affected by our tech-nique. We believe that for displays with larger numbers of views,this will not be required, and better quality can be achieved.

We also conducted a similar experiment with 4 different static lightfields: TRAIN, COUCH, TEAPOT and BUNNY (Figure 13). As pre-viously, each was prepared in two versions: one processed with thefull technique, and the other with the global shear only. Both ver-sions were downsampled to 18 views, printed at 720 DPI and gluedto lenticular sheets with 40 lenses per inch with a viewing angle of25◦. For better comparison, both versions were printed on a sin-gle sheet. We also produced additional sheets where the versionswere swapped for randomization. In every trial, one stimulus wasshown to a participant who could look at each lenticular sheet fromdifferent angles and distances. They were allowed to take as muchtime as they wanted to investigate the print. Afterwards, they wereasked to decide which version offered a better viewing experience.Similarly to previous experiments, we did not provide any furtherguidelines, so every participant was free to use any criteria to judgethe quality. The results of the experiment are shown in Figure 15.In more than 80% of trials, participants preferred results producedusing our technique. To verify significance of of our results, wecomputed a binomial test for each scene separately. For all exam-ples the obtained p-value was below 0.05.

7 Discussions

In our approach, we take two main steps to process the light field:shearing (global and local) and stitching. To better justify the role ofeach operation, we provide examples of our technique with certainsteps omitted in Figure 16.

Our technique cannot remove transitions and depth reversals com-pletely as this would require complete flattening of the scene, i. e.,falling back to 2D viewing. However, due to our stitching across

0%

20%

40%

60%

80%

100%81%

19%

94%

6%

Train Couch Bunny Teapot All

Ours Global shear only

81%

19%

75%

25%

75%

25%

Figure 15: Results of our experiment with lenticular prints. Foreach scene, the bars represent a percentage of subjects who judgedthe quality of our (green) and global shear (yellow) as higher.

Raw light �eld

Global shear

Global & local shear

Full technique

Figure 16: Results of our technique with certain steps omitted. Theglobal shear of the original light field reduced excessive disparitiesand provided a good alignment of the blue lines. Local shearingsignificantly improved the discontinuities between the red lines. Inthis case, this is achieved by locally sacrificing the continuity of theblue lines, which does not in�uence significantly the overall qualityacross all views. The complete technique was able to match thered and yellow lines to further improve the continuity between thereplicas. The Poisson reconstruction was applied in all examples.

many views (Section 4.2), the remaining artifacts are distributedacross different views. In contrast to the original content, wheretransition areas are large and have obvious structure, the artifactsthat remain after our technique was applied are small and local. Todemonstrate this, we followed suggestions by Filippini et al. [2009]that some stereoscopic effects can be explained by local cross-correlation. Consequently, we used this approach to compute depthfor sequences before and after our techniques was applied (Fig-ure 17). Although it is an objective comparison, it cannot be treatedas an accurate prediction of what the observer perceives. From ourexperience, small reversals are usually less objectionable. This canbe explained by the fact that people are less sensitive to high fre-quency depth variations.

Although distributing transitions across different views may af-fect the sweet-spot viewing, it was not manifested in our experi-ments. We believe that for displays with more views (e. g., PhilipsBDL2331VS/00 has 28 views), the stitching can be performed onlyon a small part of the light field near viewing zone boundaries. Inour examples for m = n/4, the resulting light field contains 3/4 ofall views, and our stitching step affects 1/4 of them. As a result, thecontent of 2/3 of all views shown on the screen remains unchanged.To avoid limiting the number of input views that are shown on thescreen, view synthesis techniques can be used to create additionalviews for the purpose of stitching (Figure 18).

Depending on the display (e.g., number of views) and the viewingconditions (e.g., range of viewing distances) different sets of viewsmight be needed. To avoid performing expensive computation on

Our

sO

rigin

al

Figure 17: Perceived depth computed using cross correlation fortwo original views, and the same views processed using our tech-nique. While the original content produces pronounced depth re-versals, our technique provides views for which resulting depth con-tains less structured depth errors.

Orig

inal

Expa

nded

Resu

lt

Figure 18: Here, the original light field (100 views) is expanded to200 views using image-based warping. After stitching, the resultinglight field has the same number of views as the input content. Pleaserefer to the supplementary materials for the complete light field.

the display, several versions of the content can be prepared and de-livered to the client. In the case of streaming systems, the correctversion can be sent alone.

Our technique benefits from repetitive patterns, which are commonin many natural scenes (e. g., trees, grass, ground, clouds). As canbe seen in Figure 10, our local shearing finds good matches and im-proves the quality of the light field. For some cases our method haslimited options, but it can reduce the transitions problem by placingthese objects close to the screen plane, which is also desirable foreye strain reduction.

8 Conclusions

In this paper, we analyzed the light field created by standard auto-multiscopic screens. First, we showed that the limited angular cov-erage results in a periodic structure of the light field with disconti-nuities between repeating fragments. Then, we explained how thisleads to a number of artifacts such as view discontinuities, depthreversals, and excessive disparities. To the best of our knowledge,we are the first to address these issues together with their analy-sis and explanation. To overcome these limitations, we presentedthe novel idea of modifying the 3D content shown on automulti-scopic screens. Our method improves the continuity of the lightfield, which leads to significant visual quality improvements. Theperformance of the method was demonstrated on static images aswell as videos, and validated in user experiments. Additional ad-vantage of our technique is device- and view-independence, i. e.,it does not utilize any information about display type or viewers’

positions. These together with the fact that it is a purely softwaresolution make it attractive as a pre-processing step for a wide rangeof applications.

Extending our method to a full-parallax display is an exciting av-enue for future work and we consider this a non-trivial problem.First, the analysis of the problem needs to be extended from 2DEPI images to 3D. Then, the technique would need to enforce therepetitive structure in both horizontal and vertical directions. Ourtechnique does not apply directly to multi-layer displays; however,we believe that some ideas from our work could be used to expandtheir small field of view. Another interesting possibility would beto combine such manipulations with depth remapping methods andinter-view antialiasing as well as to improve performance for real-time applications. We believe that our technique will be beneficialnot only for 3DTV applications and 3D visualizations, but also forlarge scale projector-based cinema systems.

Acknowledgments

We would like to thank Katarina Struckmann and KrzysztofTemplin for proofreading, and the Anonymous Subjects whotook part in our experiments. This work was partially sup-ported by NSF IIS-1111415, NSF IIS-1116296, Quanta Computer,the National Basic Research Project of China (Project Number2011CB302205), the Natural Science Foundation of China (ProjectNumber 61272226/61120106007), the National High TechnologyResearch and Development Program of China (Project Number2013AA013903), and Research grant of Beijing Higher InstitutionEngineering Research Center.

References

AGARWALA, A., ZHENG, K. C., PAL, C., AGRAWALA, M., CO-HEN, M., CURLESS, B., SALESIN, D., AND SZELISKI, R.2005. Panoramic video textures. ACM Trans. Graph. 24, 3, 821–827.

AGARWALA, A. 2007. Efficient gradient-domain compositing us-ing quadtrees. ACM Trans. Graph. 26, 3, 94.

BIRKLBAUER, C., AND BIMBER, O. 2012. Light-field retargeting.Computer Graphics Forum 31, 2pt1, 295–303.

BROOKES, A., AND STEVENS, K. A. 1989. The analogy betweenstereo depth and brightness. Perception 18, 5, 601–614.

CHEN, B., OFEK, E., SHUM, H.-Y., AND LEVOY, M. 2005. In-teractive deformation of light fields. In Proceedings of the 2005symposium on Interactive 3D graphics and games, ACM, 139–146.

DIDYK, P., RITSCHEL, T., EISEMANN, E., MYSZKOWSKI, K.,SEIDEL, H.-P., AND MATUSIK, W. 2012. A luminance-contrast-aware disparity model and applications. ACM Trans.Graph. 31, 6, 184.

DIDYK, P., SITTHI-AMORN, P., FREEMAN, W., DURAND, F.,AND MATUSIK, W. 2013. Joint view expansion and filtering forautomultiscopic 3D displays. ACM Trans. Graph. 32, 6, 221:1–221:8.

EISEMANN, M., GOHLKE, D., AND MAGNOR, M. 2011. Edge-constrained image compositing. In Proceedings of Graphics In-terface 2011, Canadian Human-Computer Communications So-ciety, 191–198.

FILIPPINI, H. R., AND BANKS, M. S. 2009. Limits of stereopsisexplained by local cross-correlation. Journal of Vision 9, 8–8.

HIRSCH, M., WETZSTEIN, G., AND RASKAR, R. 2014. A com-pressive light field projection system. ACM Trans. Graph. 33, 4,1–12. to appear.

HORN, D. R., AND CHEN, B. 2007. Lightshop: interactive lightfield manipulation and rendering. In Proceedings of the 2007symposium on Interactive 3D graphics and games, ACM, 121–128.

ISAKSEN, A., MCMILLAN, L., AND GORTLER, S. J. 2000. Dy-namically reparameterized light fields. In Proceedings of the27th Annual Conference on Computer Graphics and InteractiveTechniques, ACM Press/Addison-Wesley Publishing Co., 297–306.

IVES, F. E., 1903. Parallax stereogram and process of making same.U.S. Patent 725,567.

JIA, J., AND TANG, C.-K. 2008. Image stitching using structuredeformation. Pattern Analysis and Machine Intelligence, IEEETransactions on 30, 4, 617–631.

JIA, J., SUN, J., TANG, C.-K., AND SHUM, H.-Y. 2006. Drag-and-drop pasting. ACM Trans. Graph. 25, 3, 631–637.

KIM, C., HORNUNG, A., HEINZLE, S., MATUSIK, W., ANDGROSS, M. 2011. Multi-perspective stereoscopy from lightfields. ACM Trans. Graph. 30, 6, 190.

KIM, C., ZIMMER, H., PRITCH, Y., SORKINE-HORNUNG, A.,AND GROSS, M. 2013. Scene reconstruction from high spatio-angular resolution light fields. ACM Trans. Graph..

KONRAD, J., AND AGNIEL, P. 2006. Subsampling models andanti-alias filters for 3-D automultiscopic displays. Image Pro-cessing, IEEE Transactions on 15, 1, 128–140.

KWATRA, V., SCHÖDL, A., ESSA, I., TURK, G., AND BOBICK,A. 2003. Graphcut textures: image and video synthesis usinggraph cuts. ACM Trans. Graph. 22, 3, 277–286.

LEVIN, A., ZOMET, A., PELEG, S., AND WEISS, Y. 2004. Seam-less image stitching in the gradient domain. In Computer Vision-ECCV 2004. Springer, 377–389.

LEVOY, M., AND HANRAHAN, P. 1996. Light field rendering. InProceedings of the 23rd annual conference on Computer graph-ics and interactive techniques, ACM, 31–42.

LIPPMANN, G. 1908. Épreuves réversibles donnant la sensationdu relief. Journal of Physics 7, 4, 821–825.

MAHAJAN, D., HUANG, F.-C., MATUSIK, W., RAMAMOORTHI,R., AND BELHUMEUR, P. 2009. Moving gradients: a path-based method for plausible image interpolation. ACM Trans.Graph. 28, 3, 42.

MASIA, B., WETZSTEIN, G., ALIAGA, C., RASKAR, R., ANDGUTIERREZ, D. 2013. Display adaptive 3D content remapping.Computers & Graphics 37, 8, 983–996.

PÉREZ, P., GANGNET, M., AND BLAKE, A. 2003. Poisson imageediting. ACM Trans. Graph. 22, 3, 313–318.

PETERKA, T., KOOIMA, R. L., SANDIN, D. J., JOHNSON, A. E.,LEIGH, J., AND DEFANTI, T. A. 2008. Advances in the Dynal-lax Solid-State Dynamic Parallax Barrier Autostereoscopic Vi-sualization Display System. IEEE Transactions on Visualizationand Computer Graphics 14, 487–499.

RUBINSTEIN, M., SHAMIR, A., AND AVIDAN, S. 2008. Improvedseam carving for video retargeting. ACM Trans. Graph. 27, 3,16:1–16:9.

SCHÖDL, A., SZELISKI, R., SALESIN, D. H., AND ESSA, I. 2000.Video textures. In Annual Conference on Computer Graphics,SIGGRAPH ’00, 489–498.

SHIBATA, T., KIM, J., HOFFMAN, D., AND BANKS, M. 2011.The zone of comfort: Predicting visual discomfort with stereodisplays. Journal of Vision 11, 8, 11:1–11:29.

STEWART, J., YU, J., GORTLER, S. J., AND MCMILLAN, L.2003. A new reconstruction filter for undersampled light fields.In Proceedings of the 14th Eurographics workshop on Render-ing, Eurographics Association, 150–156.

TOMPKIN, J., HEINZLE, S., KAUTZ, J., AND MATUSIK, W. 2013.Content-adaptive lenticular prints. ACM Trans. Graph. 32, 4,133:1–133:10.

YE, G., STATE, A., AND FUCHS, H. 2010. A practical multi-viewer tabletop autostereoscopic display. In Mixed and Aug-mented Reality (ISMAR), 2010 9th IEEE International Sympo-sium on, IEEE, 147–156.

YI, S.-Y., CHAEAND, H.-B., AND LEE, S.-H. 2008. Moving par-allax barrier design for eye-tracking autostereoscopic displays.In 3DTV Conference.

ZHANG, Z., WANG, L., GUO, B., AND SHUM, H.-Y. 2002.Feature-based light field morphing. ACM Trans. Graph. 21, 3,457–464.

ZWICKER, M., MATUSIK, W., DURAND, F., AND PFISTER, H.2006. Antialiasing for automultiscopic 3D displays. In Proceed-ings of the 17th Eurographics conference on Rendering Tech-niques, Eurographics Association, 73–82.

Improving Visual Quality of View Transitions in ...

Documents

Transcript of Improving Visual Quality of View Transitions in ...