Depth consistency and vertical disparities in...

Depth consistency and verticaldisparities in stereoscopic panoramas

Luis E. GurrieriEric Dubois

Pichon

Typewriter

Luis E. Gurrieri and Eric Dubois "Depth consistency and vertical disparities in stereoscopic panoramas", J. Electron. Imaging. 23(1), 011004 (Jan 29, 2014). Copyright (2014) Society of Photo-Optical Instrumentation Engineers. One print or electronic copy may be made for personal use only. Systematic reproduction and distribution, duplication of any material in this paper for a fee or for commercial purposes, or modification of the content of the paper are prohibited. http://dx.doi.org/10.1117/1.JEI.23.1.011004

Pichon

Typewriter

Pichon

Typewriter

Pichon

Typewriter

Depth consistency and vertical disparities in stereoscopicpanoramas

Luis E. Gurrieri* and Eric DuboisUniversity of Ottawa, School of Electrical Engineering and Computer Science, Ottawa, Ontario K1N 6N5, Canada

Abstract. In recent years, the problem of acquiring omnidirectional stereoscopic imagery of dynamic scenes hasgained commercial interest, and consequently, new techniques have been proposed to address this problem.The goal of many of these new panoramic methods is to provide practical solutions for acquiring real-time omni-directional stereoscopic imagery for human viewing. However, there are problems related to mosaicking partiallyoverlapped stereoscopic snapshots of the scene that need to be addressed. Among these issues are the con-ditions to provide a consistent depth illusion over the whole scene and the appearance of undesired verticaldisparities. We develop an acquisition model capable of describing a variety of omnistereoscopic imaging sys-tems and suitable to study the design constraints of these systems. Based on this acquisition model, we comparedifferent acquisition approaches based on mosaicking partial stereoscopic views of the scene in terms of theirdepth continuity constraints and the appearance of vertical disparities. This work complements and extends ourprevious work in omnistereoscopic imaging systems by proposing a mathematical framework to contrast differ-ent acquisition strategies to create stereoscopic panoramas using a small number of stereoscopic images.©2014SPIE and IS&T [DOI: 10.1117/1.JEI.23.1.011004]

Keywords: stereoscopic panoramas; omnistereo; omnistereoscopic acquisition; omnistereoscopic cameras; vertical disparities; pano-ramic depth perception.

Paper 13494SS received Aug. 31, 2013; revised manuscript received Dec. 20, 2013; accepted for publication Jan. 7, 2014; publishedonline Jan. 29, 2014.

1 IntroductionThe problem of acquiring stereoscopic panoramas ofdynamic scenes has gained relevance in recent years and,consequently, new acquisition methods have been proposed.1

The goal of many of these novel panoramic methods is toacquire stereoscopic panoramas of real-world scenes and torender stereoscopic views suitable for human viewing.2–4

In particular, methods based on the acquisition of partiallyoverlapped stereoscopic snapshots of the scene are themost attractive for real-time omnistereoscopic capture.5,6

However, there is a need to rigorously model these acquis-ition techniques in order to provide useful design constraintsfor the corresponding omnidirectional stereoscopic systems.

This paper is about the limitations of omnistereoscopicsystems based on acquiring partially overlapped stereoscopicviews of the scene from two distinct and coplanar viewpointswith horizontal parallax. In particular, we study the mosaick-ing problem that has become relevant due to recent develop-ments in stereoscopic panoramic video.6,7 However, thispaper does not address the problems related specifically topanoramic video.

An important problem often ignored in omnistereoscopicacquisition techniques is the continuity in the illusion ofdepth perceived over all gazing directions. This problemis relevant for omnistereoscopic systems based on acquiringpartially overlapping stereoscopic snapshots of the scene tobe mosaicked into a complete stereoscopic panorama.4,5,7,8

One important parameter used to contrast different acquisi-tion techniques is the minimum distance to the scene to pro-vide a continuous illusion of depth in any gazing direction.

Another important problem is to characterize the vertical dis-parities that cause ghosting and visual discomfort at thestitching boundaries between mosaics. In our simulations,we studied the effect of the field of view (FOV) of the lenses,and the pixel size and dimension of the sensor in the designof the system.

In order to study these parameters, we propose a generalacquisition model to describe a variety of multiple camerasystems and acquisition techniques.1 Our model is basedon a pair of pin-hole cameras with horizontal parallax. Bychanging the spatial location of the respective projection cen-ters and the panning direction of the stereoscopic pair, thismodel describes a large variety of omnistereoscopic imagingsystems with a horizontal baseline.

First, we detail the general acquisition model. Then, wederive from this generic model four acquisition configura-tions that describe a variety of omnistereoscopic camerassuitable to produce horizontal stereo for human viewing.1

Finally, we apply this acquisition model to obtain expres-sions for the horizontal and vertical disparities observedwhen mosaicking stereoscopic snapshots of the scene. Weobtain the parameters of interest for each configuration usinga ray tracing approach. From these simulations, we extractconclusions that can be used in the design of omnistereo-scopic cameras for the acquisition of dynamic scenes.

One of the contribution of this paper is to provide a trac-table method for analyzing multiple camera configurationsintended for omnistereoscopic imaging. Furthermore, weprovide methods to study the acquisition constraints neces-sary to attain a continuous depth perception in all gazingdirections in azimuth. Another relevant contribution is to

*Address all correspondence to: Luis E. Gurrieri, E-mail: [email protected] 0091-3286/2014/$25.00 © 2014 SPIE and IS&T

Journal of Electronic Imaging 011004-1 Jan–Mar 2014 • Vol. 23(1)

Journal of Electronic Imaging 23(1), 011004 (Jan–Mar 2014)

http://dx.doi.org/10.1117/1.JEI.23.1.011004






provide a mathematical model for the vertical disparities thatwould affect the mosaicking process in each configuration.This work complements and extends our previous work instereoscopic panoramas acquisition2–4 by proposing a math-ematical framework to contrast different omnistereoscopicacquisition strategies.

2 Omnistereoscopic Acquisition ModelThe general acquisition model is composed of two pin-holecameras separated by a baseline distance b with respect to aglobal reference of coordinates in three-dimensional (3-D)space as illustrated in Fig. 1. This model is used to derivefour camera configurations, which can be distinguished bythe relative location of the stereoscopic pair of cameras withrespect to the reference center. In this paper, we refer to thesefour spatial variations of the acquisition model as configu-rations, and we assign them numbers from one to four.

Although this model consists of one pair of cameras, itcan model complex multiple camera configurations aswell as a single stereoscopic camera rig rotated at differentazimuth angles. In the paper,1 we review a large variety ofacquisition cameras and methods to produce omnistereo-scopic imagery suitable for human viewing that each canbe described by one of these four variations of the acquisitionmodel.

The location and orientation of this stereoscopic camerapair in the 3-D space is restricted by the need to capture twosnapshots of the same scene from two viewpoints with hori-zontal parallax. One constraint is that all the possible loca-tions for this pair of pin-hole cameras are restricted to thehorizontal XZ-plane, which is used as the reference horizon-tal plane. Another constraint is that the optical axes of thestereoscopic pair of pin-hole cameras are parallel and lieon the XZ-plane. A consequence of these constraints is thatthe orientation (panning angle) of each virtual stereoscopicrig is described by a pitch rotation around the Y axis. Thereference point O is used to describe the panning directionof the stereoscopic rig. Hence, the locations of the projectioncenters of each camera can be described by a Euclideantranslation in the XZ-plane.

2.1 Acquisition Model: ConfigurationsThe first configuration of the acquisition model we introduceis the central stereoscopic rig (configuration 1), which isillustrated in Fig. 2(a). This configuration models the sam-pling of the scene by means of a rotating stereoscopiccamera, which captures partial snapshots at regular angularintervals in azimuth. This method has been the first technique

to create omnistereoscopic images with horizontal parallax.This camera configuration consists of two cameras withco-planar projection centers, which are separated by a base-line b. The stereoscopic camera rotation is around the Y axisexclusively (pitch rotation). This configuration describesacquisition methods that have been widely used over thelast decade, whether using planar9,10 or line sensors11,12 forsequential acquisition. This configuration also models awidely used technique based on rotating an off-centeredcamera at regular angles θi, where i ∈ f0; : : : ; N − 1g andN is usually large, producing a set of overlapped stereoscopicimages of the scene.13,14 Although simple in its conception,configuration 1 cannot to be used in a parallel acquisitionconfiguration due to self-occlusion between cameras; henceit can only represent sequential omnistereoscopic acquisition.

The lateral stereoscopic rig (configuration 2) is shown inFig. 2(b). This configuration models the sequential acquis-ition of partial images of the scene by rotating the stereo-scopic camera at regular intervals θi, where i ∈ f0; : : : ;N − 1g. The main difference with configuration 1 is thatthe pitch rotation is defined around the nodal point of oneof the cameras, i.e., making the rotation axis to coincidewith the projection center of the left or right camera. In thisapproach, a singular viewpoint panorama is produced bymosaicking the images acquired by the central camera,while a second image with horizontal parallax for each θi isacquired by the lateral camera.15,16 The lateral camera (ster-eoscopic counterpart) describes a circle of radius equal to thestereo baseline b. Similar to configuration 1, this configura-tion cannot be used for the simultaneous acquisition of thewhole scene due to self-occlusion of the central camera.

The lateral-radial stereoscopic rig (configuration 3) isshown in Fig. 2(c). This configuration models a stereoscopiccamera rotated off-center, where one camera is radially dis-placed from the rotation axis inO by a distance krck, and thesecond camera is laterally displaced b perpendicularly withrespect to the direction of krck. This arrangement enables thecapture of a second snapshot of the scene with horizontalparallax for each sampling angle θi.

4 This configuration

Fig. 1 The global reference frame versus the camera frames for thestereoscopic camera composed by cameras ΩL;i and ΩR;i , which isoriented at θi in azimuth.

Fig. 2 Variations of the acquisition model: (a) central stereoscopic rig(configuration 1), (b) lateral stereoscopic rig (configuration 2), (c) lat-eral-radial stereoscopic rig (configuration 3), and (d) off-centered ster-eoscopic rig (configuration 4).


Gurrieri and Dubois: Depth consistency and vertical disparities in stereoscopic panoramas

can be derived from configuration 2 by radially displacingthe stereoscopic camera a distance krck away from the cen-tral camera nodal point. This configuration can be used tomodel a multiple-sensor arrangement since the self-occlu-sion of the central camera can be avoided.

The off-centered stereoscopic rig (configuration 4) is astereoscopic camera located at a radial distance krck fromthe geometrical center O as depicted in Fig. 2(d). This con-figuration can be derived from configuration 1 by locatingthe pitch axisO a distance krck behind the stereoscopic cam-era midpoint. This configuration models the case wheremultiple stereoscopic rigs, radially located with respect tothe pitch center O, are used to acquire partially overlappedsnapshots of the whole scene.2,17–19 In one approach, the suc-cessive images acquired by left and right cameras can bemosaicked to create left and right eye panoramas: ðIL; IRÞ.20

2.2 Practical Acquisition ApproachAcquiring the scene column-wise using line cameras21 orthrough extracting narrow image columns by back-projec-ting two distinct viewpoints13,22 can produce stereoscopicpanoramas for human viewing. However, their main down-fall is their sequential characteristic, which limits them tostatic scenes. The stereoscopic images rendered by directlymosaicking multiple columns with horizontal parallax arecorrect only in a limited region of interest,23 which is locatedat the center of the image, no matter the gazing direction.This is acceptable since the peripheral vision is not usedby the mechanisms of stereo fusion, but adaptation to differ-ent display technologies is necessary.

An attractive alternative is to capture a limited number ofstereoscopic snapshots of the scene and mosaic them,4,6,8,24

e.g., five to eight stereo images will be enough to cover thewhole scene in azimuth. Partially overlapped snapshots canbe acquired sequentially or simultaneously. The latter opensthe possibility to acquire omnistereoscopic images and vid-eos for dynamic scenes, bypassing the limitations of linesequential techniques. Mosaicking can produce a continuousand consistent binocular illusion of depth in all gazing direc-tions around the sampling point. However, to achieve thisideal illusion of depth, the camera system has to be carefullydesigned.

2.3 ModelA projective pin-hole camera is a simple yet powerfulapproach to model each configuration of the acquisitionmodel presented in the previous section. The location of eachprojection center can be specified for a singular stereoscopiccamera rotating around a common vertical axis (configura-tions 1 and 2) and for multiple stereoscopic pairs with acommon symmetry axis (configurations 3 and 4).

In this paper, we use the subindex j (j ∈ fL; Rg) to referto the left (L) or right (R) cameras in a stereoscopic camerapair. The subindex i refers to one of N consecutive gazingdirections in azimuth (i ∈ f0; : : : ; N − 1g) defined as

θi ¼i · 360 deg

N: (1)

It follows that θN ¼ θ0. In summary, the subindex ðj; iÞindicates the camera j (j ∈ fL;Rg) from a stereoscopicpair of cameras, which is oriented at θi in azimuth

(i ∈ f0; : : : ; N − 1g) with respect to the global referenceof coordinates.

Note that each of the four configurations presented inFig. 2 can be modeled by a set of single stereoscopic rigswith different gazing directions or by a singular stereoscopiccamera sequentially rotated at different θi angles. Using thisabstraction, all configurations can be described as a singlestereoscopic after defining the spatial location of left andright cameras and the set of all possible gazing directionsfor such a virtual stereoscopic camera.

2.3.1 World and camera frame of coordinates

The reference world frame of coordinates is referred as XYZand its spatial location differs for each configuration. A pointin the scene is written as PW ¼ ðXW; YW; ZWÞT , where XW ,YW , and ZW are scalar numbers.

Each camera has its own local frame of coordinates,which we refer as Xj;iYj;iZj;i. Similar to the world referenceof coordinates, a point in the camera coordinates frame isdenoted as Pj;i ¼ ðXj;i; Yj;iZj;iÞT . The global (reference) andlocal (cameras) coordinate frames are illustrated in Fig. 1 forconfiguration 1.

2.3.2 Cameras and stereoscopic rigs

In our acquisition model, we identify each camera with thenotation Ωj;i, where j ∈ fL; Rg and i ∈ f0; : : : ; N − 1g. Acamera Ωj;i has a projective center, which we refer tousing its spatial location Oj;i. The individual cameras aregrouped in stereoscopic camera pairs ðΩL;i;ΩR;iÞ, where i ∈f0; : : : ; N − 1g defines its orientation angle in azimuthaccording to Eq. (1) as illustrated in Fig. 2 for eachconfiguration.

The distinction between left and right does not necessarilycorrespond to the relative spatial location of each camera inworld coordinates, but it is used here to distinguish eachcamera in a stereoscopic pair. In order to eliminate ambigu-ities, we label the cameras according to their position in thereference of coordinates XYZ when θ0 ¼ 0 deg, e.g., whenthe baseline vector b is parallel to X. Using this convention,the camera ΩL;i is the one whose projection center OL;i iscentered at X ≤ 0, and conversely, the right camera, referredas ΩR;i, is the one whose projection center OR;i is centered atX > 0. This labeling scheme is exemplified in Fig. 1. Afterlabeling them for θ0, each camera retains its left or right forall θi in the acquisition sequence.

2.3.3 Location and orientation of each camera

Two planes are defined by Xj;iZj;i in a stereoscopic camerawith orientation θi. These two planes are coincident with theXZ-plane. Hence, all projection centers Oj;i are located onthe same reference plane.

The camera frame and its projection center are co-located.The Yj;i-axes are parallel to the Y axis. The optical axis ofcamera Ωj;i is aligned with the its corresponding Zj;i axis.Furthermore, each Zj;i axis is perpendicular to the vectorb ¼ OL;i −OR;i, which is defined by each stereoscopic cam-era. Therefore, the optical axes of each camera pair are par-allel. Finally, the image plane is parallel toXY-plane and it islocated at Zj;i ¼ f as shown in Fig. 1.



2.3.4 Stereoscopic image pairs

A stereoscopic pair of images is denoted ðimL;i; imR;iÞ,where imL;i and imR;i are images, respectively, acquired bya stereoscopic camera ðΩL;i;ΩR;iÞ, whose orientation is θiwith respect to the global reference.

The two-dimensional (2-D) reference of coordinates ineach image is located at the symmetry center of eachimage as shown in Fig. 1. A coordinate point on eachimage plane is denoted pj;i ¼ ðxj;i; yj;iÞT , where the subin-dex ðj; iÞ has the meaning given in Sec. 2.3. The image win-dow is a rectangular subregion of the image plane, whosecenter intersects the camera optical axis Zj;i and whose areais defined by its horizontal width (Wh) and its aspectratio (ar).

2.3.5 Acquisition and rendering

The omnistereoscopic strategy is based on acquiring Npartially overlapped stereoscopic images. The set of stereo-scopic snapshots can be acquired by rotating a single stereo-scopic camera in increments of Δθ degrees in azimuth(configurations 1 and 2), or by acquiring all the snapshotssimultaneously by using multiple stereoscopic cameras (con-figurations 3 and 4).

The subset of images imL;i acquired by camera ΩL;i ismosaicked to render a left-eye view of the scene IL. Thesame is done with the set of images imR;i acquired by cameraΩR;i to generate the right-eye view IR. The pair of panoramasðIL; IRÞ defines an omnistereoscopic image IS.

2.3.6 Stitching and blending

Each image imj;i will be aligned and stitched with the pre-vious imj;i−1 and the next image imj;iþ1 in the sequence. Thealigning and stitching is done at �xb from the center in thehorizontal dimension as shown in Fig. 3(a).

The camera FOV (Δ) is defined as the angle of view in themain diagonal of the image window. In our acquisitionmodel, we use the camera FOV in the horizontal dimension(Δa) since it is better suited to our analysis.

The region of each image defined for xj;i ∈ ½−xb; xb� usedin the mosaicking is a fraction of the image width (αWh) asillustrated in Fig. 3. The parameter α is given by

α ¼ 2fWh

tan

�Δθ

2

�; (2)

where Δθ is the sampling angle in azimuth, which is definedas

Δθ ¼360 deg

N: (3)

Note that Δθ < Δa given the partial overlapping require-ment between stereoscopic samples i and iþ 1.

In each image, the stitching point is symmetrically locateda distance xb with respect to the image center in the horizon-tal dimension. The distance xb is given by

xb ¼ f tan

�Δθ

2

�; (4)

where f is the focal length of the camera, which is defined as

f ¼ Wh

2 tan�Δa2

� : (5)

The stitching positions xj;i ¼ �xb are in the horizontalmiddle of the overlapping regions for any Δa and Δθ.The geometric depiction of these magnitudes is presentedin Fig. 3(b).

2.3.7 Overlapping regions

The overlapping regions between any two neighbor imagesin a set can be defined by the image regions where the samescene is simultaneously projected in both images. In ouracquisition model, two overlapping regions can be definedin any image. These regions are located in

−Wh

2≤ xj;i ≤ −2xb þ

Wh

2; (6)

2xb −Wh

2≤ xj;i ≤

Wh

2: (7)

The overlapping regions spreadWh − 2xb from each hori-zontal edge of the image toward its center. However, not allthis region is used for blending: only a region defined by fewpixels width around xb is used.

2.3.8 Blending regions

The horizontal stitching coordinate may be different from afixed xb when using any optimal cut algorithm.25,26 For in-stance, in a recently proposed method based on using graphcuts to find the optimal stitching position,27 the authors claimto produce consistent mosaics of stereoscopic images. Theirmethod is based on minimizing an energy function specifi-cally designed to account for the depth continuity followedby a warping transformation to smooth disparity transitionsbetween mosaics. Consequently, depending on the stitchingtechnique used, xb provides a reference for the stitchingposition search. The actual stitching position can be a pointps ¼ ðxs; yÞ, where xs has different values for each imagerow.

However, we use a constant stitching coordinate xb and ablending region of sizeΔb around the stitching coordinate. Inother words, the blending region is defined by xj;i ∈ ½−xb −ðΔb∕2Þ;−xb þ ðΔb∕2Þ� and xj;i∈½xb−ðΔb∕2Þ;xbþðΔb∕2Þ�,for all image rows yj;i. This is sufficient to model the designparameters for the different configurations.

Fig. 3 The parameters for each image: (a) the frame of coordinateslocation on the image plane and (b) the field-of-view parameters andtheir relationship with f and Wh .



2.4 From Global to Camera CoordinatesThe transformation to represent a point PW in each cameraframe of coordinates can be defined as

Pj;i ¼ Ri · ðPW − Tj;iÞ; (8)

where Ri is the rotation matrix defined by θi and Tj;i definesthe location of Oj;i in the global frame.

The rotation matrix is defined by

Ri ¼

0B@

cos θi 0 sin θi0 1 0

− sin θi 0 cos θi

1CA: (9)

The generic notations to refer to the translation vectors are

TL;i ¼ ðtL;1; tL;2; tL;3ÞT; (10)

TR;i ¼ ðtR;1; tR;2; tR;3ÞT; (11)

where tj;k, for j ∈ fL; Rg and k ∈ f1; 2; 3g, are the transla-tion components in the global frame of coordinates for Oj;i.The translation vectors for each configuration are defined inSecs. 2.4.1 to 2.4.4.

2.4.1 Configuration 1

TL;i ¼�−b2cos θi; 0;−

b2sin θi

�T; (12)

TR;i ¼�b2cos θi; 0;

b2sin θi

�T; (13)

where b ¼ kbk. This is shown in Fig. 4(a).


TL;i ¼ ð0; 0; 0ÞT; (14)

TR;i ¼ ðb cos θi; 0; b sin θiÞT: (15)

This is shown in Fig. 4(b).


TL;i ¼ ð−krck sin θi; 0; krck cos θiÞT; (16)

TR;i ¼ ðkrok cos βi; 0; krok sin βiÞT; (17)

where βi ¼ arctanðkrckb Þ þ θi and krok ¼ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffikrck2 þ b2

p. This

is shown in Fig. 4(c).


TL;i ¼ ð−krok sin βi; 0; krok cos βiÞT; (18)

where βi ¼ θi þ arctan�

b2krck

�and krok ¼

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffikrck2 þ ðb

2Þ2

q.

TR;i ¼ ðkrok sin αi; 0; krok cos αiÞT; (19)

where αi ¼ arctan�

b2krck

�− θi. This is shown in Fig. 4(d).

Alternatively, the transformation of coordinates in Eq. (8)can be written as

Pj;i ¼ Ri · PW − Tj; (20)

where

Tj ¼ Ri · Tj;i: (21)

Here Tj is the camera translation with respect to O of theleft or right camera frame of coordinates. By this formu-lation, TR ¼ TL þ ðb; 0; 0ÞT for all configurations.

2.5 Projecting the Scene on Each CameraThe perspective transformation to map a scene point PW intothe frame of coordinates of each camera Ωj;i was defined byEq. (8). Furthermore, any point PW in the FOVof both cam-eras of a stereoscopic camera can be projected into eachimage as a point with coordinates pj;i ¼ ðxj;i; yj;iÞT . The2-D image coordinates in each image are given by the pro-jective equations 28

xj;i ¼ −fR1;iðPw − Tj;iÞR3;iðPw − Tj;iÞ

; (22)

yj;i ¼ −fR2;iðPw − Tj;iÞR3;iðPw − Tj;iÞ

; (23)

Fig. 4 The geometric relationships for each configuration.



where Rk;i for k ∈ f1; 2; 3g is a row vector formed by thek’th row of Ri. Without losing generality, we assume zerobias in image center and a unit pixel size.

3 Acquisition ConstraintsOne important requirement arising in omnistereoscopic im-aging is how to provide a continuous and consistent depthillusion in all directions. For instance, the acquisition ofmultiple stereoscopic snapshots of the scene for mosaickingwill be correct only at the center of each mosaic, which coin-cides with the θi orientation of the stereoscopic camera at themoment of acquisition. The reproduced stereoscopic view inany intermediate gazing directions between θi and θiþ1 issubject to distortion.6,23

3.1 Omnistereoscopic FOVThe FOV of a single camera Ωj;i is defined by all possiblerays of light that pass through the projection centerOi;j whichsimultaneously intersect its image window (Sec. 2.3.4). TheFOV defines the region of space in front of camera that canbe acquired. This definition is valid for the geometricapproach used here. However, the minimum distance to thescene can be several times f in front of the camera using reallenses. In Fig. 5, we present the different visibility scenariosusing various locations in the XZ-plane. The point P1 in thisexample is in the FOV of camera ΩR;i only.

Also in Fig. 5, the point P2 is in the stereoscopic FOVofthe stereoscopic camera ðΩL;i;ΩR;iÞ. This stereoscopic FOVis defined by the intersection of the FOVs of each camera inthe stereoscopic pair. The same point P2 is in the FOV ofcamera ΩR;iþ1, but not in the stereoscopic FOV ofðΩL;iþ1;ΩR;iþ1Þ. Conversely, a third point P3, which is atthe same distance fromO as point P2, is located in the stereo-scopic FOVof ðΩL;iþ1;ΩR;iþ1Þ, but is not in the stereoscopicFOV of ðΩL;i;ΩR;iÞ.

The distance from the camera that marks the intersectionof the stereoscopic FOV of two neighbor stereoscopic cam-eras (i and iþ 1) defines the stereoscopic FOVof the pano-ramic camera. Any point in the scene at this distance fromthe omnistereoscopic camera can potentially be imaged in IS.In this example, P4 belongs to the intersection of two neigh-bor stereoscopic FOVs, which defines the overlapping regionin space. A point in this overlapping region will be imagedsimultaneously by cameras ðΩL;i;ΩR;iÞ and ðΩL;iþ1;ΩR;iþ1Þ.Any point in the overlapping region can be used for stitchingneighbor stereoscopic images.

The stereoscopic FOV of the panoramic camera does notdefine the minimum distance between the omnistereoscopiccamera and the scene. In some configurations, a point in theoverlapping of stereoscopic FOVs may be registered withdifferent horizontal disparities in neighbor stereoscopic sam-ples. This is especially true in configurations with asymmet-rical camera locations with respect to the virtual rotationaxis, such as configurations 2 and 3. This affects the per-ceived depth illusion after mosaicking the partial images.

However, a continuous depth illusion can be maintainedas long as the difference between horizontal disparities isbelow the human threshold for depth perception around thestitching position. The perceived depth is directly relatedto the horizontal disparities in each stereoscopic sample.Since the difference between the horizontal disparities ofthe same scene in different samples decreases with the dis-tance between the camera and the scene, there is a minimumtolerable distance for each configuration after which a differ-ence in the registered depth is below the human perceptualthreshold.

Hence, we define the omnistereoscopic FOVas the spheri-cal surface centered at O, with a radius rmin, that marks theminimum distance from the omnistereoscopic camera afterwhich a perceptually continuous depth illusion around theacquisition point can be reproduced. Any point PW locatedat a distance kPWk ≥ rmin can be imaged by either two orfour cameras, maintaining a consistent illusion of depthbetween stereoscopic samples. The stereoscopic FOV ofthe panoramic camera and the omnistereoscopic FOV areshown in Fig. 5.

3.2 DisparitiesA point in the scene PW is projected by the stereoscopiccamera ðΩL;i;ΩR;iÞ into the image coordinates pL;i ¼ðxL;i; yL;iÞT and pR;i ¼ ðxR;i; yR;iÞT in imL;i and imR;i, respec-tively. In this context, we define the horizontal (dhi) and ver-tical (dvi) disparities for the stereoscopic image pair i as

dhi ¼ xR;i − xL;i; (24)

dvi ¼ yR;i − yL;i: (25)

3.3 Horizontal Disparity EquationsThe coordinates of point in the scene PW after applying theprojective transformation defined in Eqs. (22) and (23) are

xL;i ¼ −fcos θiðXw − tL;1Þ þ sin θiðZw − tL;3Þ− sin θiðXw − tL;1Þ þ cos θiðZw − tL;3Þ

¼ −fXL;i

ZL;i; (26)

xR;i ¼ −fcos θiðXw − tR;1Þ þ sin θiðZw − tR;3Þ− sin θiðXw − tR;1Þ þ cos θiðZw − tR;3Þ

¼ −fXR;i

ZR;i; (27)

where ðXj;i; Yj;i; Zj;iÞT are the coordinates of a point PW ineach camera frame, and tj;k (k ∈ f1; 2; 3g) are componentsof the translation vectors defined in Eqs. (10) to (19).Fig. 5 The continuous depth perception for all gazing directions.



The image plane coordinates (xL;i and xR;i) can be used toexpand dhi and dhiþ1 [Eq. (24)] as

dhi ¼ −f�XR;iZL;i − XL;iZR;i

ZR;iZL;i

�; (28)

dhiþ1 ¼ −f�XR;iþ1ZL;iþ1 − XL;iþ1ZR;iþ1

ZR;iþ1ZL;iþ1

�: (29)

Note that since ZR;i ¼ ZL;i ¼ Zi and XR;i ¼ XL;i þ bholds for all i, the disparities in Eq. (28) can be simplifiedto

dhi ¼ −f b�1

Zi

�; (30)

dhiþ1 ¼ −f b�

1

Ziþ1

�: (31)

3.4 Horizontal Disparity ErrorThe difference in the depth estimation from the perspectiveof two neighbor stereoscopic samples is the parameter wedefine to study the depth continuity. This depth estimationis the distance of PW in the Zi axis when imaged by stereo-scopic cameras at θi and θiþ1. Such depth estimationðZi;Ziþ1Þ is related to the horizontal disparities ðdhi; dhiþ1Þ.

The mosaicking of overlapped stereoscopic imagesrequires the registration and alignment of neighbor images.However, this prerequisite is independent of the calculationðZi;Ziþ1Þ. Furthermore, warping the stereoscopic imagesonto a cylinder (or a topologically equivalent surface) cen-tered at O, the horizontal angular disparity defined over thecurved surface has to be consistent with di and diþ1 to con-vey the same illusion of depth captured from each stereo-scopic camera.

After warping the stereoscopic images onto a curved sur-face centered at O, the angular disparity in azimuth directiondefined by corresponding left and right projections of PWover any display has to produce the same illusion ofdepth to the viewer than the originally captured stereoscopicimages, which is determined by dhi and dhiþ1 in a planarimage. Hence, consideration of the similarity between Ziand Ziþ1 for each acquisition configuration is a validapproach to study the depth continuity.

The horizontal disparity in the vicinity of the horizontalstitching coordinate xj;i ¼ �xb should remain below a toler-able error. In order to quantize the difference between regis-tered horizontal disparities in neighbor stereoscopic samples,we define the horizontal disparity error (eh) as

eh ¼ jdhiþ1 − dhij: (32)

The region of stitching and blending is critical since it isthe region where the artifacts caused by the parallax betweenindividual cameras will appear. It is also the most critical areain terms of defining the omnistereoscopic FOV. Hence, it isimportant to know what is the closest distance to the scene to

guarantee the continuity of the horizontal disparity amongmosaicked stereoscopic snapshots.

The closed form of eh is

eh ¼ f

��XR;iþ1ZL;iþ1 − XL;iþ1ZR;iþ1

ZR;iþ1ZL;iþ1

−XR;iZL;i − XL;iZR;i

ZR;iZL;i

��¼ f b

�� 1

Ziþ1

−1

Zi

��:(33)

As defined, eh is a function of PW after being transformedinto each camera frame of coordinates. Hence, in the analy-sis, we are using ehðPWÞ to express the dependency on thehorizontal disparity error of the depth of the point in scene.

The depth of PW can be written in terms of camerasðΩL;0;ΩR;0Þ and ðΩL;Δθ

;ΩR;ΔθÞ as follows:

Zi ¼ Zw − tR;3; (34)

Ziþ1 ¼ − sin ΔθðXw − tR;1Þ þ cos ΔθðZw − tR;3Þ; (35)

where tR;1 and tR;3 can be obtained from Eqs. (12) to (19) byreplacing θiþ1 by Δθ.

3.5 Depth ResolutionThe threshold for eh is a distance ε, which is related to thedepth resolution among humans. The threshold ε definesthe tolerable horizontal difference between horizontal dispar-ities of the same point PW projected in two consecutivelyacquired stereoscopic images. Each configuration must satisfy

eh ≤ ε (36)

in the region of overlapping (Sec. 3.1). The value of ε can beestimated by assuming that the depth resolution of each ster-eoscopic camera is at least equal to the average depth resolu-tion in humans.

The perceptual depth resolution in the average adult pop-ulation (dZh) can be approximated by29

dZh ¼z2δθde

; (37)

where δθ is the vergence acuity in humans, de (20 arcsec) isthe average interocular distance in adults (65 mm), and Zc isthe distance from the reference system defined on the stereo-scopic camera. A diagram that helps to understand theseparameters is shown in Fig. 6(a).

The depth resolution (dZc) of a stereoscopic camera canbe approximated by30

dZc ¼z2εf b

: (38)

In Fig. 6(b), we present an illustration that helps under-stand dZc.

The goal is to have dZc ≤ dZh for a camera to surpass orequal the human depth resolution. The largest threshold isdefined by dZc ¼ dZh; hence, from Eqs. (37) and (38),the disparity threshold is



ε ¼ f bδθde

: (39)

The larger the product f b, the larger the horizontal dis-parity threshold. The latter expression can be written as

ε ¼�

Whb

2 tan Δa2

�δθde

: (40)

The depth accuracy of human vision leads to a very strin-gent restriction for a stereoscopic camera system.31 Asan example, the sensor width of Nikon 800D is Wh ¼35.8 10−3 m, which requires a horizontal disparity resolutionof ε ¼ 0.19 μm in order to be comparable with the humandepth resolution. However, the pixel size of this camera is4.8 μm, which is approximately 25 times larger than ε.Hence, the achievable horizontal resolution of a stereoscopiccamera using this sensor is well below the depth acuity of thehuman vision.

Consequently, in a practical scenario, the maximum hori-zontal resolution of the camera, e.g., one pixel width (s)defines the ε threshold. This threshold has been proven tobe sufficient in our rendering experiments.3,4

4 Minimum Distance to the SceneThe problem of finding the minimum distance to the scene(rmin) that defines the omnistereoscopic FOV requires us tofind the points in the scene located in the intersection of ster-eoscopic FOVs of cameras ðΩL;i;ΩR;iÞ and ðΩL;iþ1;ΩR;iþ1Þ,constrained to eh ≤ ε. This is done for the neighborhood ofthe stitching coordinate as explained in Sec. 3.5.

This approach requires only two consecutively acquiredstereoscopic images, e.g., the samples i ¼ 0 (θ0 ¼ 0 deg)and i ¼ 1 (θ1 ¼ Δθ). The search can be restricted to theXZ-plane for simplicity and without losing generality.Hence, we are going to assume that the point PW is locatedon the XZ-plane. This is illustrated in Fig. 7.

First, we propose to define a ray between the point PWand OR;0. This ray (LR;0) intersects the image window ofcamera ΩR;0 in the coordinates pR;0 ¼ ð−xb; 0ÞT as shownin Fig. 7. The ray LR;0 is modeled as

LR;0 ¼ a0 þ trða1 − a0Þ; (41)

where a0 is the location of OR;0, a1 is the intersection of theray with the image window of cameraΩR;0, both expressed in

terms of the global reference frame XYZ, and tr ∈ R, wheretr ≥ 1. The parameters for LR;i are given in Table 1 for allconfigurations.

The points in the scene PW, which are in the line LR;0, areprojected into the same coordinate point xR;0 ¼ −xb andtherefore it is used as reference. The projections of PW inthe other three image planes are xL;0, xR;1, and xL;1, sinceyj;0 ¼ yj;1 ¼ 0. These projection points are used to calculatethe horizontal disparities dh0 and dh1 and eh for eachpoint PW ∈ LR;0.

The analysis starts by defining a point P0 ∈ ρLR;0 locatedinside the intersection between stereoscopic FOVof cameras(ΩL;0, ΩR, 0) and (ΩL;1, ΩR, 1), i.e., the stereoscopic FOVofthe panoramic camera (Sec. 3.1). This can be granted by cal-culating the projection of P0 in the other three image planes.This projection must satisfy xj;i ∈ ½−Wh∕2;Wh∕2�.

If ehðP0Þ > ε, then the points PW ∈ LR;0, such thatkPWk > kP0k must be evaluated. The point P1 ∈ LR;0that ehðP0Þ ≤ ε defines the minimum distance for omnister-eoscopic rendering as rmin ¼ kP1k.

The error in the horizontal disparity eh is a monotonicallydecreasing function of the distance between the scene andthe camera (kPWk). Therefore, the search for a pointPW ∈ LR;0, such that kPWk > kP0k will converge to rmin.Any point in the scene whose distance from the camera islarger than rmin will be projected in both stereoscopic

Fig. 6 Depth resolution scheme: (a) human eyes and (b) stereoscopiccamera.

Fig. 7 Ray tracing method to find the minimum distance rmin: the raypassing throughOR;0 and xb defines the points on the XZ-plane to findthe minimum distance for which eh ≤ ε.

Table 1 Parameters to define LR;0.

Configuration a0 a1

1 ðb∕2;0;0Þ ð−xb þ b∕2;0; f Þ

2 ðb;0;0Þ ð−xb þ b;0; f Þ

3 ðb;0; krckÞ ð−xb þ b;0; f þ krckÞ

4 ðb∕2;0; krckÞ ð−xb þ b∕2;0; f þ krckÞ



cameras with an error in the horizontal disparity below theperceptual threshold.

This approach to calculate rmin, which defines the omnis-tereoscopic FOV, can be applied to all the acquisition models.

A comparison between acquisition models in terms of theachievable minimum distance to the scene for a consistentdepth rendition is presented in Sec. 6.

5 Vertical DisparitiesThere are two situations where vertical disparities need to bemodeled: the first involves the vertical disparities that appearwithin stereoscopic images ðimL;i; imR;iÞ, and the other caseinvolves the vertical disparities between left and right imagesto be stitched [ðimL;i; imL;iþ1Þ and ðimR;i; imR;iþ1Þ].

The vertical disparities within stereoscopic samples arenegligible unless there are misalignments between stereo-scopic pairs of cameras, e.g., when the optical axes of thecameras are not parallel. Even in that case, the vertical dis-parities can be eliminated using stereoscopic registrationtechniques.

The vertical disparities between consecutive samplesappear within the overlapping regions between neighborimages because of the parallax between contiguous cameras.These disparities may lead to image artifacts, such as ghost-ing after mosaicking, if not corrected. The amount and varia-tion of these vertical disparities varies with the gazing angle(ϕ) in elevation.

5.1 Vertical Disparity EquationsThe coordinates of a point PW after applying the projectivetransformation defined in Eqs. (22) and (23) are

yL;i ¼ −fYw

− sin θiðXw − tL;1Þ þ cos θiðZw − tL;3Þ

¼ −fYL;i

ZL;i; (42)

yR;i ¼ −fYw

− sin θiðXw − tR;1Þ þ cos θiðZw − tR;3Þ

¼ −fYR;i

ZR;i; (43)

where ðXj;i; Yj;i; Zj;iÞT are the coordinates of a point PW ineach camera frame and tk (k ∈ f1; 2; 3g) are components ofthe translation vectors defined in Eqs. (10) to (19). Since theXZ-plane contains all the projection centers and optical axes,we can say that Yj;i ¼ YW for all the cameras [Eqs. (42)and (43)].

The vertical disparity expression within stereoscopicimages ðimL;i; imR;iÞ is given by

dvi ¼ yR;i − yL;i ¼ −fYW

�ZL;i − ZR;i

ZR;iZL;i

�: (44)

5.2 Vertical Disparities Outside the OverlappingRegion

Any point PW in the stereoscopic FOVof ðimL;i; imR;iÞ willhave projections ZL;i ¼ ZR;i. Therefore, dvi ¼ 0 for all YW

[Eq. (44)]. This is valid outside the overlapping regions ofevery stereoscopic image to mosaic (Sec. 2.3.7).

In a real scenario, cameras will exhibit slight variationsfrom the ideal case, e.g., the optical axes will not be perfectlyparallel and their projection center might not lie on the refer-ence plane. Hence, ZL;i ≠ ZR;i and dvi ≠ 0 for YW ≠ 0. Inthese cases, pairwise camera calibration followed by stereo-scopic image registration will help to reduce or to eliminateundesired vertical disparities before mosaicking.

5.3 Vertical Disparities Within the OverlappingRegion

The parallax between projection centers produces un-wanted vertical disparities in the overlapping regions. Themeasure of the vertical disparities between neighbor imagesðimj;i; imj;iþ1Þ is given by

dvj;i ¼ yj;iþ1 − yj;i ¼ −fYw

�Zj;i − Zj;iþ1

Zj;iZj;iþ1

�¼ −fYwϑj;

(45)

where

ϑj ¼�Zj;i − Zj;iþ1

Zj;iZj;iþ1

�

for j ∈ fL; Rg and i ∈ f0; : : : ; N1g. As defined, this dispar-ity measures the vertical component error between the pro-jections of PW in ðimL;i; imL;iþ1Þ or in ðimR;i; imR;iþ1Þ.

The vertical disparities in the overlapping regionsbetween images are null for YW ¼ 0. However, dvj;i ≠ 0for gazing directions above and below the horizontal refer-ence plane. Therefore, as can be seen from Eq. (45), the ver-tical disparities increase with the parameters ϑj in theoverlapping region.

The ϑj ≠ 0 explains the appearance of vertical disparitieswhile mosaicking stereoscopic snapshots originating in cam-eras with distinct projection centers.3,8 This occurs because apoint PW has different projections on each camera, i.e.,Zj;i ≠ Zj;iþ1. This unwanted effect diminishes with the dis-tance between the scene and the omnistereoscopic camerareference center.

The vertical disparities affect the stitching of neighborimages and are manifested as ghosting after the blendingprocess. This ghosting will affect the stereoscopic resultonly when it is not corrected before stitching and blending,e.g., by local registration and warping images,3 and it will berestricted to the overlapping regions.

A comparison between the different configurations interms of vertical disparities is presented in Sec. 6.6.

6 ResultsThe goal of our simulations is to contrast the four configu-rations in order to identify the characteristics useful toimprove the design of the acquisition system. We used thenative horizontal resolution of each camera given by thepixel size as the horizontal disparity threshold, i.e., ε ¼ s.We used real values extracted from the specifications ofthree off-the-shelf cameras: one APS-C sensor (Canon400D) and two full-frame sensors (Nikon 800D and CanonEOS6D). Specifications of each camera are presented inTable 2. In terms of the lenses, we used three generic lenses



with horizontal FOV Δa ∈ f122 deg; 100 deg; 76 degg.We chose this particular set of lenses only to illustrate theeffect of changing the focal length [Eq. (5)] in the minimumdistance to the scene.

The parameters of interest are rmin, which depends on eh(Sec. 3.4), and the vertical disparities within the overlappingregions, which depends on ϑj (j ¼ fL; Rg) (Sec. 5.3).

6.1 Horizontal Disparity ContinuityIn order to find the distance rmin, we first calculated ehbetween two neighbor stereoscopic samples to mosaic. Thiswas done following the procedure described in Sec. 4. Inbrief, we defined a ray LR;0 on the XZ-plane, using thestitching position with horizontal coordinates xR;0 ¼ −xbin imR;0 and the projection center OR;0. Then, we computedeh for the points PW ∈ LR;0, starting at a minimum distancefrom the camera kr0k ¼ 0.3 m. We calculated eh usingEq. (32) for all xj;0 and xj;1 within the overlapping region(Sec. 2.3.8). The depth continuity and vertical disparitieswere calculated for the overlapping region extending�0.05Wh around xb in the horizontal dimension. The vari-able Δx is the deviation from xb in the horizontal coordinate.

In the first simulation, we compared rmin for the four con-figurations using different numbers of stereoscopic samplesN (different Δθ), for b ¼ 35 mm (configurations 1 and 2)and the same radial distance krck ¼ b (configurations 3 and4). We used the same f ¼ 9.3 mm for three cases, whichcorresponds approximately to Δa ¼ 100 deg. We used thesensor size of a Canon 400D (APS-C sensor) in landscapeorientation. This simulation gives information about rmin as afunction of the horizontal stitching coordinate XR;1 ¼ xbþΔx. The results of this simulation are presented in Fig. 8.

The critical minimum distance ðϱminÞ is defined by thelargest value of rmin within blending width Δb, which is afew pixels width around xb (Sec. 2.3.8). In other words,ϱmin is the practical minimum distance between the omnis-tereoscopic camera and the scene for a given set of acquis-ition parameters. Based on the simulation results shown inFig. 8, the values of ϱmin for each configuration are presentedin Table 3 for xb (Δx ¼ 0) and a given blending regionΔb ¼ 10 ⋅ s. This table provides a numerical example toshow how the minimum achievable distance is reduced byincreasing the number of stereoscopic samples for a given f.

The best performance in terms of ϱmin is achieved by con-figuration 1 followed by configuration 2, both modelingacquisition methods that are not suitable for acquiringdynamic scenes. The ϱmin increases for configurations 3 and4, with configuration 3 being the worst in terms of allowedminimum distance to the camera. Hence, the price to pay forhaving krck ≠ 0 is to increase the achievable minimum dis-tance to the camera.

A small ϱmin is necessary to render omnistereoscopicimages of indoor scenes where objects may be closer tothe camera than in outdoor scenes. This can be achieved byincreasing the number of samples N, but that option impliesadding more cameras in configuration 3 or 4, or taking moresamples in configurations 1 and 2, which leads to increasingthe complexity and cost of the omnistereoscopic system.Another option is changing the focal length and the numberof samples to get a compromise between cost and efficiency.The results of doing this is simulated in Sec. 6.4.

Another useful conclusion from this simulation is that theselection of stitching position xb, as defined by Eq. (4), canbe optimized for each configuration. For instance, a stitchingpoint shifted Δx ¼ −0.05Wh with respect to xb leads to aneffective minimum distance in the range of 4.5 to 5.2 minstead of 1.9 to 1.2 m as it would be stitching at xb.Hence, the acquisition model shows that the stitching posi-tion xb has a larger impact than the acquisition model used.

Probably, the most important conclusion is that there isnot much difference between different configurations interms of the parameter ϱmin. For instance, it can be seenfrom Table 3 that ϱmin varies a few tens of centimetersbetween configurations, which means that all variations ofthe acquisition model performs similarly in real acquisitionscenarios.

Table 2 Image sensor specifications: sensor width Wh , aspect ratioar , pixel width s.

Camera Wh ar s

Canon 400D 22.210−3 m 1.5 5.7110−6 m

Nikon 800D 35.910−3 m 1.5 4.8810−6 m

Canon EOS6D 35.810−3 m 1.5 6.5510−6 m

Fig. 8 The minimum distance for stereoscopic rendering rmin as a function of the horizontal bias Δx fromstitching coordinate xb (XR;1 ¼ xb þ Δx ), for configurations 1 to 4 and for different number of stereo-scopic samples N [Canon 400D, f ¼ 9.3 mm (Δa ¼ 100 deg), b ¼ 35 mm, and krck ¼ b].



6.2 Effect of Radial DisplacementThe radial displacement krck is relevant for configurations 3and 4 to model the physical limitations in a multiple-camerasystem to co-locate the projection centers. The smaller krck,the more similar are configurations 3 and 2, and similarly, themore similar configuration 4 is to configuration 1. In order tosee this effect, we recalculate rmin for the case presented inFig. 8(b) for krck ¼ 0, krck ¼ b∕2, and krck ¼ 2b. Theresults are presented in Fig. 9. Notice that configuration 1coincides with configuration 4 and configuration 2 coincideswith configuration 3 for krck ¼ 0 [Fig. 9(a)].

The larger krck, the larger is ϱmin; in other words, thelarger is the distance between the camera and the scene.A possible acquisition system is using configuration 4 forkrck ≤ b, but interleaving a number of stereoscopic rigsin order to reduce ϱmin. Another method to reduce krck isusing mirrors to relocate each camera projection centercloser to O. However, the latter may require a smaller Δa(larger f) and more camera pairs, all of which increasethe complexity and cost of the acquisition system. A bettersolution to reduce ϱmin is selecting a bias in the stitchingpoint, in this case, jx 0

bj > jxbj.

6.3 Optimum Stitching PositionThe parameter Δθ represents the relative azimuthal anglebetween two consecutive stereoscopic samples. In addition,each Δθ determines a shift in the horizontal coordinates forthe optimum stitching position xb. The optimality of this newstitching position is determined by the closest rmin to thecamera as shown in Fig. 8.

The larger the number of stereoscopic samples to mosaic,the smaller is Δθ, and the optimal stitching coordinate moves

closer to the center on each image to mosaic. Hence, theposition xb can be corrected in rendering time to avoid dis-continuities in horizontal disparities in gazing directionswhere the scene is too close to the omnistereoscopic camera.Hence, there is no need to use the same relative position xbfor the stitching; a different distance to the scene can deter-mine a different stitching position for each stereoscopic pairdefined for each configuration.

The results for rmin shown in Fig. 8 are presented as afunction of coordinate xR;1 instead of around a determinedstitching coordinate xb as were presented in Fig. 8.This helps to illustrate the different locations, relative tothe image center, of the optimum stitching point whenchanging Δθ.

6.4 Effect of Focal LengthUsing wide-angle lenses, i.e., larger Δa (shorter f), reducesthe number of stereoscopic samples. However, a larger Δaintroduces distortions at the edges of the stereoscopicimages, especially for large baselines. Despite this disadvant-age, a slight improvement in the effective minimum distancecan be achieved by reducing f. We repeated the calculationrmin presented in Sec. 6.1, but this time we used a smallerfocal length (f ¼ 6.1 mm, Δa ¼ 122 deg). The results pre-sented in Fig. 10 show a small but measurable reduction ofϱmin with respect to using f ¼ 9.3 mm (Δa ¼ 100 deg) onthe same sensor.

Conversely, it is expected that a larger ϱmin will resultwhen using larger f. This is shown in Fig. 11 for thesame camera parameters as the previous example but for f ¼14.2 mm (Δa ¼ 76 deg).

6.5 Effect of the Sensor SizeThe sensor size affects f for a given Δa [Eq. (5)]. For in-stance, in this configuration, a Canon 400D (APS-C sensor)has f ¼ 9.3 mm for Δa ¼ 100 deg, while f ¼ 15.1 mm fora full-size sensor (Wh ¼ 35 mm) (Table 2). Compared to thepixel size of the Canon 400D sensor, the pixel size of theCanon EOS6D is ∼15% wider, while the pixel width ofthe Nikon 800D is 15% smaller. The results for findingrmin for the four configurations are presented in Fig. 12.

For a comparable sensor size and the same focal length,the Nikon 800D sensor performance is worse than CanonEOS6D sensor in terms of ϱmin because the pixel size ofthe former is smaller, which makes the threshold for ehsmaller and pushes the minimum tolerable distance away

Table 3 Example of the effective minimum distance ϱmin for xb(Δx ¼ 0, Δb ¼ 10 ⋅ s): ϱmin (Canon 400D, f ¼ 9.3 mm Δa ¼100 deg, and b ¼ 35 mm).

Configuration N ¼ 5 N ¼ 6 N ¼ 8

1 1.2 m 1.1 m 0.9 m

2 1.7 m 1.5 m 1.3 m

3 2.3 m 2.0 m 1.6 m

4 1.90 m 1.60 m 1.35 m

Fig. 9 Comparison of rmin as a function of the stitching bias Δx for all configurations after changing theradial distance krck in configurations 3 and 4 [Canon 400D, N ¼ 6, f ¼ 9.3 mm (Δa ¼ 100 deg), andb ¼ 35 mm].



from the camera. The Canon 400D sensor is smaller and its fis also smaller, so ϱmin for the same sampling angle Δθ iscloser to the omnistereoscopic camera.

6.6 Vertical DisparitiesAn important side effect of the spatial distribution betweencameras in each acquisition configuration is the manifesta-tion of vertical disparities in the overlapping region betweenneighbor stereoscopic images (Sec. 5.3). We compared thefour configurations in terms of the coefficients ϑj, whichdetermine the magnitude of the vertical disparities in theoverlapping areas. As seen in Eq. (45), the undesired verticaldisparities are proportional to YW. But, this error alsodepends on the distance between the camera and PW onthe XZ-plane. The results of calculating the value of ϑj asa function of the distance to the camera krk ¼ kPWk forthe four configurations are presented in Fig. 13. These resultsare valid for a camera Canon 400D, b ¼ 65 mm, krck ¼65 mm, N ¼ 6, and Δa ¼ 100 deg.

A nonzero ϑj leads to a vertical disparity error (Sec. 5.3)in the overlapping regions; hence, it is desirable to take intoaccount this effect when positioning the omnistereoscopiccamera, especially for acquiring scenarios that are relativelyclose in certain elevation angles. This relationship amongvertical disparities, the proximity to the camera, and the gaz-ing direction in elevation explains the vertical disparities thatappear at the top and bottom parts of the mosaicked images,which are both at the shorter distance krk on the XZ-plane and, at the same time, have large YW components.Fortunately, both coefficients converge to zero at a relatively

short distance from the camera for all the configurations ofthe acquisition model.

The magnitude of this error is also dependent on the par-ticular stitching position in the image pair xb. In Figs. 13(c)to 13(f), we present the value of ϑj over the overlappingregion, for krk ¼ 50 cm of the camera and krk ¼ 100 cm.

Fig. 10 Comparison of rmin as a function of the stitching bias Δx for configurations 1 to 4 and for differentN when reducing the focal length to f ¼ 6.1 mm (Δa ¼ 122 deg) (Canon 400D, b ¼ 35 mm andkrck ¼ b).

Fig. 11 Comparison of rmin as a function of the stitching bias Δx for configurations 1 to 4 and for differentN when increasing the focal length to f ¼ 14.2 mm (Δa ¼ 76 deg) (Canon 400D, b ¼ 35 mm andkrck ¼ b).

Fig. 12 Comparison of rmin as a function of Δx for configurations 1 to4 and for different sensors while maintaining a given field of view Δa.



These results show that the vertical disparities over eachoverlapping region decrease with the distance between thescene projection on the XZ-plane and the camera.

7 ConclusionsIn this paper, we presented an acquisition model with fourconfigurations suitable to describe the variety of imagingsystems to acquire and render stereoscopic panoramas withhorizontal stereo. In other words, we focused on the acquis-ition techniques to capture the visual scene from two distinctand coplanar viewpoints with horizontal parallax. From thesemethods, we focused on the mosaicking of a reduced numberof partially overlapped stereoscopic images given its benefitsto acquire dynamic scenes. We studied two parameters rel-evant for the design of omnistereoscopic acquisition systemsbased on mosaicking: one involves the constraints to repro-duce a continuous depth illusion around the acquisitionpoint, and the other is the appearance of undesired verticaldisparities.

We proposed a projective approach based on pin-holecameras to model each configuration, which can be adaptedeasily to all the four variations of the acquisition model tostudy the parameters of interest for the mosaicking approach.In order to model the depth continuity, we introduced

theoretical and practical thresholds for the depth resolution.Based on our projective approach, we defined a parameter tostudy: the minimum distance between the camera and thescene to reproduce a continuous depth illusion around theacquisition point. Furthermore, we introduced the conceptof a safe distance around the omnistereoscopic camera toacquire the scene stereoscopically, which we called theomnistereoscopic FOV, and we proposed a ray traced methodto determine its location. Based on extensive ray tracing sim-ulations, we compared the effect of the focal length, numberof stereoscopic samples, and radial distance on the minimumdistance from the camera.

The main conclusion from our simulations is that there isno substantial difference between camera geometries interms of the minimum distance between camera and thescene. However, we found differences in the optimal locationfor stitching stereoscopic samples, which can be attributed tothe geometry of the omnistereoscopic system, and that can beused to improve the rendering process. Also from our sim-ulation results, we proposed strategies affecting the design ofsimultaneous acquisition systems that may lead to increasingefficiency in the omnistereoscopic acquisition and rendering.

Finally, we used our acquisition model to find a closedform for the vertical disparity equations. These equationshelp us to model the relationship between the gazing anglein elevation and the projection center parallax in the intro-duction of undesired vertical disparities. We simulated theeffects of distance to the scene and the elevation angle inthe appearance of vertical disparities around the stitchingcoordinates.

We did not cover in this paper the perceptual distortionsintroduced by mosaicking a limited number of images. Thisis an interesting problem related to the stereoscopic qualityof the omnistereoscopic image formation that requires fur-ther research.

AcknowledgmentsThis work was supported by the Ontario GraduateScholarship fund.

References

1. L. E. Gurrieri and E. Dubois, “Acquisition of omnidirectional stereo-scopic images and videos of dynamic scenes: a review,” J. Electron.Imaging 22(3), 030902 (2013).

2. L. E. Gurrieri and E. Dubois, “Optimum alignment of panoramicimages for stereoscopic navigation in image-based telepresence sys-tems,” in Proc. of the 11th Workshop on Omnidirectional Vision,Camera Networks and Non-Classical Cameras, Vol. 11, pp. 351–358, IEEE (2011).

3. L. E. Gurrieri and E. Dubois, “Efficient panoramic sampling of real-world environments for image-based stereoscopic telepresence,”Proc. SPIE 8288, 82882D (2012).

4. L. E. Gurrieri and E. Dubois, “Stereoscopic cameras for the real-timeacquisition of panoramic 3D images and videos,” Proc. SPIE 8648,86481W (2013).

5. H. Baker et al., “Capture considerations for multiview panoramic cam-eras,” in 2012 IEEE Computer Society Conf. on Computer Vision andPattern Recognition Workshops, pp. 37–44, IEEE (2012).

6. C. Weissig et al., “The ultimate immersive experience: panoramic 3Dvideo acquisition,” Lec. Notes Comput. Sci. 7131, 671–681 (2012).

7. V. Couture, M. S. Langer, and S. Roy, “Panoramic stereo videotextures,” in IEEE Int. Conf. on Computer Vision, pp. 1251–1258,IEEE (2011).

8. V. Vanijja and S. Horiguchi, “A stereoscopic image-based approach tovirtual environment navigation,” Comput. Internet Manage. 14(2), 68–81 (2006).

9. R. O. Reynolds, “Design of a stereo multispectral CCD camera for Marspathfinder,” Proc. SPIE 2542, 197–206 (1995).

10. F. Hongfei et al., “Immersive roaming of stereoscopic panorama,” in2008 Int. Conf. on Cyberworlds, pp. 377–382, IEEE (2008).

Fig. 13 Vertical disparity coefficients ϑj as a function of the distanceto the camera krk for configurations 1 to 4 and for xR;1 ¼ xb (Δx ¼ 0):(a) ϑL and (b) ϑR , and the variation of these coefficients around xb asfunction Δx when the distance to the scene is [(c) and (d)]krk ¼ 50 cm and [(d) and (e)] krk ¼ 100 cm.



Pichon

Typewriter



http://dx.doi.org/10.1117/12.908794

http://dx.doi.org/10.1117/12.2002129

http://dx.doi.org/10.1007/978-3-642-27355-1

http://dx.doi.org/10.1117/12.218677

11. H.-C. Huang and Y.-P. Hung, “Panoramic stereo imaging system withautomatic disparity warping and seaming,” Graph. Models ImageProcess. 60(3), 196–208 (1998).

12. F. Huang and Z.-H. Lin, “Stereo panorama imaging and display for 3DVR system,” in IEEE Congress on Image and Signal Processing, Vol. 3,pp. 796–800, IEEE (2008).

13. H. Ishiguro, M. Yamamoto, and S. Tsuji, “Omni-directional stereo,”IEEE Trans. Pattern Anal. Machine Intell. 14(2), 257–262 (1992).

14. S. Peleg and M. Ben-Ezra, “Stereo panorama with a single camera,” inProc. IEEE Conf. on Computer Vision and Pattern Recognition, Vol. 1,pp. 395–401, IEEE (1999).

15. K. Yamada et al., “Generation of high-quality stereo panoramas using athree-camera panorama capturing system,” J. Inst. Image Inf. Telev.Eng. 55(1), 151–158 (2001).

16. K. Yamada et al., “Structure analysis of natural scenes using censustransform and region competition,” Proc. SPIE 4310, 228–237(2000).

17. R. G. Baker, F. A. Baker, and J. A. Conellan, “Panoramic stereoscopiccamera,” U.S. Patent 2008/0298674 A1 application (2008).

18. H. H. Baker and P. Constantin, “Panoramic stereoscopic camera,” U.S.Patent 2012/0105574 application (2010).

19. V. Vanijja and S. Horiguchi, “Omni-directional stereoscopic imagesfrom one omni-directional camera,” J. VLSI Signal Process. 42(1),91–101 (2006).

20. W. A. Clay, “Methods of stereoscopic reproduction of images,” U. S.Patent 3225651 granted (1965).

21. F. Huang, R. Klette, and K. Scheibe, Panoramic Imaging: Sensor-LineCameras and Laser Range-Finders, John Wiley & Sons Inc., Hoboken,NJ (2008).

22. S. Peleg and M. Ben-Ezra, “Stereo panorama with a single camera,” inProc. IEEE Conf. on Computer Vision and Pattern Recognition,pp. 395–401, IEEE (1999).

23. P. Bourke, “Omni-directional stereoscopic fisheye images for immer-sive hemispherical dome environments,” in Proc. of the ComputerGames and Allied Technology, pp. 136–143, World Academy ofScience, Engineering and Technology (2009).

24. R. Szeliski, “Video mosaics for virtual environments,” IEEE Comput.Graph. Appl. 16(2), 22–30 (1996).

25. V. Kwatra et al., “Graphcut textures: image and video synthesis usinggraph cuts,” Proc. ACM SIGGRAPH 22(3), 277–286 (2003).

26. S. J. Ha et al., “Panorama mosaic optimization for mobile camera sys-tems,” IEEE Trans. Consum. Electron. 53(4), 1217–1225 (2007).

27. T. Yan et al., “Seamless stitching of stereo images for generating infinitepanoramas,” in Proc. of the 19th ACM Symp. on Virtual RealitySoftware and Technology, pp. 251–258, Association for ComputingMachinery (2013).

28. E. Trucco and A. Verri, Introductory Techniques for 3-D ComputerVision, Prentice Hall, Englewood Cliffs, New Jersey (1998).

29. J. M. Harris, “Monocular zones in stereoscopic scenes: a useful sourceof information for human binocular vision?,” Proc. SPIE 7524, 151–162 (2010).

30. C. Chang and S. Chatterjee, “Quantization error analysis in stereovision,” in IEEE Asilomar Conf. on Signals, Systems andComputers, Vol. 2, pp. 1037–1041, IEEE (1992).

31. M. Kytö, M. Nuutinen, and P. Oittinen, “Method for measuring stereocamera depth accuracy based on stereoscopic vision,” Proc. SPIE 7864,I1–I9 (2011).

Luis E. Gurrieri received his BEng in electronic engineering from theUniversity of Buenos Aires in 1998 and MSc in electrical engineeringfrom the University of Manitoba in 2006. From 1998 to 2005, heworked in IT and Telecommunication companies, includingEricsson and AT&T. From 2005 to 2009, he was a research engineerat the Communications Research Center in Ottawa, Canada. He iscurrently working toward his PhD degree in electrical and computerengineering at the University of Ottawa, where his main research areais stereoscopic vision for image-based telepresence.

Eric Dubois is a professor at the School of Electrical Engineering andComputer Science, University of Ottawa, Canada. His research hascentered on the compression and processing of still and movingimages, and on multidimensional digital signal processing theory.His current research is focused on stereoscopic and multiview imag-ing, image sampling theory, image-based virtual environments, andcolor signal processing. He is a fellow of IEEE, the CanadianAcademy of Engineering, and the Engineering Institute of Canada,and is a recipient of the 2013 George S. Glinski Award forExcellence in Research of the Faculty of Engineering.



http://dx.doi.org/10.1006/gmip.1998.0467

http://dx.doi.org/10.1006/gmip.1998.0467

http://dx.doi.org/10.1109/34.121792

http://dx.doi.org/10.3169/itej.55.151

http://dx.doi.org/10.3169/itej.55.151

http://dx.doi.org/10.1117/12.411800

http://dx.doi.org/10.1007/s11265-005-4168-7

http://dx.doi.org/10.1109/38.486677

http://dx.doi.org/10.1109/38.486677

http://dx.doi.org/10.1145/882262

http://dx.doi.org/10.1109/TCE.2007.4429204

http://dx.doi.org/10.1117/12.837465

http://dx.doi.org/10.1117/12.872015

Depth consistency and vertical disparities in...

Documents

Transcript of Depth consistency and vertical disparities in...