R&D White Paper - BBCdownloads.bbc.co.uk/rd/pubs/whp/whp-pdf-files/WHP035.pdf · in real-time in a...

12
R&D White Paper WHP 035 September 2002 Use of image-based 3D modelling techniques in broadcast applications O. Grau and G.A. Thomas Research & Development BRITISH BROADCASTING CORPORATION

Transcript of R&D White Paper - BBCdownloads.bbc.co.uk/rd/pubs/whp/whp-pdf-files/WHP035.pdf · in real-time in a...

Page 1: R&D White Paper - BBCdownloads.bbc.co.uk/rd/pubs/whp/whp-pdf-files/WHP035.pdf · in real-time in a specially equipped studio using chroma-key techniques. In order to use the method

R&D White Paper

WHP 035

September 2002

Use of image-based 3D modelling techniques in broadcast applications

O. Grau and G.A. Thomas

Research & Development BRITISH BROADCASTING CORPORATION

Page 2: R&D White Paper - BBCdownloads.bbc.co.uk/rd/pubs/whp/whp-pdf-files/WHP035.pdf · in real-time in a specially equipped studio using chroma-key techniques. In order to use the method
Page 3: R&D White Paper - BBCdownloads.bbc.co.uk/rd/pubs/whp/whp-pdf-files/WHP035.pdf · in real-time in a specially equipped studio using chroma-key techniques. In order to use the method

© BBC 2002. All rights reserved.

BBC Research & Development White Paper WHP 035

Use of image-based 3D modelling techniques in broadcast applications

Oliver Grau, Graham A. Thomas

Abstract

Conventional video- and audio-based content is still the mainstream media for thebroadcast industry. However, there is an increasing demand to use 3D models,mainly in special effects or visualisation of virtual objects, like buildings that nolonger exist. On the other hand there are several established or emergingtechnologies that make explicit use of 3D models and are of interest for thebroadcast industry. These applications include: 3D games, 3D internetapplications and 3D-TV.

This paper discusses several techniques and systems that allow cost effectivecreation of this content. The approaches can be classified by the class of shapequality they deliver: 2D-Sprites, incomplete 3D (2.5D) and complete 3D models.This contribution describes an approach for classifying these shape types. Anoverview of requirements for the generation of object-based 3D models in anumber of different applications is also given.

This document was originally published in 2002 Tyrrhenian InternationalWorkshop on Digital Communications (IWDC 2002), September 8th - 11th, 2002,Capri, Italy.

Key words: Virtual Production, Virtual Studio, Special Effects, 3DReconstruction

Page 4: R&D White Paper - BBCdownloads.bbc.co.uk/rd/pubs/whp/whp-pdf-files/WHP035.pdf · in real-time in a specially equipped studio using chroma-key techniques. In order to use the method

© BBC 2002. All rights reserved. Except as provided below, no part of this document may bereproduced in any material form (including photocopying or storing it in any medium by electronicmeans) without the prior written permission of BBC Research & Development except in accordancewith the provisions of the (UK) Copyright, Designs and Patents Act 1988.

The BBC grants permission to individuals and organisations to make copies of the entire document(including this copyright notice) for their own internal use. No copies of this document may bepublished, distributed or made available to third parties whether by paper, electronic or other meanswithout the BBC's prior written permission. Where necessary, third parties should be directed to therelevant page on BBC's website at http://www.bbc.co.uk/rd/pubs/whp for a copy of this document.

White Papers are distributed freely on request.

Authorisation of the Head of Research is required forpublication.

Page 5: R&D White Paper - BBCdownloads.bbc.co.uk/rd/pubs/whp/whp-pdf-files/WHP035.pdf · in real-time in a specially equipped studio using chroma-key techniques. In order to use the method

Use of image-based 3D modelling techniques in broadcast applications

Oliver Grau, Graham Thomas

BBC R&DKingswood Warren, Tadworth, Surrey, KT20 6NP, UK

Email: oliver.grau | [email protected], Web: http://www.bbc.co.uk/rd

ABSTRACT

Conventional video- and audio-based content is stillthe mainstream media for the broadcast industry. How-ever, there is an increasing demand to use 3D models,mainly in special effects or visualisation of virtual ob-jects, like buildings that no longer exist. On the otherhand there are several established or emerging technolo-gies that make explicit use of 3D models and are of in-terest for the broadcast industry. These applications in-clude: 3D games, 3D internet applications and 3D-TV.This paper discusses several techniques and systems thatallow cost effective creation of this content. The ap-proaches can be classified by the class of shape qual-ity they deliver: 2D-Sprites, incomplete 3D (2.5D) andcomplete 3D models. This contribution describes an ap-proach for classifying these shape types. An overview ofrequirements for the generation of object-based 3D mod-els in a number of different applications is also given.

1 INTRODUCTION

The main use of 3D models in broadcast applicationsat the moment is in virtual studios and special effects.These methods are often referred to as virtual production.The key idea of virtual production is the composition ofvirtual and real scene elements or different virtual ele-ments. For this composition several optical phenomenamust be harmonised between different kinds of media orcomponents from different sources. The most importantphenomena are: (1) match of the camera perspective of2D footage with rendered virtual components, (2) occlu-sions, (3) establishing proper lighting and shading of thedifferent scene components and (4) reflections.

In a virtual studio one or more actors are captured in astudio using a camera fitted with a tracking system. Theparameters of the real camera are then transfered to a vir-tual camera that renders the background of the scene. Fi-nally the actor is keyed into this image using the chroma-keying technique. The method is effective but often lack-ing a sufficient degree of optical interaction. So the ex-ample of figure 1 does not show the correct lighting onthe (real) actor and a proper shadow on the virtual floor.

In order to create a full and realistic looking opti-cal interaction a 3D description of the real and virtualscene must be available for rendering the synthetic im-age. This paper investigates several image-based 3Dmodelling techniques for the use in virtual production.The quality requirements of these 3D models vary de-

Figure 1: Virtual studio: Scene of the BBC’s programme“Tomorrow’s world”. Optical interactions are restricted.

pending on the application. Section 3.2 gives a summaryof different applications and their requirements.

The following section gives an overview about related3D reconstruction techniques.

Section 4 describes approaches and their implementa-tion in the broadcast context. The article finishes withsome results and conclusions.

2 RELATED WORK ON SHAPE RECONSTRUCTION

In recent years, a lot of research has been carried out inthe field of 3D scene reconstruction and especially in thedevelopment of 3D depth sensors. This can be dividedinto active and passive approaches.

Active 3D sensors use an active illumination techniqueand are usually more robust compared, for example, topassive stereo. Products using active techniques such aslasers or structured light are available from Cyberware(USA), Wicks & Wilson (UK), Vitronic (Germany) andothers. Unfortunately none of these systems are able tocapture a moving 3D scene at normal video frame rate. Apromising system is the Z-Cam from 3DV Systems (Is-rael), which uses a special TV camera with video framerate and a depth sensing method based on time-of-flightof light.

Passive 3D sensors generally use a two-camera rig inconjunction with a stereo vision approach (for example[1, 2, 3, 4]). These systems are not currently in commonuse in a studio environment, mainly due to their restric-tions in accuracy and robustness.

There are also passive methods using more then twocameras. A promising approach is the shape from sil-

Page 6: R&D White Paper - BBCdownloads.bbc.co.uk/rd/pubs/whp/whp-pdf-files/WHP035.pdf · in real-time in a specially equipped studio using chroma-key techniques. In order to use the method

houette method [5, 6, 7]. This has been shown as a fast,scalable and robust method for reconstruction of 3D in-formation from silhouettes, that can be computed easilyin real-time in a specially equipped studio using chroma-key techniques. In order to use the method the camerasmust be calibrated. That means the internal and externalcamera parameters must be known. These are measuredusing a calibration pattern and a calibration procedure toestimate the camera parameters [8, 9]. Then the intersec-tion of the bounding volumes of each silhouette is com-puted. From the remaining convex hull in the form of avoxel representation, a surface description is computed.In the last step a texture map is created using the infor-mation from the camera images.

A disadvantage of the basic shape from silhouette algo-rithm is, that no convex structures can be modelled. Thisproblem was addressed by several extensions of the ap-proach. The voxel colouring and shape carving technique[10, 11] makes use of the colour information, i.e. thedifferences between the generated model and the cameraimages.

Another advanced approach, that fundamentally makesuse of silhouette information is the incorporation of high-level generic models of human bodies [12, 13]. Thesemethods give a good appearance, but are not intendedfor modelling a person photo-realistically at video framerate, because they grab a texture map only once. Themain application field of these methods is for online ap-plications with a limited bandwidth.

In recent years the techniques used for virtual pro-duction have been influenced by image-based methodsand plenoptic modelling. The original work on plenop-tic modelling used little or no geometry. Meanwhilethere is a spectrum of methods known in the literaturethat are using different qualities of geometrical descrip-tion. E.g. view-dependant texture-mapping [14], surface-light-fields [15] and the unstructured lumigraph [16].Although these methods are very promising for photo-realistic modelling, even of very complex environments,they are not well suited for inserting or exchanging ob-jects from or into other environments or changing thelighting of the scene.

3 APPLICATIONS AND REQUIREMENTS

The requirements for a 3D reconstruction of objectsdepend both on the application and the kind of opticalphenomena that should be realised. The most importantoptical phenomena concerning virtual production are:

a. camera perspective

b. occlusions

c. depth perception

d. shadows

e. light reflections

The degree of optical interaction gives a measure ofhow good the composited scene of virtual and real com-ponents realises these phenomena between the inserted

objects. The relative importance of these phenomena andthe feasibility of implementing them varies with the ap-plication. The following sections give a summary of ap-plications in the broadcast context and criteria for thechoice of 3D quality for the particular purposes.

3.1 Current and future use of 3D techniques invirtual production

Conventional video- and audio-based content is stillthe mainstream media for the broadcast industry. How-ever, there is an increasing demand to use 3D models, invirtual studios, in special effects and for the visualisationof virtual objects.

The term virtual studio is usually taken to mean thetechnique of combining live action in a studio with akeyed background image. The latter is updated in real-time by a computer equipped with sufficiently fast graph-ics hardware to match the movement of the studio cam-era. Due to the real-time requirement the optical inter-actions are quite limited. The minimum is to match thecamera parameters using camera tracking. The final com-positing works in 2D; 3D data is not used except a roughdepth estimate for occlusion masking. The virtual cameracannot be moved to a position different from than that ofthe real camera; control over lighting is also very poor.

Special effects are normally not restricted to real-timeand use more powerful rendering tools, like ray-tracersfor image synthesis. Therefore, special effects usu-ally establish full optical interaction, that means occlu-sions, shadow casting and receiving, further reflections ifneeded. Control over the lighting is very important, someeffects ask even for control over the virtual camera.

Pre-visualisation is a tool to help planning the arrange-ments on the set during an early state of the production,i.e. position of actors, cameras and lighting. The 3Dmodel must allow these positions to be changed interac-tively. Pre-visualisation becomes more and more impor-tant and is used independently from special effects, i.e.also in conventional productions.

On-set visualisation is mainly intended to be a tool dur-ing production, i.e. to give feedback to camera operators,actors and directors. Therefore, it does not have the re-quirement for the best quality, but should be available inreal-time.

3D-TV gives the observer the added value of depth per-ception. Due to lack of the right 3D-display technologythis technique is not implemented for home users yet, butin IMAX cinema it already plays a role.

An important application for the future is interactiveprogrammes that make explicit use of 3D models and areof interest for the broadcast industry. These applicationsinclude: games, edutainment and 3D Internet applica-tions. Here the user gets the full control over the virtualcamera.

Before we discuss the question of the right shape qual-ity or representation, it is important to notice that virtualstudio systems and also most special effects today areusing 2D compositing for the final integration of virtualand real scenes. In particular conventional virtual stu-dio systems do not allow optical interactions other than

2

Page 7: R&D White Paper - BBCdownloads.bbc.co.uk/rd/pubs/whp/whp-pdf-files/WHP035.pdf · in real-time in a specially equipped studio using chroma-key techniques. In order to use the method

Application RT shad. ref. lc cc ClassVirtual Studio ++ (+) - - - IOn-set Visuali. + - - - - IIPre-Visuali. - + + ++ ++ IIISpecial Effects - + + + (+) II-IV3D-TV - + + + - IIInteractive - + (+) (+) ++ III

Table 1: Shape quality.

matching of the camera perspective and (on some sys-tems) simple occlusions. For special effects some opticalinteractions like shadows can be realised by additional al-pha mattes that ‘dim’ the real image in shadow areas. Forfeatures like the change of virtual viewpoints or full con-trol over the lighting a 3D description of the virtual andthe real scene is needed.

3.2 What shape quality is needed for which ap-plication ?

Sprite, BillboardRelief

I

coarse rich detailed

IVIIIII

2Dincomplete 3D

complete 3D

2.5D

Figure 2: Different classes of shape quality.

Figure 2 gives a classification of different shape quali-ties used in virtual production [17]. Which one to choosedepends strongly on application requirements, like thequestion of whether the created model is used for real-time rendering or not. Moreover, the shape quality in-fluences the degree of optical interaction that is possible.For example a sprite (class I) when used in a 2D layeredsystem can realise simple occlusion effects, like render-ing the sprite object on top or behind another layer, butthey do not give fine control, like pushing only a handthrough a virtual object.

Table 1 gives a summary of the optical phenomenathat are used in the applications mentioned in the previ-ous section and the shape class that should be used. Thefollowing abbreviations are used in the header of the ta-ble: RT = real-time reconstruction and synthesis, shad. =shadows, ref. = reflections, lc = lighting control and cc= control over virtual camera. Since the right perspectiveand occlusions are always needed, these are omitted inthe table. A ’-’ in the table indicates that this feature isnot important, a ’+’ and ’++’ indicates it is important andvery important.

Table 1 shows a clear correlation between the shapequality class and the number of optical phenomena thatmust be realised. The rich detailed 3D class allows thewidest applications, but is not always needed or possibledue to the real-time constraints, like in the virtual studioor on-set visualisation.

Special effects cannot be clearly categorised becausethey differ from case to case. Usually the best class (IV)might be desired, but is not always possible to be realiseddue to budget restrictions or may not actually be neces-sary at all.

Interactive applications usually require a complete 3Ddescription, because of the fact that the user has full con-trol over the virtual camera. Due to limitations in re-sources, namely bandwidth and rendering speed, theywill be normally restricted to a coarser and therefore morecompact shape description, i.e. class III.

4 APPROACHES

This section gives an overview about 3D reconstruc-tion techniques developed by the BBC. The techniquesare developed for use in a studio environment like in fig-ure 3. An exception is the use of the stereo methods, asdescribed in section 4.2, that can be used in a range ofdifferent environments.

Figure 3: Studio setup with different capturing systems

4.1 2D- Sprite and Billboard modelsThe simplest 3D description of an object, according to

figure 2 is a 3D polygon, texture-mapped with the cameraimage. These ‘sprites’ are using alpha-masks to representarbitrarily-shaped 2D objects and can be created automat-ically by re-projecting the image frame from the cameraposition as a polygon into space. The position of thispolygon should coincide with the position of the actor inthe studio reference co-ordinate system. Therefore, theposition and orientation of the camera and the position ofthe actor must be known.

A camera tracking system, and an actor tracking sys-tem using an auxiliary camera, as depicted in figure 4,were developed by us as components in a virtual studiosystem [9, 18]. The optical interaction is limited hereto simple occlusions. The virtual camera can be movedaway from the real viewpoint only if the viewing angle isnot significantly changed (see figure 7 + 8).

An extension of the planar sprite model was proposedby Grau, et. al in [17]. This method is based on the ideathat objects are usually more elevated in the middle. Thisassumption can be used, as depicted in figure 5, by com-puting the 2D distance function (figure 5 c) of the 2Dsilhouette (figure 5 b) and a polygonal elevation function(figure 5 d).

3

Page 8: R&D White Paper - BBCdownloads.bbc.co.uk/rd/pubs/whp/whp-pdf-files/WHP035.pdf · in real-time in a specially equipped studio using chroma-key techniques. In order to use the method

Auxiliary camera

for actor position

Main camera

Figure 4: The position of the actor is tracked by an auxil-iary camera

Figure 5: Shape from one silhouette. a) camera image, b)object mask, c) 2D distance function, d) 3D mesh.

4.2 Incomplete 3D (Relief)Within the EU-funded project MetaVision [19], we are

studying applications that can be supported by the “2.5D”shape class of figure 2, and developing a method to cap-ture such data. The main application we are consideringis the composition of 3D objects into an image sequenceusing a depth map of the sequence to support optical in-teractions such as occlusions and shadows between realand virtual elements. Other potential applications aresimulations of effects such as depth-of-focus and fog, andthe generation of stereoscopic image sequences.

We have chosen to develop a multi-baseline stereo sys-tem, with an auxiliary camera positioned either side of themain camera. The main camera could be any standard orhigh-definition TV or film camera, whereas the auxiliarycameras are lower-cost monochrome cameras with con-ventional TV resolution. It is not practical to use highresolution colour cameras for the auxiliary cameras dueto constraints of size, cost and data capture bandwidth.In our current experimental set-up, all three cameras are576-line cameras and are running synchronised at 25Hzwith progressive scanning.

We adopted a passive multi-baseline stereo approach,instead of an active method, so that the system would beusable outdoors and over relatively long ranges. By ad-justing the baseline between the cameras, the practical us-able range can be 20m or more, although depth resolutiondecreases significantly with distance. The use of two aux-iliary cameras allows us to obtain a depth map for the cen-tral camera with significantly fewer occlusions than with

a single auxiliary camera. Occlusions inevitably lead touncertainties in the computed depth around object bound-aries, which can be a particular problem for applicationssuch as segmentation, where a reliable depth estimate atobject boundaries is important.

As we are mainly considering applications such as spe-cial effects implemented in post-production, there is norequirement for a real-time depth image, so we can usenon-real-time depth estimation software. However, wecannot afford to use a highly computationally-intensivealgorithm, since we need to process sequences containingmany thousands of images in a length of time acceptableto typical users of post-production software. A process-ing time of a few seconds per image on a modern PC islikely to be acceptable.

We have developed a block-based multi-baseline dis-parity estimator, that seeks to minimise the match errorbetween each block in the central camera image and cor-responding regions of the images from the auxiliary cam-eras. We use spatial and temporal recursion to form aninitial prediction of the disparity, both to increase pro-cessing speed and ensure a smooth disparity field. Toprovide the pixel-level disparity field that most applica-tions require, we refine the block-based field by testingdisparities for each pixel using candidates from the blockcontaining the pixel and the four immediately adjacentblocks, and measuring the match error over a small re-gion centred on the pixel. Further details may be foundin [20].4.3 Complete 3D

Within the EU-funded project ORIGAMI [21, 22] theBBC is developing a studio-based system that allows thegeneration of complete 3D models of dynamic scenesboth in real-time and offline. ORIGAMI addresses full3D scene composition for achieving full optical interac-tion between real and virtual scenes. Therefore the realscene elements are virtualized. The 3D description is thenimported into a commercial 3D animation package to-gether with the virtual scene components. The opticalinteraction is established just by using the render func-tionality of the animation package that copes inherentlywith occlusions, shadows and reflections.

Current experiments are using 6 fixed, calibrated cam-eras and one moving camera equipped with the free-d camera tracking system developed by the BBC. Thestudio is equipped with a special retro-reflective cloth.Around the lens of each camera is a ring of blue LEDs.The light reflected from the retro-reflective cloth allows arobust chroma key of the images.

The real-time shape reconstruction is used for on-setvisualisation, which is integrated into the studio system.It gives feedback to actors, camera operators and the di-rector. The actor feedback is realized by generating view-dependent images and projecting them onto the studiofloor and walls. On the other hand there should not beany light projected on the actor. Therefore the area wherethe actor stands is masked out using a 3D model of himcreated in real-time.

The feedback for camera operators and directors isgiven on a computer screen. It uses a texture-mapped

4

Page 9: R&D White Paper - BBCdownloads.bbc.co.uk/rd/pubs/whp/whp-pdf-files/WHP035.pdf · in real-time in a specially equipped studio using chroma-key techniques. In order to use the method

Z−1

Studio iluminationmodel

..

3D shape

Texture color,

surface properties

Shape

Reconstruction

Texture

mapping

multicamera images

3D shape

Figure 6: Generation of dynamic 3D models of actors inORIGAMI

version of the 3D model used for the projection mask.For the basic shape reconstruction of the scenes in

the studio the shape-from-silhouette approach [5, 6, 7]is used, because it makes use of the available chroma keyfacility.

Currently, initial experiments with shape-from-silhouette for the shape reconstruction are being carriedout. First results are presented in section 5. The shapeof the objects is represented as 3D triangular meshes.To create a coloured model a texture map is computed,as depicted in the functional block diagram in figure6. In order to be able to use the generated 3D modelsin a different illumination situation the texture mapscontain diffuse and reflective surface properties. For thecomputation of these parameters the lighting situationof the studio is measured and used. An algorithm thatmakes use of this information to “un-light” the texturemaps is under development.

An important requirement is that the techniques devel-oped can be applied to dynamic objects and the final re-sult is a sequence of images. Therefore it is important thatthe generated 3D description is also consistent over time,that means the final sequence should not show significanttemporal artefacts. In order to cope with this requirementa recursive approach will be developed that makes use ofthe previously computed 3D shape and texture. This isindicated by the signal feedback in figure 6.

5 RESULTS

This section presents some results of the approachesdescribed in section 4.

Figure 7 shows an experiment with textured polygons.The original idea was to create a simple 3D descriptionthat can be used as a rough actor model in a MPEG-4system[23].

The system also allows the movement of the virtualcamera to a viewpoint different from that of the real cam-era. This gives reasonable results only if the virtual cam-era keeps roughly the same viewing angle to the scene asthe real camera. If the virtual camera is moved aroundthe scene object, as depicted in figure 8, the limitationsof this shape representation become obvious.

An extension of the textured polygon is an elevationgrid using the ‘shape-from-one-silhouette’ method as de-scribed in section 4.1. The resulting shape approximationallows much better optical interactions and in particularthe virtual camera can be change to some extent.

Figure 7: Actor modelled using a textured polygon withalpha-channel

Figure 8: Image rendered form a viewpoint significantlydifferent to the viewpoint of the real camera

Figure 9 show a still image from a sequence1 createdwith an animation package. There is a freedom in movingthe virtual camera viewpoint of about �15Æ around orig-inal camera viewing angle. Further, the dancer is castingshadows and reflections to the street and is also receivinga shadow.

Figure 10 shows some results from the use of depth in-formation for keying, as described in Section 4.2. Theupper image shows one frame from an outdoor sequence.The tree is about 19 m away from the camera. The mid-dle image shows the disparity map for this image, derivedfrom the main image plus two monochrome images fromauxiliary cameras positioned 0.25m to either side. Thecomputation time for this disparity image was approx-imately 7 seconds on a 1.4Ghz Pentium 3 PC, and thedisparity range is approximately 5-65 pixels. By thresh-olding the disparity map at an appropriate value, an al-pha signal was created and used to insert a virtual objectbetween the person and the tree, as shown in the lowerimage. The depth map shows some artefacts around theedges of the person where there is low contrast betweenthe foreground and background, but nevertheless a rea-sonable key can be achieved. Further work is required

1Available on www.ist-metavision.com

5

Page 10: R&D White Paper - BBCdownloads.bbc.co.uk/rd/pubs/whp/whp-pdf-files/WHP035.pdf · in real-time in a specially equipped studio using chroma-key techniques. In order to use the method

Figure 9: Model of a dancer integrated into virtual streetmodel

to improve the disparity map, particularly in low-contrastareas.

Figure 11 shows a visualisation of a multi-camerasystem as developed within the ORIGAMI project. Itshows the original 6 cameras and a number of 3D modelsfrom a captured sequence created using the shape-from-silhouette algorithm as described in section 4.3. An ex-ample of the use of these models in a virtual environmentis depicted in figure 12.

6 FUTURE WORK AND CONCLUSIONS

This contribution discussed the use of 3D modellingtechniques for broadcast applications. Recent applica-tions are mainly focused on the integration of virtual andreal scenes for TV production, as in virtual studios andspecial effects. Further 3D techniques are used to sup-port the production process itself by pre- and on-set visu-alisation. In the future the use of 3D media, as in internetservices and games might become an additional applica-tion area.

Section 3 gave an analysis of the requirements of theapplication areas on one side and proposed four differentclasses of shape quality on the other end. Useful criteriafor the choice of the shape class are the degree of opticalinteraction that is needed, the degree of freedom of thevirtual camera and the virtual lighting.

The approaches discussed in section 4 give a selectionof methods implemented mainly for the use in a studioenvironment. The results give some examples of the qual-ity of the optical interactions under the given constraints.

The technique using just a texture-mapped polygon isquite restricted in usability. The extension of that ap-proach, that creates an elevation grid by using the 2Dsilhouette of an actor is surprisingly efficient. It pro-vides limited support for re-lighting the scene and cre-ating shadow effects. Furthermore it allows the virtualviewpoint to be moved over a limited range.

In order to give more freedom in the variation of thevirtual camera and lighting, a multi-camera approach isinvestigated, which is currently under development in theIST ORIGAMI project. First results are quite promising.

Next steps in the development are dedicated to the re-construction of proper texture maps, that take colour andreflection features of the surface.

The use of a depth map provides support for a rangeof applications where the viewpoint of the camera re-mains essentially fixed, but where effects such as occlu-sions of inserted virtual objects by real scene elementsneed to be handled. Initial results show promise, but havehighlighted problems with real-world scenes, particularlywhere areas of low contrast make image segmentationdifficult. Further work is planned in this area.

REFERENCES

[1] M.J.P.M. Lemmens, “A survey on stereo match-ing techniques,” in International Archives of Pho-togrammetry and Remote Sensing, 1988, vol. 27,pp. 11–23.

[2] L. Falkenhagen, “Depth estimation from stereo-scopic image pairs assuming piecewise continuoussurfaces,” in Image Processing for Broadcast andVideo Production, November 1994, pp. 115–127.

[3] Manfred Ziegler and Lutz Falkenhagen et al, “Evo-lution of stereoscopic and three-dimensional video,”Signal Processing: Image Communication, vol. 14,pp. 173–194, 1998.

[4] J.-R. Ohm et al., “A realtime hardware systemfor stereoscopic videoconferencing with viewpointadaptation,” Signal Processing: Image Communi-cation, vol. 14, pp. 173–363, 1998.

[5] M. Potmesil, “Generating octree models of 3D ob-jects from their silhouettes in a sequence of images,”Computer Vision, Graphics and Image Processing,vol. 40, pp. 1–29, 1987.

[6] Richard Szeliski, “Rapid octree construction fromimage sequences,” CVGIP: Image Understanding,vol. 58, no. 1, pp. 23–32, July 1993.

[7] Wolfgang Niem, “Robust and fast modelling of 3dnatural objects from multiple views,” in SPIE Pro-ceedings, Image and Video Processing II, San Jose,February 1994, vol. 2182, pp. 388–397.

[8] R. Tsai, “A versatile camera calibration techniquefor high-accuracy 3d machine vision metrology us-ing off-the-shelf tv cameras and lenses,” IEEE J.Robotics and Automation, vol. 3, no. 4, pp. 323–344, 1987.

[9] Graham A. Thomas, J. Jin, T. Niblett, and Urquhatr,“A versatile camera position measurement systemfor virtual reality tv production,” in ConferenceProc. of International Broadcasting Convention,Sept. 1997.

[10] S. Seitz and C. Dyer, “Photorealistic scene recon-struction by voxel coloring,” International Journalof Computer Vision, vol. 35, no. 2, pp. 151–173,1999.

6

Page 11: R&D White Paper - BBCdownloads.bbc.co.uk/rd/pubs/whp/whp-pdf-files/WHP035.pdf · in real-time in a specially equipped studio using chroma-key techniques. In order to use the method

[11] K. Kutulakos and S. Seitz, “A theory of shape byshape carving,” Intl. Journal of Computer Vision,vol. 38, no. 3, pp. 197–216, 2000.

[12] Adrian Hilton, D. Beresford, T. Gentils, R. Smith,and W. Sun, “Virtual people: Capturing humanmodels to populate virtual worlds,” IEEE Conf. onComputer Animation, 1999.

[13] S. Weik, J. Wingbermühle, and W. Niem, “Auto-matic creation of flexible antropomorphic modelsfor 3d videoconferencing,” Journal of Visualiza-tion and Computer Animation, vol. 11, pp. 145–154,2000.

[14] Paul E. Debevec, George Borshukov, and YizhouYu, “Efficient view-dependent image-based ren-dering with projective texture-mapping,” in Proc.of 9th Eurographics Rendering Workshop, Vienna,Austria, June 1998.

[15] Daniel N. Wood, Daniel I. Azuma, Ken Aldinger,Brian Curless, Tom Duchamp, David H. Salesin,and Werner Stuetzle, “Surface light fields for 3Dphotography,” in Siggraph 2000, Computer Graph-ics Proceedings, Kurt Akeley, Ed. 2000, pp. 287–296, ACM Press / ACM SIGGRAPH / AddisonWesley Longman.

[16] Chris Buehler, Michael Bosse, Leonard McMillan,Steven J. Gortler, and Michael F. Cohen, “Unstruc-tured lumigraph rendering,” in SIGGRAPH 2001,Computer Graphics Proceedings, Eugene Fiume,Ed. 2001, pp. 425–432, ACM Press / ACM SIG-GRAPH.

[17] Oliver Grau, Marc Price, and Graham A. Thomas,“Use of 3d techniques for virtual production,” inProc. Of SPIE, Conference Proc. of Videometricsand Optical Methods for 3D Shape Measurement,January 2001, vol. 4309.

[18] Marc Price and Graham A. Thomas, “3d virtualproduction and delivery using mpeg-4,” in Confer-ence Proc. of International Broadcasting Conven-tion, Sept. 2000.

[19] “Metavision project website,” http://www.ist-metavision.com.

[20] Graham A. Thomas and Oliver Grau, “3d imagesequence acquisition for tv & film production,” inConference Proc. of 1st Symp. on 3D Data Process-ing Visualization Transmission, Pavova, Italy, June.2002.

[21] “Origami project website,” http://www-dsp.elet.polimi.it/origami/.

[22] Geovanni Bazzoni, Enrico Bianchi, and Oliver Grauet al., “The origami project: advanced tools andtechniques for high-end mixing and interaction be-tween real and virtual content,” in Conference Proc.of 1st Symp. on 3D Data Processing VisualizationTransmission, June. 2002.

[23] M. Price, J. Chandaria, and O. Grau et. al., “Real-time production and delivery of 3d media,” in Con-ference Proc. of International Broadcasting Con-vention, Sept. 2002.

7

Page 12: R&D White Paper - BBCdownloads.bbc.co.uk/rd/pubs/whp/whp-pdf-files/WHP035.pdf · in real-time in a specially equipped studio using chroma-key techniques. In order to use the method

Figure 10: Depth-keying

Figure 11: Multi-camera system

Figure 12: 3D model from multi-camera system inte-grated into virtual scene

8