Realistic Cloth Augmentation in Single View Videoiphome.hhi.de › eisert › papers ›...

Realistic Cloth Augmentation in Single View Video

Anna Hilsmann, Peter Eisert

Fraunhofer Institute for TelecommunicationsHeinrich Hertz Institute

Einsteinufer 37, 10587 Berlin, GermanyEmail: {anna.hilsmann,peter.eisert}@hhi.fraunhofer.de

Abstract

Augmenting deformable surfaces like cloth in realvideo is a challenging task because not only geo-metric parameters describing the surface deforma-tion have to be estimated, but also photometric pa-rameters in order to recover realistic lighting andshading. In this paper we present a method forcloth augmentation in single-view video with realis-tic geometric deformation and photometric proper-ties without a 3 dimensional reconstruction of thesurface. These parameters are jointly estimatedfrom direct image information using an extendedversion of optical flow with an explicit photometriccolor model and mesh-based. We demonstrate howthe proposed method is used to produce convincingaugmented results.

1 Introduction

Merging computer generated content with realvideo is an important and challenging task in manyapplications such as movies [5] and augmentedreality [11] (see Figure 1). For realistic augmen-tation, not only geometric parameters describingthe motion of the object have to be estimated. Inaddition to that, also photometric parameters de-scribing lighting and shading have to be recoveredto assure that the virtual object merges with the realvideo content and to increase the perception thatthe augmented video is truly exhibiting the virtualobject. We are particularly interested in retexturingnon-rigid surfaces, e.g. cloth, in single-view videosfor real-time augmented reality applications like theone depicted in Figure 1. As 3D reconstruction ofelastically deforming surfaces in single-view videois an ill-posed problem, we approach the problemin the image plane. This is the most intuitiveapproach as for the retexturing purpose we intend

to render a virtual texture from the same point ofview as the original surface and integrate it into thereal video scene such that lighting conditions andshadings have to remain the same.

Current cloth augmentation methods usually donot account for shading and illumination [3] at all ortreat geometric and photometric parameter estima-tion separately [1, 6, 11, 13, 14]. Some approachesrequire markers to establish a shading map by inten-sity interpolation [1, 6, 13], others restrict the sur-face to consist of a limited set of colors which canbe easily classified [14]. However, this assumptionof a-priori knowledge is problematic in many ap-plications and limits the applicability for arbitraryvideo sequences.

In [7] we presented a parametric method utilizingdirect image information for joint deformation andphotometric parameter recovery from single viewvideo and showed in various experiments howtaking into account the photometric changes in animage sequence improves the geometric trackingresults. We exploit the optical flow constraintequation extended by a specific illumination modelwhich allows the brightness of a scene pointto vary with time, as proposed by Gennert andNegahdaripour [4]. Several other researchers haveexploited similar ideas to make optical flow basedtracking more robust against lighting changes[12, 9, 7]. In this paper, we propose to utilize theadditional information about illumination changesto synthesize an augmented retextured version ofthe cloth by incorporating a specific color model,that accounts not only for the light intensity butalso for the color of the light. We will show thatthe addition of lighting substantially improves therealistic impression of the augmented cloth.

The remainder of this paper is structured as fol-

VMV 2009 M. Magnor, B. Rosenhahn, H. Theisel (Editors)

Figure 1: Augmented reality application: in a Vir-tual Mirror a user can see himself wearing virtuallytextured clothes.

lows. In Section 2 we recall our method for simul-taneous estimation of deformation and illuminationparameters for deformable surfaces from [7] and ex-pand the method by a specific color model that ac-counts not only for changes in the light intensity butalso in the color of the light, before Section 3 de-scribes our method for cloth augmentation from sin-gle view video, i.e. shading map creation and clothretexturing. Section 4 reports experimental results.

2 Joint Deformable Motion and Pho-tometric Parameter Estimation

The problem of recovering geometric and photo-metric parameters for realistic retexturing can beseen as an image registration task solving for a warpthat not only registers two images spatially but alsophotometrically. We use an intensity-based para-metric approach which leads to a minimization of acost function over a parameter vector Θ describingthe spatial and photometrical warp. In general, thecost function consists of two terms:

Θ = argminΘ

(ℰD(Θ) + �2ℰS(Θ)

)(1)

with ℰD(Θ) being the data term and ℰS(Θ) repre-senting prior knowledge on the shape and illumina-tion model, often called the smoothness term. � isa regularization parameter which weights the influ-ence of this prior knowledge against fitting to thedata term.

2.1 Parameterizing the Warp

A 3D reconstruction of elastically deforming clothfrom single-view video is an ill-posed task because

it faces many ambiguities. Without a 3D recon-struction of the surface we cannot explicitly modelthe light conditions of the scene. However, we canexplain the motion and deformation in the imageplane and recover the impact of the illumination onthe intensity of a scene point. This is because theshading and lighting properties that should be ap-plied to the virtual texture are already present in theoriginal image. We therefore describe the spatialwarp of the surface in a dense 2-dimensional pixeldisplacement field D (x) = (Dx (x) ,Dy (x))T

and define the photometric changes in terms of adense multiplier field S (x) describing the scale ofthe pixel intensity, where x = (x, y)T representsthe pixel position. The reason why we chose toexplain the intensity changes in the image by amultiplier field and not by both multiplier andoffset fields as in [10] is that the decompositioninto multiplier and offset fields yields ambiguitiesand is not unique. Another advantage is that wedo not have to care about gamma correction in theimage for the retexturing purpose, as explained inSection 3.

We parameterize D (x) and S (x) in a parametervector Θ, using a mesh-based model consisting ofK vertices vk, (k=1...K). Additionally, a third pho-tometric parameter �i describing the intensity scaleat each vertex is incorporated so that each vertexnow has three parameters �k = (�vkx, �vky, �k)as illustrated in Figure 2. The spatial warp is thenparameterized by the vertex displacements �vk =(�vkx, �vky) and the photometric change is param-eterized by the photometric parameter �k:

Θ=

⎛⎜⎝�vx1 ...�vxK , �vy1 ..., �vyK︸︷︷︸Θv

, �1...�K︸︷︷︸Θ�

⎞⎟⎠T

(2)where Θ�v comprises the geometric deformationparameters of Θ and Θ� comprises the photometricparameters. Several parameterizations of the pixeldisplacement field D (x) and the brightness scalefield S (x) are possible. One is a parametrizationusing the Barycentric coordinates of each pixel inthe mesh. If a pixel xi is surrounded by the threevertices va,vb,vc, and �a, �b, �c are the three cor-responding barycentric coordinates (�a+�b+�c=1, 0 ≤ �a,b,c ≤ 1), the pixel displacement fieldD (x) and the brightness scale field S (x) can be

parameterized in the following way:

D (x) =∑

j∈{a,b,c}

�j�vj

S (x) =∑

j∈{a,b,c}

�j�j (3)

Note, that higher order interpolation, like e.g.B-splines or thin-plate splines, are also possible.

When dealing with color images one approach isto estimate a dense muliplier field

S (x) = (Sr (x) ,Sg (x) ,Sb (x))T (4)

for each color separately, where Sg (x), Sr (x) andSb (x) denote the local intensity changes of the red,green and blue color channels. This is time consum-ing and would increase the parameter vector Θ bytwice the number of mesh vertices. Instead, we as-sume that the color temperature of the light is spa-tially constant in the image and only its intensityvaries locally. This can be expressed through thefollowing equations:

S (x) = Sg (x)⋅(1, srg, sbg)T (5)

where srg and sbg denote global red and blue gainsof the light color representing the color temperature.Hence, the parameter vector Θ is extended by twofurther parameters for color images:

Θ=

⎛⎜⎝...�vxi ..., ...�vyi ...︸︷︷︸Θ�v

, ...�i...︸︷︷︸Θ�

, srg, sbg︸︷︷︸Θs

⎞⎟⎠T

(6)

such that now Θ� describes the intensity changesin the green channel that can vary spatially in theimage and Θs comprises the red and blue intensityscale with respect to the green intensity. These pa-rameters are global parameters for one image, i.ethey are assumed to be spatially constant.

2.2 Deriving the Data Term

For readability reasons we drop the distinction ofthe different color channels in the following de-scription of the derivation of the data term froman extended optical flow equation and come backto dealing with the different color channels at the

end of this section. In order to account for bright-ness changes in an image sequence Negahdaripour[10] relaxed the optical flow constraint from itsoriginal form, which assumes brightness constancybetween two successive image frames [8], allow-ing for multiplicative and additive deviations frombrightness constancy. To account for the unique-ness reasons given above we relax the optical flowconstraint equation allowing for multiplicative de-viations from brightness constancy only to deducethe data term ℰD:

ℐ (x + D(x, t), t+ �t) = S (x, t)⋅ℐ (x, t) (7)

where ℐ (x, t) is the image intensity of each of thethree color channels at pixel x and time t, D(x, t) isthe displacement vector field at pixel x = (x, y)T

and S(x, t) is the intensity scale field. For readabil-ity reasons we skip the time dependency of D(x, t)and S(x, t) in the following equations and denotethem with D(x) and S(x). The data term ℰD thenis derived from the Sum of Squared Differences

ℰD=∑x∈ℛ

(ℐ (x+D(x), t+�t)−S (x)⋅ℐ (x, t)

)2

(8)

whereℛ denotes the region of interest in the image.

We proceed in a hierarchical analysis-by synthe-sis estimation approach where the parameters arenot estimated between two successive frames ℐn−1

and ℐn but between a synthetic version of the pre-vious frame ℐn−1 generated with the previous pa-rameter estimation Θn−1. Between these two im-ages the parameters are estimated in a hierarchicalGauss-Newton method. An initial approximation ofthe parameters between these two images is com-puted from low-pass filtered and sub-sampled ver-sions of ℐn−1 and ℐn and the estimation is refinedon each level. On each pyramid level we approxi-mate the above equation with a Gauss-Newton ap-proximation by applying a first order Taylor expan-sion:

ℰD=∑x∈ℛ

(∇ℐ (x)⋅D(x, t) +

∂ℐ (x)

∂t

+ (1− S (x, t))⋅ℐ (x))2

.

(9)

Here, ∇ℐ (x) = (ℐx(x), ℐy(x)) denotes the spa-tial derivatives of the image ℐ and ∂ℐ(x)

∂tdenotes

the temporal gradient.

To regularize the optical flow field D(x) andthe illumination scale field S(x) we incorporate themesh-based motion and illumination model repre-sented by the parameter vector Θ. This leads to acost function of the following form:

ℰD(Θ) = ∥JD ⋅Θ− r∥2 (10)

where JD is the sparse n× 3K Jacobian matrix ofthe Gauss-Newton approximation of the data term[7]. The K × 1 parameter vector Θ is given byequation (2) and r is an n× 1 vector given by:

r = ( ∂ℐ(x1)∂t

+ℐ(x1),...,∂ℐ(xn)∂t

+ℐ(xn))T (11)

n denotes the number of pixels in regionℛ selectedfor contribution, i.e. pixels with sufficient spatialgradient.

When dealing with color images ℐ(x) =(ℛ(x),G(x),ℬ(x))T equation (7) is changed to

ℐ (x + D(x), t+ �t) = S (x) ∘ ℐ (x, t) (12)

where ∘ denotes the entrywise product or Hadamardproduct of two vectors and ℛ(x),G(x),ℬ(x) de-note the red, green and blue color channels of im-age ℐ(x). S (x) denotes the local intensity changesof the red, green and blue color channels as givenin equation (5). In each iteration step in the hierar-chical Gauss-Newton estimation procedure, we ap-proximate S (x) by a first order Taylor expansion

S(x) = Sg(x)⋅( 1srgsbg

)≈ Sg(x)⋅

(1srgsbg

)+JS ∣Sg,srg,sbg ⋅

(Sg−Sgsrg−srgsbg−sbg

)(13)

where Sg and srg, srg are the current estimates ofthe local intensity scale field of the green channeland the red and blue gains at which S is linearizedin each iteration step.

2.3 Choosing the Smoothness Term

In order to account for the smooth nature of thedeformation and illumination field we penalize thediscrete second derivative of the motion and illumi-nation parameters in the mesh in a smoothness termℰS .

ℰS(Θ) =

K∑k=1

∥∥∥∥∥∥�k − wk∑n∈Nk

�n

∥∥∥∥∥∥2

(14)

Figure 2: Original frame and estimated model, pho-tometric parameters are represented by the colors.

with wk = 1∣Nk∣

, where Nk denotes the neighbor-hood of vertex vk, i.e. all vertices vk is connectedto,K is the number of vertices in the mesh and� = (�vkx, �vky, �k). In order to give closer neigh-bors a higher influence on vertex vk than neigh-bors with a larger distance, we can also weight eachneighbor according to its distance to vertex vk. Inthis case the weight wk is wk =

1/dk,l∑n∈Nk

1/dk,n,

where dk,l denotes the distance between the two vkand vl.In matrix notation, ℰs(Θ) can be expressed as:

ℰs(Θ) =∥∥∥L⋅Θ∥∥∥2

(15)

where L = (�dL, �dL, ��L) is composed by con-catenating three times the scaled Laplacian MatrixL of the mesh [2]. The scaled Laplacian Matrix Lis a k × k matrix with one row and one column foreach vertex and L(i, j) = wi if vertex i and j areconnected and L(i, i) = 1. All other entries are setto zero. �d and �� account for different weights ofthe smoothness term of the spatial deformation andthe change in the brightness scale.

3 Retexturing with Correct Deforma-tion and Shading

In our model the intensity of each colorchannel at pixel xi in the image ℐ(xi) =(ℛ(xi),G(xi),ℬ(xi))

T is described as a productof a factor S(xi) = (Sr(xi),Sg(xi),Sb(xi))Trepresenting photometric conditions of the scene,comprising e.g. physical lighting conditions,medium properties, spectral response characteris-tics etc., and the actual color Cpi = (Cr, Cg, Cb)T

Figure 3: Retexturing with correct shading and deformation

of the underlying surface point pi:

ℐ(xi) = S(xi) ∘ Cpi (16)

where Sr(xi) = Sg(xi)⋅srg and Sb(xi) = Sg(xi)⋅sbg . The factor S(xi) describes the change of thecolor intensity at surface point pi due to changes inthe scene light. Here, we assume that the change ofthe light color, i.e. the fraction of the scale in the redand the blue channel to the scale in the green chan-nel, is spatially constant in the image and only thelight intensity varies spatially due to shading. Ofcourse, this is a simplification and the above equa-tion is only true if the first frame of the sequence weanalyze is white balanced as we estimate the light-ing changes always compared to a reference frame.However, if the first frame is not white balanced butthe color value w = (wr, wg, wb)

T of a white sur-face point in the image is known, the above equa-tions can be rewritten as:

ℐ(xi) = S(xi) ∘ Cpi ∘ w (17)

Retexturing now means to change the underlyingsurface color Cpi to a new value, namely to that ofthe new synthetic texture, and keep the irradianceof the scene the same as well as the white factorw. The change of a pixel intensity in the originalimage sequence can arise from motion and lightingchanges. With our method described in the previoussection we can separate the lighting changes from

motion and the underlying surface color.

Under the assumption that the reflectance ofthe new texture is the same as the original tex-ture and the white value w is known we can nowuse the estimated deformation and photometric pa-rameters to retexture the deforming surface in thevideo sequence with correct deformation and light-ing. The vertex displacements �vk are used tospatially warp a new synthetic texture image T (x)such that the original image ℐ(x) and the warpedsynthetic texture Twarp(x) are now geometricallyregistered. Following, a shading map S(x) =(Sg(x),Sr(x),Sb(x))T is established as follows.For each pixel xi belonging to a triangle withbrightness scale parameters �a, �b, �c at the sur-rounding vertices the shading map is

S(xi) =∑

j∈a,b,c

�j�j ⋅(1, srg, sbg)T (18)

Again, higher order interpolation is possible. Infact, the parametrization given in equation (18) de-scribes a two-dimensional Gouraud shading. It isfurthermore possible to estimate the photometricparameters with the linear interpolation given inequation (3) in the hierarchical Gauss-Newton ap-proach to make the estimation step fast but use ahigher order interpolation for the generation of theshading map to produce a smoother retexturing re-sult.

Figure 4: Test pattern (left), real and estimated colors (red crosses) (center) and real and synthesized colors(right) of ten example squares (marked with a black square).

The color channels of the geometrically regis-tered synthetic texture image Twarp(xi) are thenmultiplied with the shading map S (xi) to generatea new synthetic texture image Tsyntℎ(xi) with cor-rect shading and illumination properties (see alsoFigure 3):

Tsyntℎ(xi) = S (xi) ∘ Twarp(xi) ∘ w. (19)

The synthetic texture is then integrated into theoriginal image via alpha-blending at the meshborders.

As stated above, equations (16) and (17) arenot strictly true, especially in case of saturationor specularities which make the estimation of thephotometric parameter unreliable. We treat thesecases by simply thresholding the resulting valuesin Tsyntℎ(xi) which is mathematically not strictlycorrect while perceptionally convincing. Gammacorrection does not need not to be handled sepa-rately as we chose to use a multiplicative photomet-ric model in which a pixel intensity is composed asin equation (16) or (17) which simply means, thatthe Gamma correction factor is also applied to themultiplicative photometric parameter � at each ver-tex1.

4 Experimental Results

We applied our method to several video sequencesof deforming cloth and generated video sequences

1Let I1 and I2 be two geometrically registered image whereGamma correction has been applied. Let I1 = I

1/ 1 and

I2 = I1/ 2 be the same images without Gamma correction. Ap-

plying our approach yields I2 (xi) = � ⋅ I1 (xi) with � =∑i∈a,b,c �i�i. Hence, I2 (xi) = � ⋅I1 (xi) with � = � .

of the synthetic texture as explained in the previoussection. Figure 5 shows example frames from twovideo sequences of deforming cloth with differentcharacteristics. The upper row shows thick clothwith very smooth deformations while the lowerrow shows cloth that produces small crinklesand creases. The figure shows both the trackingmesh on the original frame and the syntheticallygenerated texture image. Figure 6 shows differentsteps during the retexturing approach. Figure 6shows several frames of tracking and retexturinga fold in a shirt, as well as the associated shadingmaps. The convincing illusion of the retexturingmethod is best evaluated via visual inspection whenviewing the video sequence, such that we referthe reader to the additional video material2. In thevideos it becomes obvious how crucial illuminationrecovery is for convincing texture augmentationof deforming surfaces and that the addition of reallighting increases the 3- dimensional perception ofthe virtual texture.

In order to visually evaluate the accuracy of theestimated color changes in an image sequence weanalyzed the color variance due to lighting in a se-quence of a checkerboard test pattern that comprisessix fully saturated colors (red, green, blue, magenta,cyan, yellow) and black as well as six graduationsin saturation and intensity of each color. We plot-ted the actual variations of each color in the RGBspace and overlayed the estimated colors obtainedfrom the parameters Θ� and Θs (see Figure 4 cen-ter). However, the important question is not, ifthe color changes were estimated correctly but ifa new texture with different colors can be synthe-sized realistically from these estimations. There-

2http://iphome.hhi.de/hilsmann/assets/videos/VMV.mov

Figure 5: Tracking and retexturing cloth with different characteristics. Note that the mesh is purely 2-dimensional and the 3-dimensional illusion comes from shading.

fore, in the next step we synthesized the colors ofthe squares with the photometric parameters Θ�

and Θs estimated at their neighbor squares of dif-ferent color, saturation and intensity, assuming thatthey should have experienced similar lighting con-ditions. Again, we compared the actual colors ofthe image sequence and the synthesized colors in aRGB plot. This provides us with a visual evalua-tion of the color variations of the synthesized colorsas we treat the neighboring original color as groundtruth. The right image of Figure 4 shows an ex-ample where we plotted these ground truth colorsin an RGB plot and overlayed them with the syn-thetically generated color values (red crosses). Thetrajectories of the synthetic colors match with thetrajectories of the real colors although they weregenerated with photometric parameters estimated atcolors that were different in hue, saturation and in-tensity than the selected synthetic color.

5 Conclusion

We presented a method for augmentation of de-formable surfaces, like e.g. cloth, from single-viewvideo for augmented reality applications. We ex-ploit the fact that for retexturing the retrieval of de-formation and motion in the image plane is suf-ficient as the augmented surface is rendered fromthe same point of view as the original one. Fur-thermore, the shadows and lighting conditions thatshould be cast onto the virtual texture are alreadypresent in the original image. To estimate deforma-tion and photometric parameters simultaneously weuse an extended optical flow version together withmesh-based models and a specific color model thatnot only accounts for changes in the light intensitybut also in the light color. From the estimated pho-tometric parameters a shading map is establishedwhich is applied to the spatially warped synthetictexture. We showed several results for cloth retex-

Figure 6: Tracking and retexturing a folding on a shirt. Left to right: Tracking result, shading mask,retexturing result. Note that the mesh is purely 2-dimensional and the 3-dimensional illusion comes fromshading.

turing in single-view video. Our method is so farlimited to smooth deformations and shading. Fu-ture work will therefore concentrate on modelingdiscontinuities in both the warp, e.g. in case of fold-ing, and in the shading map, e.g. in case of sharpshadows.

References

[1] D. Bradley and G. Roth, G. and P. Bose. Aug-mented Clothing. In Proc. Graphics Interface,May 2005, Victoria, Canada.

[2] D.M. Cvetkovic, M. and Doob and H. Sachs.Spectra of Graphs: Theory and Applications.Wiley,1998.

[3] V. Gay-Bellile, A. Bartoli and P. Sayd. Di-rect Estimation of Non-Rigid Registrationswith Image-Based Self-Occlusion Reasoning.IEEE Transactions on Pattern Analysis andMachine Intelligence, 2008.

[4] M. A. Gennert and S. Negahdaripour. Relax-ing the Brightness Constancy Assumption inComputing Optical Flow. Cambridge, USA,MIT, 1987

[5] Y. Guo, H.Sun, Q. Peng and Z. Jiang. Mesh-Guided Optimized Retexturing for Image andVideo. IEEE Trans. on Visualization andComputer Graphics, 14(2): 426–439, 2008

[6] A. Hilsmann and P. Eisert. Optical Flow BasedTracking and Retexturing of Garments. InProc. International Conference on Image Pro-cessing (ICIP), October 2008, San Diego,USA

[7] A. Hilsmann and P. Eisert. Joint Estimation ofDeformable Motion and Photometric Param-

eters in Single View Video. ICCV Workshopon Non-Rigid Shape Analysis and DeformableImage Alignment (NORDIA), Sept. 2009, Ky-oto, Japan,

[8] B.K.P. Horn and B. G. Schunck. DeterminingOptical Flow. Cambridge, MA, USA, Mas-sachusetts Institute of Technology, 1980.

[9] C. Nastar, B. Moghaddam and A. Pent-land. Generalized Image Matching: StatisticalLearning of physically based deformations. InProc. of European Conf. on Computer Vision(ECCV), 589–598, April 1996, UK.

[10] S. Negahdaripour and C.H.Yu. A GeneralizedBrightness Change Model for Computing Op-tical Flow. In Proc. Int. Conf. on Computer Vi-sion (ICCV ), 2–11, 1993, Berlin, Germany.

[11] J. Pilet and V. Lepetit and P. Fua. Augment-ing Deformable Objects in Real-Time. In Int.Symposium on Mixed and Augmented Reality,Oct. 2005, Vienna, Austria.

[12] G. Silveira and E. Malis. Real-time Vi-sual Tracking under Arbitrary IlluminationChanges. In Proc. of Int. Conf. on ComputerVision and Pattern Recognition (CVPR) 305–312, 2006, Minneapolis, USA.

[13] V. Scholz and M. Magnor. Texture Replace-ment of Garments in Monocular Video Se-quences. In Rendering Techniques 2006: Eu-rographics Symposium on Rendering, 305–312, June 2006, Nicosia, Cyprus.

[14] R. White and D. A. Forsyth. Retexturing Sin-gle Views Using Texture and Shading. InProc. European Conf. on Computer Vision(ECCV),70–81, May 2006, Graz, Austria.

Realistic Cloth Augmentation in Single View Videoiphome.hhi.de › eisert › papers ›...

Documents

Transcript of Realistic Cloth Augmentation in Single View Videoiphome.hhi.de › eisert › papers ›...