Shaodi You, Member, IEEE 1 Yasuyuki Matsushita, Member...

7
IEEE Proof 1 Multiview Rectification 2 of Folded Documents 3 Shaodi You, Member, IEEE, 4 Yasuyuki Matsushita, Member, IEEE, 5 Sudipta Sinha, Member, IEEE, 6 Yusuke Bou, Member, IEEE, and 7 Katsushi Ikeuchi, Fellow, IEEE 8 Abstract—Digitally unwrapping images of paper sheets is crucial for accurate 9 document scanning and text recognition. This paper presents a method for 10 automatically rectifying curved or folded paper sheets from a few images captured 11 from multiple viewpoints. Prior methods either need expensive 3D scanners or 12 model deformable surfaces using over-simplified parametric representations. In 13 contrast, our method uses regular images and is based on general developable 14 surface models that can represent a wide variety of paper deformations. Our main 15 contribution is a new robust rectification method based on ridge-aware 3D 16 reconstruction of a paper sheet and unwrapping the reconstructed surface using 17 properties of developable surfaces via 1 conformal mapping. We present results 18 on several examples including book pages, folded letters and shopping receipts. 19 Index Terms—Robust digitally unwarpping, ridge-aware surface reconstruction, 20 mobile phone friendly algorithms Ç 21 1 INTRODUCTION 22 DIGITALLY scanning paper documents for sharing and editing is 23 becoming a common daily task. Such paper sheets are often curved 24 or folded, and proper rectification is important for high-fidelity 25 digitization and text recognition. Flatbed scanners allow physical 26 rectification of such documents but are not suitable for hardcover 27 books. For a wider applicability of document scanning, it is wanted 28 a flexible technique for digitally rectifying folded documents. 29 There are two major challenges in document image rectification. 30 First, for a proper rectification, the 3D shape of curved and folded 31 paper sheets must be estimated. Second, the estimated surface must 32 be flattened without introducing distortions. Prior methods for 3D 33 reconstruction of curved paper sheets either use specialized hard- 34 ware [1], [2], [3] or assume simplified parametric shapes [2], [4], [5], 35 [6], [7], [8], [9], such as generalized cylinders (Fig. 2a). However, 36 these methods are difficult to use due to bulky hardware or make 37 restrictive assumptions about the deformations of the paper sheet. 38 In this paper, we present a convenient method for digitally 39 rectifying heavily curved and folded paper sheets from a few 40 uncalibrated images captured with a hand-held camera from mul- 41 tiple viewpoint. Our method uses structure from motion (SfM) to 42 recover an initial sparse 3D point cloud from the uncalibrated 43 images. To accurately recover the dense 3D shape of paper sheet 44 without losing high-frequency structures such as folds and creases, 45 we develop a ridge-aware surface reconstruction method. 46 Furthermore, to achieve robustness to outliers present in the sparse 47 SfM 3D point cloud caused by repetitive document textures, we 48 pose the surface reconstruction task as a robust Poisson surface 49 reconstruction based on 1 optimization. Next, to unwrap the 50 reconstructed surface, we propose a robust conformal mapping 51 method by incorporating ridge-awareness priors and 1 optimiza- 52 tion technique. See Fig. 1 for an overview. 53 The contributions of our work are threefold. First, we show how 54 ridge-aware regularization can be used for both 3D surface recon- 55 struction and flattening (conformal mapping) to improve accuracy. 56 Our ridge-aware reconstruction method preserves the sharp struc- 57 ture of folds and creases. Ridge-awareness priors act as non-local 58 regularizers that reduce global distortions during the surface flat- 59 tening step. Second, we extend the Poisson surface reconstruc- 60 tion [10] and least-squares conformal mapping (LSCM) [11] 61 algorithms by explicitly dealing with outliers using 1 optimization. 62 Finally, we describe a practical system for rectifying curved and 63 folded documents that can be used with ordinary digital cameras. 64 2 RELATED WORK 65 The topic of digital rectification of curved and folded documents has 66 been actively studied in both the computer vision and document 67 processing communities. It is common to model paper sheets as 68 developable surfaces which have underlying rulers corresponding 69 to lines with zero Gaussian curvature. Many existing methods 70 assume generalized cylindrical surfaces where the paper is curved 71 only in one direction and thus can be parameterized using a 1D 72 smooth function. Such surfaces do not require an explicit parame- 73 terization of the rulers. See Fig. 2a for an example. A variety of exist- 74 ing techniques recover surface geometry using this assumption. 75 Shape from shading methods were first used by Wada et al. [4], 76 Tan et al. [12], [13], Courteille et al. [14] and Zhang et al. [5] whereas 77 shape from boundary methods were explored by Tsoi et al. [6], [15]. 78 Binocular stereo matching with calibrated cameras was used by 79 Yamashita et al. [16], Koo et al. [7] and Tsoi et al. [6]. Shape from text 80 lines is another popular method for reconstructing the document 81 surface geometry [8], [9], [12], [17], [18], [19], [20], [21], [22], [23], 82 [24], [25]. However, these methods assume that the document con- 83 tains well-formatted printed characters. 84 Some recent methods relax the parallel ruler assumption 85 (see Fig. 2b). However, the numerous parameters in these models 86 makes the optimization quite challenging. Liang et al. [26] and 87 Tian et al. [27] use text lines. Although these methods can handle 88 a single input image, the strong assumptions on surface geometry, 89 contents and illumination limit the applicability. Meng et al. 90 designed a special calibrated active structural light device to 91 retrieve the two parallel 1D curvatures [2], the surface can be 92 parameterized by assuming appropriate boundary conditions and 93 constraints on ruler orientations. Perriollat et al.[28] use sparse SfM 94 points but assume they are reasonably dense and well distributed. 95 Their parameterization is sensitive to noise and can be unreliable 96 when the 3d point cloud is sparse or has varying density. 97 For rectification of documents with arbitrary distortion and con- 98 tent (Fig. 2c), other methods require specialized devices and use 99 non-parametric approaches. Brown et al. [3] use a calibrated mirror 100 system to obtain 3D geometry using multi-view stereo. They 101 unwrap the reconstructed surface using constraints on elastic 102 energy, gravity and collision. The model is not ideal for paper docu- 103 ments because developable surfaces are not elastic. Later, they pro- 104 pose using dense 3D range data [29] after which they flatten the 105 surface using least square conformal mapping [11]. Zhang et al. [30] 106 also use dense range scans and use rigid constraints instead of elas- 107 tic constraints with the method proposed in [3]. Pilu [1] assumes 108 that a dense 3D mesh is available and minimizes the global bending S. You is with Data61-CSIRO, Canberra, ACT 2601, Australia, and the Australian National University, Canberra, ACT 0200, Australia. E-mail: [email protected]. Y. Matsushita is with Osaka University, Suita, Osaka Prefecture 565-0871, Japan. E-mail: [email protected]. S. Sinha is with Microsoft Research, Redmond, WA 98052. E-mail: [email protected]. Y. Bou is with Microsoft, Tokyo, 108-0075, Japan. E-mail: [email protected]. K. Ikeuchi is with Microsoft Research Asia, Beijing 100080, China. E-mail: [email protected]. Manuscript received 30 May 2016; revised 25 Jan. 2017; accepted 13 Feb. 2017. Date of publication 0 . 0000; date of current version 0 . 0000. Recommended for acceptance by J. Yu. For information on obtaining reprints of this article, please send e-mail to: reprints@ieee. org, and reference the Digital Object Identifier below. Digital Object Identifier no. 10.1109/TPAMI.2017.2675980 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 39, NO. X, XXXXX 2017 1 0162-8828 ß 2017 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

Transcript of Shaodi You, Member, IEEE 1 Yasuyuki Matsushita, Member...

Page 1: Shaodi You, Member, IEEE 1 Yasuyuki Matsushita, Member ...users.cecs.anu.edu.au/~u1018276/Downloads/ShaodiYou_Origami.pdf · Next, to unwrap the 49 reconstructed surface, we propose

IEEE P

roof

1 Multiview Rectification2 of Folded Documents

3 Shaodi You,Member, IEEE,4 Yasuyuki Matsushita,Member, IEEE,5 Sudipta Sinha,Member, IEEE,6 Yusuke Bou,Member, IEEE, and7 Katsushi Ikeuchi, Fellow, IEEE

8 Abstract—Digitally unwrapping images of paper sheets is crucial for accurate

9 document scanning and text recognition. This paper presents a method for

10 automatically rectifying curved or folded paper sheets from a few images captured

11 from multiple viewpoints. Prior methods either need expensive 3D scanners or

12 model deformable surfaces using over-simplified parametric representations. In

13 contrast, our method uses regular images and is based on general developable

14 surface models that can represent a wide variety of paper deformations. Our main

15 contribution is a new robust rectification method based on ridge-aware 3D

16 reconstruction of a paper sheet and unwrapping the reconstructed surface using

17 properties of developable surfaces via ‘1 conformal mapping. We present results

18 on several examples including book pages, folded letters and shopping receipts.

19 Index Terms—Robust digitally unwarpping, ridge-aware surface reconstruction,

20 mobile phone friendly algorithms

Ç

21 1 INTRODUCTION

22 DIGITALLY scanning paper documents for sharing and editing is23 becoming a common daily task. Such paper sheets are often curved24 or folded, and proper rectification is important for high-fidelity25 digitization and text recognition. Flatbed scanners allow physical26 rectification of such documents but are not suitable for hardcover27 books. For a wider applicability of document scanning, it is wanted28 a flexible technique for digitally rectifying folded documents.29 There are two major challenges in document image rectification.30 First, for a proper rectification, the 3D shape of curved and folded31 paper sheets must be estimated. Second, the estimated surface must32 be flattened without introducing distortions. Prior methods for 3D33 reconstruction of curved paper sheets either use specialized hard-34 ware [1], [2], [3] or assume simplified parametric shapes [2], [4], [5],35 [6], [7], [8], [9], such as generalized cylinders (Fig. 2a). However,36 these methods are difficult to use due to bulky hardware or make37 restrictive assumptions about the deformations of the paper sheet.38 In this paper, we present a convenient method for digitally39 rectifying heavily curved and folded paper sheets from a few40 uncalibrated images captured with a hand-held camera from mul-41 tiple viewpoint. Our method uses structure from motion (SfM) to42 recover an initial sparse 3D point cloud from the uncalibrated43 images. To accurately recover the dense 3D shape of paper sheet44 without losing high-frequency structures such as folds and creases,45 we develop a ridge-aware surface reconstruction method.

46Furthermore, to achieve robustness to outliers present in the sparse47SfM 3D point cloud caused by repetitive document textures, we48pose the surface reconstruction task as a robust Poisson surface49reconstruction based on ‘1 optimization. Next, to unwrap the50reconstructed surface, we propose a robust conformal mapping51method by incorporating ridge-awareness priors and ‘1 optimiza-52tion technique. See Fig. 1 for an overview.53The contributions of our work are threefold. First, we show how54ridge-aware regularization can be used for both 3D surface recon-55struction and flattening (conformal mapping) to improve accuracy.56Our ridge-aware reconstruction method preserves the sharp struc-57ture of folds and creases. Ridge-awareness priors act as non-local58regularizers that reduce global distortions during the surface flat-59tening step. Second, we extend the Poisson surface reconstruc-60tion [10] and least-squares conformal mapping (LSCM) [11]61algorithms by explicitly dealing with outliers using ‘1 optimization.62Finally, we describe a practical system for rectifying curved and63folded documents that can be usedwith ordinary digital cameras.

642 RELATED WORK

65The topic of digital rectification of curved and folded documents has66been actively studied in both the computer vision and document67processing communities. It is common to model paper sheets as68developable surfaces which have underlying rulers corresponding69to lines with zero Gaussian curvature. Many existing methods70assume generalized cylindrical surfaces where the paper is curved71only in one direction and thus can be parameterized using a 1D72smooth function. Such surfaces do not require an explicit parame-73terization of the rulers. See Fig. 2a for an example. A variety of exist-74ing techniques recover surface geometry using this assumption.75Shape from shading methods were first used by Wada et al. [4],76Tan et al. [12], [13], Courteille et al. [14] and Zhang et al. [5] whereas77shape from boundary methods were explored by Tsoi et al. [6], [15].78Binocular stereo matching with calibrated cameras was used by79Yamashita et al. [16], Koo et al. [7] and Tsoi et al. [6]. Shape from text80lines is another popular method for reconstructing the document81surface geometry [8], [9], [12], [17], [18], [19], [20], [21], [22], [23],82[24], [25]. However, these methods assume that the document con-83tains well-formatted printed characters.84Some recent methods relax the parallel ruler assumption85(see Fig. 2b). However, the numerous parameters in these models86makes the optimization quite challenging. Liang et al. [26] and87Tian et al. [27] use text lines. Although these methods can handle88a single input image, the strong assumptions on surface geometry,89contents and illumination limit the applicability. Meng et al.90designed a special calibrated active structural light device to91retrieve the two parallel 1D curvatures [2], the surface can be92parameterized by assuming appropriate boundary conditions and93constraints on ruler orientations. Perriollat et al.[28] use sparse SfM94points but assume they are reasonably dense and well distributed.95Their parameterization is sensitive to noise and can be unreliable96when the 3d point cloud is sparse or has varying density.97For rectification of documents with arbitrary distortion and con-98tent (Fig. 2c), other methods require specialized devices and use99non-parametric approaches. Brown et al. [3] use a calibrated mirror100system to obtain 3D geometry using multi-view stereo. They101unwrap the reconstructed surface using constraints on elastic102energy, gravity and collision. The model is not ideal for paper docu-103ments because developable surfaces are not elastic. Later, they pro-104pose using dense 3D range data [29] after which they flatten the105surface using least square conformal mapping [11]. Zhang et al. [30]106also use dense range scans and use rigid constraints instead of elas-107tic constraints with the method proposed in [3]. Pilu [1] assumes108that a dense 3Dmesh is available and minimizes the global bending

� S. You is with Data61-CSIRO, Canberra, ACT 2601, Australia, and the AustralianNational University, Canberra, ACT 0200, Australia. E-mail: [email protected].

� Y. Matsushita is with Osaka University, Suita, Osaka Prefecture 565-0871, Japan.E-mail: [email protected].

� S. Sinha is with Microsoft Research, Redmond, WA 98052.E-mail: [email protected].

� Y. Bou is with Microsoft, Tokyo, 108-0075, Japan. E-mail: [email protected].� K. Ikeuchi is with Microsoft Research Asia, Beijing 100080, China.

E-mail: [email protected].

Manuscript received 30 May 2016; revised 25 Jan. 2017; accepted 13 Feb. 2017. Date ofpublication 0 . 0000; date of current version 0 . 0000.Recommended for acceptance by J. Yu.For information on obtaining reprints of this article, please send e-mail to: [email protected], and reference the Digital Object Identifier below.Digital Object Identifier no. 10.1109/TPAMI.2017.2675980

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 39, NO. X, XXXXX 2017 1

0162-8828� 2017 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

Page 2: Shaodi You, Member, IEEE 1 Yasuyuki Matsushita, Member ...users.cecs.anu.edu.au/~u1018276/Downloads/ShaodiYou_Origami.pdf · Next, to unwrap the 49 reconstructed surface, we propose

IEEE P

roof109 potential energy to flatten the surface. None of these existing meth-

110 ods are as practical and convenient as ourmethod that only requires111 a hand-held camera and a few images.

112 3 PROPOSED METHOD

113 Our method has two main steps-3D document surface reconstruc-114 tion and unwrapping of the reconstructed surface. For now, we115 assume that a set of sparse 3D points on the surface are available.116 Next, we describe our new algorithms for ridge-aware surface117 reconstruction and robust surface unwrapping.

118 3.1 Ridge-Aware Surface Reconstruction

119 Dense methods are favored for 3D scanning of folded and curved120 documents [31], [32]. This is because existing methods for surface121 reconstruction from sparse 3D points tend to produce excessive122 smoothing and fail to preserve sharp creases and folds, i.e., ridges123 on the surface. Such methods are typically also inadequate for deal-124 ing with noisy 3D points caused by repetitive textures present in125 documents. We address these issues by developing a robust ridge-126 aware surface reconstruction method for sparse 3D points. Specifi-127 cally, we extend the Poisson surface reconstruction method [10]128 by incorporating ridge constraints and by adding robustness to129 outliers.130 Robust Poisson Surface Reconstruction.We denote a set ofN sparse131 3D points obtained from SfM as fxn; yn; zng; n ¼ 1; 2; . . . ; N , where132 only 3D points triangulated from at least three images are retained.133 For our input images, N typically lies between 700 to 2,000. For a134 selected reference image (and viewpoint), we use a depth map135 parameterization zðx; yÞ for the document surface. We aim to esti-136 mate depth at the mesh grid vertices ziðxi; yiÞ, where i is the mesh137 grid index, 1 � i � I. Our method computes the optimal depth val-138 ues z� ¼ z1; . . . ; zI½ �> by solving the following optimization problem

z� ¼ argminz

EdðzÞ þ �EsðzÞ: (1)140140

141 Here, Ed and Es are the data and smoothness terms respectively142 and � is a parameter to balance the two terms. The original Poisson

143surface reconstruction method uses the squared ‘2-norm for both144terms. Instead, we propose using the ‘1-norm for the data term Ed

145to deal with outliers

EdðzÞ ¼Xn

kzn � zik1: (2)

147147

148This encourages the estimated depth zi to be consistent with zn, the149observed depth of the nearest 3D point.150We rewrite Eq. (2) in vector form

EdðzÞ ¼ kz� PVzk1; (3)152152

153where PV is a permutation matrix that selects and aligns observed154entries V by ensuring correspondence between zn and zi. The smo-155othness term Es is defined using the squared Frobenius norm of156the gradient of depth vector z along x and y in camera coordinates

EsðzÞ ¼ kr2zk2F ¼ @2z

@x2;@2z

@y2

� ���������2

F

: (4) 158158

159

160By preparing a sparse derivative matrixD that replaces the Lap-161lace operatorr2 in a linear form,

D ¼

di;j ¼2 if i ¼ j�1 if zj is left=right to zi0 otherwise

8<:

diþI;j ¼2 if i ¼ j�1 zj is above=below zi0 otherwise

8<:

26666664

377777752I � I

; (5)

163163

164we have a special form of the Lasso problem [33]

z� ¼ argminz

kz� PVzk1 þ �kDzk22: (6)166166

167While this problem (Eq. (6)) does not have a closed form solution,168we use a variant of iteratively reweighted least squares (IRLS) [34]169for deriving the solution. By rewriting the data terms in Eq. (6) as a170weighted ‘2 norm using a diagonal matrix W with positive values171on the diagonal, we have

z� ¼ argminz

z� PVzð Þ>W>W z� PVzð Þ þ �z>D>Dz: (7)173173

174In contrast to Eq. (6), the data term now uses ‘2 norm instead of ‘1175norm. We solve this problem (Eq. (7)) using alternation as176described next.177Step 1: Update z178Eq. (7) can be rewritten as z� ¼ argminz k Az� b k22, where A ¼

179WPVffiffiffi�

pD

� �and b ¼ Wz

02I�1

� �and 02I�1 is a zero vector of length 2I.

180This is a squared ‘2 sparse linear system. It has the closed form181solution

z� ¼ ½A>Aþ aI��1A>b; (8)183183

184where I is the identity matrix, a is a regularization parameter (we185use a ¼ 1:0e-8).186Step 2: UpdateW187We initialize W to the identity matrix. During each iteration,188each diagonal element wi of W is updated given the residual189r ¼ WPVz

� �Wb, as follows:

Fig. 2. Developable surfaces with underlying rulers (lines with zero Gaussian cur-vature) and fold lines (ridges) shown as dotted and solids lines respectively. Exam-ples of (a) smooth parallel rulers, (b) smooth rulers not parallel to each other and(c) rulers and ridges in arbitrary directions.

Fig. 1. Our technique recovers a ridge-aware 3D reconstruction of the document surface from a sparse 3D point cloud. The final rectified image is then obtained via robustconformal mapping.

2 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 39, NO. X, XXXXX 2017

Page 3: Shaodi You, Member, IEEE 1 Yasuyuki Matsushita, Member ...users.cecs.anu.edu.au/~u1018276/Downloads/ShaodiYou_Origami.pdf · Next, to unwrap the 49 reconstructed surface, we propose

IEEE P

roof

wi ¼1

jrij þ �; (9)

191191

192 Here, ri is the ith element of r and � is a small positive value (we use193 � ¼ 1:0e-8). These steps are repeated until convergence; namely,194 until the estimate at tth iteration z�ðtÞ becomes similar to the pre-195 vious estimate z�ðt�1Þ, i.e., kz�ðtÞ � z�ðt�1Þk2 < 1:0e-8. Fig. 3b shows196 an example of the reconstructedmesh.197 Ridge-Aware Reconstruction. Developable surfaces are ruled [35],198 i.e., contain straight lines on the surface as shown in Fig. 2. Our199 method exploits this geometric property as described in this sec-200 tion. Unlike existing parameterization-based methods which only201 handle smooth rulers, [2], [26], [27], [28], extracting arbitrary202 creases and ridges is more difficult when only sparse 3D points are203 available. In particular, the sparse SfM points can be quite noisy.204 We propose a sequential approach by first detecting ridges on the205 mesh z� that was obtained using our robust Poisson reconstruction206 method. After selecting the ridge candidates, we instantiate addi-207 tional linear ridge constraints and incorporate them into the linear208 system that was solved earlier. This sequential approach is quite209 general and avoids overfitting. It also avoids spurious ridge candi-210 dates arising due to noise.211 For each point zðx; yÞ on the mesh z�, we compute the HessianK212 as follows:

KðzÞ ¼@2z@x2

@2z@x@y

@2z@x@y

@2z@y2

" #: (10)

214214

215 Based on the following Eigen decomposition of KðzÞ,

KðzÞ ¼ p1;p2

� � k1 00 k2

� �p1;p2

� �>; (11)

217217

218 we obtain principal curvatures k1 and k2 (jk1j � jk2j) and the corre-219 sponding eigenvectors p1 and p2.220 The value of k1 is equal to zero at all points on a developable221 surface. Thus, at any point zi, a straight line along direction p1

222 must lie on the surface. As discussed earlier and shown in Fig. 2,223 the curvature along the ridge is zero while the curvature orthogo-224 nal to the ridge reaches a local extremum. We use this observation225 to select ridge candidates using the value of jk2j. Mesh points226 ziðxi; yiÞ with jk2ðiÞj greater than the threshold kth are selected as

227ridge candidates. (see Fig. 3c for an example). The associated228smoothness constraints in Eq. (4) are adjusted as follows:

~di;j ¼ ’ðhp1; e1iÞdi;j~diþI;j ¼ ’ðhp1; e2iÞdiþ1;j;

(12)

230230

231where h ; i is the inner product and e1 ¼ ½1; 0�>; e2 ¼ ½0; 1�> are232orthonormal bases. ’ð�Þ is a convex monotonic function defined

233as ’ðxÞ ¼ bx2�1

b�1 , which places a greater weight b 1 along the ridge

and smallerweight orthogonal to it.We also consider twomore direc-

tional smoothness constraints similar to those stated in Eq. (12), for

234the two diagonal directions e3 ¼ ½ffiffi2

p

2 ;ffiffi2

p

2 �> and e4 ¼ ½

ffiffi2

p

2 ;�ffiffi2

p

2 �>.

235Finally, we modify EsðzÞ defined in Eq. (4) by adding these236ridge constraints and solve a new sparse linear system (similar to237the earlier one) to obtain the final reconstruction. Fig. 3d shows238that this method can preserve accurate folds and creases.

2393.2 Surface Unwrapping

240Given the 3D surface reconstruction, our next step is to unwrap the241surface. We take a conformal mapping approach to this problem,242amongst which, Least Squares Conformal Mapping [11], [29] is a243suitable choice. However, it is not resilient to the presence of

Fig. 3. Example of ridge-aware 3D surface reconstruction.

Fig. 4. Vertices of a triangle in a local coordinate basis.

Fig. 5. Rectification results from combination of methods. Acronyms RA and Podenote our ridge-aware method and Poisson reconstruction respectively. L1denotes our ‘1 conformal mapping method with non-local constraints; L2 indicatesLSCM [29] and Geo indicates geodesic unwrapping [30].

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 39, NO. X, XXXXX 2017 3

Page 4: Shaodi You, Member, IEEE 1 Yasuyuki Matsushita, Member ...users.cecs.anu.edu.au/~u1018276/Downloads/ShaodiYou_Origami.pdf · Next, to unwrap the 49 reconstructed surface, we propose

IEEE P

roof

244 outliers and susceptible to global distortion which can occur due to245 the absence of long-range constraints. We address both these issues246 and extend LSCM by incorporating an appropriate robustifier as247 well as ridge constraints to reduce global drift.248 Conformal Mapping. For our mesh topology, each 3D point249 ziðxi; yiÞ; i ¼ 1; . . . ; I, on the grid on z forms two triangles, one with250 its upper and left neighbor, the other with its lower and right251 neighbor on the grid. The triangulated 3D mesh is denoted as252 fT ; zg. A conformal map will produce an associated 2D mesh with253 the same connectivity but with 2D vertex positions such that the254 angle of all the triangles are best preserved. We denote the 2D255 mesh as fT ;ug, where u ¼ ðui; viÞ.256 For a particular 3D triangle t with vertices at ðx1; y1; z1Þ,257 ðx2; y2; z2Þ, and ðx3; y3; z3Þ, we seek its associated 2D vertex posi-258 tions (ðu1; v1Þ, ðu2; v2Þ and ðu3; v3Þ) under the conformal map. Using259 a local 2D coordinate basis for triangle t, the conformality con-260 straint is captured by the following linear equations

1

S

DX1 DX2 DX3 �DY1 �DY2 �DY3

DY1 DY2 DY3 DX1 DX2 DX3

� �ut ¼ 0; (13)

262262

263Here, ut ¼ ½u1; u2; u3; v1; v2; v3�>, S is the area of t, DX1 ¼ ðX3 �X2Þ,264DX2 ¼ ðX1 �X3Þ and DX3 ¼ ðX2 �X1Þ (DY is similarly defined).265Note that variables ðX1; Y1Þ, ðX2; Y2Þ, and ðX3; Y3Þ were obtained266from t’s vertex coordinates (see Fig. 4). Putting together the267constraints for all the triangles, we have the following sparse linear268system

Cu ¼ 0: (14)270270

271Using indices i and j to index the I vertices and J triangles respec-272tively, the 2J � 2I matrix C in Eq. (14) has the following non-zero273entries

cj;i ¼ DXSj

; cj;iþI ¼ � DYSj

cjþJ;i ¼ DYSj

; cjþJ;iþI ¼ DXSj

: (15)

275275

276Ridge Constraints. Notice that the original conformal mapping has277only local constraints, which will result in global distortion, Fig. 5f.278To reduce global distortions during unwrapping, we add ridge279and boundary constraints to constrain the solution further.280We take into consideration two facts. First, the ridge lines281remain straight after flattening but should essentially become282invisible on the flattened surface. Second, the conformal mapping283constraint Eq. (13) applies to beyond triangles. In particular, it is

Fig. 6. [OUR RESULTS] Original images are shown in rows 1 and 3 and our rectification results are shown in rows 2 and 4.

4 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 39, NO. X, XXXXX 2017

Page 5: Shaodi You, Member, IEEE 1 Yasuyuki Matsushita, Member ...users.cecs.anu.edu.au/~u1018276/Downloads/ShaodiYou_Origami.pdf · Next, to unwrap the 49 reconstructed surface, we propose

IEEE P

roof

284 true for three collinear points. Therefore, we propose using the col-285 linearity property to derive non-local constraints during flattening.286 and add it to our conformal map estimation problem. Referring to287 Fig. 4, and imagine the collinear case, that is when point ðx2; y2; z2Þ288 is also lying on the X axis; in such case, Y2 ¼ Y1 ¼ Y3 ¼ 0. In addi-289 tion, the area of the triangle T is zero. Hence, the ridge constraints290 can be written in a form similar to Eq. (13)

DX1 DX2 DX3 0 0 00 0 0 DX1 DX2 DX3

� �uR ¼ 0: (16)

292292

293 where uR ¼ ½u1; u2; u3; v1; v2; v3�> are the targeted 2D coordinates294 similarly defined as uut. We select ridge candidates in the sameway as295 we did earlier during reconstruction. However, this step is nowmore296 accurate because the surface iswell reconstructed. For each ridge can-297 didate (vertex), we find two farthest ridge candidates along the ridge298 line in opposite directions and instantiate the above mentioned con-299 straint for the three points. We assume that the boundary of the flat-300 tened 2D document image has straight line segments (they need not301 be straight lines on the 3D surface). These boundary constraints can302 be expressed in a form similar to Eq. (16). We incorporate all ridge303 and boundary constraints into a systemof linear equations

Ru ¼ 0: (17)305305

306

307 Robust Conformal Mapping. We propose using an ‘1 norm instead308 of the standard squared ‘2 norm to make conformal mapping robust309 to outliers. Putting together Eqs. (14) and (17) in the ‘1 sense, we have

u� ¼ argminu

k Cu k1 þg k Ru k1; (18)311311

312 where g balances the ridge and boundary constraints. To avoid the313 trivial solution u ¼ 0, we fix two points of u to ðui; viÞ ¼ ð0; 0Þ and314 ðuj; vjÞ ¼ ð0; 1Þ. Eq. (18) is then rewritten as

u� ¼ argminu

k Cu k1 þg k Ru k1 þu k Efix k22; (19)316316

317 where Efix is the energy function for the two fixed points. We solve318 the objective function using the iterative reweighted least squares319 method [34]. Fig. 5 shows a result from the conventional LSCM320 (‘2 method) and our proposed ‘1 method.

3213.3 Implementation Details

322Sparse 3D Reconstruction. We recover the initial sparse 3D point323cloud using SfM. While any existing SfM method is applicable,324we use the popular incremental SfM technique et al. [36] in our sys-325tem. We typically capture five to ten still images for each document326from different viewpoints. Capturing these images or equivalently327a set of burst photos or a short video clip only takes a few seconds.328After running SfM, we segment the document from the back-329ground surface in the reference image using a simple method330based on color difference and edge detection. In our experiments,331we assumed that the document has a sufficiently different color332from the background and therefore the document boundary is visi-333ble with sufficient contrast. Fig. 1a shows an input example and334the corresponding reconstruction.335Image Warping. After recovering the flattened mesh grid u ¼336fui; vig, we unwrap the input image with the maximum document337area in pixels. To obtain correspondence between the input image338and fui; vig, we project the 3Dmesh points fziðxi; yiÞg into the image339to obtain image coordinates f ~xi; ~yig using the camera pose estimated340using SfM. We then warp the image according to the correspon-341dence between f ~xi; ~yig and fui; vigwith bilinear interpolation.

3424 EXPERIMENTS

343We perform qualitative and quantitative evaluation on a wide vari-344ety of input documents. The first set of experiments show that our345method can handle different paper types, document content and346various types of folds and creases. Next, we report a quantitative347evaluation based on known ground truth using local and global348metrics where we demonstrate the superior performance and349advantages of our method over existing methods [28], [29], [30]. In350all the experiments, we set parameters as follows: � ¼ 1e-5; b ¼ 40;351g ¼ 1e3, u ¼ 1e2 and kth ¼ 0:006. Our method is insensitive to these352parameters. Varying �; g; u by factors of 0.1-1.0 or varying b or kth353by 50 percent from these settings did not change the result354significantly.

3554.1 Test Data

356The first six out of the 12 test sequences (I-VI) contain documents357with no fold lines, one fold line, two to three parallel fold lines, and358two to three crossing fold lines respectively. The other six sequences359(VII-XII) contain documentswith an increasing number of fold lines.360Irregular fold lines were intentionally added to make the rectifica-361tion more challenging. All documents were either placed on a pla-362nar or curved background surface. Sequence VII contains a363shopping receipt on a paper roll whereas II and VIII contain pages

Fig. 7. Distortion metrics for datasets shown in Fig. 6. Abbreviations are consistentwith Fig. 5 and the text.

Fig. 8. Comparison with Periollat et al.’s method [28].

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 39, NO. X, XXXXX 2017 5

Page 6: Shaodi You, Member, IEEE 1 Yasuyuki Matsushita, Member ...users.cecs.anu.edu.au/~u1018276/Downloads/ShaodiYou_Origami.pdf · Next, to unwrap the 49 reconstructed surface, we propose

IEEE P

roof

364 from a book. Sequences III, IV, IX and X contain letters foldedwithin365 envelopes. Sequence V, VI, XI and XII contain examples of docu-366 ments folded inside a purse or notebook. The input images as well367 as the results from our method are shown in Fig. 6. Our method368 does not rely on the content, formatting, layout or color of the docu-369 ment. Thus, it is generally applicable as long as a sufficient number370 of sparse keypoints in the input images are available for SfM.

371 4.2 Quantitative Evaluation Metrics

372 We quantitatively evaluate the global and local distortion between373 the ground truth digital image and our rectified result using local374 and global metrics. The digital version of six out of the 12 test375 documents were available. We treat those images as ground truth376 and resize them by setting their height to 1,000 pixels.377 Global Distortion Metric. We first register the rectified image to378 the ground truth by estimating a global affine transform T esti-379 mated using SIFT keypoint correspondences in these images [37]

T ¼a1 a2 t1a3 a4 t20 0 s

24

35; (20)

381381

382 This is achieved by minimizing the squared error

T� ¼ argminT

k Tp� p k22 : (21)

384384

385 where, p and p denote corresponding 2D keypoint positions using386 homogenous coordinates. We compute the global distortion metric387 G as follows:

G ¼ ða1a4 � a2a3Þ=s2G ¼ maxðG; 1=GÞ: (22)

389389

390 A perfect result has G ¼ 1; and larger values indicate more distor-391 tion (see comparative results in Fig. 7).392 Local Distortion Metric. After warping the rectified image with393 the affine transform T, we have removed the global distortion as394 well as the scaling, rotation, and translation of the rectified image.395 After that, we further evaluate the remaining local distortion. We396 register the resulting image with the ground truth using SIFT-flow397 [38] for dense registration. This flow map is used to compute the398 local distortion metric which cannot be removed by global affine399 warping. The frequency distribution of local displacements are400 shown in lower figure in Fig. 7 and compared with existing meth-401 ods. We found dense registration to be more useful for an unbiased402 evaluation than sparse SIFT keypoint-based registration because403 sparse methods are more likely to ignore many matches if the404 result contains large deformations.

4054.3 Comparison with Existing Methods

406We first compare with three methods [28], [29], [30] on various real407images and then use synthetically generated data to further com-408pare with the methods designed for dense 3D point data [29], [30].409Perriollat et al. [28]. Their method explicitly parameterizes410smooth rulers but cannot handle our document images with creases411and folds. Our method works fine on their dataset and produces a412more accurate result than the one obtained by running their code1

413(see Fig. 8). Although our result hasminor artifacts due to self-occlu-414sion and fore-shortening, the flattening result is quite accurate.415Brown et al. [29], Zhang et al. [30]. We compare to both methods416using our sequences where ground truth is available (Fig. 6). Since417they require 3D range data, we use our reconstructed surface as418their input and compare the surface flattening quality. We also419compare our ridge-aware reconstruction to the standard Poisson420reconstruction method. As shown in Fig. 7, the global and local dis-421tortion metrics introduced earlier are used in the evaluation. Our422method has higher accuracy in terms of both metrics. Results from423various methods have been compared in Fig. 5.424Evaluation on Synthetic Data.We compared ourmethodwith [29],425[30] on synthetically generated dense 3D points because these meth-426ods require dense 3D points. We vary the point cloud size from 2K427to 300 K (common in 3D range data) and inject varying levels of428Gaussian noise. The results from the threemethods are compared in429Fig. 9. These experiments show that with low noise and high point430density, all three methods are comparable in accuracy. However,431when the points are sparser or when the noise level is higher, our432method is more accurate than prior methods [29], [30].

4335 CONCLUSION AND FUTURE WORK

434In this paper, we propose a method for automatically rectifying

435curved or folded paper sheets from a small number of images cap-

436tured from different viewpoints. We use SfM to obtain sparse 3D

437points from images and propose ridge-aware surface reconstruc-

438tion method which utilizes the geometric property of developable

439surface for accurate and dense 3D reconstruction of paper sheets.

440We also robustify the algorithms using ‘1 optimization techniques.

441After recovering surface geometry, we unwrap the surface by

442adopting conformal mapping with both local and non-local con-

443straints in a robust estimation framework. In the future we will

Fig. 9. (a) Comparison of the global distortion metric between our method (top) and Zhang et al. [30] and Brown et al.[29] with varying point density and noise. Here lowervalues indicate higher accuracy. (b) Frequency distribution of local distortion metrics for the associated experiments. Our method is more accurate when input point aresparser or have more noise.

1. Their result shown here was generated by the original code provided bythe authors. These result do not agree with the results in their paper. This is prob-ably due to a difference in initialization.

6 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 39, NO. X, XXXXX 2017

Page 7: Shaodi You, Member, IEEE 1 Yasuyuki Matsushita, Member ...users.cecs.anu.edu.au/~u1018276/Downloads/ShaodiYou_Origami.pdf · Next, to unwrap the 49 reconstructed surface, we propose

IEEE P

roof

444 address the correction of photometric inconsistencies in the docu-

445 ment image caused by shading under scene illumination.

446 REFERENCES

447 [1] M. Pilu, “Undoing paper curl distortion using applicable surfaces,” in Proc.448 IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit., 2001, pp. I-67–I-72.449 [2] G. Meng, Y. Wang, S. Qu, S. Xiang, and C. Pan, “Active flattening of curved450 document images via two structured beams,” in Proc. IEEE Conf. Comput.451 Vis. Pattern Recognit., 2014, pp. 3890–3897.452 [3] M. S. Brown andW. B. Seales, “Image restoration of arbitrarily warped doc-453 uments,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 26, no. 10, pp. 1295–454 1306, Oct. 2004.455 [4] T. Wada, H. Ukida, and T. Matsuyama, “Shape from shading with456 interreflections under a proximal light source: Distortion-free copying457 of an unfolded book,” Int. J. Comput. Vis., vol. 24, no. 2, pp. 125–135,458 1997.459 [5] L. Zhang, A. M. Yip, M. S. Brown, and C. L. Tan, “A unified framework for460 document restoration using inpainting and shape-from-shading,” Pattern461 Recognit., vol. 42, no. 11, pp. 2961–2978, 2009.462 [6] Y.-C. Tsoi and M. S. Brown, “Multi-view document rectification using463 boundary,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2007, pp. 1–8.464 [7] H. I. Koo, J. Kim, and N. I. Cho, “Composition of a dewarped and enhanced465 document image from two view images,” IEEE Trans. Image Process.,466 vol. 18, no. 7, pp. 1551–1562, Jul. 2009.467 [8] N. Stamatopoulos, B. Gatos, I. Pratikakis, and S. J. Perantonis, “Goal-468 oriented rectification of camera-based document images,” IEEE Trans.469 Image Process., vol. 20, no. 4, pp. 910–920, Apr. 2011.470 [9] Z. Zhang, X. Liang, and Y. Ma, “Unwrapping low-rank textures on471 generalized cylindrical surfaces,” in Proc. IEEE Int. Conf. Comput. Vis., 2011,472 pp. 1347–1354.473 [10] M. Kazhdan, M. Bolitho, and H. Hoppe, “Poisson surface reconstruction,”474 in Proc. 4th Eurographics Symp. Geometry Process., 2006, pp. 61–70.475 [11] B. L�evy, S. Petitjean, N. Ray, and J. Maillot, “Least squares conformal maps476 for automatic texture atlas generation,” ACM Trans. Graph., vol. 21,477 pp. 362–371, 2002.478 [12] Z. Zhang, C. Lim, and L. Fan, “Estimation of 3D shape of warped document479 surface for image restoration,” in Proc. 17th Int. Conf. Pattern Recognit., 2004,480 pp. 486–489.481 [13] C. L. Tan, L. Zhang, Z. Zhang, and T. Xia, “Restoring warped document482 images through 3D shape modeling,” IEEE Trans. Pattern Anal. Mach. Intell.,483 vol. 28, no. 2, pp. 195–208, Feb. 2006.484 [14] F. Courteille, A. Crouzil, J.-D. Durou, and P. Gurdjos, “Shape from shading485 for the digitization of curved documents,” Mach. Vis. Appl., vol. 18, no. 5,486 pp. 301–316, 2007.487 [15] Y.-C. Tsoi and M. S. Brown, “Geometric and shading correction for488 images of printed materials: A unified approach using boundary,”489 in Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit., 2004,490 pp. I-240–I-246.491 [16] A. Yamashita, A. Kawarago, T. Kaneko, and K. T. Miura, “Shape recon-492 struction and image restoration for non-flat surfaces of documents with a493 stereo vision system,” in Proc. 17th Int. Conf. Pattern Recognit., 2004,494 pp. 482–485.495 [17] H. Cao, X. Ding, and C. Liu, “A cylindrical surface model to rectify the496 bound document image,” in Proc. 9th IEEE Int. Conf. Comput. Vis., 2003,497 pp. 228–233.498 [18] H. Ezaki, S. Uchida, A. Asano, and H. Sakoe, “Dewarping of document499 image by global optimization,” in Proc. 8th Int. Conf. Document Anal. Recog-500 nit., 2005, pp. 302–306.501 [19] A. Ulges, C. H. Lampert, and T. M. Breuel, “Document image dewarping502 using robust estimation of curled text lines,” in Proc. 8th Int. Conf. Document503 Anal. Recognit., 2005, pp. 1001–1005.504 [20] S. Lu, B. M. Chen, and C. C. Ko, “A partition approach for the restoration of505 camera images of planar and curled document,” Image Vis. Comput., vol. 24,506 no. 8, pp. 837–848, 2006.507 [21] B. Fu, M. Wu, R. Li, W. Li, Z. Xu, and C. Yang, “A model-based book dew-508 arping method using text line detection,” in Proc. 2nd Int. Workshop Camera509 Based Document Anal. Recognit., 2007, pp. 63–70.510 [22] G. Meng, C. Pan, S. Xiang, J. Duan, and N. Zheng, “Metric rectification of511 curved document images,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 34,512 no. 4, pp. 707–722, Apr. 2012.513 [23] C. Liu, Y. Zhang, B. Wang, and X. Ding, “Restoring camera-captured dis-514 torted document images,” Int. J. Document Anal. Recognit., vol. 18, no. 2,515 pp. 111–124, 2014.516 [24] B. S. Kim, H. I. Koo, and N. I. Cho, “Document dewarping via text-line517 based optimization,” Pattern Recognit., vol. 48, pp. 3600–3614, 2015.518 [25] D. Salvi, K. Zheng, Y. Zhou, and S. Wang, “Distance transform based active519 contour approach for document image rectification,” in Proc. IEEE Winter520 Conf. Appl. Comput. Vis., 2015, pp. 757–764.521 [26] J. Liang, D. DeMenthon, and D. Doermann, “Geometric rectification of522 camera-captured document images,” IEEE Trans. Pattern Anal. Mach. Intell.,523 vol. 30, no. 4, pp. 591–605, Apr. 2008.524 [27] Y. Tian and S. G. Narasimhan, “Rectification and 3D reconstruction of525 curved document images,” in Proc. IEEE Comput. Soc. Conf. Comput.526 Vis. Pattern Recognit., 2011, pp. 377–384.

527[28] M. Perriollat and A. Bartoli, “A computational model of bounded develop-528able surfaces with application to image-based three-dimensional529reconstruction,” Comput. Animation Virtual Worlds, vol. 24, no. 5, pp. 459–530476, 2013.531[29] M. S. Brown, M. Sun, R. Yang, L. Yun, and W. B. Seales, “Restoring 2D532content from distorted documents,” IEEE Trans. Pattern Anal. Mach. Intell.,533vol. 29, no. 11, pp. 1904–1916, Nov. 2007.534[30] L. Zhang, Y. Zhang, and C. L. Tan, “An improved physically-based method535for geometric restoration of distorted document images,” IEEE Trans. Pat-536tern Anal. Mach. Intell., vol. 30, no. 4, pp. 728–734, Apr. 2008.537[31] H. Avron, A. Sharf, C. Greif, and D. Cohen-Or, “1-sparse reconstruction of538sharp point set surfaces,” ACM Trans. Graph., vol. 29, 2010, Art. no. 135.539[32] A. C. €Oztireli, G. Guennebaud, and M. Gross, “Feature preserving point set540surfaces based on non-linear kernel regression,” Comput. Graph. Forum,541vol. 28, no. 2, pp. 493–501, 2009.542[33] R. Tibshirani, “Regression shrinkage and selection via the Lasso,” J. Roy.543Statist. Soc. Series B (Methodological), vol. 58, pp. 267–288, 1996.544[34] E. J. Candes, M. B. Wakin, and S. P. Boyd, “Enhancing sparsity by545reweighted ‘1 minimization,” J. Fourier Anal. Appl., vol. 14, no. 5/6,546pp. 877–905, 2008.547[35] E. Portnoy, “Developable surfaces in hyperbolic space,” Pacific J. Math.,548vol. 57, no. 1, pp. 281–288, 1975.549[36] R. I. Hartley and A. Zisserman, Multiple View Geometry in Computer Vision,5502nd ed. Cambridge, U.K.: Cambridge Univ. Press, 2004.551[37] D. G. Lowe, “Distinctive image features from scale-invariant keypoints,”552Int. J. Comput. Vis., vol. 60, no. 2, pp. 91–110, 2004.

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 39, NO. X, XXXXX 2017 7