[IEEE 2014 International Conference on Electronics, Communications and Computers (CONIELECOMP) -...

5
Processing Video Frames in iPad for Augmented Reality Applications esar David Corona Arzola and Luis Gerardo de la Fraga Cinvestav, Computer Science Department Av. IPN 2508, 07360 Mexico, D.F., M´ exico E-mail: [email protected] Abstract—In this work we document the necessary image processing steps to recognize fiducial markers for augmented reality applications. Although we use ARToolKitPlus, we believe that it is not well, neither documented all details about how this library works. We create a new simplified library for the iPad and now it is public available. Besides, we set out some criteria for the development of augmented reality systems on iPad, a restricted device in both computing power and memory storing. iPad is a tablet with a very intuitive use and has the advantage of incorporating a display screen –with the proper size for augmented reality applications–, a camera and the necessary sensors to interact by touching the screen, a gyroscope and an accelerometer. We present three applications created with this library: (1) a booklet of species for children, where the model of the species appears virtually on a marker; (2) Japanese language learning application through cards, where the meaning of the symbols is shown using virtual models on a marker; and (3) a virtual ball on a maze, which additionally uses iPad’s gyroscope and accelerometer. KeywordsAugmented reality, image processing, fiducial marker, marker’s detection, principal component analysis I. I NTRODUCTION In general terms, a fiducial marker is an object placed on a scene captured by an image processing system and it is used as a point of reference or a measure of some kind. In this sense, those markers are called fiducial because we can surely trust in the information given by them. Also, these markers were designed to be easily recognized by computer algorithms [1]. In an augmented reality environment, fiducial markers based on a square contour are commonly used because the algorithms to recognize them are too simple and they do not need any special sensor to identify them. Besides, mobile devices employ this markers for pose detection and tracking of an object due to the fact that GPS does not have a good performance indoors [2]. We had chosen two types of markers to work with them: template markers and id simple markers. Template markers are basically white square figures with a thick black border and a black image pattern in the center. This pattern could be freely designed, however, they have some minor inconveniences: first of all, when template matching occurs, the pattern must be compared with all other patterns stored in memory, and when more patterns are used in the scene, the implementation becomes increasingly slower. Also, the patterns must be de- signed and the system must be trained before we use them. Finally, pattern complexity affects performance. An example of a template marker is shown in Figure 1(a). (a) Template marker (b) Id simple marker Figure 1. Two different types of fiducial markers In the other hand, id simple markers (see Figure 1(b)) could be used in an environment conscious of identifiers in which an ID number is bit-coded inside the marker. The detection process of these markers is faster than template markers detection process because they do not need to be compared with other patterns. Also, these markers must not be stored in memory because any valid marker is implicitly known to the system [3] and also we can choose any marker from a fixed set of markers rather than provide the system with marker images. During our search for fiducial markers recognition pro- cesses, we always found code fragments of ARToolKit [4], [5]. ARToolKit is a free library and is available here [6], but practically there is not useful comments inside its source code about how this library internally works. Also in articles [4], [5] there is not really any useful explanation about how this library works. Wagner and Schmalstieg create ARToolKitPlus by adapting ARToolKit to mobile devices. Their source code is more simplified than ARToolKit, but in some of their comments they recognize that even they do not understand how some ARToolKit’s functions works. ARToolKitPlus library is big, with more than 80 source files, and the library consider too many types of mobile devices, even some that now are not used anymore (like old style HTC’s mobile phones and PDAs, specially those which use fixed-point units). We take this library and simplify it to be used with the iPad, e.g. we only consider one type of pixel format, in this case, BGRA pixel format. This article is organized as follows: In the next section the description about all the image processing steps necessary to recognize a marker will be given. In Sec. IV results of the three applications that use our library are shown. Finally, in 978-1-4799-3469-0/14/$31.00 ©2014 IEEE 155

Transcript of [IEEE 2014 International Conference on Electronics, Communications and Computers (CONIELECOMP) -...

Page 1: [IEEE 2014 International Conference on Electronics, Communications and Computers (CONIELECOMP) - Cholula., Mexico (2014.02.26-2014.02.28)] 2014 International Conference on Electronics,

Processing Video Frames in iPad for AugmentedReality Applications

Cesar David Corona Arzola and Luis Gerardo de la FragaCinvestav, Computer Science Department

Av. IPN 2508, 07360 Mexico, D.F., MexicoE-mail: [email protected]

Abstract—In this work we document the necessary image

processing steps to recognize fiducial markers for augmented

reality applications. Although we use ARToolKitPlus, we believe

that it is not well, neither documented all details about how

this library works. We create a new simplified library for the

iPad and now it is public available. Besides, we set out some

criteria for the development of augmented reality systems on

iPad, a restricted device in both computing power and memory

storing. iPad is a tablet with a very intuitive use and has the

advantage of incorporating a display screen –with the proper size

for augmented reality applications–, a camera and the necessary

sensors to interact by touching the screen, a gyroscope and an

accelerometer. We present three applications created with this

library: (1) a booklet of species for children, where the model of

the species appears virtually on a marker; (2) Japanese language

learning application through cards, where the meaning of the

symbols is shown using virtual models on a marker; and (3) a

virtual ball on a maze, which additionally uses iPad’s gyroscope

and accelerometer.

Keywords—Augmented reality, image processing, fiducial

marker, marker’s detection, principal component analysis

I. INTRODUCTION

In general terms, a fiducial marker is an object placed on ascene captured by an image processing system and it is used asa point of reference or a measure of some kind. In this sense,those markers are called fiducial because we can surely trustin the information given by them. Also, these markers weredesigned to be easily recognized by computer algorithms [1].

In an augmented reality environment, fiducial markersbased on a square contour are commonly used because thealgorithms to recognize them are too simple and they do notneed any special sensor to identify them. Besides, mobiledevices employ this markers for pose detection and trackingof an object due to the fact that GPS does not have a goodperformance indoors [2].

We had chosen two types of markers to work with them:template markers and id simple markers. Template markers arebasically white square figures with a thick black border and ablack image pattern in the center. This pattern could be freelydesigned, however, they have some minor inconveniences:first of all, when template matching occurs, the pattern mustbe compared with all other patterns stored in memory, andwhen more patterns are used in the scene, the implementationbecomes increasingly slower. Also, the patterns must be de-signed and the system must be trained before we use them.Finally, pattern complexity affects performance. An exampleof a template marker is shown in Figure 1(a).

(a) Template marker (b) Id simple marker

Figure 1. Two different types of fiducial markers

In the other hand, id simple markers (see Figure 1(b))could be used in an environment conscious of identifiers inwhich an ID number is bit-coded inside the marker. Thedetection process of these markers is faster than templatemarkers detection process because they do not need to becompared with other patterns. Also, these markers must notbe stored in memory because any valid marker is implicitlyknown to the system [3] and also we can choose any markerfrom a fixed set of markers rather than provide the system withmarker images.

During our search for fiducial markers recognition pro-cesses, we always found code fragments of ARToolKit [4],[5]. ARToolKit is a free library and is available here [6], butpractically there is not useful comments inside its source codeabout how this library internally works. Also in articles [4],[5] there is not really any useful explanation about how thislibrary works.

Wagner and Schmalstieg create ARToolKitPlus by adaptingARToolKit to mobile devices. Their source code is moresimplified than ARToolKit, but in some of their commentsthey recognize that even they do not understand how someARToolKit’s functions works.

ARToolKitPlus library is big, with more than 80 sourcefiles, and the library consider too many types of mobiledevices, even some that now are not used anymore (like oldstyle HTC’s mobile phones and PDAs, specially those whichuse fixed-point units). We take this library and simplify it tobe used with the iPad, e.g. we only consider one type of pixelformat, in this case, BGRA pixel format.

This article is organized as follows: In the next section thedescription about all the image processing steps necessary torecognize a marker will be given. In Sec. IV results of thethree applications that use our library are shown. Finally, in

978-1-4799-3469-0/14/$31.00 ©2014 IEEE 155

Page 2: [IEEE 2014 International Conference on Electronics, Communications and Computers (CONIELECOMP) - Cholula., Mexico (2014.02.26-2014.02.28)] 2014 International Conference on Electronics,

Sec. V conclusions of this work are given.

II. RECOGNIZING MARKERS

A marker in an image is detected by image processingtechniques and once it is detected we could get valuableinformation and store this information in a proper structure likethe one suggested in [7]. The steps in the detection processare: (1) segmentation, (2) extract and labeling all connectedcomponents, (3) find first a marker’s vertex and then the otherthree vertices, (4) use principal component analysis (PCA) tofit in a line the points belonging to each edge of the marker,(5) extract the marker’s central image. All these five steps willbe described in detail.

The segmentation step includes to change the input colorimage to a new grey tone one and to apply a global thresholdlevel to binarize the grey tone image. In its actual implemen-tation the threshold level is calculated adaptively. The changeto grey tone is the easyest one: (r + g + b)/3.

The step (2), extract and labeling all connected compo-nents, can be performed with the form’s extraction algorithmin [8].

The next steps (3) and (4) will be described in the followingsubsections.

A. Find the vertices of a marker

To perform this task, first the contour pixels of the biggestextracted and labeled form must be obtained. Now, the Al-gorithm 1 obtains all contour pixels of the marker: given astarting point on the contour, it moves clockwise around theform’s pixels to find a chain of vertices that represents thecontour of the projected square marker.

Algorithm 1 Pseudocode for detecting a contour.Require: The biggest labeled form. This is the marker.Ensure: The contour of the marker.

1: Make a lineal search from left to right in order to find thefirst pixel on the marker’s contour (V

start

).2: P

current

V

start

3: P

current

0 pixel above P

current

4: while P

current

0 is not black do . Pixel with label 05: P

current

0 next pixel one step round6: end while

7: P

actual

black pixel found.8: Add pixel to the chain of the contour.9: if P

current

V

start

then

10: return

11: else

12: P

current

0 last pixel added to the chain.13: Step forward clockwise over P

current

14: Go back to line 415: end if

Once the contour is obtained, some of those pixels are thecorners (or vertices) of the figure and as we do not know whichpixels belong to the corners, we perform the following [9]:

• Starting at the first point in the contour, identify thepoint that is furthest from it, it will be the first vertex.The second corner is on the opposite corner of the

projected square marker or it is directly connected tothe first vertex that has been found,

• then to find the remaining corners, we need to max-imize the length of two perpendicular lines that arebetween the two opposite previously found cornersthrough the start and end point for each chain segment,the vertex of the chain which maximizes this length(based on a threshold) will be the third corner,

• but if the first and second corners are directly con-nected in the same line, we have to create an interme-diate point at the parallel line of the marker and thenapply the previous step.

• This last two steps are applied to the chain of verticesrecursively to find all the possible corners in thecontour.

These steps describes Algorithm 2.

Algorithm 2 Pseudocode for detecting the corners of a contourRequire: The coordinates of the vertices of a contour and a

start and end point in which we search for a corner.Ensure: The corners of a contour.

1: a Yend � Ystart2: b Xstart �Xend3: c (Xend ⇥ Ystart)� (Yend ⇥Xstart)4: maxDistance 05: for i = start to end do

6: maxDistance a⇥X

i

+ b⇥ Y

i

+ c

7: if d⇥ d > maxDistance then

8: maxDistance d⇥ d

9: v i

10: end if

11: end for

12: if

maxDistance

a

2⇥b

2 > threshold then

13: if get vertex(X,Y, start, v) < 0 then return �114: end if

15: if totalV ertices > 5 then return �116: end if

17: vertices[totalV ertices] = v

18: totalV ertices totalV ertices+ 119: if get vertex(X,Y, v, end) < 0 then return �120: end if

21: end if

22: return 0

As we can see in Algorithm 2, at the same time that wefound corners, we check if the figure is still a square (Line 15)and if not, all the marker detection procedure stops. After thecorner detection process finishes it is performed the followingsteps:

• we get the lines formed between each corner ofthe square and they are fitted by a linear regressionmethod, in this case, using PCA. The fitting processmust be applied for each line of the square and becauseof that, we get the square corners with a higherconfidence level.

• Finally the pattern of the marker is obtained (a 2D rec-tified image) using the border width and consideringthe original pattern size.

978-1-4799-3469-0/14/$31.00 ©2014 IEEE 156

Page 3: [IEEE 2014 International Conference on Electronics, Communications and Computers (CONIELECOMP) - Cholula., Mexico (2014.02.26-2014.02.28)] 2014 International Conference on Electronics,

0

2

4

6

8

10

0 2 4 6 8 100

2

4

6

8

10

0 2 4 6 8 10

Figure 2. Two examples of fitting two set of points to a line using PCA.Arrow start in the mean of data and drawn length is twice the real one

B. PCA for a 2⇥2 matrix

It is possible to use an optimized procedure to calculatePCA, as the one described in [10] that uses the QR decompo-sition method. It is not worthy to mention than in [10] authorsreports results with data from 5,000 to 10,000 dimensions with3,000 data samples. But fitting a set of 2D points to a linerepresents only a bidimensional problem that can be solvedwithout any special numerical function. We create our ownprocedure (in Algorithm 3) to calculate the PCA of a set of npoints, stored in a matrix P of size n⇥ 2. PCA calculated thedirection of the biggest variation of the points, as it is shownin Fig. 2.

Algorithm 3 PCA pseudocode for a set of 2D pointsRequire: Input set of point stored in a matrix P of size n⇥2.Ensure: The vector corresponding to the biggest eigenvalue.

1: meanx = mean(P (1, :)) . The mean of x values2: meany = mean(P (2, :)) . The mean of y values3: P (1, :) = P (1, :)� meanx4: P (2, :) = P (2, :)� meany5: a = 0, b = 0, c = 06: for i = 1 to n do . Calculates M = P

TP

incrementally7: a = a+ P (i, 1) ⇤ P (i, 1) . x

2

8: b = b+ P (i, 1) ⇤ P (i, 2) . xy

9: c = c+ P (i, 2) ⇤ P (i, 2) . y

2

10: end for

11: v =p(a� c)2 + 4b2

12: �

min

= ((c+ a)� v)/2; �max

= ((c+ a) + v)/2 .

Eigenvalues13: A = M � �

min

I

14: � = 8�max

DBL EPSILON15: if norm(A(1, :)) < � then

16: r = [0; 1]17: else if norm(A(2, :)) < � then

18: r = [1; 0]19: else

20: r = A(1, :)/norm(A(1, :))21: end ifreturn vector r . Eigenvector r associated to �

max

In line 6 of Algorithm 3, matrix M is formed asa b

b c

�.

The two eigenvalues are calculated solving det(M � �I) =0, and the eigenvector is calculated by the Cayley-Hamiltontheorem. Algorithm 3 is used on the calculation of the vectorsin Fig. 2.

C. Last step

In template markers method, template matching processbegins and then the identifier is obtained with marker’s orienta-tion and confidence level as well. In order to perform templatematching, the system must be previously trained.

In case of Id markers, once we get the pattern from insidethe marker, we follow these steps:

• Bitmasks are applied in order to search the necessaryinformation about the marker (Id, orientation andconfidence value),

• pattern is rotated 90� four times (because there arefour possible orientations) and each time, we get andstore the id, orientation and value of confidence.

• Finally we compare all confidence values and wechoose the pattern with more reliable results.

III. VIDEO FRAMES PROCESSING

First of all, we have to know that iPad can store image datausing BGRA pixel format and this format is well supported byCore Video (an Objective-C framework that allows us to accessimage data from the camera). So when you set video settingson an AVCaptureSession, you make sure to establishpixel format using kCVPixelFormatType_32BGRA. CoreVideo does not provide support for all formats so that iswhy we only need to consider this format in ARToolKitPlus’sarGetPatt function, it is not necessary to provide supportto other formats like RGBA or RGB565.

We don’t need any support for fixed-point devices, so allfunctions specially built for those devices are not required.Also the exclusive ARFloat type for handling fixed andfloating point operations at the same time was suppressedand the byteSwap class as well. At the same time, thereis no need for a memory manager neither a Logger class fordebugging.

Because of these modifications, the image labeling functionconsiderably reduces since there are no need for any optimiza-tions or considerations in labeling due to the pixel format used.

ARToolKitPlus implements a special series of classes fo-cused on handling Camera and offers you the possibility toinstantiate more of an object of this type for multi-markertracking. In these applications we only need to track oneobject at a time, that is why all classes designed to supportmulti-marker tracking and Camera classes was removed,but it is important to still consider a class to store andmanage calibration matrix (K) and to preserve Camera’schangeFrameSize function, because it allow us to supportmore devices that uses iOS.

We don’t need a config class because iPad performanceallows it to support all default configuration values, even whenthis means to make more processes. Also we decide to usethe previous information functions to avoid jittering whendetecting template markers.

In the other hand, as we use only Id simple markers in someof our applications, there is no support for BCH Id markers.

978-1-4799-3469-0/14/$31.00 ©2014 IEEE 157

Page 4: [IEEE 2014 International Conference on Electronics, Communications and Computers (CONIELECOMP) - Cholula., Mexico (2014.02.26-2014.02.28)] 2014 International Conference on Electronics,

For camera calibration, ARToolKitPlus supports two meth-ods, the original ARToolKit’s calibration process describedby Kato [11] as well as a new one based on the GMLMatLab Camera Calibration Toolbox [12] using a lookup tableto determine lens undistortion. We remove the lookup tableprocesses and there were no problems in object tracking.

Finally, ARToolKitPlus proposes a robust planar posetracking algorithm but since this algorithm is focused toimprove desktop applications’ performance, we don’t need it.

IV. RESULTS

Each application uses a single marker at the screen, butuses several markers internally to identify the correspondencebetween each marker with a single card. The marker is alsothe reference coordinate system to the augmented 3D objects,therefore if one moves the marker (or the iPad) the 3D objectmoves following the marker. The applications made with thesemodifications on ARToolKitPlus are:

Booklet of species: where the user has a little booklet withsome information about animals. Every page has the speciesinformation on the left side and a template marker on the right.When iPad looks above the booklet, it draws a 3D model of theanimal on the preview layer of the camera, like the exampleshown in Figure 3.

Figure 3. Application: Booklet of species.

On screen touching, the user can see a little menu withoptions to rotate or scale the 3D model or simply to read alittle bit of information about the animal. The target users forthis applications are children, but an application such like thisone could be useful to describe or to show some new product.

Kanjirama: an application for Japanese language studentswhere they can learn Japanese kanji. It’s a game based on thework of Wagner and Barakonyi [13], a concentration gamewith a deck of 27 augmented reality cards which have a kanjisymbol in one side and an Id simple marker by the other.Initially the user has all cards faced down on a table and whenthe game starts, an image to guess appears at the right cornerof the screen in order to face up the card which meaning isthe same as the image displayed on the screen.

When the iPad looks at the markers, a 3D model renderson the preview layer of the camera and if the user faces upthe right card, the score increases. Also, there is a timer whichshows the time left to face up a card.

Figure 4. Application: Kanjirama.

Wherever the user faces up a card, kanji information isdisplayed on the screen. An example of Kanjirama is shownon Figure 4. The game ends after ten rounds.

Labyrinth: the last application developed simulates themovement of a ball on a labyrinth using iPad’s accelerometerand gyroscope (Figure 5). It uses an augmented reality cardand when the marker is detected, the iPad renders both thelabyrinth and the ball on the screen.

In this case, the gyroscope gives the direction to movethe ball and the accelerometer is used to calculate the rightorientation of the device to make the right mapping throughthe coordinates of the virtual space and the coordinates of the2D image. We use 10 different labyrinth’s images.

Figure 5. Application: Labyrinth.

V. CONCLUSIONS

When applying our own lite version of the original AR-ToolKitPlus library we are able to develop applications with aneducational approach that fits the requirements of a restricted

978-1-4799-3469-0/14/$31.00 ©2014 IEEE 158

Page 5: [IEEE 2014 International Conference on Electronics, Communications and Computers (CONIELECOMP) - Cholula., Mexico (2014.02.26-2014.02.28)] 2014 International Conference on Electronics,

environment such as iPad’s, but not as limited as the environ-ment of the old mobile devices.

Using the original conditions for camera calibration andusing previous information to obtain the transformation matrix,we obtain the same results while tracking an object, i.e,markers can be detected farther and Id simple markers stillbe preferable rather than template markers.

Moreover, is truly important to preserve the linear regres-sion method in order to detect the marker even if the paper ofan augmented reality card folds a little. It is important to knowthe meaning of these processes, that is why we have coveredall the steps that ARToolKitPlus internally does.

As a future work should be necessary to compare theperformance of several libraries and to measure how manymarkers could be used at the same time on the iPad screen byour library.

Finally, we make public available this lite version ofARToolKitPlus (in http://cs.cinvestav.mx/⇠fraga/Softwarelibre/VisioniPad.zip) that fits preferably for iOS devices whenaugmented reality applications are focused on single markerdetection.

REFERENCES

[1] Mohamed Ashraf Nassar and Fatma Meawad. An Augmented RealityExhibition Guide for the iPhone. In 2010 International Conference onUser Science Engineering (i-USEr).

[2] Michael Gervautz and Dieter Schmalstieg. Anywhere Interfaces UsingHandheld Augmented Reality. IEEE Computer: Innovative Technologyfor Computer Professionals, 45(7):26–31, July 2012. Published by IEEEComputer Society.

[3] Daniel Wagner and Dieter Schmalstieg. ARToolKitPlus for PoseTracking on Mobile Devices. Technical report, Institute for ComputerGraphics and Vision, Graz University of Technology, February 2007.

[4] S. Prince, A.D. Cheok, F. Farbiz, T. Williamson, N. Johnson,M. Billinghurst, and H. Kato. 3d live: real time captured content formixed reality. In Mixed and Augmented Reality, 2002. ISMAR 2002.Proceedings. International Symposium on, pages 7–317, 2002.

[5] S. Prince, A.D. Cheok, F. Farbiz, T. Williamson, N. Johnson,M. Billinghurst, and H. Kato. 3d live: real time interaction for mixedreality. In ACM Conference on Computer Supported Cooperative Work,2002. CSCW 2002. Proceedings of, pages 364–371, 2002.

[6] Artoolkit’s official site. http://www.hitl.washington.edu/artoolkit/.Check it Jan 26th 2013.

[7] Francisco Jurado, Javier A. Albusac, Jos J. Castro, David Vallejo, LuisJimnez, Flix J. Villanueva, David Villa, Carlos Gonzlez, and GuillermoSimmross. Desarrollo de Videojuegos: Desarrollo de Componentes.Bubok.

[8] J. Cornejo Herrera, A. Lara Lopez, R. Landa Becerra, and L.G. de laFraga. Scimagen: An image processing library (in spanish). In VIIIConferencia de Ingenierıa Electrica, 4, 5 y 6 de septiembre, 2002.Cinvestav. Available at: http://cs.cinvestav.mx/⇠fraga/Publicaciones/bibliotecapdi.pdf.gz.

[9] Dismantling ARToolKit, part 3: finding vertices. http://chriskirkham.co.uk/2011/07/14/dismantling-artoolkit-part-3-locating-vertices/.

[10] Alok Sharma, Kuldip K. Paliwal, Seiya Imoto, and Satoru Miyano.Principal component analysis using QR decomposition. InternationalJournal of Machine Learning and Cybernetics, pages 1–5, 2012.

[11] Mark Billinghurst, Hirokazu Kato, Suzanne Weghorst, and Tom Fur-ness. A Mixed Reality 3D Conferencing Application. Technicalreport, Seattle: Human Interface Technology Laboratory at University ofWashington, 1999. http://www.hitl.washington.edu/publications/r-99-1/.

[12] GML MatLab Camera Calibration Toolbox. http://research.graphicon.ru/calibration/gml-matlab-camera-calibration-toolbox.html.

[13] Daniel Wagner and Istvan Barakonyi. Augmented Reality Kanji Learn-ing. In Mixed and Augmented Reality, 2003. Proceedings. The SecondIEEE and ACM International Symposium on, pages 335 – 336, October2003.

978-1-4799-3469-0/14/$31.00 ©2014 IEEE 159