Object Tracking with Iphone 3Gs · 2010. 5. 27. · Iphone 3Gs is the third version of Apples...

46
Object Tracking with Iphone 3Gs Lars Alin May 25, 2010 Master’s Thesis in Computing Science, 30 credits Supervisor at CS-UmU: Ola ˚ Agren Examiner: Per Lindstr¨ om Ume ˚ a University Department of Computing Science SE-901 87 UME ˚ A SWEDEN

Transcript of Object Tracking with Iphone 3Gs · 2010. 5. 27. · Iphone 3Gs is the third version of Apples...

  • Object Tracking withIphone 3Gs

    Lars Alin

    May 25, 2010Master’s Thesis in Computing Science, 30 credits

    Supervisor at CS-UmU: Ola ÅgrenExaminer: Per Lindström

    Ume̊a UniversityDepartment of Computing Science

    SE-901 87 UMEÅSWEDEN

  • Abstract

    In June of 2007 Apple Inc. released the smartphone Iphone. It was a groundbreakingsuccess that set a new standard for what a smartphone should be able to do. Apple hasimproved the Iphone every year since then and the 3Gs is the newest Iphone model. Asthe phones have improved, both when looking at hardware and software, the applicationshave improved as well. The Iphone 3Gs provides the possibility to use the camera as anapplication background and with that the possibility to analyze the surroundings, makingit possible to track objects that the phone is pointed towards.

    This thesis examines how object tracking can be implemented in applications for Iphone3Gs as well as providing a survey of four different areas of use that have been implementedin Xcode: an augmented reality car game, a letter tracking application, a face recognitionapplication and an object recognition application.

  • ii

  • Contents

    1 Introduction 1

    1.1 Task . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

    1.2 Iphone 3Gs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

    1.3 Augmented Reality and Tracking . . . . . . . . . . . . . . . . . . . . . . . . . 3

    1.4 Outline of the Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

    2 Problem Description 5

    2.1 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

    2.2 Purposes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

    2.3 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

    2.4 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

    3 Tracking in Handheld Devices 7

    3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

    3.2 Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

    3.2.1 Markertracking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

    3.2.2 Edge detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

    3.2.3 Mean-shift algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

    3.2.4 Parallel Tracking and Mapping . . . . . . . . . . . . . . . . . . . . . . 11

    3.3 Fields of interest . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

    4 Accomplishment 15

    4.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

    4.2 How the Work was done . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

    4.2.1 The Preparing and Designing phase . . . . . . . . . . . . . . . . . . . 16

    4.2.2 Early Development phase . . . . . . . . . . . . . . . . . . . . . . . . . 16

    4.2.3 Development of Tracking . . . . . . . . . . . . . . . . . . . . . . . . . 16

    4.2.4 Completion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

    4.3 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

    iii

  • iv CONTENTS

    5 Results 195.1 Tracking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195.2 Augmented Reality Car Game . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

    5.2.1 Icon and Menus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205.2.2 Game Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

    5.3 Object Tracking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225.3.1 Icon and Menus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225.3.2 Object recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235.3.3 Letter recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 245.3.4 Face recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

    6 Conclusions 276.1 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 276.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

    7 Acknowledgements 29

    References 31

    A Concept sketches 33

    B Lo-Fi 35

    C Interactive prototypes 37

  • List of Figures

    1.1 Different companies share of the smartphone market worldwide in percent . . 11.2 The front and the camera on the back of a Iphone 3Gs . . . . . . . . . . . . . 2

    3.1 An Iphone using a mean-shift algorithm tracking an orange object . . . . . . 83.2 An early marker used by the ARToolKit . . . . . . . . . . . . . . . . . . . . . 83.3 An illustration over how camera angle and marker angle is mapped . . . . . . 93.4 Edge detection on calculator and pen . . . . . . . . . . . . . . . . . . . . . . . 103.5 Histogram of a normalized colorspace . . . . . . . . . . . . . . . . . . . . . . . 113.6 Parallel tracking and mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

    4.1 Preliminary time chart on the project . . . . . . . . . . . . . . . . . . . . . . 15

    5.1 Cargame icon . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205.2 Cargame splashscreen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205.3 Screenshot from the game played on a desk at North Kingdom . . . . . . . . 215.4 Icon to the application: What?What? . . . . . . . . . . . . . . . . . . . . . . 225.5 Splashscreen and menu of the application . . . . . . . . . . . . . . . . . . . . 225.6 Tracking of the Apple logo on a MacBook . . . . . . . . . . . . . . . . . . . . 235.7 The letter tracking function in progress . . . . . . . . . . . . . . . . . . . . . 245.8 Face tracking in progress . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

    A.1 Concept sketch on the car game, before it was reduced to 2D . . . . . . . . . 33A.2 Concept sketch on the letter reading application . . . . . . . . . . . . . . . . 34

    B.1 Lo-fi sketches on possible ways to steer the car . . . . . . . . . . . . . . . . . 35B.2 Lo-fi sketches on possible ways to steer the car . . . . . . . . . . . . . . . . . 36

    C.1 HiFi prototype to test the usability of a spinning steerwheel with gas andbreak pedals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

    C.2 HiFi prototype to test the usability of a steeringcross . . . . . . . . . . . . . . 38

    v

  • vi LIST OF FIGURES

  • Chapter 1

    Introduction

    With the technical progress of smartphones today, designers and developers of softwaresuited for these smartphones strive to push the edge of what is possible to create. Oneof these fields, for which the technical progress is essential for development, is augmentedreality. Augmented reality (AR) is a term for merging computer generated material intothe physical world in real time, see section 1.3.

    AR applications can be created in two ways, one of which the application does notconsider its surroundings and by that merge digital objects irrespective of what is displayedin the physical world. The other way is to react to what is displayed in the physical worldand then merge it with appropriate digital objects. In order to match the digital objectswith physical objects the physical world has to be analyzed. This is where object trackingcomes into play and this is the kind of augmented reality that this thesis is built around.Tracking makes it possible for the computer, which in this thesis is the Iphone 3Gs, to findand identify objects, and then react to different situations.

    The reasons for the Iphone 3Gs to be the device of choice are many. First of all it containsall the needed hardware to be able to manage these kind of applications in combination withthe hype and continuous growth in the smartphone market, see section 1.2. In just a coupleof years Iphone have approximately taken over 17.8 percent of the smartphone marketworldwide for the third quarter 2009, Figure 1.1.

    Figure 1.1: Different companys share of the smartphone market worldwide in percent [1]

    1

  • 2 Chapter 1. Introduction

    1.1 Task

    The task is to investigate how far it is possible to take AR and object tracking with theIphone 3Gs. The main goal of the task is to produce software which shows the possibilitiesin the form of an augmented reality car game and a tracking application that both trackand make some recognition of what is tracked.

    The work is developed in collaboration with North Kingdom, which is a digital creativeagency from Sweden. Its main locations are in Stockholm and Skellefte̊a. North Kingdomprovides digital storytelling in innovative ways to provide clients with digital media [2]. Asfor the assignment provided by North Kingdom, it is something that at the moment is notincluded in their ordinary area of work but, as they are striving to be in the front edge ofdevelopment, investigations like these are essential to push the limit of what they can offertheir clients even further.

    1.2 Iphone 3Gs

    Iphone 3Gs is the third version of Apples praised smartphone.

    Figure 1.2: The front and the camera on the back of a Iphone 3Gs [3]

  • 1.3. Augmented Reality and Tracking 3

    It contains features such as touch screen, voice control, accelerometers, proximity sen-sor, ambient light sensor, Wi-Fi, digital compass GPS and more. To use it as a tool foraugmented reality and tracking the Iphones key features and limitations are:

    – Camera The smartphone is equipped with a 3 megapixel camera. It has autofocusand the camera has a frame rate of 30 frames per second [3]. A limitation is that noflashlight is provided, which limits the use to already lightened areas.

    – 3.5 inch multi-touch display The screen has a 480-by-320-pixel resolution whichenables the user to easily interact with the phone [3]. Because of the widescreen formatthe camera view has to be scaled by 1.3 times the size, in order to have the cameraview fill the whole screen.

    – Processor The Iphone 3Gs has a 600MHz CPU and 256MB of RAM that contributeto a fast and powerful handheld device [4].

    – Iphone SDK At the time this thesis is created, the newest version of the Iphone SDKis the 3.1.2 [3]. This update enables users to print the screen, making it possible toanalyze the screenshot. A huge limitation with this version of the SDK is the fact thatit is impossible to get access to the raw data stream from the camera, neither troughthe SDK nor any workaround supported by Apple.

    1.3 Augmented Reality and Tracking

    Although this thesis focus on the tracking part of augmented reality, there is a need to goa little bit deeper into what AR actually is. Augmented reality is a term first coined in the1990s and as stated in the introduction the most commonly used description is that AR isdigital objects merged into the physical world [5]. This technology is traditionally used toenhance the physical world providing the user with information and assistance regardingthe field it is used in [5].

    Some fields where AR has been implemented throughout the years:

    – Military. Aircraft pilots use head mounted displays to help navigation [5]. Surveyshave also been done regarding the use of AR in military operations in urban terrain [6].

    – Healtcare. AR for example is used to create live scenarios in simulators wheresurgeons can develop their skills [7]. It could also be used in real surgeries to assistthe doctor [5].

    – Entertainment The entertainment industry has also adopted this technology. Theidea of having digital creatures in the real world can be found in several games, suchas ARhrrrr - An augmented reality shooter [8].

    In order to map digital objects to the physical world some kind of analysis of the worldhas to be done. This is usually performed using some kind of tracking algorithm. An indepth study of how the algorithms are implemented and how they work in handheld devicesis presented in Chapter 3.

  • 4 Chapter 1. Introduction

    1.4 Outline of the Thesis

    – Chapter 2 presents a detailed view of the task. In this chapter the task is statedand the purpose of the task is defined. It also contains a overview of the methodsused when conducting this thesis and a look at what has already been done withinthis subject.

    – Chapter 3 presents an theoretical study on tracking in handheld devices.

    – Chapter 4 presents the preliminary timeframe and what was planned to be done.It also presents a detailed description on how it was actually done and ends with acomparison of planned and actual activities.

    – Chapter 5 presents the final results of this project; a walk through the central partsof the resulting applications, complete with screenshots and pseudo code.

    – Chapter 6 presents the conclusions of the results. This chapter also states the limi-tations of the result and future work that the result could lead to.

    – Chapter 7 presents acknowledgements to those who contributed to this master thesis.

  • Chapter 2

    Problem Description

    In this chapter an in-depth explanation of the task is presented. To clarify things theproblem is divided into sub-problems. This chapter also contains the purpose of the task,how the task is solved and related work.

    2.1 Problem Statement

    The main problem is stated as: how well suited is the Iphone 3Gs as a platform for aug-mented reality and object tracking?

    This statement is not just a rhetorical question but rather a starting point for developapplications for Iphone, testing this statement by pushing the limits of what can be done.

    The sub-tasks are:

    – Augmented reality car game. The main idea of this game is for the user to be ableto play a car game on any physical area with the physical objects posing as obstacles.

    – Tracking application. The focus of this application is to track and recognize ob-ject and patterns. Examples include logotypes, desktop material and human faces.Another functionality is to read hand written letters and display the resulting word.

    2.2 Purposes

    The purpose of this master thesis is to provide an insight of the abilities that the Iphone3Gs has when it comes to handle applications with object tracking and augmented reality.Therefore the applications are not meant to be uploaded to Apples Appstore and introducedto the public but rather to be used to display what is possible to create within this field.

    There is never a purpose to directly transfer this knowledge to North Kingdoms ordinaryactivity but as the mobile application market progresses this kind of work certainly will bea part of that activity in the future.

    2.3 Methods

    This master thesis is initially conducted through a literature review regarding the subjecttracking in handheld devices. This review is the foundation of the thesis and it is an influ-ence to both the design process and the development process of the project, see chapter 3.

    5

  • 6 Chapter 2. Problem Description

    After the literature review the project switch to the second phase of this thesis – develop-ment. After a review of the capabilities of the development environment Xcode a coupleof applications are designed. The design process contain sketches, LoFi prototypes, HiFiprototypes and usability testing. The finished designs is implemented in Xcode.

    2.4 Related Work

    There are numerous companies that have produced and displayed visions in the form ofdemo videos of what they think they can do with augmented reality and tracking. But sinceno actual applications are displayed these cannot be regarded. One example of an AR andobject tracking application is the Sudoku Grab [9]. This application can track a sudokupuzzle and solve it, adding the missing numbers in the empty sudoku slots. This applicationis by the time this thesis is written running for most innovative way of hardware use in anIphone applications award [10]. I use similar ideas to what the Sudoku Grab is presentingin my implementations.

    Another example is produced by Georg Klein and David Murray. They have created anapplication were the surroundings can be analyzed, making it possible to render different3D characters look like they are sitting on the desk in the physical world [11]. Their studyis conducted on an older version of Iphone with the possibility to access the raw camerastream.

    An example of an application tracking its environment is the application Red Laser. It isdeveloped by Occipital [12] and is a good example of how it is possible to scan and analyzethe camera view within the iphone.

  • Chapter 3

    Tracking in Handheld Devices

    This chapter will give an in depth survey regarding some different ways that tracking canbe used in mobile devices, as well as what they are used for. This section will discuss someof the most commonly used tracking algorithms such as marker tracking and edge detectionbut it will also highlight some alternative methods.

    3.1 Introduction

    Tracking in handheld devices is almost synonymous with marker tracking. The reason forthis is because of how easy it is to calculate the angle of the camera and then rotate the ARobject accordingly to that angle [13]. The process is described in the following section andone of the earliest successfully attempts to implement this on an “off-the-shelf hardware”was done by Daniel Wagner and Dieter Schmalstieg in 2003 [14]. They implemented an ARmarker tracker system on a unmodified personal digital assistant (PDA).

    The single biggest contribution within the field of marker tracking was done by HirokazuKato. He developed the ARTool kit, an AR and marker tracking framework which becameopen source in 2004 and since then has had hundreds of thousands of downloads [15]. TheARToolKit has since then evolved into versions more optimized for handheld devices [16].

    Even if marker tracking is a big part of tracking with hand held devices, this subjecthas a lot more to offer. If the main goal of the tracking is to detect shapes it is optimalto use an edge detecting algorithm [17]. This can be done using a number of differentapproaches [18, 19], but the main goal is to highlight pixels that do not match a fixedthreshold value in order to detect object edges.

    Another more unconventional method is tracking with a mean-shift algorithm. This is amethod that relies on features in the picture such as the histogram value of a specific areain order to track an object [20]. This method is often used only with single objects that arepresent in the view at all times. Figure 3.1 shows this method implemented on an Iphonedevice [21].

    7

  • 8 Chapter 3. Tracking in Handheld Devices

    Figure 3.1: An Iphone using a mean-shift algorithm tracking an orange object [21]

    To be able to position a 3D generated object in correct angles without a marker is a farmore complicated process. To do this there is no use in tracking a single object but ratherto track the whole environment and matching a grid to specific points in the environment.It is this grid that then changes its position resulting in the 3D object changing angle [11].A great example of how to do this was created by Georg Klein and David Murray with thelabel: parallel tracking and mapping [11].

    3.2 Algorithms

    For a better understanding of how these kinds of algorithms are working, an explanationof how the algorithms mentioned in the section above are operating follows. This sectionfocuses on the theoretical part, explaining how each algorithm works rather than showexactly how they are implemented.

    3.2.1 Markertracking

    The essential part of this method is the marker. A key feature of the marker is that thepattern itself cannot be identical from two different angels. Another thing is that both thepattern and its size have to be known. Figure 3.3 shows a marker used in the ARToolKitproject [15].

    Figure 3.2: An early marker used by the ARToolKit

    Due to the fact that the size of the marker is known it is possible to map the cameraangle to how much that is seen of the marker. This is done by image processing where the

  • 3.2. Algorithms 9

    black borders on the marker is searched for and when found the pattern inside is analyzed,calculating the angle. Figure 3.4 is an illustration of how this is done. By knowing the angleof the camera with consideration to the x, y and z axis it is possible to rotate a 3D objectcreating the illusion of a digital object in the physical world [13]. Changing the distancebetween the marker and the camera will result in adjustments in size of the 3D object.Increasing the distance will shrink the object as long as the camera is still able to recognizethe marker and decreasing the distance will enlarge the object.

    Figure 3.3: An illustration over how camera angle and marker angle is mapped

    3.2.2 Edge detection

    There are a large number of ways to do an implementation of edge detection but the twomain categories are search-based and zero crossing based. Search-based uses edge strength asmeasurement and searches for local maxima of that value while zero cross based algorithmsuse, as the name imply, zero crossings computed from the images to find edges. Commonto the approaches is that they use deviations in the picture to localize edges. The mostcommon way is a gradient operation that determines the level of variance between selectedpixels [18]. Figures 3.4 shows how edges are detected in a mobile camera photo. In thiscase all pixels in the picture are processed and if there is a deviation in color value of thepixel compared to its neighbours, the pixel gets the color white. If there is no deviation,the color of the pixel is set to black and after all pixels are processed the resulting picturehas a black background with the white edges from the starting picture. When edges arecalculated the resulting image makes it possible to identify objects in the picture. Withobjects identified it is possible to place digital artifacts positions in relations to the physicalobjects. A walkthrough of a edge detection algorithm is given in chapter 5.1. This is a simpleway of implementing an edge dection algorithm and a large reduction of the algorithms usedin computer vision, like the extensive work of John Canny [19] and the work by Harris andStephens [22]. The reduction is vital due to the difference in computer strength between ahandheld device and a stationary computer. As for the calculation there is a difference inperformance but that is a sacrifice that has to be made.

  • 10 Chapter 3. Tracking in Handheld Devices

    Figure 3.4: Edge detection on calculator and pen

  • 3.2. Algorithms 11

    3.2.3 Mean-shift algorithm

    A mean-shift algorithm needs a pre-decided object to track. It all starts with a clusterof pixels being chosen and this area must contain the object. After the initial stage themean-shift works, frame by frame, calculating the area in the frame that has the closestcolor distribution to the pre-selected area [20]. As is seen in Figure 3.1 the pre-selected areais the bottom left side of the orange object, the picture to the right showes how the objectis moved and the blue box surrounding the area is moving to the best matching area. Toillustrate this Figure 3.5 shows a selected histogram of a normalized colorspace [23]. Themethod will try to find the local maxima that matches this histogram and in the case ofFigure 3.1, move the blue square in that direction [24].

    Figure 3.5: Histogram of a normalized colorspace

    3.2.4 Parallel Tracking and Mapping

    The main goal with this method is the same as with the marker tracking method; to estimatethe camera pose in order to adjust the augmented reality content. It is performing that bytracking key points in the user environment and mapping them to a digital representationof the enviroment. This method contains two different processes running in parallel, apoint-based tracking system and a mapping system that bundles points and keyframes to amap representation of the environment [25]. This section will only regard the point-basedtracking system.

  • 12 Chapter 3. Tracking in Handheld Devices

    One way of performing the task of point-based tracking is described in six steps inParallel Tracking and Mapping for Small AR Workspaces [25]:

    – Step 1. A new frame is recived and the camera pose from the prior frame is estimated.

    – Step 2. From the estimation in step 1, the map points are added to the frame.

    – Step 3. 50 of the coarsets-scale points are searched for in the frame.

    – Step 4. When thay are found the estimated camera pose is updated to the newestimation.

    – Step 5. 1000 points are drawn into the frame again and searched for.

    – Step 6. Finally a new camera pose is estimated from all the points that were foundi step 5.

    The picture below shows the mapping in progress.

    Figure 3.6: Parallel tracking and mapping, courtesy of Georg Klein and David Muray [11]

  • 3.3. Fields of interest 13

    3.3 Fields of interest

    At the moment the interest in the field of tracking in hand held devices is rising but thereis not a lot of commercial usage out there. Traditionally, tracking algorithms is all part ofimage processing and that is basically the biggest challange. Hand held devices are alwaysgoing to produce shaky images so the better the algorithm is to withstand obstacles likemotion blur the more efficient the method will be [11].

    Tracking is an essential part of computer vision. This is a field that reaches from specialeffects in movies to industrial robots inspecting manufactering. At this time, the most track-ing is performed to filter out data, leaving the intresting parts and removing unnecessaryparts of the picture to reduce the amount of data [19]. Therefore, various types of edgedetection are the most commonly used tracking algorithms [19].

    Even in computer vision the field of hand held tracking is limited as of today. Asidefrom a couple of barcode readers and augmented reality games, there is not a big marketyet. Leading AR researchers predictict that the market for both AR in computors as wellas AR in handheld devices will rapidly increase in the not so distant future so the marketand contributions to hand held tracking will probably increase as well [26].

  • 14 Chapter 3. Tracking in Handheld Devices

  • Chapter 4

    Accomplishment

    This chapter will compare the preliminary time plan and order of execution to the actualway it was executed.

    4.1 Preliminaries

    Below is the preliminary estimation of how the work would proceed.

    Figure 4.1: Preliminary time chart on the project

    15

  • 16 Chapter 4. Accomplishment

    4.2 How the Work was done

    The work is divided into four phases which are described in this section. The first of thesefour phases, the preparing and designing phase, refers to an appendix which contains someof the LoFi and HiFi prototypes that were tested as well as the design concepts.

    4.2.1 The Preparing and Designing phase

    The preparation weeks in the beginning of the project where spent on gaining insight intoApples development environment Xcode. As a fairly new user of both Xcode and thedevelopment language Objective-C, this process was important to prepare for the worklater on. In addition to the familiarization of the development environment, studies weredone in order to see what already had been done within this field and to discover essentiallimitations.

    Between weeks 40 and 41, the second and third week of this project, design conceptswere created and finalized. Appendix A shows the first conceptual sketches of the two appli-cations. Because of the extensive work required on the tacking mechanisms, the frameworksurroundings were striped and very minimalistic. Booth applications present a splash screenwith information on launch and by tapping the screen the tracking mode starts. Some LoFidesign sketches on how to steer the car is presented in Appendix B and the final decisionis based on the fact that the steering wheel metaphor is obviously suited for a car game.So the steering wheel became the steering device of choice. Regarding the other applicationthe main focus was set on the back end part simplifying the interface as much as possibleby implementing the standardized Iphone button and label classes. This implementationsimplifies the user interaction due to the fact that the user will immediately be familiar withthe environment.

    The interaction with the game mode was tested with interactive prototypes. A screenshotof such a prototype and a substitute for the steering wheel can be found in Appendix C.

    In this prototype booth gas and brake pedals were tested, but neither making the finalversion due to the minimization of on screen objects.

    4.2.2 Early Development phase

    Due to a delay at Telia who were providing the Iphone, only the framework and menus couldbe created. These were created and tested in an Iphone simulator provided by Xcode. Thistook place during weeks 42 and 43.

    In the beginning of November, a couple of weeks late on the schedule due to the waitingtime of receiving an Iphone, the implementation of the live video feed was done. This setthe starting point for tracking implementations.

    4.2.3 Development of Tracking

    Between weeks 46 and 50 different tracking algorithms were implemented and tested, strivingfor an as efficient algorithm as possible. Running in parallel to the tracking adjustmentswas numerous attempts to bypass the standard SDK in order to access the raw data feeddirect from the camera. As all attempt failed the fact had to be faced that the only way ofanalyzing the camera stream was by printing the whole screen. The original idea of havinga 3D generated car had to be withdrawn and instead a concept of a 2D game was createddue to the print screen problem. A 2D solution without depth consideration reduces thecalculation needed and retains the frame rate, and by removing the 3D rendering performed

  • 4.3. Conclusions 17

    with Open GLes, the print screen method could be used. This also added to the problemthat the project was running late and I had no choice but postpone the presentation fromJanuary to February.

    4.2.4 Completion

    Finally an edge detecting algorithm was chosen as the most efficient due to its capabilityto sustain satisfying frame rate despite the limitations. An interpolation technique wasimplemented to leave no trace on the prints. This was created by printing the edge marksin a separable layer on top of the camera stream with just enough alpha for the user tosee but also for the image to be usable. This makes it possible to interpolate the edgemarks so that they wont be present in the next frame captured, see chapter 5.1. As thiswas done by the end of the year 2009 the project started again week 2 and in the followingweeks two applications were built. The car game was now created as a 2D game seen fromabove with the edge detecting algorithm keeping track of the location of physical objectsand adjusting the car to these objects. The edge detecting algorithm was also used in theother application. In this application a backend thread was created to match the edgesdetected with the algorithm to pre-computed edges of letters, logos and human faces.

    In week 5 I held a presentation of the project for a couple of companies also located inSkellefte̊a and showed a few demos of my work. In week 6 this report was written and itwas completed at the start of week 7 2010.

    4.3 Conclusions

    Comparing the preliminary schedule to the actual outcome it is clear that there is a ratherbig difference. Knowing that it would take several weeks for Telia to supply the phone mayhave avoided some of the delay but not enough to finish on time. The main reason for thedelay was the ridiculous amount of time spent on trying to access the raw data of the livevideo feed. When investigating in which order the task had to be done it is rater accurate.The amount of weeks spent on each task is also accurate other than the tracking algorithmtask which could be prolonged due to the fact that the time for graphics could be shortenedwhen the 3D idea got scrapped.

  • 18 Chapter 4. Accomplishment

  • Chapter 5

    Results

    In this chapter the outcome of this project is presented. Every screenshot of the workingapplications has had its edge detection points colored in order to provide the reader of thisthesis with a better understanding of what is tracked in the picture. The car game haswhite tracking points and the tracking points for the object tracking application are red.

    5.1 Tracking

    The heart of the following applications is the edge detection algorithm that I have imple-mented. It is therefore essential to explain how the algorithm works. Two versions of thesame algorithm have been implemented and the following pseudo code will explain the dif-ferent steps. A threshold value is chosen before the algorithm starts, the larger the value isthe less sensitive the edge tracking will be. A low value generates more and thicker edges.

    1. Capture a screenshot of the whole display

    2. Loop through all the pixels of the captured screenshot

    (a) Check color value of the pixel

    (b) Check color value of pixels that border on the current pixel

    (c) If the color value differs from the threshold value then save position in array

    3. Update the array by removing pixels that no longer differs from the threshold value

    4. Start all over again at 1.

    In order to leave colored traces, like in the screenshots below, some more steps have tobe completed. The trick here is to print points on a clear canvas instead of saving themin an array. To make sure that the points does not get captured on the screenshot, whichwould lead to a one colored screen, interpolation has to be obtained. By interpolation thecolor of the drawn pixel gets substituted by a mean value of the surrunding pixel colors.

    19

  • 20 Chapter 5. Results

    5.2 Augmented Reality Car Game

    As mentioned before the car game is implemented in 2D and therefore has a couple oflimitations. The game has to be played directly from above, pointing the camera straightdown. Because it is a 2D game, changing the angle will not rotate the car, and thereforeloose the illusion of a merged digital object in the real world. Changing distance betweentable and camera is an implementation limitation. For this illusion to make sense, the carhas to be scaled down when the distance increases and scaled up if the distance decreases.The decrease part is the problem when it is not possible to access the raw camera stream. Ifthe camera gets too close, the car probably will fill the whole screen and by then no trackingwill be possible and the car will never scale down even if the distance is increasing. Anotherlimitation is that some of the physical objects from when the game is started has to bepresent at all time, if all of the original objects gets substituted it will be another scenarioand the car will be adjusted to that scenario instead.

    5.2.1 Icon and Menus

    To start to game the application icon has to be pressed in the Iphone menu. The picturebelow shows the application icon.

    Figure 5.1: Cargame icon

    The game is started by pressing the play button on the splashscreen that appears atstartup and the game mode will soon appear.

    Figure 5.2: Cargame splashscreen

    5.2.2 Game Mode

    The play sequence in this game is rather simple. The steering wheel to the right controlsthe car and the player is rotating it by touch interaction. The phone can be moved in twodimensions as long as the angle of the camera and the distance between the objects and the

  • 5.2. Augmented Reality Car Game 21

    phone does not differ too much. The key feature in this game is that the digital car appearsas if it is merged into the physical world, leaving the car at the same place as the physicalobjects even when the phone is moved. The example below shows two pictures where thedigital car is keeping its angle and distance to the physical objects, which in this case is astapler, even when the position of the phone has been changed.

    Figure 5.3: Screenshot from the game played on a desk at North Kingdom

  • 22 Chapter 5. Results

    5.3 Object Tracking

    This application has three features based on object tracking and object recognition. Itshould be pointed out that the main focus of this thesis is object tracking so the recognitionpart of this application has been a bit foreseen and especially the face recognition functionis a bit of showcase work. It will not tell the difference between faces, just recognize if ahuman face is present.

    5.3.1 Icon and Menus

    To start this application you have to tap the icon below in the Iphone menu.

    Figure 5.4: Icon to the application: What?What?

    When the application is started, the splash screen to the left in the picture below isvisible. When the screen is tapped the picture to the right appears. It contains a toolbar inthe bottom of the screen where the user can change from the default mode, which is objectand face recognition, to the letter recognition mode by tapping the cross to the right in thetoolbar. The done button exits the application and the space to the left contains a labelthat prints what the application has found.

    Figure 5.5: Splashscreen (left) and menu (right) of the application

  • 5.3. Object Tracking 23

    5.3.2 Object recognition

    In order for this function to work, the application has to have the ratio between differentedges in the objects precalculated. As is shown in Figure 5.6, the ratio between six differentedges are compared and found. Once again it should be pointed out that this is probablynot the most efficient way to do this kind of application, but as is stated in the beginning ofthis chapter, this is only implemented to show the potential that this kind of tracking has.

    Figure 5.6: Tracking of the Apple logo on a MacBook

  • 24 Chapter 5. Results

    5.3.3 Letter recognition

    The letter recognition function can be accessed through the bottom menu by pressing thecross on the right hand side. When pressing this button a red aim will appear on the screen.The user then has to fit the letters within this aim in order for the recognition to do its job.

    Figure 5.7: The letter tracking function in progress

    In this mode the regular recognition is turned off and the application switches its focusonly to what is present within the aim. The recognition is built like a grid with every letterhaving unique tracking points within that grid. The example above shows how the letter Ris recognized.

  • 5.3. Object Tracking 25

    5.3.4 Face recognition

    This face recognition tracks the ratio between eyes and mouth. Both eyes and the mouthgenerates large cluster of edge pixels, making them easy to locate in a picture. If such acluster is found triangulation is obtained in order to check if there is a cluster within a precalculated range. This range is based on the ratio between the eyes and mouth of a humanface. A feature in this mode is that if a face is recognized the facebook page of the detectedperson is accessible through the facebook icon appearing at the bottom of the screen.

    Figure 5.8: Face tracking in progress

  • 26 Chapter 5. Results

  • Chapter 6

    Conclusions

    To summarize this whole thesis it is appropriate to revisit the main goal with the project.The goal was to find out how well suited the Iphone 3Gs is as a platform for augmentedreality and object tracking. From this point the project has been a success, despite setbackslike redesigning the whole concept of a augmented reality game in 3D to a augmented realitygame in 2D. Because that was what the whole project was about, testing the limits. Themain conclusion of this thesis could easily be summed up in one sentence:

    “As long as the SDK does not allow developers to access the raw camera stream,the Iphone 3Gs is not suited for augmented reality that depends on analyses ofthe physical world”

    To add pictures on top of the camera stream works just fine but as long as the print screenmethod discussed in the last chapter is the only way of analyzing what is present on thescreen, the picture will also be a part of the equation and complicate everything. Thiscan be displayed in Figure 5.3 where looking closely both the steering wheel and the carhas white edge around them, proving that the steering wheel is also considered a physicalobjects by the algorithm.

    When talking about tracking of the physical world without the interference of digitalobjects merged with it, it is another story. The phone possesses enough features and CPUpower to complete very demanding operations. Recognizable software as Iphone applicationsare just in the beginning of what I believe is an upcoming trend. The area of use is almostendless and as long as the Iphone continuously thrive on the market the development willcontinue.

    6.1 Limitations

    As for the limitations in my work some of it has already been mentioned. Due to whatcould be called an overview of the possibilities with Iphone 3Gs the focus has not beenon developing bug free and solid applications. All of the applications should be seen asdemonstrations of what can be done and to optimize these would probably be a masterthesis of each and every one on their own. To sum up some of the limitations the cargame isused as starting point: It only works in 2 dimensions, some of the objects that are presentat the start has to be present at all time and the car has no collision detection on objects.The letter tracker only tracks a bold handwritten font. The object tracker only tracks threedifferent shapes at the moment and the face tracker only tracks that a face is present, not

    27

  • 28 Chapter 6. Conclusions

    who the face belongs to and therefore the name of the person and the facebook link is justimplemented for one person.

    6.2 Future Work

    As for future work there is quite a bit here that can be done. The work on the letter trackerwill continue, and the first thing will be to make it possible to track whole words with acommon font such as Times New Roman. It will also be made possible to save the text toa document.

    When it comes to object and face recognition a database of objects and faces in differentangles has to be implemented. My guess is that if the database grows large, the mostefficient way to use it is to create a server/client application. In such a case the phone onlyprovide photos to a back-end server that does all the calculations.

  • Chapter 7

    Acknowledgements

    This master thesis has been really interesting and I’m sure that I will have great use of thisexperience later on in my carrier. I would therefore like to take this opportunity to thankthe CEO of North Kingdom David Eriksson for providing me with the subject and lettingme work at their firm. I would also like to thank my supervisor at North Kingdom HansEklund for helping me with my work and also my supervisor at the department of computerscience, Ola Ågren for the help and feedback on this report.

    29

  • 30 Chapter 7. Acknowledgements

  • References

    [1] Canalys. Smart phone market shows modest growth in q3 - but apple and rim hitrecord volumes. http://www.canalys.com/pr/2009/r2009112.html, December 25 2009.

    [2] North Kingdom. Official website. http://www.northkingdom.com/about/, January 202010.

    [3] Apple Inc. Apple iphone. http://www.apple.com/iphone, January 10 2010.

    [4] T-mobile Netherland. Leaked Iphone secret. http://www.mobilewhack.com/t-mobile-netherlands-leaks-iphone-3g-s-hardware, February 10 2010.

    [5] R. T. Azuma. A survey of augmented reality. Presence: Teleoperators and VirtualEnvironments, pages 355–385, 1997.

    [6] M. A. Livingston, L. J. Rosenblum, S. J. Julier, D. Brown, Y. Baillot. J, E. Swan II,J L. Gabbard, and D. Hix. An augmented reality system for military operations inurban terrain. I/ITSEC, page 89, 2002.

    [7] J. Moline. Virtual reality for health care: a survey. Technical report, National Instituteof Standards and Technology, Gaithersburg, MD, 1997.

    [8] Augmented environments lab. Arhrrrr! http://www.augmentedenvironments.org/lab/research/handheld-ar/arhrrrr/, February 14 2010.

    [9] CMG Research. Sudoku grab. http://www.cmgresearch.com/sudokugrab/, Febru-ary 12 2010.

    [10] Best App Ever Awards. Second annual iphone os application achievement awards.http://bestappever.com/awards/2009/, December 17 2009.

    [11] G. Klein and D. Murray. Parallel tracking and mapping on a camera phone. ISMAR’09,2009.

    [12] Occipital. Redlaser. http://redlaser.com/, February 10 2010.

    [13] H. Kato and M. Billinghurst. Marker tracking and hmd calibration for a video-basedaugmented reality conferencing system. IWAR’99, pages 85–94, 1999.

    [14] D. Wagner and D Schmalstieg. First steps towards handheld augmented reality. ISWC2003, pages 127–137, 2003.

    [15] ARToolKit. Official website. http://www.hitl.washington.edu/artoolkit/, January 202010.

    31

  • 32 REFERENCES

    [16] D. Wagner and D Schmalstieg. Artoolkitplus for pose tracking on mobile devices.CVWW’07, pages 139–146, 2007.

    [17] D. Marr and E. Hildreth. Theory of edge detection. PROC. ROY. SOC.(London), vol.B207, pages 187–217, 1980.

    [18] H. S. Neoh and A. Hazanchuk. Adaptive edge detection for real-time video processingusing fpgas. GSPx 2004, 2004.

    [19] J. Canny. A computational approach to edge detection. IEEE Trans. Pattern Analysisand Machine Intelligence, pages 679–698, 1986.

    [20] D. Comaniciu, V. Ramesh, and P. Meer. Real-time tracking of non-rigid objects usingmean shift. Proceedings of 2000 IEEE Conference on Computer Vision and PatternRecognition, pages 142–149, 2000.

    [21] I. Halil. Mean-shift based moving object tracker.http://www.cs.bilkent.edu.tr/∼ismaila/MUSCLE/MSTracker.htm, January 14 2010.

    [22] C. Harris and M. Stephens. A combined corner and edge detector. Fourth Alvey VisionConference, pages 147–151, 1988.

    [23] D. Comaniciu and P. Meer. Mean shift: A robust approach towards feature spaceanalysis. IEEE Trans. Pattern Anal. Machine Intell., pages 603–619, 2002.

    [24] R. Collins, O. Amidi, and T. Kanade. An active camera system for acquiring multi-view video. Proceedings of the International Conference on Image Processing, pages517–520, 2002.

    [25] G. Klein and D. Murray. Parallel tracking and mapping for small ar workspaces.ISMAR’07, pages 225–234, 2007.

    [26] T. Carpenter. Gamesalfresco. http://gamesalfresco.com/, January 02 2010.

  • Appendix A

    Concept sketches

    Figure A.1: Concept sketch on the car game, before it was reduced to 2D

    33

  • 34 Chapter A. Concept sketches

    Figure A.2: Concept sketch on the letter reading application

  • Appendix B

    Lo-Fi

    Figure B.1: Lo-fi sketches on possible ways to steer the car

    35

  • 36 Chapter B. Lo-Fi

    Figure B.2: Lo-fi sketches on possible ways to steer the car

  • Appendix C

    Interactive prototypes

    Figure C.1: HiFi prototype to test the usability of a spinning steerwheel with gas and breakpedals

    37

  • 38 Chapter C. Interactive prototypes

    Figure C.2: HiFi prototype to test the usability of a steeringcross