POLITECNICO DI MILANO Sede di COMO · 2013-05-11 · Jhonny, Rodrigo, Crespi e Gadio, che con me...

POLITECNICO DI MILANO

Sede di COMO

Facoltà di Ingegneria dell’Informazione

Master of Science in Computer Engineering

Dipartimento di Elettronica e Informazione

The Earring Store: An android virtual mirror augmented reality

application based on the new android face detection feature

Relatore: Prof. Thimoty Barbieri

Tesi di Laurea di:

Leandro Francisco Bruno

Matricola n. 770321

Anno Accademico 2011-2012

Dedico questa tesi alla mia famiglia, lontana e vicina, che mi ha sempre

supportato.

Ringraziamenti

Questa laurea è stato costellata da momenti di alto e basso e desidero quindi

ringraziare tutte le persone che mi sono state vicine quando le cose non andavano

come avrei gradito e che hanno esultato con me quando funzionavano. Un grande

grazie a Kappa, Fabio, Cardani, Mario, Misha, Catta e Nielo, amici sin dalla

triennale e sempre disponibili per i miei innumerevoli dubbi. Grazie anche a Luca,

che con me è stato iniziato ad Android ed è sempre stato pronto per uno scambio

di opinioni.

Un grazie a Michele, Giammarco, Leonardo, Giuseppe, Matteo e tutta la

combricola di Como, che hanno reso i due anni della specialistica divertenti e mai

banali, specialmente per il calcetto! Grazie anche a PG, il Fo, Artiòm, il Groppo,

Jhonny, Rodrigo, Crespi e Gadio, che con me hanno alternato momenti di pura

demenza a duro lavoro, nonchè le indimenticabili partite a Worms.

Grazie anche a Vincenzo e Frank, miei fedeli seguaici. Un grazie al Tosco, Gardo e

a quelli di Ferrara che, per quanto lontani, sono sempre con me.

Grazie al Prof. Barbieri, per avermi ascolato e guidato nell’ultima fatica della

laurea.

A tutti voi, Grazie di cuore.

iii

Contents

Abstract x

Estratto xii

1 Introduction 1

1.1 Overture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.2 Document structure . . . . . . . . . . . . . . . . . . . . . . . . 2

2 Background information 4

2.1 About smartphones . . . . . . . . . . . . . . . . . . . . . . . . 4

2.2 Android overview . . . . . . . . . . . . . . . . . . . . . . . . 6

2.2.1 Software architecture . . . . . . . . . . . . . . . . . . . 7

2.2.2 Applications structure . . . . . . . . . . . . . . . . . . 9

2.2.3 SDK and NDK . . . . . . . . . . . . . . . . . . . . . . 11

2.3 Augmented reality . . . . . . . . . . . . . . . . . . . . . . . . 12

2.3.1 Taxonomies . . . . . . . . . . . . . . . . . . . . . . . . 12

2.3.2 Applications and human vision . . . . . . . . . . . . . 17

2.3.3 Hardware setups for augmented reality . . . . . . . . 18

2.3.4 Application areas . . . . . . . . . . . . . . . . . . . . . 21

3 "The Earring Store": Objectives and Related Works 30

3.1 Introducing "The Earring Store" project . . . . . . . . . . . . 30

3.2 Related works . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

iv

CONTENTS v

3.2.1 Virtual Mirror . . . . . . . . . . . . . . . . . . . . . . . 31

3.2.2 Other E-commerce virtual mirror projects . . . . . . . 33

3.2.3 Face detection and head pose estimation . . . . . . . 35

3.3 Motivations and Goals . . . . . . . . . . . . . . . . . . . . . . 36

4 Designing the prototype of the application 40

4.1 Establishing requirements . . . . . . . . . . . . . . . . . . . . 40

4.2 Users analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

4.3 Architectural design . . . . . . . . . . . . . . . . . . . . . . . 43

4.3.1 Web service adopted technologies . . . . . . . . . . . 44

4.3.2 Data model and methods . . . . . . . . . . . . . . . . 47

4.4 3D modeling tools . . . . . . . . . . . . . . . . . . . . . . . . . 49

4.4.1 Importing the 3D models . . . . . . . . . . . . . . . . 51

4.5 software design . . . . . . . . . . . . . . . . . . . . . . . . . . 54

4.5.1 Class analysis and software choices . . . . . . . . . . 56

4.5.2 Head pose estimation issues . . . . . . . . . . . . . . 58

5 Prototype overview 63

5.1 Generated data analysis . . . . . . . . . . . . . . . . . . . . . 63

5.2 Activity flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

5.2.1 Main Gallery . . . . . . . . . . . . . . . . . . . . . . . 65

5.2.2 Earring in detail . . . . . . . . . . . . . . . . . . . . . . 67

5.2.3 Try on photo . . . . . . . . . . . . . . . . . . . . . . . . 68

5.2.4 Try on virtual mirror . . . . . . . . . . . . . . . . . . . 69

6 Users evaluation 71

6.1 Application operational time list . . . . . . . . . . . . . . . . 71

6.1.1 Time performance evaluation . . . . . . . . . . . . . . 73

6.2 Virtual mirror activity analysis . . . . . . . . . . . . . . . . . 74

6.3 Application effectiveness . . . . . . . . . . . . . . . . . . . . . 76

6.4 Application evaluation . . . . . . . . . . . . . . . . . . . . . . 81

CONTENTS vi

7 Conclusions and future works 83

7.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

7.2 Future works . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84

List of Figures

2.1 Android architecture structure . . . . . . . . . . . . . . . . . 8

2.2 Android Manifest example . . . . . . . . . . . . . . . . . . . . 11

2.3 Continuum space and mixed reality . . . . . . . . . . . . . . 13

2.4 Mediated Reality taxonomy . . . . . . . . . . . . . . . . . . . 14

2.5 Functional augmented reality taxonomy graph . . . . . . . . 16

2.6 Head Mounted Display vision schema . . . . . . . . . . . . . 20

2.7 Haptice sensor system . . . . . . . . . . . . . . . . . . . . . . 21

2.8 Virtual Heliodon lighting simulation . . . . . . . . . . . . . . 22

2.9 Example of Surface rendering of preoperative data . . . . . . 24

2.10 Example of augmented reality game . . . . . . . . . . . . . . 26

2.11 TranslateAR in operation . . . . . . . . . . . . . . . . . . . . . 27

2.12 Military mechanics conducting routine maintenance prototype 29

4.1 Architectural design Deployment Diagram . . . . . . . . . . 43

4.2 .Net architecture . . . . . . . . . . . . . . . . . . . . . . . . . . 45

4.3 J2EE architecture . . . . . . . . . . . . . . . . . . . . . . . . . 46

4.4 Database Modello table . . . . . . . . . . . . . . . . . . . . . . 47

4.5 "The Earring Store"UML 2.0 class diagram . . . . . . . . . . . 55

4.6 Finding the face and the triangle made of the middle point

of the mouth and the eyes . . . . . . . . . . . . . . . . . . . . 59

4.7 Modeling the face: ABC Triangle builded on the face . . . . . 60

vii

LIST OF FIGURES viii

5.1 Images used for the catalog’s earrings . . . . . . . . . . . . . 63

5.2 Generated earring 3D models . . . . . . . . . . . . . . . . . . 64

5.3 Prototype application Activity Flow . . . . . . . . . . . . . . 64

5.4 Maingallery activity start . . . . . . . . . . . . . . . . . . . . . 65

5.5 Maingallery activity operations . . . . . . . . . . . . . . . . . 66

5.6 Earring in detail activity overview . . . . . . . . . . . . . . . 67

5.7 Try on photo activity operations . . . . . . . . . . . . . . . . . 68

5.8 Try virtual mirror activity overview . . . . . . . . . . . . . . 69

5.9 Android face detection performance on different illumina-

tion conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

6.1 Answers to question B . . . . . . . . . . . . . . . . . . . . . . 73

6.2 Answers to, left to right, questions D1, D2 and D3 . . . . . . 75

6.3 Answers to question E . . . . . . . . . . . . . . . . . . . . . . 75

6.4 Answers to question E1 . . . . . . . . . . . . . . . . . . . . . 75

6.5 Answers to question F . . . . . . . . . . . . . . . . . . . . . . 76

6.6 Answers to question F1 . . . . . . . . . . . . . . . . . . . . . . 76



6.9 Answers to question G . . . . . . . . . . . . . . . . . . . . . . 77

6.10 Answers to question G1 . . . . . . . . . . . . . . . . . . . . . 78

6.11 Answers to question G2 . . . . . . . . . . . . . . . . . . . . . 78

6.12 Answers to question H . . . . . . . . . . . . . . . . . . . . . . 79

6.13 Answers to question H1 . . . . . . . . . . . . . . . . . . . . . 79

6.14 Answers to question H2 . . . . . . . . . . . . . . . . . . . . . 80

6.15 Answers to question I . . . . . . . . . . . . . . . . . . . . . . . 80

List of Tables

6.1 Answers to question A1, A2, A3 and A4 . . . . . . . . . . . . 73

6.2 Answers to questions C1, C2, C3, C4 and C5 . . . . . . . . . 74

ix

Abstract

Mobile devices have widely spread over years, and particularly of An-

droid OS based smartphones, introducing, consequently, a huge number

of applications. In this paper I attempt to introduce android virtual mirror

functionalities in an E-commerce prototype application in order to evalu-

ate users reactions and behaviors. Augmented reality is being introduced

in a growing number of applications and what I want to do is to establish

whether its use in E-commerce applications can attract new costumers and

improve the selling rate of the application.

In order to verify my thesis, I have designed the prototype application,

called "The Earring Store". "The Earring Store" is an android application

that simulates a virtual earring store. I have adopted the Android 4.0 face

detection functionality and used it as the basis of the head pose estimation

in order to perform the virtual mirror effect. In order to improve the realism

of the application, 3D models of the earrings have been designed. The cata-

logue of the application stores an image, its 3D model and some other data

for each earring; it will be entirely managed through a RESTful webservice

I have designed and created and that is in charge of acceding the database

and provide the required data, in JSON format, to the application.

An analysis of time performance, virtual mirror functionality perfor-

mance and users impressions have been used to establish the effectiveness

of the prototype and more in general, of the virtual mirror functionality in

E-commerce applications. Users have broadly appreciated the virtual mir-

x

ABSTRACT xi

ror functionality even though, due to some issues pertinent the android 4.0

face detection function, it does not work properly whenever the user ro-

tates his/her head. Time performance has also been declared acceptable,

download time of the catalogue and 3D models do not bother the users.

Users also have stated that the virtual mirror functionality increases the in-

terest towards the application and the items and that they are more willing

into purchase an item from an application that includes the aforementioned

virtual mirror functionality rather than from another application that does

not.

Estratto

I dispositivi mobile si sono diffusi ampiamente negli ultimi anni ed,

in particolare, smartphone basati su Android OS introducendo, di con-

seguenza, un gran numero di applicazioni. In questo documento cerco di

valutare le reazioni ed i comportamenti di utenti di fronte ad un appli-

cazione E-commerce che contiene funzionalità virtual mirror. La realtà au-

mentata viene usata in un crescente numero di applicazioni di vario tipo

e ciò che punto a stabilire è se il suo utilizzo in applicazioni E-commerce

possa attrarre nuovi utenti ed incrementare le percentuali di vendita.

Al fine di verificare la mia tesi, ho realizzato il prototipo dell’applicazione,

chiamata "The Earring Store". Si tratta di un applicazione android di un

negozio virtuale di orecchini. Ho adoperato la funzionalità di face detec-

tion di Android versione 4.0 e l’ho usata come base per la head pose es-

timation per l’effetto virtual mirror. Con l’intento di aumentare il realismo

dell’applicazione, ho creato i modelli 3D degli orecchini. Il catalogo dell’applicazione

che, per ogni orecchino, contiene un immagine, il suo modello 3D ed altri

dati, sarà gestito interamente tramite un webservice di tipo REST che ho

disegnato e realizzato e che si occupa di accedere al database e fornire i

dati richiesti, in formato JSON, all’applicazione.

Analisi delle performance temporali, della funzionalità virtual mirror e

delle impressioni degli utenti sono state eseguite al fine di stabilire l’efficacia

del prototipo e, più in generale, della funzionalità virtual mirror in un ap-

plicazione E-commerce. Gli utenti hanno largamente apprezzato la fun-

xii

ESTRATTO xiii

zionalità virtual mirror anche se, a causa di alcuni problemi legati alla fun-

zione face detection di Android 4.0, risulta altamente imprecisa quando

l’utente ruota il capo. Le performance temporali sono state valutate ac-

cettabili, il tempo di download del catalogo e dei modelli 3D non infas-

tidisce gli utenti. Gli utenti hanno affermato che la funzionalità virtual mir-

ror aumenta l’interesse verso l’applicazione e gli oggetti "esposti" e che si

sentono più propensi ad acquistare i suddetti oggetti che oggetti da altre

applicazioni che non includono la funzionalità virtual mirror.

Chapter 1

Introduction

In this chapter I am going to provide a brief summary of this thesis first, point-

ing out its motivations and goals and then I am going to explain how this document

is organized

1.1 Overture

This thesis has as its purpose the creation of an Android mobile applica-

tion that integrates virtual mirror augmented reality functionalities based

on the new Android 4.0 Icecream sandwitch face detection function in or-

der to allow users to not only to see but also to "try" the item they are look-

ing at. Since application belongs to the E-commerce field, the focus will not

be on typical E-commerce applications’ functionalities but on the virtual

mirror functionality and the impression that it causes on users.

In this work I provide my own approach to the creation of an android

augmented reality application. It will include the functionalities of face de-

tection, head pose estimation. The application will interact with a webser-

vice for obtaining 3D models of the items and a catalogue with images and

other information.

1

CHAPTER 1. INTRODUCTION 2

The application has to be light and fast. The communication with the

web service and 3D model rendering operations are designed in order to

be fast and smooth while performing, improving the user experience. The

augmented reality functionalities are supposed to be handled with the new

Android face detection only in order to provide an analysis of this new

feature, too. The effectiveness of this approach will be evaluated by ques-

tioning the testers about their opinion of the application time related per-

formance, about the augmented reality virtual mirror performances and

about the effectiveness of such approach in order to persuade the testers to

purchase an item from the catalogue.

The application is just a prototype, so there will be no features typical of

online shopping or any data regarding ( real or fictitious ) store.

The testers will be will be chosen regardless of age and gender and will try

the prototype firsthand before providing their evaluation. Such evaluations

will be gathered and analyzed in order to find out if the application does

satisfy them and if not, where and what it is lacking.

1.2 Document structure

This work is divided into seven chapters, including the current one that

briefly introduces the objectives of this thesis and some functionalities of

the application.

• In chapter two I will provided the background information indis-

pensable for understanding the concepts behind the application.

• In chapter three I will provided the key concepts this work is based

on, the work motivations, its purposes and goals and related works

from literature review.

CHAPTER 1. INTRODUCTION 3

• In chapter four I will provided an overview of the design of the ap-

plication. Such overview includes the establishment of the require-

ments, user analysis and the architectural design, the adopted tech-

nologies for the web service, the database, creating the 3D models

and the client application itself.

• In chapter five I will provided an overview of the application by

means of screenshots, functionalities and analysis of how the appli-

cation works.

• In chapter six I will established the evaluation parameters and the

metrics for doing such evaluation.

• In chapter seven there will be the conclusions, a final analysis of the

evaluated parameters and some guidelines for future improvements

and developments.

Chapter 2

Background information

In this chapter, I am going to introduce the main background information nec-

essary to understand this work and its concept ideas. In the first part of this chap-

ter I am going to introduce smartphones and this era’s boom of mobile devices and

Android Operative System while on the second part I am going to describe Web-

services approaches and analyze the concept of Augmented Reality both from a

theoretical and from a more practical point of view.

2.1 About smartphones

Defined a feature phone as a mobile phone which at the time of produc-

tion has additional functions over and above a basic mobile phone which is

only capable of voice calling and text messaging, it is possible to define the

concept of smartphone by saying that the distinction between smartphones

and feature phones can be vague, and there is no official definition for what

constitutes the difference between them but one of the most significant dif-

ferences is that the advanced application programming interfaces (APIs) on

smartphones for running third-party applications can allow those applica-

tions to have better integration with the phone’s OS and hardware than the

typical of feature phones.

4

CHAPTER 2. BACKGROUND INFORMATION 5

The first smartphone was designed by IBM in 1992 and was called Simon,

it was shown as a concept product that year at COMDEX expo. A refined

version was commercialized to the public in 1994 at $ 1099.

Apart from being a mobile phone, its weight was 500 g, it contained some

applications as calendar, world clock, calculator, address book entries and

email. It could moreover send and receive FAX and included games. It had

no physical buttons to dial but had a touch screen to be used with an op-

tional stylus.

From 1994 technology has made huge progress, both in terms of hardware

and software, that allowed the production of more powerful, smaller and

more efficient devices. Current smartphones have considerably decreased

in size and weight ( Ex: Samsung Galaxy S3 weights 133 g, which is 3,75

times less than the weight of IBM Simon), they use multi core technology

low consumption CPU and have integrated graphic GPU.

From the commercial point of view, smartphones have reached 1.038 billion

devices sold (third quarter of 2012), which indicates an increment of 47%

with respect to the number of devices sold in the third quarter of 2011. Neil

Mawston, Executive Director at Strategy Analytics, estimates that 1.038 bil-

lion smartphones works that out to 1 person in every 7 owns a smartphone,

meaning that there are still more feature phone users out there, and that

there is still much more growth to come, at an even faster pace: "Smart-

phone penetration is still relatively low," he writes. "Most of the world does

not yet own a smartphone and there remains huge scope for future growth,

particularly in emerging markets such as China, India and Africa. The first

billion smartphones in use worldwide took 16 years to reach, but we fore-

cast the next billion to be achieved in less than three years, by 2015." [1]

.

Previous statement remarks that smartphones sell rate is increasing at

incredible pace and these this phaenomena are not going to stop anytime


soon and indicates the necessity for developers to get into smartphones

application developments in order to reach a larger market. Among all

the smartphones, there is no standard OS, which implies that each OS has

its features and requires proper attention in order to develop applications

(though there are some tools that allow multi platform application devel-

opment). Modern mobile operating systems combine the features of a per-

sonal computer operating system with touchscreen, cellular, Bluetooth, WiFi,

GPS mobile navigation, camera, speech recognition, voice recorder, music

player, personal digital assistant (PDA) and other features.

According to International Data Corporation (IDC), the majority of smart-

phones OS market ( in the second quarter of 2012) is divided between An-

droid ( 68.1% ), iOS ( 16.9% ), Blackberry ( 4.8% ), Symbian ( 4.4% ) and

Windows Phone ( 3.5% ).

While this data remarks that Android based smartphones are the ones that

sell the most, it is important to notice that Android’s market share suc-

cess in the market can be traced directly to Samsung, which accounted for

44.0% of all Android smartphones shipped in 2Q12 and totaled more than

the next seven Android vendors’ volumes combined [2].

In the next paragraph I will introduce some of the main features of An-

droid OS that had lead it to be the leader in the smarphones OS market.

2.2 Android overview

Developed by Android Inc in 2005 and sold to Google in 2005, Android

OS is a free and open source operative system for smartphones and tablets

distributed by Google.

The first Android base phone ( HTC Dream ) was sold in October 2008

and run Android 1.0 version which among its features had: Web Browsers,

Camera, support Google Maps, Google Search, Google, Media Player, YouTube


player and Wi-Fi and Bluetooth support. In the next versions Android im-

proved and introduced many features such: Widgets support, Video record-

ing and playback in MPEG-4, Multi-lingual speech synthesis, multi-touch

event tracking, native support for sensors as gyroscopes and barometers,

multitasking, USB connection and more. At the moment, it has been re-

cently released the version 4.2 Jelly Bean ( API 17 ).

In line with open source philosophy, Android OS has been developed in

Java and relies on Linux Kernel. However, Android does not have a native

X Window System by default nor does it support the full set of standard

GNU libraries, and this makes it difficult to port existing Linux applica-

tions or libraries to Android.

While being Android an open source OS, in order to release a commer-

cial device based on Android trademark, the device is required to satisfy

the Compatibility Definition Document ( CDD ), a document that contains

hardware and software requirements. Among those, there are API compat-

ibility, supported standards, minimum application compatible and min-

imum hardware available, in order for the device to be considered as a

unique smartphone or tablet that posses a set of characteristics that allow

cross compatibility to other devices.

2.2.1 Software architecture

Android system Architecture is made of four levels. As introduced in

the previous paragraph, Android stack software includes the linux kernel

in order to handle energy management, process management, driver in-

puto/output etc, at the lowest level.

In the immediately above level there are the custom 2D graphic libraries

called SGL, while 3D graphic libraries are based on OpenGL ( currently

Android supports OpenGL 2.0 EL ) with optional hardware acceleration.

SQLite is in charge of archive data using a relationship database and FreeType


handles text rendering both vectorial and bitmap. Other libraries (of the

same level) include security protocol SSL support, multimedia data execu-

tion, streaming, etc.

On top of the libraries level, there are the Application Framework level and

the Application level. While the second is in charge of handling the appli-

cations, the first works as a base over which applications lie.

Figure 2.1: Android architecture structure

Android includes a minimum set of applications and Google Play ser-

vice in order to allow users to download certified applications, some for

free other for a certain price; besides it allows developers to upload their

own.

While lower levels are written in C, applications are written in Java and exe-

cuted via a virtual machine called Dalvik, that has an architecture "register-

based" and compiles "just-in time".

The byte-code from .class files is converted into .dex format ( Dalvik ex-

ecutable ), which is optimized for devices with low memory and perfor-

mance. The code and eventual resources or data are then compressed into


a unique file .apk ( android package ), which is used to install the appli-

cation. Each application is handled as a unique process, thanks to Linux

kernel multiuser support, so that each different user has his/her own user

ID, with exclusive access from the OS.

Android makes available to each application only the components that are

immediately necessary for their proper execution and denies the access to

other parts of the system they have no access privileges. Data and resource

sharing is still possible using a mechanism that allows applications to share

the same user ID, although it requires that the applications have the same

digital certificate. There are, however, some data that are accessible only

if the user provides his/her consent at installation time ( such as the tele-

phone book, SMS messages, etc) and are not reachable without it.

2.2.2 Applications structure

Android application projects have a well defined structure and are writ-

ten in java ( with some exceptions ). In the /src folder there are all the

classes ( unless they are written in C ), in the /libs folder there are the ex-

ternal jars while resources are placed in the /res folder. Resources include

/drawable, folders used to contain images in different resolution ( in or-

der to adapt them to the device’s resolution). In /layout we can find all the

layouts ( xml files that contain the graphical elements ) used by the appli-

cation. In /values there are xml files that define the colours, strings and

styles used by the application. Files such as videos and others are stored in

the /raw folder.

Android applications are based on classes called Activities. An activity is a

single, focused thing that the user can do. Almost all activities interact with

the user, so the Activity class takes care of creating a window in which it is

possible to place the User Interface ( UI ) by including a Layout or coding

it. Since they have to be used by users, activities have methods for pause,


resume etc that allow them to be used and keep their status even if inter-

rupted ( for example, by a phone call ). Activities can communicate among

them via "intent", which is an abstract description of an operation to be per-

formed ( to start either a new activity or a service ).

A layout defines the visual structure for a user interface, such as the UI. A

layout can be declared in two ways:

• Declare UI elements in XML. Android provides a straightforward

XML vocabulary that corresponds to the View classes and subclasses,

such as those for widgets and layouts.

• Instantiate layout elements at runtime. The application can create View

and ViewGroup objects (and manipulate their properties) program-

matically.

Each element can have other graphical properties and requires an unique

id. The id of the element is the link between the element itself and the activ-

ity: it allows to associate particular properties to that element and handling

its behavior.

The structure of the application is defined in an xml file called "manifest". In

this file the name of the application is declared, together with the icon, the

package and the label its uses. Moreover, there is a list of all the activities

that the application can use ( logic classes are not required to be declared )

and it points which one is the starting activity. It is also declared the target

SDK and the minimum SDK that is compatible with the application. More-

over all the permissions, required to access protected API calls ( such as

acceding at internet or the usage of the camera ) that the application needs

are declared.

While the manifest stores the main elements of the application, it is im-

portant to remark that it does not store any information about what those

elements do or how they interact among them.


Figure 2.2: Android Manifest example

2.2.3 SDK and NDK

Android applications are written mainly in Java, libraries and docu-

mentation are provided in the Software Development Kit ( SDK ). SDK can

be either used via scripts and prompt commands or via Android Develop-

ment Tools ( ADT ) plugin for Eclipse. Using the SDK Manager tool, down-

loading, updating and deleting extra components among which there are

the different Android APIs, tools and sample projects is possible. Android

emulators can be created, configured, edited and deleted using the Android

Virtual Device Manager ( AVD ). For each emulator the installed API, CPU

architecture, RAM size, SD size, internal storage size, etc can be config-

ured. Select existing devices is also possible: by doing so, the emulator will

be configured as those devices, thought it will still possible to change size

of RAM, internal storage etc.

While Android libraries are written in Java ( on top of a C++ linux kernel

), native code in C and C++ can be included: by using the Native Devel-

opment Kit ( NDK ) and Java Native Interface ( JNI ) native code can be

re-used and included in Android applications. Sangchul Lee and Jae Wook

Jeon have carried out an analysis of performances between an application


written in Java and another one that used native C code. They focused

the comparison over Maths operations and stated that in every part of

the experiment-integer calculation, floating-point calculation, memory ac-

cess algorithm, heap memory allocation algorithm, using native C library

achieves faster results than using the same algorithm running on Dalvik

virtual machine only [3] .

While deducing from Sangchul Lee and Jae Wook Jeon’s work that C

native code is way superior to Java code and that the delay introduced by

using the JNI communication does not influence the experimental results in

any way, it is important also to keep in mind that for other operations that

do not include complex calculation, performance gain of native code could

be minimal if not slower than Java code. Particularly, Android developers

suggest the use of native code (either C or C++) in case of self-contained,

CPU-intensive operations that do not allocate much memory, such as signal

processing, physics simulation, and so on [4] .

2.3 Augmented reality

2.3.1 Taxonomies

In modern era concepts of what is virtual and what is real are becom-

ing more diffuse due to the technological innovations that are being intro-

duced. These concepts, which seem to be one the opposite of the other, are

not totally separated though: they are connected through the Virtual Con-

tinuum, which is the space that links the real environment to the virtual

environment (and vice-versa).

In this space Fumio Kishino and Paul Milgram have provided a taxonomy

for the definition of Mixed Reality: a subclass of Virtual Reality related tech-

nologies that involve the merging of real and virtual worlds [5].


Defined the range as the virtual continuum and its bounds as the real

environment ad virtual environment, we can define, inside Mixed Reality,

Augmented Virtuality as the augmentation of the virtual environment via

real phenomena such real objects or physical laws and Augmented Reality

as the enhancing of the real environment by means of virtual ( computer

graphic generated ) objects or information.

Augmented reality applications allow users to visualize data on a dis-

play which are not present in the reality by performing real time processing

of reality input ( which can be geospatial data, visual data from a camera

etc ).

Figure 2.3: Continuum space and mixed reality

Steven Mann extends Fumio Kishino and Paul Milgram’s Mixed real-

ity definition by adding a second dimension called Medility [6]. Medial-

ity introduces the concept of mediated Reality, which generalizes Mixed

reality concept and allows a classification which is not necessarily depen-

dent on virtual elements. Furthermore, Mediality introduces the concept of

Dimished reality, which indicates digital processing and discharging of real

elements from the input video, which is not classifiable in Milgram contin-

uum since there is no virtual element present. In further detail, Mann’s

taxonomy can be described by a Cartesian plane that has as its origin pure


Reality ( R ), Virtuality ( V ) as x ax and Mediality ( M ) as y ax. Mediality

indicates the degree of executed changes, the further from the origin, the

more changes have been effected and the more it gets further from reality.

For remarkable values of M, Augmented Reality and Augmented Virtuality

are classified as Medialted reality and Mediated Virtuality. In the diametri-

cally opposite point to the origin, the perception of the pure virtuality is so

strong that it is severed from reality (severely mediated virtuality).

Figure 2.4: Mediated Reality taxonomy

The previous schema shows the relationship between Augmentation

and Mediation of Reality and Virtuality. In this case too, like in Space con-

tinuum case, the boundary between Reality and Virtuality is not perfectly

defined. Moreover, while extending the space by adding one dimension,

Mann’s space has to deal with the consequent introduction of a boundary

in the new dimension, in particular between of Mediation and Augmen-

tation. Since there is no such a constraint that forbids to introduce virtual

elements while removing real ones from the input, separation between Me-

diation and Augmentation is not perfectly defined too.

Previous definitions of Augmented Reality do not take into account any


further distinction inside Augmented Reality definition itself as other tax-

onomies do, they just define the main area of Augmented Reality without

analyzing the content of the area itself.

Here is a short list of some taxonomies that consider more detailed analysis

of Augmented Reality.

Wendy E. Mackay’s analysis of Augmented Reality identifies three ways to

perform the augmentation based on what is augmented [7] :

• Augment the user: it is the first approach to Augmented Reality that

was ever used. The user wears the interface (usually keeps the object

on the hands or on the head)

• Augment the physical object: sensors are placed over the physical

object and they interact with a computer providing information ( about

location, temperature etc ).

• Augment the environment surrounding the user and the object: the

environment is ( usually ) augmented by means of a projector while at

least one camera fetches information from users ( like some gesture,

gaze tracking etc ) and a computer elaborates the information in order

to provide the appropriate response.

Instead, Olivier Hugues, Philippe Fuchs and Olivier Nannipieri pro-

pose a functional based taxonomy of Augmented Reality applications [8].

The first level divides functionality by augmentation of either perception

or Environment. Next level specifies the kind of augmentation and the last

level subdivides by geometrical and physical interactions between real and

virtual objects.

• Incrustation: objects displayed on the image regardless of any physics

law or geometry.


• Integration: objects overlapped on the image taking into account physics

and geometry.

Figure 2.5: Functional augmented reality taxonomy graph

It is important to remark that, in the last level of the hierarchy, classi-

fying both applications that incrustate virtual entities over real images and

applications that incrustate real images on a virtual environment is possi-

ble.

Marcus Toennis and David A. Plecher proposal is based on six classes

of presentation principles that cover different aspects of the application [9]:

• Temporal: temporal property of a presentation, it can be continuous

or discrete.

• Dimensionality: continuum between 2D ( symbolic ) and 3D (virtual

objects ) information presentation.

• Registration: Virtual objects to the environment can be unregistered

( no alignment to the 3D world ), registered (the object is shown as

if it embeds into the environment by having the same perspective

and by appearing on the correct 2D position on the screen ) and can

be presented in a so-called contact-analog manner (In addition to the

information being respectively registered on the 2D screen, contact-

analog presentation displays the objects in the correct focal depth ).


• Frame of Reference: The continuum spans between egocentric ( uses

the same point of view from which the user perceives the real scenery

to place the virtual camera for object rendering ) and exocentric pre-

sentation ( an object shown from another point of view, such as a

mini-map ).

• Referencing: Deals with the relation of the object of concern with re-

spect to the user of the AR system can see.

• Mounting: differentiates where a virtual object or information is mounted

in the real world ( objects furthermore can have more than one mount-

ing point ) .

Each application is categorized by classifying it by each class and lists

the results in a table, in this way applications are classified in the six di-

mensions space previously defined.

2.3.2 Applications and human vision

Augmented reality applications require input, elaboration and output

components in order to be used. Devices such computers, smartphones,

laptops etc usually contain all required hardware components in order to

process Augmented reality applications.

Hardware components have a big impact in the quality of those applica-

tions: if the hardware components are not powerful enough, quality and in

particular frame rate of the application will decrease drastically.

Human eyes perceive reality as continuous yet people usually think that 30

Frames Per Second ( FPS ) is the limit of our vision system. This misconcep-

tion has been cleared by Dustin D. Brand who has written an article about

how many frames per second human eyes can perceive. In his article, Brand

remarks that the limitation of perceived FPS is due to the viewing device

and not human eyes, in fact he says :


"The United States Air Force ( USAF ), in testing their pilots for visual re-

sponse time, used a simple test to see if the pilots could distinguish small

changes in light. In their experiment a picture of an aircraft was flashed on

a screen in a dark room at 1/220th of a second. Pilots were consistently able

to "see" the afterimage as well as identify the aircraft. This simple and spe-

cific situation not only proves the ability to perceive 1 image within 1/220

of a second, but the ability to interpret higher FPS." [10].

While Brand demonstrated that human perception can go far over 30

FPS, what truly concerns Augmented reality applications is the frame rate

below which perception goes from fluent to brokenly, with a clear and dis-

tinctive perception of the passing from one frame to another.

In order to avoid previously described effect, it is important that, while

designing and testing any visual involved application, the minimum target

frame rate should be around 15-20Hz on less powerful devices while opti-

mal frame rate is equal or greater than 30Hz on the more powerful ones.

2.3.3 Hardware setups for augmented reality

The choice of hardware sensor devices is strictly application related.

Based on the purpose on the application those components are selected

and integrated ( if missing ) in the augmented reality system.

Input hardware components for Augmented reality applications are usu-

ally made of one camera for acquisition of visual related data. More cam-

eras can be simultaneously used in order to process 3D stereoscopic, which

is used in order to create the illusion of depth in an image by means of

stereopsis for binocular vision, or in order to perceive depth and analyze

3D motion, such gesture based kinects applications.


Other input components could be the Accelerometer, GPS, Gyroscope etc...

which are typical of smartphone and tablet devices. Those sensors, which

do provide information about device position, orientation and motion, are

mostly used in applications that require georeferenced data or some spe-

cific movement of the device.

Other sensors too, like thermic sensors can be used as input for augmented

reality applications.

Efficiency of augmented reality applications relies in sensors data correct-

ness and transmission speed: wrong data could lead to a inexact input anal-

ysis and produce meaningless output while slow transmission data reduces

the application speed and leads to brokenly perception of the frame rate.

Output hardware device allow user interaction with the application. It is

essential that one or more displays are available in order to visualize the

video streaming output. Among the visual displays used in augmented re-

ality applications, there are computer monitors, mobile devices displays

and head-mounted displays.

Computer monitors and mobile devices display are the most used output

device ( since they are an essential component of computer and mobile de-

vice, it used to take for granted that most of the users of the application

have them ).

Head-mounted displays ( HMD ) have either one or two small displays

with lenses and semi-transparent mirrors embedded in a helmet, eye-glasses

( also known as data glasses ) or visor in order to project the data processed

after camera acquisition and adapted in order to match eyes position ( and

sometimes even eyes gaze). The display units are miniaturised and may

include LCD, OLED, etc... Adopting multiple micro-displays solution in

order to increase the total resolution and field of view is possible for head

mounted displays.


Figure 2.6: Head Mounted Display vision schema

Another kind of output hardware is haptic hardware device. Haptic

hardware devices are used to enhance the user experience and deliver fur-

ther tactile knowledge about the properties of an object. Additional force

stimuli are transferred via the haptic interface in conjunction with what-

ever force response is being received from the real object, augmenting the

overall perception of the feel of the object [11]. Haptic technology is often

used in areas such medicine where, even if it is necessary, it is not always

possible to perform test e.g: a heart surgery test training for a doctor stu-

dent could lead to disastrous consequences, if a mistake were to happen.


Figure 2.7: Haptice sensor system

2.3.4 Application areas

• Archeology/Architectural: Archeological remnants in urban areas tend

to be included in the urban landscape or even remain hidden in sub-

terranean locations which are not visible and, for these reasons, they

are accessed with difficulty by visitors [12]. Moreover, they are ( of-

ten ) deteriorated and their appearance can be way far off from the

original one. Augmented Reality technology is used to provide users

( in this case, visitors ) not only multimedia information such as his-

torical background of the archeological site ( through text, images,

video or even 3D models animation) but also a virtual reconstruc-

tion of the original building, improving the experience provided by

the tour. In this case, the most common hardware solution adopted

are smartphones that are used to recognize the different markers that

represent a particular piece of information or historical building each

in order to retrieve the correct multimedia information and provide

it on the display and/or on the headphones in case of audio format

information.

Concerning Architectural environment, Augmented Reality is mainly

used to prepare, analyze and study the development and the sta-

tus of a model or building. Among possible research fields, illumi-

nation and spatial related issues are the most studied. Particularly,


Yu Sheng, Theodore C. Yapo, Christopher Young and Barbara Cutler

[13] present Virtual Heliodon, an application of interactive global il-

lumination and spatially augmented reality to architectural daylight

modelling that allows designers to explore alternative designs and

new technologies for improving the sustainability of their buildings.

Images of a model in the real world, captured by a camera above the

scene, are processed to construct a virtual 3D model.

Figure 2.8: Virtual Heliodon lighting simulation

• Education: Augmented reality is found to be very effective in the

field of teaching and this can be used in order to build interest in

students and young children to the study concepts which are imagi-

nary and are difficult to understand. By Merging these two concepts

of Augmented reality and mobile learning and deeply studying the

concept of mobile augmented reality, the idea is to develop an interac-

tive mobile augmented reality application based on the best learning

practices in which the interactive science book will act as a marker

and the web and the mobile camera will work as a tracking device

to trigger the new level of study experience of general science con-

cepts such as the study of materials, solids, liquids and gases, differ-

ent phenomenon they go through, universe and the galaxies, the basic

human skeleton parts, digestive and respiratory systems etc. Mobile

learning or M-learning through the use of mobile device allows any-

one to access information and learning materials from anywhere and

at any time. M-learning focuses on the mobility of the learner and in-


teracting with portable device, like laptop, PDA, smart mobile phone

etc [14].

An example of M-learning Augmented reality application is presented

by Sejin Oh and Yungcheol Byun [15]. They propose a augmented re-

ality learning system that enables users to experience flower garden-

ing with an interactive agent over a physical book. To allow users to

cooperate with the interactive agents in a real space, the system over-

lays a virtual flower garden over a physical book by detecting and

tracking the page of the book through a camera and assign collabora-

tive works on the gardening to the learners. To improve learners’ en-

gagements in gardening, a picture is augmented with an interactive

agent that assists users in achieving desired goals in the gardening

environment. Specifically, it allows users to seamlessly interact with

an interactive agent with their mobile devices.

• Medical: In medical environment, Augmented Reality technology has

found many research possibilities. Particularly, in surgery related op-

erations, techniques such Image-guided surgery (IGS) and Computer-

aided surgery (CAS) have been developed in order to improve the

success rate, reduce risks and obtain an improved view of the sta-

tus of the patient. Image-guided surgery is mostly employed in min-

imally invasive operation. IGS model can be depicted as made up

of four stages linked up in a continuous chain. At each of the stages

forming the model, one or more computation or application of cer-

tain mathematical based procedure is executed before moving on to

the next stage. The stages are listed in order of execution as follows;

(a) image acquisition: the IGS protocol employs some aspects of pa-

tient registration methods like patient layout, imaging modality, field

strength, scan sequence, slice thickness, et cetera needed to correlate

the reference position of a virtual 3D dataset with the patient’s ref-


erence position which will be useful at other stages of IGS, (b) pre-

operation planning: Using the image acquired in the previous step,

and achieved information such as the slice thickness, a 3D virtual

image (where the "damaged area" is segmentated in order to avoid

the risk of post-surgical morbidity by accidentally damaging adjacent

structures.)is produced and the surgeon performs simulations of the

operation over the reconstructed image,

Figure 2.9: Example of Surface rendering of preoperative data

(c) surgical interventional stage: At this point, integration of medical

images and other sources of information such as tracked instruments

is accomplished by enforcing the earlier segmentation and the actual

surgery is performed, and (d) post-operation monitoring which is

mostly handled at outpatient level. It also encompasses the use of ra-

diological images for post operation monitoring, treatment and med-

ication to patients. [16].

Another employment of IGS is in the 3D pose estimation of fractured

bones: The typical imaging modality currently used intraoperatively

in orthopedic IGS is fluoroscopic images. These images can be ac-

quired in real-time but are 2D. Hence they lack the spatial informa-

tion contained in 3D volumetric modalities. Thus a crucial enhance-

ment over current orthopedic IGS systems would be to enable the use

of 3D volumetric information during interventional procedures. This


can be achieved by registering pre-operatively obtained 3D volumet-

ric data to 2D images that are acquired intraoperatively. This 2D-3D

registration will identify the position and orientation of the patient’s

anatomy in six degrees of freedom and will provide real-time 3D vi-

sualization, intraoperatively. Fracture reduction, hip and knee arthro-

plasties and several spinal procedures (pedicle screw placement) are

currently seen as potential areas of orthopedic surgery benefited by

intra-operative 3D visualization through a 2D-3D registration process

[17].

These techniques require the highest precision, even the lesser error

could lead to the worst consequences, so new techniques are being

developed and tested in order to prevent this from happening.

• Games and entertainment: Augmented reality and games are suc-

cessfully meshed-up and have increased Augmented reality popu-

larity and diffusion. Among the firsts successful Augmented reality

videogames there is "EyeToy: Play" (2003), a hardware and software

system that interacts with the console Sony Playstation 2. EyeToy in-

corporates a unfique USB camera that utilizes motion-tracking tech-

nology for gesture recognition so that gamers instantly become the

main character in their own game. In 2007 Sony released an upgraded

version of EyeToy, called "Playstation eye", in bundle with an Aug-

mented reality gamed called "The eye of judgement", for Playstation

3 console. The game required players to arrange physical cards over

a grid in order to play against other players or the console itself. The

cards were used as markers that were recognized by the camera and

were overlapped by their own 3D models on the screen.

In 2009, Microsoft presented the Kinect peripheral, which consists of

a Infra-red projector, Infra-red camera and a RGB camera. Kinect is

receiving a lot of attention thanks to rapid human pose recognition


system developed on top of 3D measurement. The low cost, reliability

and speed of the measurement promise to make Kinect the primary

3D measuring devices in indoor robotics, 3D scene reconstruction,

and object recognition. Kinect can be used both with console Xbox

360 and windows ( 7 and 8 ) based computers [18]. In 2011 Microsoft

released official non commercial drivers for Kinect application devel-

opment, which provided another boost to Augmented reality games

developing. In the next years, Nintendo released the portable con-

sole Nintendo 3DS ( 2010 ), which is equipped with the software AR

Games. Particularly, it is suited for marker recognition based games:

since the console has two cameras on its side, no extra peripheral is

required. Furthermore, the console’s display allowds the 3D visual-

ization without any goggles required.

Figure 2.10: Example of augmented reality game

Currently, Augmented reality games trend is moving on the mobile,

console, table and smartphones market. Android and iOS operative

systems for smartphones are compatible with a big quantity of aug-

mented reality games, among which there are "AR Invaders", "iSnipe

you" and "DroidShooting". Smartphones games usually rely on the

device camera, GPS and accelerometer in order to capture informa-

tion regarding the surroundings and the smartphone position ( and


orientation ) itself, that can also be used with markers or object recog-

nition, to provide the augmentation of the game.

• Translation: Using text recognition algorithms, the application can

recognize foreign language writings, translate them into the selected

language and display the translated text on the device’s display. Vic-

tor Fragoso, Steffen Gauglitz, Shane Zamora, Jim Kleban and Matthew

Turk [19] realized an Augmented reality application for smartphones

called TranslateAR. The application applies text detection algorithms

in order to find the text over the scene. Once the text has been found,

it is encircled by a bounding box. The resulting quadrilateral region of

interest is warped into a rectangle, correcting any perspective distor-

tion and showing the text as if seen orthogonally. The warped image

produced is used to extract background and foreground colour, as

well as to "read" the word via Optical character recognition ( OCR ).

Finally, the user taps over the text that he/she wants to translate into

the desired language and its translation is first performed by Google

and secondly overlaped over the screen.

Figure 2.11: TranslateAR in operation

• Military: Among all the possible application fields that augmented

reality technology has, there is the military field as well. In military

environment, soldiers can benefit from AR, which is often used in

order to increase the Situation Awareness (SA) or maintenance and


repair operations, by using HMD and improve their efficiency. Re-

garding SA based AR technology, soldiers obtain through the display

dynamic information about surroundings, their own location, target

position and other multimedia information that is synchronized with

operative center and teammates by using satellite or wireless tech-

nologies.

On the other hand, maintenance and repair based augmented reality

applications, are used, as their names say, in order to help the sol-

diers while performing maintenance and repair operations.The ma-

jority applications focuses on specific subsets of the domain, which

can be categorized as activities involving the inspection, testing, ser-

vicing, alignment, installation, removal, assembly, repair, overhaul,

or rebuilding of human-made systems.

Steven J. Henderson and Steven Feiner [20] propose an analysis of

maintenance and repair base augmented reality applications and their

own prototype. Particularly for each task, the application provides

five forms of augmented content to assist the mechanic:

1. Attention-directing information in the form of 3D and 2D ar-

rows.

2. Text instructions describing the task and accompanying notes

and warnings.

3. Registered labels showing the location of the target component

and surrounding context.

4. A close-up view depicting a 3D virtual scene centered on the

target at close range and rendered on a 2D screen-fixed panel.

5. 3D models of tools (e.g., a screwdriver) and turret components

(e.g., fasteners or larger components), if applicable, registered at

their current or projected locations in the environment.


Figure 2.12: Military mechanics conducting routine maintenance prototype

The aforementioned augmentations are visualized through an HMD

( that uses two displays ). An android smartphone wireless wrist-

worn controller has also been integrated in order to allow the user to

replay an animated sequence or control its speed. It provides also for-

ward and back buttons that allow the mechanic to navigate between

maintenance tasks. When viewing tasks with supporting animation,

additional buttons and a slider are provided to start, stop, and control

the speed of animated sequences. This controller is used as interface

since the head mounted display does not provide any physical or vir-

tual interface itself.

Chapter 3

"The Earring Store": Objectives

and Related Works

In this chapter I am going to first introduce the project. Then I am going to

provide information and a brief introduction to the prototype I have created and

some related works. Lastly, I am going to provice the goals and motivations of this

thesis

3.1 Introducing "The Earring Store" project

The "The Earring Store" is the prototype of a Augmented Reality appli-

cation for Android that I have developed for this thesis. The main idea of

this prototype is that of a catalogue-like application for an earring store that

integrates Augmented Reality functionality known as Virtual Mirror. The

client application interacts with a custom web service in order to get the

catalogue of earrings ( made of images, prices and other data ) and "build"

a local catalogue on the user device. Once the user selects a particular el-

ement from the catalogue, he/she can see the available information and,

eventually, decide if "try" it using the smartphone as a virtual mirror or

take a picture and try it on the picture. In case the user decides to try the

30

CHAPTER 3. "THE EARRING STORE": OBJECTIVES AND RELATED WORKS31

selected item, the application downloads the 3D model of such item from

the web service and allows the user to try it.

This concept application falls inside the categories of virtual mirror and

E-commerce/advertisment application ( since it is, in fact, an online shop-

ping application), where the first category indicates a particular function-

ality and the second one indicates its purpose.

Being an Augmented Reality application for smartphone, the "The Earring

Store" requires access to the camera of the device. Moreover, it also requires

access to Internet in order to fetch the catalogue and the 3D models from the

web service. Such requirements, together with the Android version, will be

available, as introduced in the previous chapter, for users on the page of the

application in the Google Play store once the application will be uploaded.

The 4.0 "Icecream sandwitch" version is the minimum Android version

in order for the application to correctly work. This is due to the fact that

Android 4.0 introduces new camera related functions, among them an im-

proved dalvik native face detection function that I’ve selected and used to

perform the virtual mirror functionality.

3.2 Related works

3.2.1 Virtual Mirror

Mirrors are well known and widely used in computer graphics to en-

hance the realism of virtual scenes [21]. The Magic Mirror is a user interface

technique that mimics a hand mirror. In addition to providing the optical

effect of a real mirror, several non-physical extensions are also proposed.

As a metaphor, the Magic Mirror is an intuitive and easy to learn interac-

tion technique. It can be combined seamlessly with most navigation tech-

niques. And it is easy to use because the only task involved is similar to

the manipulation of an object [22]. The mirror is supposed to reflect real-


ity, show us reality that surrounds us from another point of view. Alexan-

dre François, Elaine Kang and Umberto Malesci [23] presented their work

"handheld virtual mirror" where they introduced a scenario consisting of

a camera mounted on a display monitor and a magnetic tracking system.

It records streaming video as input and provides on the display the video

reflected as if it were a real mirror. However, it is not an augmented reality

application since they do not introduce any virtual objects on the scene. The

mirror can also mislead, distort the reality, distort it instead of reproduce it

accurately. T. Darrell, G. Gordon, J. Woodfill and M. Harville [24] describe

a virtual mirror interface which can react to people using robust, real-time

face tracking. The display can directly combine a user’s face with various

graphical effects, performed only on the face region in the image.

Among the many possible situations where a mirror is needed, there is

driving. The main purpose is to provide the driver of a car beside the infor-

mation from the traditional rear view mirror with additional information

in a mirror-like display that is not directly visible. Sameer Pardhy, Craig

Shankwitz and Max Donath [25] report on the idea of extending the rear

view mirror in a car with additional information from a DGPS (Differ-

ential Global Positioning System), "an onboard geo-spatial database" and

"radar or inter-vehicle communication" to provide the driver with an entire

knowledge about position, distance and speed of other vehicles and objects

close to his or her own car. Donath from the same group filed a patent on

this topic and called it "virtual mirror".

All of the mentioned mirrors referred to mixed or augmented reality ap-

plications used as a real display reflecting the world like a mirror. Some of

them are capable of augmenting the reflected world with additional infor-

mation registered with real objects. My mirror application belongs to this

second category .


3.2.2 Other E-commerce virtual mirror projects

The mirror metaphor combining reality with an augmentation of vir-

tual objects has been presented for improving different commercial applica-

tions: the customer is able to view oneself in a stationary mirror-like display

that augments clothes or accessories for instance. The following projects are

just a brief selection of some of the existing E-commerce virtual mirror ap-

plications and prototypes.

Probably the most famous virtual mirror E-commerce application is the

Ray Ban virtual mirror [26]. Developed by Ray Ban for desktop pc, it can

be downloaded from the Ray Ban web site. Once it starts, it connects to the

web site to check if there are new models and then, using the web cam of

the computer, it opens the video stream. The application performs first face

detection by making the user position his/her face inside a small ellipse in

the display and fitting face shape and eyes position. Then, they track the

users face and perform head pose estimation and overlap to the face the

glasses that the user wants to try. The application allows only one user at a

time and if it "loses" the user, it compels the user to place himself/herself

again in the ellipse and start from the beginning but it also allows user to

try the glasses and to change the model while trying it.

Silhouette iMirror [27] is a free iPhone/iPad application that essentially

has every feature that Ray Ban virtual mirror has, just for mobile Apple

devices. Moreover, unlike the Ray Ban version, it does not require the user

to keep still inside an ellipse while the face detection is performed neither

at initialization time nor when the application "loses" the user. Among its

features, it allows to change the glasses model while trying them, to resize

the model in case it is being displayed to small or too big and to purchase

directly the real glasses from the application (Ray Ban virtual mirror does

not allows to do so).

P. Eisert, P. Fechteler, J. Rurainsky [28] presented a project for the real-


time visualization of customized sports shoes. Their application setup con-

sists in a green platform where the user has to stand, a camera and a dis-

play. With an initial segmentation they extract the shoes of the user from

the image if both shoes are visible. Under the assumption that the user

stands on the platform, they perform a rough estimation of the shoes’ pose,

which is improved by using 3D tracking and gradient-based 3D motion es-

timation. The shoes are composed by several 3D sub-objects ( instead of

being a unique 3D model ), which allows the user to customize his/her

shoes by changing some of those them or their colour. By adding an invis-

ible 3D model of the user’s leg the obtain the integration effect previously

discussed in 2.3.1 Taxonomies.

Lu Wang, Ryan Villamil, Supun Samarasekera, and Rakesh Kumar [29]

propose a kinect based application for on-line handbag shopping. The sys-

tem requires a Kinect sensor. To start, a user stands in the initial calibration

pose defined for a couple of seconds, so that the system can perform im-

age segmentation and Edge-based background modeling to fully extract

the user from the background. Once the user is detected, he (or she) can

select virtual handbags on a screen with gestures. To check if there is an in-

tersection, the 3D handbag model is sampled with a set of 3D points which

are projected onto the image plane of the kinect camera. If the depth values

of the 3D points are all smaller than the corresponding values in the depth

map generated by kinect, the handbag has not intersection with the human

body. The selected virtual handbag will initially be displayed in one of his

hands. When the user raises his arm, the virtual handbag will slide to his

elbow or shoulder depending on the slope of his forearm and upper arm

(the part between shoulder and elbow); and when the user lows his arm,

the handbag can slide back to his hand. There are 3 stable positions of the

handbag: hand, elbow and shoulder.


3.2.3 Face detection and head pose estimation

Face detection, which often mistaken to face recognition ( that indicates

recognize a particular face as a certain person from a database), and head

pose estimation are two computer vision operations that indicate finding

the face ( either on an image or on a video ) and understanding its orien-

tation and position in space. The "The Earring Project" does indeed need

to perform both of them in order to accurately arrange the earrings and

accomplish its function as virtual mirror.

Face detection is usually performed as a binary pattern-classification

task. That is, the content of a given part of an image is processed into fea-

tures that a trained classifier decides whether that particular region of the

image is a face, or not. Often, a window-sliding technique is employed.

That is, the classifier is used to classify the usually square or rectangular

portions of an image, at all locations and scales, as either faces or non-faces.

The pose head estimation problem is a specific case of the pose estima-

tion problem, which is the problem of understanding the 3D pose of an

object by taking as input 2D images. Erik Murphy-Chutorian and Mohan

Manubhai [30] provide a survey of the approaches to the head pose estima-

tion problem and propose the following categories:

• Appearance Template Methods: compare a new image of a head to a

set of exemplars (each labelled with a discrete pose) in order to find

the most similar view.

• Detector Array Methods: train a series of head detectors each attuned

to a specific pose and assign a discrete pose to the detector with the

greatest support.

• Nonlinear Regression Methods: use nonlinear regression tools to de-

velop a functional mapping from the image or feature data to a head

pose measurement.


• Manifold Embedding Methods: seek low-dimensional manifolds that

model the continuous variation in head pose. New images can be em-

bedded into these manifolds and then used for embedded template

matching or regression.

• Flexible Models: fit a non-rigid model to the facial structure of each

individual in the image plane. Head pose is estimated from feature-

level comparisons or from the instantiation of the model parameters.

• Geometric Methods: use the location of features such as the eyes,

mouth, and nose tip to determine pose from their relative configura-

tion.

• Tracking Methods: recover the global pose change of the head from

the observed movement between video frames.

• Hybrid Methods: combine one or more of these aforementioned meth-

ods to overcome the limitations inherent in any single approach.

Such methods differ from computational time, accuracy and input re-

quirements ( ex: calibrated camera, multiple cameras etc). For the "The Ear-

ring Store" project, the input is the video stream from a single uncalibrated

camera. My approach can be classified as Hybrid Method ( Geometric and

Tracking Method ) and will be described in the next chapter.

3.3 Motivations and Goals

Online shopping has become commonplace and more convenient than

ever, allowing shoppers to buy not only typical products with ease, but

also products which may be difficult to find in stores or may have a more

diverse selection online. The foremost limitation in online shopping how-

ever, is apparel; clothing and accessories cannot be "dressed" by the user


before buying, which makes shopping for these things online impractical,

inconvenient, or costly.

Software can be used to address some of these limitations. While many

accessories do not require a mirror (i.e., low degree of necessity for mod-

eling on individuals before buying), and clothing often requires more than

a mirror (by virtue of including factors such as physical fit and stretch of

material). Such issue has been engaged by some companies using a static

approach: the user was to take a snapshot of himself/herself and manually

place the object in the exact position, handling its orientation and possibly

its scaling. In a second time, companies decided instead to a more intu-

itive and automatic approach: Augmented Reality virtual mirror. By tak-

ing a continuous video stream, the software first detects the body of the

user, it finds its pose and lastly it overlaps over the video the 3D model

of the given object the user wanted to try. This second approach simpli-

fies and at the same time stimulates user interaction. Users perceive the

3D model as a real object that follows each of their movements, having the

feeling that what they "are trying" is actually the store’s object. Jun Park

and Woohun Lee propose a paper where they state : "Since the middle

of 1990’s web-based E-Commerce markets have grown quickly. However,

Two-dimensional images and text in internet cannot provide enough infor-

mation of products to customers. The difference between the impressions

on the images and the actual products is due to the fundamental discrep-

ancy between the internet-based cyber world and the real environment. To

resolve the discrepancy, 3D virtual products can be provided ... but still 3D

virtual products are not in the same context as the users’ real environment.

To correctly resolve the discrepancy, the user’s real environments (user’s

office or home) and the virtual object (the product) should be seamlessly

mingled" [31].

what can be inferred from previous statement, is the importance in E-


commerce of integrating 3D models by meaning of Augmented Reality.

Android released a face detection functionality for developers with API

1, which was ( in theory ) able to provide the middle point between the

eyes, the eyes distance and more importantly, the pose of the head of the

found faces. Unfortunately, there is a BUG in that function and for what-

ever real pose of the head, it will return, for both X and Y and Z angle, 0

degrees. While that BUG is widely renown, no fix has been released. With

the new 4.0 icecream sandwitch version ( API 14 ), a new face detection has

been released. This function provides the developers with different param-

eters from the previous one: it provides, for each face, a rectangle, the pixel

coordinates of the left and right eye and the center of the mouth and an

ID. Using these parameters it should be possible to implement a head pose

estimator for the face found and increase the sense of realism of the virtual

mirror effect.

The purpose of this thesis is to provide an analysis of the user inter-

action with an E-commerce virtual mirror android application prototype

based on the new Android face detection function, taking into account user

experience and software performances ( focusing on the time consumption

factor ) and an evaluation of this new face detection. Regarding users ex-

perience, an evaluation of the prototype will be performed over different

aspects of its usability, such the navigation of the catalogue, the overview

of selected items and the use of the virtual mirror mode. Particular focus

will be given to the virtual mirror experience, not only from the usabil-

ity point of view but also analyzing its effectiveness and persuasiveness in

the process of convincing users to purchase items that are available on the

catalogue. Concerning software performance, as previously stated, I have

chosen to focus on speed performances. Such choice has been driven by

the fact that, as Jacob Nielsen says, " Even a few seconds’ delay is enough

to create an unpleasant user experience. Users are no longer in control, and


they’re consciously annoyed by having to wait for the computer. Thus, with

repeated short delays, users will give up unless they are extremely commit-

ted to completing the task. " [32].

In speed evaluation performances I am going to analyze the communi-

cation time between the web service and the client application, the render-

ing of 3D models on the client application and the required time needed to

perform transformations ( such as rotation, translation and scaling ) of the

previously mentioned 3D models. Evaluations will be performed both by

making users compile a survey after testing the prototype and by taking

analytical data wherever possible ( i.e. clock web service response time ).

Such evaluations and evaluations’ analysis will be provided in chapter 6.

It is also important to remark that there are no virtual mirror applications

( at least known to me ) that include real time face detection and reality

augmentation for android on the market there are instead applications that

provide simple mirror effect by acceding the camera and reproducing its in-

put stream on the display. However virtual mirror applications that include

face detection, pose estimation and reality augmentation are available for

computer and iphone/ipad devices ( i.e. Ray Ban virtual mirror website ).

There are no virtual mirror applications known to me that augments the

reality using earrings ( most of the existing applications use glasses ).

Evaluations will be performed in order to establish if the user experi-

ence has been proved satisfactory and to verify the ( eventual ) benefits of

the introduction of augmented reality into an E-commerce ( prototype ) ap-

plication can increase the persuasion factor and stimulate the purchasing.

Chapter 4

Designing the prototype of the

application

In this chapter focus will be on the design of the prototype of the application

itself. It will be provided a brief user analysis, the prototype application’s require-

ments, architectural and sofware design and overview.

4.1 Establishing requirements

While designing the prototype, I have decided to have the client appli-

cation separated from the catalog and 3D models. This is due to the fact

that I wanted to de-couple the application from the catalog and to reduce

the amount of memory that the application needs on the SD card of the

smartphone. This means that all the data concerning the catalog and its

items will be kept in a database and acceded through the web service.

The database of the catalog will contain, for each earring, an image, the

price, the name of the model, the material that composes the earring ( i.e:

gold, silver etc ), the kind of closure, the stone ( if any ) and the 3D model

of the earring.

Client application will be always in portrait mode, regardless of the ori-

40

CHAPTER 4. DESIGNING THE PROTOTYPE OF THE APPLICATION 41

entation of the smartphone. This is due to the fact that, whenever (if not told

to avoid it ) the smartphone changes it orientation, the activity starts from

scratch. Since almost every graphic component of the application is created

after downloading the data from the database, the application should save

the status of each element before changing the orientation of the layout

and build every element again. This process is discouraged on the official

android developers website:

"If restarting your activity requires that you recover large sets of data,

re-establish a network connection, or perform other intensive operations,

then a full restart due to a configuration change might be a slow user expe-

rience. Also, it might not be possible for you to completely restore your ac-

tivity state with the Bundle that the system saves for you with the ’onSave-

InstanceState()’ callback, it is not designed to carry large objects (such as

bitmaps) and the data within it must be serialized then deserialized, which

can consume a lot of memory and make the configuration change slow. In

such a situation, you can alleviate the burden of re-initializing your activ-

ity by retaining a stateful Object when your activity is restarted due to a

configuration change. " [33].

The virtual mirror part of the application is meant to be used by only

one person at a time. While it is possible to change this and remove this

limit I have imposed, I have decided not to because having more than one

person would increase the computation calculus of face detection and pose

estimation and require more instances of the 3D model of the earrings,

which could lead to memory related issues like out of memory.

Once the application starts, the catalogue will be in the lower part of

the screen, while above the catalogue there will be a box for the selected

item. The catalogue will be scrolled ( horizontally ) by simply touching the

screen. To select an item it will be necessary to clicking on it and keeping the

finger down for at least 2 seconds. Once the item is selected, its image will


appear inside the "selected item" box and, if the user decides to explore the

data about that item, it will need another click over the box to do so. This is

in order to allow the user to confront earrings and to avoid exploring items

they are not interested in. When viewing a single item of the catalogue,

the information will be disposed vertically and it will be able to scroll it (

vertically ).

Hardware requirements for the prototype are an Android based smart-

phone to test the client version and a host for both the database and the

webservice. The smartphone needs the presence of a frontal camera ( be-

cause it would be meaningless attempt to reproduce the virtual mirror ef-

fect on the rear one ) and internet connection in order to receive the cata-

logue and 3D models from the web service.

Software requirements are related to the Android version of the device,

which is the 4.0 Icream sandwitch version. This is due to the choice of using

the new android Face Detection function that was released in such version.

Since this is a prototype and not a commissioned application, there will

be no information about the brand ( either of the store or the earrings ),

neither store contacts ( i.e. email, telephone number, facebook page, twit-

ter address ) nor information about the store location ( i.e: Google maps

address ) nor aperture/closure time. Moreover, no purchasing and login

functionalities will be implemented.

4.2 Users analysis

The prototype is meant to anyone who is interested in buying one or

more earrings and decides to check an online catalogue. The application

should be appealing and at the same time be fluent, providing an entertain-

ing and interesting user experience. To do so, layouts will be kept as neat

as possible and in case of incorrect actions from the user, directions will


be provided by the application. The target user of the application is either

male or female ( it is most likely that the majority of users will be female

due to the fact that there are less males that wear earrings than females ),

has a variable age range ( most likely it will be used by young people who

are more acquainted with technology ) and posses an Android smartphone

which version is 4.0 or higher . The application itself does not expect any

"selection" of its users, everybody is free to use it.

4.3 Architectural design

The prototype will rely, as previously introduced, on a database for stor-

ing the data, a web service to acquire the data and a client application to

visualize it. The graph below summarizes the architecture of the applica-

tion.

Figure 4.1: Architectural design Deployment Diagram


4.3.1 Web service adopted technologies

I have built the web service using WCF .NET technology on top of REST

and JSON format for my prototype application. The choice of using REST

over SOAP was led by "SOAP-Based vs. RESTful Web Services A Case" pa-

per, written by Fatna Belqasmi, Jagdeep Singh, Suhib Younis Bani Melhem,

and Roch H. Glitho [34]. In which is performed a SOAP and REST analy-

sis by means of realizing a prototype of a conference manager for mobile

smartphones and evaluation in terms of end-to-end time delay and net-

work load when executing different conferencing application operations.

The results highlight the faster performance of REST based architecture

over the SOAP based one. Moreover, they show that "processing SOAP-

based Web service requests in a mobile environment can take 10 times

longer and consume eight times more memory than an equivalent REST-

ful Web service request". Thus, my decision of realising the web service

upon REST architecture instead of SOAP.

Concerning the response format of my web service, the choice was be-

tween JSON and XML formats. In Tommi Aihkisalo’s and Tuomas Paaso’s

paper " Latencies of Service Invocation and Processing of the REST and

SOAP Web Service Interfaces" [35]. The authors compare the Web service

invocation latencies and their causes experienced by the client during the

service request-response round trip. The system utilized for the measure-

ments consisted of a suitably instrumented client and a server stack on a

server implemented for sending and receiving messages containing arbi-

trarily binary or text data content between the client and server. Both SOAP

and REST versions have been implemented and, in particular, The REST-

ful implementation was tested using the wire formats of JSON, XML and

Google Protostuff, while SOAP relied on SOAP-XML and SOAP XOP/M-

TOM for the binary content. The analysis of the results carried out for this

study clearly show how in all cases in the light of the described scenario,


REST is the fastest. The experimental data regarding XML and JSON la-

tencies are quite similar: "the only benefit that the JSON solution is able

to achieve in real life applications is the possibility for the faster network

transmission of the marshalled request-response message objects due to the

tighter encoding with JSON". Since I’m trying to make this service as fast

as possible in order not to slow down the client application, I’ve selected

the JSON format as the response format of my web service.

For the development framework of the web service, the main options

were WCF .NET and J2EE.

Windows Communication Foundation (WCF) is a framework for Vi-

sual Studio for building service-oriented applications. Services are exposed

as service endpoints, that can be part of a continuously available service

hosted by IIS, or it can be a service hosted in an application. Clients of

a service that requests data from a service endpoint are considered end-

points themselves. The messages can be as simple as a single character or

word sent as XML, or as complex as a stream of binary data and can be

sent asynchronously from one service endpoint to another or to another

endpoint [36].

Figure 4.2: .Net architecture

Short for Java 2 Platform Enterprise Edition. J2EE is a platform-independent,

Java-centric environment from Sun for developing, building and deploying


Web-based enterprise applications online. J2EE is a platform independent

solution deployed in a single language (Java), although it does have sup-

port for other languages [37]. The latest version of the J2EE specification

has been augmented with the addition of several libraries to support Web

services. The two primary APIs are as follows:

• Java API for XML-Based RPC (JAX-RPC) is an API that enables de-

velopers to develop and deploy Web services.

• Java API for XML Registries (JAXR) provides a uniform and standard

API to access different kinds of XML registries.

• Several other APIs provide functionalities like sending and receiving

XML-based messages (JAXM), processing XML (JAXP), and binding

Java objects to XML documents (JAXB).

Figure 4.3: J2EE architecture

John Grundy, Zhong Wei, Radu Nicolescu and Yuhong Cai [38] pro-

pose a paper where the improve their own tool work in order to investi-

gate support for thin-client architecture modelling and performance anal-

ysis. Their work has been performed over J2EE and ASP.NET web services

and tests were run with three networked PCs, one each used to host the


client (ACT tool), web server and components (JSPs/ASPs) and database

(SQL Server 2000). The client requests and database server tables are iden-

tical, the middle-tier web components and servers being the difference. The

C#/ASP.NET version performs much faster than the JSP version in this

example. However, we used the Microsoft IIS web server, a commercial

performance-optimised platform for the ASP.NET hosting, but used an un-

optimised J2EE SDK application server to host the JSP web components.

Based on these considerations, I’ve decided to build the web service using

the .NET WCF framework.

4.3.2 Data model and methods

The Database, implemented on Microsoft SQL Server 2008 R2, is com-

posed of a single table that contains the data that I have defined while es-

tablishing the requirements. The table structure is:

Figure 4.4: Database Modello table

Where:

ID [ Primary Key, Int , Not Null]: the primary key of the element.

Nome [ nvarchar(50), Not Null ]: the name of the earring.

Prezzo [ deciaml(10 , 2), Not Null ]: the price of the earring.

Descrizione [ nvarchar(200), Not Null ]: a brief description of the earring.

Materiali [ nvarchar(50), Not Null ]: the materials of the earring, not of (if

any) the stone.

Pietra [ nvarchar(50), Not Null ]: the kind of stone (if any) of the earring

TipoChiusura [ nvarchar(50), Not Null ]: the kind of closure of the earring.

FilePath [ nvarchar(200), Not Null ]: the path to the location of the xml file

of the 3D model of the earring


Image [ nvarchar(200), Not Null ]: the path to the location of the png file of

the image of the earring

The database is acceded by the web service through the following ser-

vice endpoints:

GetCatalogo: This service endpoint retrieves all the information about

every earring except their 3D model. It is invoked before building the scroll

list of earrings in order to obtain the image and the name of the earring and

store the rest of the data for when the user decides to accede to it. While

most of the data is just passed through a String, the ID is passed as an

integer, the price as a float and for the image, the file is retrieved using its

path, then it is read as byte array and sent in the same format.

The data is encoded in JSON format as follows:

{

"EarringData": [{

"descrizione":"This elegant earring is from the new collection",

"id":1,

"imagepng":[137,80,78,71,13,10,26,10,0, ...],

"materiali":"Yellow gold",

"nome":"golden heart",

"pietra":"zircone",

"prezzo":23.45,

"tipochiusura":"pressure"

},

{

...

}]

}


GetXMLModel/"param" : This service endpoint retrieves the 3D model

of the earring. The parameter "param" indicates the ID of the earring I want

to retrieve the 3D model. This method was designed after having decided

how to import 3D models onto Android. Particularly, among the estab-

lished requirements I have decided that I do not want to save the file of the

3D model itself on the smartphone so I decided on a simple method that

I know how it works and the parameters it retrieves from the file of the

model, which turn out to be four arrays: one for the colours (Red, Green,

Blue, Alpha ), one for the faces, one for the normal to the faces, and one for

the position in space ( X, Y, Z ).

The data is encoded in JSON format as follows:

{

"Modello": [

"color":[0.909804,0.658824,0.364706,1,...],

"faces":[0,1,3,1,2,3,4,5, ...],

"id":2,

"normal":[0,0,-1,0,0,-1,0, ...],

"position":[0.480056,-0.131201,1.758275,0.459837,...]

]

}

4.4 3D modeling tools

Among the modelling tools, the most famous ones are Autodesk 3ds

Max and Blender.

Autodesk 3ds Max is a proprietary software for modeling, animating


and rendering virtual elements. Concerning 3D modeling, the software is

primarily based around polygon modeling. It is possible to create a wide ar-

ray of primitives, including cubes, cones, pyramids and teapots, to serve as

bases for more complex models. 3ds Max supports subdivision surfaces as

well, which makes your models look smoother. 3ds Max comes with many

different editing tools for manipulating your models, among which there is

the Soft Selection tool, which allows you to grab vertex clouds and tug on

them without creating unwanted geometry in your mesh-everything stays

smooth. As an alternative to polygon and sculpting methods, designers can

also build NURBs models. This curve-based modeling approach is used for

creating smooth surfaces and is particularly useful in simulating mechani-

cal parts where accuracy is essential [39]. Furthermore, 3ds Max is capable

of photorealistic global illumination. This refers to a type of lighting that

provides heightened detail to parts of an image that fall outside the reach

of direct lights. To accomplish this, 3ds Max employs a simplified radiosity

algorithms that simulate bouncing light. This is a computationally expen-

sive operation, which makes it impractical for real-time light-rendering sce-

narios like video games, but it is ideal for rendering movies and broadcast

television segments.

Blender is an open source Cross platform 3D software solution from

modeling, animation, rendering and post-production to interactive creation

and playback. It has a vast range of features regarding 3D modeling and

shading among which there are the possibility of using a range of 3D ob-

ject types including polygon meshes, NURBS surfaces, bezier and B-spline

curves, metaballs, vector fonts (TrueType, PostScript, OpenType), mesh mod-

eling based on vertex, edge and/or face selection, material previews ren-

dered by main render engine, modifier stack deformers such as Lattice,

Curve, Armature or Displace, smooth soft selection editing tools for or-

ganic modeling and more. Concerning illumination and shaders, it allows


to use diffuse shaders such as Lambert, Minnaert, Toon, Oren-Nayar, Lam-

bert and Specular shaders such as WardIso, Toon, Blinn, Phong, CookTorr

[40].

Both software are equipped with a huge number of features and effects

for 3D modeling, textures and shaders. In this case, since I’ve already had

some experience using Blender from my master course "Computer Graph-

ics and Applications", I’ve decided to use it to design the 3D models of the

earrings.

4.4.1 Importing the 3D models

Android includes support for 2D and 3D graphic development by means

of the Open Graphics Library (OpenGL), specifically, the OpenGL ES API.

OpenGL is a cross-platform graphics API that specifies a standard software

interface for 3D graphics processing hardware. OpenGL ES is a extract of

the OpenGL specification intended for embedded devices [41].

OpenGL is C++ native yet, the android framework API OpenGL has

been realized in Java. OpenGL C++ can still be used by means of Native

Development Kit (NDK).

At the moment, there are two available versions of OpenGL for android:

OpenGL ES 1.0 and OpenGL ES 2.0. While the first version has been re-

leased together with the first version of android, the second one has been

released for android Froyo version (API 8). The major difference between

OpenGL ES 1.x and Open GL ES 2.0 is the removal of the fixed pipeline,

which is replaced by a shader-based pipeline. The OpenGL ES 2.0 API does

not provide any formal functions for setting up lighting, or setting mate-

rial, or rasterization parameters. Instead, the programmer creates their own

’per verFtex’ and ’per fragment’ programs which will run directly on the

graphics hardware. The OpenGL ES Shading Language is used to write

these ’shader’ programs; it is a subset of the OpenGL Shading Language.


Unlike desktop OpenGL 2.0, OpenGL ES 2.0 does not allow use of the fixed

function pipeline at all, so applications written for OpenGL ES 1.x are not

compatible with OpenGL ES 2.0 [42].

Having said this, the more powerful OpenGL ES 2.0 allows a more cus-

tomized handling of shades and lights, which I have decided not to use. My

choice is given by the fact that the 3D models of the earrings are relatively

small and such effects would not be plenty appreciated since most of the

users could not notice them. Moreover, I’ve avoided every adding effect or

detail that could slow down the rendering, this includes also the texture

of the earrings ( which, of course, would also slow down the download of

the model from the web service since it is increasing the file size and the

increase the amount of time needed for the parsing and increase the space

required ). Since I have already established that I’m using the new android

face detection functionality, which requires the API 14 ( and that means

that it includes both OpenGL ES 1.0 and OpenGL ES 2.0 compatibility), my

choice of using the OpenGL ES 1.0 version instead of the ES 2.0 version has

been made based only on previous considerations and performance related

requirements.

By using OpenGL ES 1.0, it is possible to create simple 3D objects by

just preparing four buffers: one for the vertex, one for the color, one for the

normal and one for the faces:

gl.glVertexPointer(int size, int type, int stride, Buffer pointer)

gl.glColorPointer(int size, int type, int stride, Buffer pointer)

gl.glNormalPointer(int size, int type, int stride, Buffer pointer)

gl.glDrawElements(int mode, int count, int type, Buffer indices)

OGRE (Object-Oriented Graphics Rendering Engine) is a scene-oriented,

flexible 3D engine written in C++ designed to make it easier and more intu-


itive for developers to produce applications utilising hardware-accelerated

3D graphics. OGRE is made available under the MIT License [43]. OGRE

has released Blender Exporter, a blender plugin developed by OGRE that

allows to export the 3D model into xml format [44] or OgreMaxSceneEx-

porter, its counterpart developed for Autodesk 3ds Max. I have decided to

use the Blender Exporter plugin with Blender because I have found a 3D

model loader function based on it. Infact, the blog "Bay Nine Studios" that

provides a really simple object loader function, written for android and

compatible with OpenGL ES 1.0 [45] that parses such .xml format file of the

3D model and uses the previously mentioned OpenGL ES 1.0 functions to

render it.

My choice was led by the necessity of keeping the application perfor-

mance as fast as possible. By knowing how the parser of the 3D model

works, I have moved it onto the web service and reduced the computa-

tional calculus on the client prototype application. However, there are many

external libraries for Android to importing 3D models, the most famous are

min3D ( recently deprecated for the new Rajawali project), libgdx ( which

is actually a cross-platform game development library written in Java).

The min3D framework is based on OpenGL ES and you can use it to

develop 3D apps for Android. The major feature of min3D is the fact that

you don’t need to be an OpenGL specialist min3D is a lightweight 3d li-

brary/framework for Android using Java with OpenGL ES targeting com-

patibility with Android v1.5/OpenGL ES 1.0 and higher. It tracks closely

with the OpenGL ES API, which makes it ideal for gaining an understand-

ing of the OpenGL ES API while providing the convenience of an object-

oriented class library [46]. Recently, min3D project has been abandoned for

the Rajawali project. "Rajawali is a 3D framework for Android built on top

of the OpenGL ES 2.0 API. Its main purpose is to make things easy and to

take away the hard work that’s involved in OpenGL programming" [47].


Rajawali allows users to import .obj model giles, animations in .md2 and

.md5 format, point and directional lights, customize materials ( simple, dif-

fuse, phong, gouraud, toon, bump map, environment cube map, sphere

map, masked, particle ), bezier splines, catmull-rom splines, particles and

more.

Libgdx is a cross-platform game and visualization development frame-

work. It currently supports Windows, Linux, Mac OS X, Android, iOS and

HTML5 as target platforms. Libgdx allows you to write your code once and

deploy it to multiple platforms without modification [48]. It offers support

for OpenGL ES 2.0 for Android 2.0 and above through custom JNI bind-

ings. Libgdx allows developers to use rendering through OpenGL ES 1.0,

1.1 and 2.0 on all platforms, handling of Vertex arrays, Vertex buffer objects,

Textures, Shaders, .OBJ and .MD5 model loaders and many other features.

Libgdx does also include physics libraries: Box2D which is used for 2D

physics and also a experimental Bullet Physics wrapper which can be used

for 3D physics.

Also these libraries have been examinated at application design and

technologies selection time but they’ve been discarded both because, while

they offer a big number of effects and features, they’d be underemployed

and their use would have required to extrapolate the parsing of the .obj files

from the inner functions, move it onto the web service and create a custom

function that can load the 3D models by receiving only the minimum pa-

rameters instead of the 3D model file itself.

4.5 software design

This is the uml class diagram that I’ve used for the prototype of the

application:


Figure 4.5: "The Earring Store"UML 2.0 class diagram


4.5.1 Class analysis and software choices

For space reasons, I had to cut down most of the methods and vari-

ables from each class, so in this paragraph I am going to explain the most

significative ones and the purpose of each class that I have used.

• MainGallery: It is the main activity, it receives the data from the

classes ContactWebService and EarringJsonParse and builds up the

gallery.

• ContactWebService: This class purpose is to download from the web

service the requested content. It handles the different kinds of request

to the endpoints ( while the "GetCatalogueo" endpoint does not re-

quire any parameter, the "GetXMLModel" does require the ID of the

model to be downloaded) with one method and its override. Such

methods return the json string acquired from the web service.

• EarringJsonParse: This class uses the Jackson library (which is a li-

brary for parsing json strings directly into objects, without having to

explore the tree structure "by hand" ) its methods returns either the

full catalogue or the 3D model of an earring, accordingly to the re-

quest.

• EarringData: This class is data structure of the general earrings data

acquired from the web service. It has as variables the id, name, kind

of closure, description, material, stone and price of the earring and

their names match the ones on the web service, so that the jackson

library can perform the initialization automatically.

• EarringList: extension of the EarringData class, it is required by the

jackson libraries to perform the initialization.

• ThreeDModel: This class is data structure of the 3D model of an ear-

ring acquired from the web service. It has as variables the id, colour,


faces, normals and position of the earring and their names match the

ones on the web service, so that the jackson library can perform the

initialization automatically.

• ThreeDModelList: extension of the ThreeDModel class, it is required

by the jackson libraries to perform the initialization.

• DetailEarring: Second activity of the application, it builds the view

of the singular earring that the user has selected.

• TryOnPhoto: This activity allows the user to take a picture of him-

self/herself and, manually, place it on the picture. It allows the user

to resize the model, add a second model ( there cannot be more than

two or less than one models at a time ), rotate the model, change the

focus from one model to the other ( if present ) and remove one model

( if there are at least two models ).

• AugmentedRealityEarring: This activity allows the user to try the

virtual mirror effetct. it shows the image preview acquired from the

camera and the face detected ( one at most ) surrounded by a green

rectangle. The earrings 3D models are positioned on the sides of the

rectangle. It allows to increase or decrease the size of the 3D models

and their distance ( only horizontally ).

• Utils: This class contains the methods to understand if the camera is

being accessed from the emulator or from a real device and to cal-

culate the best size of the camera preview for when either a photo is

taken or the mirror effetc.

• GLEarringModel: This class builds the OpenGL buffers required for

the renderer to process and draw the 3D model of the earrings.

• GLEarringRenderer: This class renders the 3D models of the earrings


and handles every functionality that concern them such as the trans-

lation around the display, the rotation, the scale etc.

• MatrixGrabber, MatrixStack, MatrixTrackingGL: These three classes

are required in order to perform the gluUnProject function. OpenGL

ES 1.0 does not have any method to grab the model view and projec-

tion as OpenGL ES 2.0 does. Such function takes as input the origi-

nal x, y and z coordinates in the display coordinates system, the cur-

rent modelview and projection matrix and returns the output vector

objX, objY, objZ, that contains the computed object coordinates into

OpenGL coordinate system.

4.5.2 Head pose estimation issues

While the face detection function of Android 4.0 icream sandwitch works

fine, there are still some issues that condition part of the work, in particu-

lar, the head pose estimation. In the official Android developers website it

is specified that, for the coordinates of the mouth, left and right eye: " This is

an optional field, may not be supported on all devices. If not supported, the

value will always be set to null. The optional fields are supported as a set.

Either they are all valid, or none of them are." [49]. At the current moment,

I’ve tested the aforementioned functionality on Samsung Galaxy Nexus,

Samsung Galaxy S3 ( which is, at the moment, the newest and most pow-

erful android based device in the market ) and LG-P700 smartphones. Nei-

ther of those smartphones returns something different than "Null" when

asked for the features of the found face. Searching in the web I have not

been able to find about any android based smartphone that has full compat-

ibility with the new face detection function that I’ve adopted in my project

on the contrary, on the website Stackoverflow a user says that he/she has

tested that function on on a Galaxy Nexus, Nexus 4, nexus 7 and Nexus 10

and that none of those smartphones have full compatibility [50].


This means that, even if the face detection is high performance, I can’t

get the data necessary for the estimation of the head’s pose. Since the rect-

angle that surrounds the face doesn’t change its shape accordingly to the

face pose and when the user turns the face ( it just shifts from towards the

direction the face is now looking ), I’ve decided that, even if it is a funda-

mental part of the virtual mirror effect, it was necessary to cut it out from

the project. Unfortunately, android face detector ( API 1 ) does not provide

a sufficient amount of features to be able to build on top of it a head pose

estimator algorithm.

My approach to the problem was inspired by A. Nikolaidis and I. Pitas

[51]. They designed a gaze direction algorithm based only on geometri-

cal features. Such approach states that In my approach, I intended to use

as data the left eye, right eye and middle point of the mouth obtained

from the face detection. By supposing them as coplanar ( and, obviously,

non collinear), I thought of building a 2D triangle in space: "face symme-

try properties can be exploited in order to determine the angle between the

plane defined by these three facial features and the image plane ". They sug-

gest to track the gaze of a person by only knowing the position of his/her

eyes and middle point of the mouth ( which are the features that the an-

droid face detection should have provided ).

Figure 4.6: Finding the face and the triangle made of the middle point of the mouth and the

eyes

Once the face is found on the image, the three points form a triangle that


they use to calculate the gaze direction. By generalizing their thesis ( and

introduce a certain degree of errors ), I intended to use the same angular

approach they use to calculate the gaze direction in order to estimate the

user’s head pose. Android face detection should provide the coordinates

of the eyes as the average middle point of each eye.

Figure 4.7: Modeling the face: ABC Triangle builded on the face

we can state that, considering the triangle CDE denoted by the dashing

lines:

cosΦ = |CD|/|DE| (equation1)

And from the triangle ADE we know that:

|CD| = |CD| sin θ (equation2)


|AC| = |CD| cos θ (equation3)

Knowing that from the triangle ADE we get:

|AE|2 = |AD|2 + |DE|2 (equation4)

We also know that triangle ABEis isosceles ( due to the simmetry of

human face ) which provides us that:

|AB| = |AE|⇒ |AB|2 = |AE|2 (equation5)

Ψ = 180◦ − 90◦ −Θ (equation6)

ρ = 180◦ − 90◦ −Θ (equation7)

By storing as "frontal value" the value of Ψ as Ψf and ρ as ρf at initial-

ization time, and Ψ = 0 when the face is 90◦ towards right and ρ = 0 when

the face is 90◦ towards left , I can declare that the face is looking towards

right when Ψ value is greater than the one stored for the frontal face and

apply the same concept ( just towards left instead of towards right ) to ρ. By

defining my right rotation range as [ Ψf , 90◦ ] and my left one as [ ρf , 90◦

], I can normalize between Ψf and 90◦ into the range [0◦ , 90◦] and calculate

the rotation as:

Yaw =

90 ∗ (ρ− ρf)/(90 − Ψf) ρf < ρ < 90; Ψ < Ψf

−90 ∗ (Ψ− Ψf)/(90 − Ψf) Ψf < Ψ < 90; ρ < ρf

(equation8)

The above mentioned approach is used to calculate the Yaw angle. In

order to calculate the, I intended to use as pitch angle the between the 2D

line that passes through the eyes and X-axis. For the Roll angle I intended

to use the proportion between the distance between the eyes ( multiplied


to the cosine of the Yaw angle ) and the distance between the middle point

of the mouth and its projection on the line that links the eyes, taking of

course into account the previous position of the eyes and of the middle

point of the mouth in order to establish if the rotation is towards upside

or downside. Concerning the position of the head, its center can be located

on the interception between CD and BF for the x and y coordinates, for the

z coordinate it is necessary to obtain a parametric function that estimates

changing of the size of the head accordingly to its position in space.

This method is conceived with the intent of being fast and depends only

from the ( few ) data I ( should, but I do not ) have available. Physical

constraints of the motion of the face are supposed to be introduced in an

Extended Kalman Filter in order to improve the performances. Due to the

fact that, as mentioned before, the new android face detection function does

not provide the position of the points as it claims, I have not developed this

concept algorithm due to the impossibility of performing real, meaningful

test and improving the model.

Chapter 5

Prototype overview

In this chapter I ma going to provide a description of data generated for the

application, how the application works and screenshots of the application

5.1 Generated data analysis

In order test the prototype application, I have created four simple ear-

ring 3D models starting from four earring images. The original images

were:

Figure 5.1: Images used for the catalog’s earrings

Their size, in order from left to right, is of 52.3Kb, 159Kb, 124Kb, 26.1Kb.

With the only exception of the last image ( created by me ), the other ones

have been downloaded from the web. I’ve selected simple earrings in order

to simplify as much as possible their 3D models.

63

CHAPTER 5. PROTOTYPE OVERVIEW 64

The 3D models generated by blender and rendered on the application in-

stead look like:

Figure 5.2: Generated earring 3D models

The size of the .xml parsed files of the 3D models are, left to right, of

214Kb, 121Kb, 121Kb and 223Kb. I believe that the size of the 3D models

is quite big, specially taking into account that for none of the, the material

was stored. While being simple, their similarity with the original images is

sufficient, specially taking into account the size they are rendered. The size

of the images and of the earrings was meant to be kept at minimum, since

it influences both the download time and the rendering execution.

5.2 Activity flow

The following schema represents the activity flow of the application.

Figure 5.3: Prototype application Activity Flow


Each of the four activities ( Main gallery, Earring in detail, Try on photo,

Try on virtual mirror) will be analyzed in the following paragraphs

5.2.1 Main Gallery

The initial activity of the client application. It starts with a slideshow (

made of a sequence of four images). Such slideshow is used to mask the

fact that once the application starts, it begins the download of the catalog (

images and data ).

Figure 5.4: Maingallery activity start


The average download time is about 8 seconds and it is done from an

inner class that extends the Asynctaks, which is the class used to handle

asynchronous tasks. Once the download finish, it loads the downloaded

images and add them to the HorizontalScrollView that contains the ele-

ments of the catalog and the slideshow stops. Each image is LongClickable,

which means that if the user keeps the finger over the image longer than

two seconds, it counts as a normal click and the selected earring’s image is

moved into the "current focused earring window". This has been done in

order to avoid that every time the user rolls the menu doesn’t change the

selected earring. Once the user clicks on the selected image, the application

goes from the first to the second activity, which receives the image of the

selected earring and its data.

Figure 5.5: Maingallery activity operations

If the user decides to change the selected earring before seeing its de-

tails, he/she has just to click for at least two seconds over another earring in

the catalog. In case no earring has been selected and the user clicks over the

"current focused earring window", a message warns him/her that he/she

needs to select an earring before being able to the successive activity.


5.2.2 Earring in detail

As soon as the activity starts, it receives as parameter the image and the

data of the selected earring, fills the content and starts first downloading

the 3D model of the earring and later preparing it to be passed to either

"Try on photo" or "Try on virtual mirror" activity. Meanwhile, the buttons

are disabled until the download is finished.

Figure 5.6: Earring in detail activity overview

The average download and "preparation" time is 2 seconds", and is

done from an inner class that extends the Asynctaks. In order to go to the

next activity, the user has to click over the button of the desired activity.


5.2.3 Try on photo

The activity receives the earring’s 3D model and, in background, it starts

preparing it. Meanwhile, it starts previewing the frontal camera on the dis-

play. The user can take a photo ( which is not saved ) and decide if he/she

wants either to keep it or to redo it. If the user decides to keep it, the 3D

model of the earring is rendered on the display. The user has three trans-

formation options ( translation, rotation and scaling ) and three operation

options ( add earring, remove earring, change focus ). While the transfor-

mation options self explanatory, the operations have some constraints. In

order to add an earring, there must be only one earring on the display. In

order to remove an earring, it must be two earrings on the display. The

change focus operation switch the focus from one earring to the other, off

course it works only with two earrings over the display.

Figure 5.7: Try on photo activity operations

Since keeping allocating 3D models is memory consuming, they are

never deleted or really removed, just an internal variable of the earring (

visibility ) is set to false and the earring is not rendered until it changes

back to true.


5.2.4 Try on virtual mirror

The activity receives the earring’s 3D model and, in background, it starts

preparing it. Meanwhile, it starts previewing the frontal camera on the

display. The face detection function starts as soon as the camera preview

does. there are four buttons on the bottom of the display: " Change the

earring size", "Change The distance" ( it is used to change the horizontal

distance between the earrings ), "Restore default size" and "Restore default

distance". The buttons can be used to improve the accuracy of the earring

location or to enlarge the earring in case it is not visible.

Figure 5.8: Try virtual mirror activity overview

It is important to remark that, even if the face detection function does

not provided the necessary features for the head pose estimation, it has

proved fast and robust against different illumination conditions.


Figure 5.9: Android face detection performance on different illumination conditions

Being the application a mobile application, it is to be expected that it

has to work in different places each under different illumination conditions

instead than on a single room with always the same illumination conditions

as it happens for most desktop applications.

Chapter 6

Users evaluation

In this chapter I’ll evaluate the application. I am going to establish questions

that the users have to answer in order to provide an evaluation of the application

and verify if the goals of this thesis have been acheived. That is to say: increase

as much as possible the time performance of the application, verify the acceptance

of the mirror functionality based on the android 4.0 face detection and verify the

effectiveness of an E-commerce application with virtual mirror functionality ) have

been achieved.

6.1 Application operational time list

The following list contains the required amount of time for each activ-

ity:

Main gallery:

• Download the catalogue: ∼ 7.85 seconds

• Prepare the catalogue: no delay

• Update the "current focused earring window": no delay

• Prepare the data of the selected earring for the next activity: ∼0.13

seconds

71

CHAPTER 6. USERS EVALUATION 72

• Start the "Try on photo" activity: no delay

Earring in detail:

• Download the 3D model: ∼1.73 seconds

• Prepare the downloaded 3D model to be passed ( add it to the intent

) to the next acivity (either "try on photo" or "Try on virtual mirror"):

∼0.19 seconds

• Start the next activity: no delay

Try on photo:

• Take a picture: ∼0.57 seconds

• Confirm the picture: ∼0.34 seconds

• Go back to take another picture: ∼0.57 seconds

• Prepare and render the 3D models: ∼0.15 seconds

• Translation, Rotation, Scaling of the earring: no delay

• Add earring, Delete earring, Change focus operations: no delay

Try on virtual mirror:

• Start camera preview: ∼0.57 seconds

• Prepare and render the 3D models: ∼0.15 seconds

• Change earrings’ size or distance operations: no delay

• Restore earrings’ original size or distance: no delay

In the following paragraph I am going to provide users impressions of

the application’s time performance.


6.1.1 Time performance evaluation

Speed performance of the application was one of the main goals of this

thesis. If the application is reactive, the user will most likely be hold back

from stopping using it. In order to evaluate the time performance of the

application, users were asked to evaluate it by filling the following bench-

mark form:

A. In a scale from 1 ( very slow ) to 5 ( very fast ), how do you evaluate:

1. The required time to load the catalogue.

2. The required time to see the details of the earring.

3. The required time to be able to try the "try on photo" functionality.

4. The required time to be able to try the "try on virtual mirror" func-

tionality.

Users ( 40 testers ) answers were:

Question Answer = 1 Answer = 2 Answer = 3 Answer = 4 Answer = 5

#1 2 [5%] 8 [20%] 13 [32.5%] 14 [35%] 3 [7.5%]

#2 0 [0%] 0 [0%] 0 [0%] 3 [7.5%] 37 [92.5%]

#3 0 [0%] 0 [0%] 0 [0%] 5 [12.5%] 35 [87.5%]

#4 0 [0%] 0 [0%] 0 [0%] 5 [12.5%] 35 [87.5%]

Table 6.1: Answers to question A1, A2, A3 and A4

92.5%7.5%

YesNo

Figure 6.1: Answers to question B

B. Do you think that the ini-

tial slideshow is useful to mask

the catalogue loading time? Yes/

No


6.2 Virtual mirror activity analysis

The android 4.0 face detection time performance is satisfactory. On the

"Samsung Galaxy Nexus" smartphone, the average frame rate ( with the 3D

models of the earrings in the display ) is ∼20 FPS while the required time

to find the face on the preview is ∼1.82 seconds. In order to evaluate the

virtual mirror functionality, users were asked to evaluate it by filling the

following form:

C. In a scale from 1 ( minimum ) to 5 ( maximum ), how do you evaluate:

1. The fluency of the virtual mirror activity.

2. The speed required to find the face over the display.

3. The accuracy of finding the face over the display.

4. The accuracy of activity for positioning the models on face.

5. The effectiveness of the additional commands ( change size, change

distance etc).


Question Answer = 1 Answer = 2 Answer = 3 Answer = 4 Answer = 5

#1 0 [0%] 0 [0%] 0 [0%] 2 [5%] 38 [95%]

#2 0 [0%] 0 [0%] 3 [7.5%] 33 [82.5%] 4 [10%]

#3 2 [5%] 8 [20%] 13 [32.5%] 14 [35%] 3 [7.5%]

#4 9 [22.5%] 19 [47.5%] 12 [30%] 0 [0%] 0 [0%]

#5 0 [0%] 0 [0%] 2 [5%] 7 [17.5%] 31 [77.5%]

Table 6.2: Answers to questions C1, C2, C3, C4 and C5


D. What do you think of the following virtual mirror activity issues:

1. Aren’t the models not realistic enough ? Yes/No

2. Do the models follow accurately head’s motions ? Yes/No

3. Are the models too small ? Yes/No


Yes

57.5%

No

42.5%

Yes15%

No

85%

Yes

72.5%

No

27.5%

Figure 6.2: Answers to, left to right, questions D1, D2 and D3

Yes 100% No0%

Figure 6.3: Answers to question E

E. Do you think that by increas-

ing the accuracy of the earrings

over the head would increase the

appreciation level of the activity?

Yes/No

Yes10%

No

90%

Figure 6.4: Answers to question E1

1. If yes, do you think it would

even if this means ( considerably )

slowing down the fluency of the ac-

tivity? Yes/No


6.3 Application effectiveness

The application is an E-commerce one that includes augmented reality

virtual mirror activity in order to persuade the users to purchase an item

from the catalogue. I am going to now provide the benchmarks used to

evaluate if the "user persuasion" goal has been achieved or not.

F. Have you ever purchased clothes or accessories from a website or an

application? Yes/No


Yes

67.5%

No

32.5%

Figure 6.5: Answers to question F

1. If yes, how frequently? Once per: Day/Week/Month/Year/Other

Users ( 27 testers that voted "Yes" at question F ) answers were:

Day0%

Week14.8%

Month

55.6%

year7.4% Other

22.2%

Figure 6.6: Answers to question F1


2. If yes, do you usually purchase items from your desktop computer or

a mobile device ? Desktop / Smartphone or Tablet / Both

Users ( 27 testers that voted "Yes" at question F ) answers were:

Desktop

62.9%

Smartphone or Tablet

11.1%Both

26%


3. If yes, has ( at least ) one of the websites/applications you’ve pur-

chased from a virtual mirror functionality included to try the models? Yes/No

Users ( 27 testers that voted "No" at question F ) answers were:

Yes0%No 100%


Yes

77.5%

No

22.5%

Figure 6.9: Answers to question G

G. Regardless of the fact that

you may not purchase online items

and the earring models in the cat-

alogue, do you think you’d ever

buy an earring from the applica-

tion? Yes/No



1. If yes, which aspect of the application did persuade you? It’s simple

and neat/ The virtual mirror functionality / The try on photo functionality

/ All of these

Users ( 31 testers that answered "Yes" ) answers were:

It’s simple and neat0%

The virtual mirror functionality

22.6%

The try on photo functionality

6.4%

All of these

71%

Figure 6.10: Answers to question G1

2. If no, why? I do not like the items / The application doesn’t seem

trustworthy/ Other

Users ( 9 testers that answered "No" ) answers were:

I do not like the items

22.2%

The application doesn’t seem trustworthy

77.8%Other0%

Figure 6.11: Answers to question G2


H. Do you think the virtual mirror functionality improves the persua-

sion of the application?


Yes95%

No5%

Figure 6.12: Answers to question H

1. If yes, why? It allows to try the model/ It entertains/ it adds prestige

to the application

Users ( 31 testers that answered "Yes" ) answers were:

It allows to try the model

92.1%

It entertains7.9%

it adds prestige to the application0%

Figure 6.13: Answers to question H1


2. Would you like that other Websites/applications included virtual

mirror functionality? Yes/No


Yes 100% No0%

Figure 6.14: Answers to question H2

I. Are there other features ( either from other websites/applications or

still unavailable ) that you’d like to add to the application? Yes/No


Yes

77.5%

No

22.5%

Figure 6.15: Answers to question I

1. If yes, which features?

All the 31 users that answered "Yes" to question "I", asked for the possibility

to change the earring from one model to another, dynamically, in both "try

on photo" and "try on virtual mirror".


6.4 Application evaluation

The application is an E-commerce application that includes augmented

reality functionality, based on android 4.0 face detection, in order to in-

crease its persuasion over users and encourage them to purchase items

from the catalogue. The users have been asked many questions about the

application time performance and the importance of the virtual mirror func-

tion inside the application.

From the results of the questions, it is possible to establish that:

From the time performance point of view, users are satisfied with the

application, their answers state that they are not likely to stop using the

application due to its speed. In particular, users appreciated the use of the

slideshow to mask the download time, since their attention focus on the

slideshow rather than on the download time is. This states that the goal of

time performance has been achieved.

Concerning the virtual mirror functionality from a technical point of view,

users stated that it is fluent and enjoyable with some not ignorable lacks

for what concern its accuracy in case of non frontal face and the unrealistic

earrings’ model. When reputedly proposed the tradeoff between accuracy

and computational speed, users showed more appreciation of the current

status than of a more accurate and slower virtual mirror functionality. An-

droid 4.0 face detection has been proved fast and efficient however, in or-

der to increase users’ appreciation of the virtual mirror functionality, it is

necessary to implement a new face detection function that is both fast and

efficient that can be efficiently used to build a head pose estimator on top

of it. The goal of building a virtual mirror application using the android 4.0

face detection function is partially achieved.

Concerning the efficiency of introducing the virtual mirror functionality in

an E-commerce application, users stated that it has been successful: Users


considerably appreciated the possibility of having a preview of how the

item would look like on them. The virtual mirror functionality is a feature

that most of the current E-commerce applications lack and that the 95%

of the users appreciated.The choice of using android based smartphone as

platform has been proved, at least at the moment, unsatisfactory, since only

the 11% of the users do purchase online items only from smartphone and

tablet ( and this data does not take into account that there is percentage of

users that uses non android based devices). The data increases significantly

(38% ) if we consider the number of users that perform some online pur-

chases from both computer and smartphone/tablet ( again, this data does

not distinguish android users from other mobile OS users). Finally, the ap-

plication has successfully achieved the goal of increase its own persuasion

level over users by means of the virtual mirror functionality, 77.5% of the

users would purchase an item from the application ( only the 67.5% of the

users had declared that perform online purchases ).

Chapter 7

Conclusions and future works

In this last chapter, I am going to provide the conclusion of this thesis based on

the users answers and the possible future works in order to continue this thesis.

7.1 Conclusions

Tests have proved that the union of virtual mirror functionalities in E-

commerce applications increases the persuasion level of the application.

Users will most likely purchase the items that they can not only see in de-

tail, but also "try" and have a forecast of how that item would look like on

them. The persuasion level however, could be further increased by solving

the virtual mirror functionality issues: increasing the realism and the accu-

racy of the 3D models, adding incrustation effect and using bigger objects

( like, for example, glasses, hats etc ) instead of small objects like earrings.

The computational download time has been defined satisfactory and effi-

ciently masked by the initial slideshow, users are unlikely to stop using the

application for being slow. The goals of this thesis have been achieved ( al-

though the virtual mirror functionality performance has only partially been

achieved), adding virtual mirror functionalities to an E-commerce applica-

tion increases its effectiveness and grabs users’ attention. Virtual mirror

83

CHAPTER 7. CONCLUSIONS AND FUTURE WORKS 84

functionality should be implemented in every E-commerce application in

order to improve its persuasion effect and increase both the visits rate and

the sale rate.

7.2 Future works

The android 4.0 face detection has been proved efficient and fast. How-

ever, it has also been verified that ( at the moment ) there is no smart-

phone that can benefit from all the features it claims. Using computer vision

libraries for android such as OpenCV’s android porting or Qualcomm’s

FastCV, developing a customized android compatible face detection func-

tionality is possible. Moreover, since the face detection is self made and

customized, it can be tuned to implement a head pose estimator in order to

increase the efficiency of the virtual mirror functionality.

The computational speed of the application has been proved satisfac-

tory. However, in case of larger catalogue, the download time of the cat-

alogue is most likely to increase from the current ∼7.85 seconds. Using

smaller and more compressed images is most likely to improve the perfor-

mance but not to solve entirely the problem. Further researches regarding

the transmission of data via web could improve the performance. The ap-

plication size on the users smartphone is of 3.15Mb, which is fairly smaller

than may other E-commerce applications. It could be worth considering the

tradeoff between the download time and the application memory space:

by saving the catalogue on the smartphone, the download time would be

drastically reduced to catalogue updates only. Concerning the 3D models,

realism should be increased. The 3D models should be optimized while be-

ing designed in the 3D modeling tool and OpenGL ES 2.0 libraries should

be adopted ( instead of the current OpenGL ES 1.0 ) in the client appli-

cation in order to improve the rendering visual effect. Bigger models too

CHAPTER 7. CONCLUSIONS AND FUTURE WORKS 85

should be used: 72.5% of the users declared that the models are too small.

By changing the items of the catalogue to bigger items ( like hats, glasses,

etc) the usability should increase. Moreover, using bigger objects requires

the use of different techniques in order to create the incrustation effect of

the models and to increase the realism of the application.

Bibliography

[1] Ingrid Lunden. Mobile milestone: The number of smart-

phones in use passed 1 billion in q3, says strategy analytics,

url: http://techcrunch.com/2012/10/16/mobile-milestone-the-

number-of-smartphones-in-use-passed-1-billion-in-q3-says-strategy-

analytics/, visited on may 12, 2013, October 16, 2012.

[2] IDC Press Release. Android and ios surge to new smart-

phone os record in second quarter, according to idc , url:

http://www.idc.com/getdoc.jsp?containerid=prus23638712#.uowdbndwuh9,

visited on may 12, 2013, August 08, 2012.

[3] Sangchul Lee and Jae Wook Jeon. Evaluating performance of android

platform using native c for embedded systems. IEEExplore, pages

1160–1163, 2010.

[4] Android Developers. Android ndk, url:

http://developer.android.com/tools/sdk/ndk/index.html, vis-

ited on may 12, 2013.

[5] P. Milgram and F. Kishino. A taxonomy of mixed reality visual dis-

plays, August 25, 1994.

[6] Steve Mann. Mediated reality with implementations for everyday life,

url: http://wearcam.org/presence-connect, visited on may 12, 2013,

August 6, 2002.

86

BIBLIOGRAPHY 87

[7] Wendy E. Mackay. Augmented reality: Linking real and virtual worlds

a new paradigm for interacting with computers. ACM Digital Library,

1998.

[8] Olivier Hugues, Philippe Fuchs, and Olivier Nannipieri. New aug-

mented reality taxonomy: Technologies and features of augmented

environment. In B. Furht, Handbook of Augmented Reality, chapter

2:47–63, 1998.

[9] Marcus Toennis and David A. Plecher. Presentation principles in aug-

mented reality: Classification and categorization guidelines. Institut

fur Informatik der Technischen UniversitÂ¨at MÂ¨unchen, 2011.

[10] Dustin D. Brand. Human eye frames per second, url:

http://amo.net/nt/02-21-01fps.html, visited on may 12, 2013,

Feb 21, 2010.

[11] Mitan SoIanki and Vinesh Raja. Haptic based augmented reality sim-

ulator for training clinical breast examination. IEEExplore, pages 265–

269, 2010.

[12] Bernardini Andrea, Delogu Cristina, Pallotti Emiliano, and Costantini

Luca. Living the past: Augmented reality and archeology. IEEExplore,

pages 265–269, July 13, 2012.

[13] Yu Sheng, Theodore C. Yapo, Christopher Young, and Barbara Cut-

ler. Virtual heliodon: Spatially augmented reality for architectural day-

lighting design. IEEExplore, pages 63–70, 2009.

[14] Parhizkar B., Gebril, Z.M., Obeidy W.K., Ngan M.N.A., Chowdhury

S.A., and Lashkari A.H. Android mobile augmented reality applica-

tion based on different learning theories for primary school children.

IEEExplore, pages 404–408, 2012.

BIBLIOGRAPHY 88

[15] Sejin Oh and Yungcheol Byun. The design and implementation of

augmented reality learning systems. IEEExplore, pages 651–654, 2012.

[16] Andrea Bernardini, Cristina Delogu, Emiliano Pallotti, Luca Costan-

tini, Abdulfattah A. Aboaba, Shihab A. Hameed, Othman O. Khal-

ifa, Aisha H. Abdalla, Rahmat H. Harun, and Nurzaini Rose Mohd

Zain. Towards a computational effective paradigm for image guided

surgery (igs). IEEExplore, pages 1–5, 2011.

[17] Gamage P., Xie, Sheng-Quan Q Shane, Delmas Patrice, Xu Peter, and

Mukherjee S. Intra-operative 3d pose estimation of fractured bone

segments for image guided orthopedic surgery. IEEExplore, pages 288–

293, 2009.

[18] Jan Smisek, Michal Jancosek, and Tomas Pajdla. 3d with kinect. IEE-

Explore, pages 1154–1160, 2011.

[19] Victor Fragoso, Steffen Gauglitz, Shane Zamora, Jim Kleban, and

Matthew Turk. Translatar: A mobile augmented reality translator. IEE-

Explore, pages 497–502, 2011.

[20] Steven J. Henderson and Steven Feiner. Evaluating the benefits of aug-

mented reality for task localization in maintenance of an armored per-

sonnel carrier turret. IEEExplore, pages 135–144, 2009.

[21] Christoph Bichlmeier, Sandro Michael Heining, Marco Feuerstein, and

Nassir Navab. The virtual mirror: A new interaction paradigm for

augmented reality environments. IEEExplore, pages 1498–1510, 2009.

[22] Jerome Grosjean and Sabine Coquillart. The magic mirror: A

metaphor for assisting the exploration of virtual worlds, Dec. 2008.

[23] Alexandre François, Elaine Kang, and Umberto Malesci. handheld

virtual mirror. ACM Digital Library, pages 140–140, 2002.

BIBLIOGRAPHY 89

[24] T. Darrell, G. Gordon, J. Woodfill, and M. Harville. A virtual mirror

interface using real-time robust face tracking. IEEExplore, pages 616–

621, 1998.

[25] S. Pardhy, C. Shankwitz, and M. Donath. A virtual mirror for assisting

drivers. IEEExplore, pages 255–260, 2000.

[26] Ray Ban. Virtual mirror, url: http://www.ray-

ban.com/italy/science/virtual-mirror, visited on may 12, 2013.

[27] silhouette. Silhouette imirror, url:

http://www.silhouette.com/it/it/look/imirror/, visited on may

12, 2013.

[28] P. Eisert, P. Fechteler, and J. Rurainsky. 3-d tracking of shoes for virtual

mirror applications. IEEExplore, pages 1–6, 2008.

[29] Lu Wang, Ryan Villamil, Supun Samarasekera, and Rakesh Kumar.

Magic mirror: A virtual handbag shopping system. IEEExplore, pages

19–24, 2012.

[30] Erik Murphy-Chutorian and Mohan Manubhai. Head pose estimation

in computer vision: A survey. IEEExplore, pages 607–626, 2009.

[31] Jun Park and Woohun Lee. Augmented E-commerce: Making Augmented

Reality Usable in Everyday E-commerce with Laser Projection Tracking.

Springer Berlin Heidelberg, 2004.

[32] Jakob Nielsen. Jakob nielsen’s alertbox website response times, url:

http://www.nngroup.com/articles/website-response-times/, visited

on may 12, 2013, June 21, 2010.

[33] Android Developers. Retain an ob-

ject during a configuration change, url:

BIBLIOGRAPHY 90

http://developer.android.com/guide/topics/resources/runtime-

changes.html, visited on may 12, 2013.

[34] Fatna Belqasmi, Jagdeep Singh, Suhib Younis, Bani Melhem, and

Roch H. Glitho. Soap-based vs. restful web services a case study for

multemedia conferencing. IEEExplore, pages 54–63, 2012.

[35] Tommi Aihkisalo and Tuomas Paaso. Latencies of service invocation

and processing of the rest and soap web service interfaces. IEEExplore,

pages 100–107, 2012.

[36] Yatin V. Patil. What’s the difference between wcf and web

services?, url: http://www.nngroup.com/articles/website-response-

times/, visited on may 12, 2013, Dec 26, 2010.

[37] Webopedia. J2ee, url: http://www.webopedia.com/term/j/j2ee.html,

visited on may 12, 2013.

[38] John Grundy, Zhong Wei, Radu Nicolescu, and Yuhong Cai. An en-

vironment for automated performance evaluation of j2ee and asp.net

thin-client architectures. IEEExplore, pages 300–308, 2004.

[39] Autodesk. Autodesk 3ds max fea-

tures: Core technology & features, url:

http://usa.autodesk.com/adsk/servlet/pc/item?siteid=123112&id=18260605,

visited on may 12, 2013, 2013.

[40] Blender Foundation. Blender features, url:

http://www.blender.org/features-gallery/features/, visited on

may 12, 2013, 2012.

[41] Android Developers. Opengl, url:

http://developer.android.com/guide/topics/graphics/opengl.html,


BIBLIOGRAPHY 91

[42] POWERVR. Migration from opengl es 1.0 to opengl es 2.0. POWERVR,

pages 100–107, Aug 11, 2009.

[43] Assaf Raman, Holger Frydrych, Jim Buck, Dave Rogers, Mattan Furst,

Noam Gat, Brian Johnstone, Jan Drabner, and Murat Sari. Ogre, url:

http://www.ogre3d.org/about, visited on may 12, 2013.

[44] Ogre Forums. Blender exporter, url:

http://www.ogre3d.org/tikiwiki/blender+exporter, visited on

may 12, 2013, June 01, 2012.

[45] Bay Nine Studios. Importing 3d models in android: Textures, url:

http://www.bayninestudios.com/author/admin/, visited on may

12, 2013, August 23, 2011.

[46] Dennis Ippel. Min3d, url: http://code.google.com/p/min3d/, visited

on may 12, 2013, 2010.

[47] Dennis Ippel. Announcing rajawali: An opengl

es 2.0 based 3d framework for android, url:

http://www.rozengain.com/blog/2011/08/23/announcing-

rajawali-an-opengl-es-2-0-based-3d-framework-for-android/, visited

on may 12, 2013, Aug 23, 2011.

[48] Mario Zechner and Nathan Sweet. libgdx introduction, url:

http://code.google.com/p/libgdx/wiki/introduction, visited on

may 12, 2013, Jan 27, 2013.

[49] Android Developers. Camera.face class overview, url:

http://developer.android.com/reference/android/hardware/camera.face.html,


[50] budius. Stackoverflow:which hardware supports eyes and mouth de-

tection?, url: http://stackoverflow.com/questions/14708583/which-

BIBLIOGRAPHY 92

hardware-supports-eyes-and-mouth-detection, visited on may 12,

2013, Feb 5, 2013.

[51] A. Nikolaidis and I. Pitas. Facial feature extraction and pose determi-

nation. Pattern Recognition, pages 300–308, 2000.

POLITECNICO DI MILANO Sede di COMO · 2013-05-11 · Jhonny, Rodrigo, Crespi e Gadio, che con me...

Documents

Transcript of POLITECNICO DI MILANO Sede di COMO · 2013-05-11 · Jhonny, Rodrigo, Crespi e Gadio, che con me...