POLITECNICO DI MILANO Sede di COMO · 2013-05-11 · Jhonny, Rodrigo, Crespi e Gadio, che con me...
Transcript of POLITECNICO DI MILANO Sede di COMO · 2013-05-11 · Jhonny, Rodrigo, Crespi e Gadio, che con me...
POLITECNICO DI MILANO
Sede di COMO
Facoltà di Ingegneria dell’Informazione
Master of Science in Computer Engineering
Dipartimento di Elettronica e Informazione
The Earring Store: An android virtual mirror augmented reality
application based on the new android face detection feature
Relatore: Prof. Thimoty Barbieri
Tesi di Laurea di:
Leandro Francisco Bruno
Matricola n. 770321
Anno Accademico 2011-2012
Dedico questa tesi alla mia famiglia, lontana e vicina, che mi ha sempre
supportato.
Ringraziamenti
Questa laurea è stato costellata da momenti di alto e basso e desidero quindi
ringraziare tutte le persone che mi sono state vicine quando le cose non andavano
come avrei gradito e che hanno esultato con me quando funzionavano. Un grande
grazie a Kappa, Fabio, Cardani, Mario, Misha, Catta e Nielo, amici sin dalla
triennale e sempre disponibili per i miei innumerevoli dubbi. Grazie anche a Luca,
che con me è stato iniziato ad Android ed è sempre stato pronto per uno scambio
di opinioni.
Un grazie a Michele, Giammarco, Leonardo, Giuseppe, Matteo e tutta la
combricola di Como, che hanno reso i due anni della specialistica divertenti e mai
banali, specialmente per il calcetto! Grazie anche a PG, il Fo, Artiòm, il Groppo,
Jhonny, Rodrigo, Crespi e Gadio, che con me hanno alternato momenti di pura
demenza a duro lavoro, nonchè le indimenticabili partite a Worms.
Grazie anche a Vincenzo e Frank, miei fedeli seguaici. Un grazie al Tosco, Gardo e
a quelli di Ferrara che, per quanto lontani, sono sempre con me.
Grazie al Prof. Barbieri, per avermi ascolato e guidato nell’ultima fatica della
laurea.
A tutti voi, Grazie di cuore.
iii
Contents
Abstract x
Estratto xii
1 Introduction 1
1.1 Overture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Document structure . . . . . . . . . . . . . . . . . . . . . . . . 2
2 Background information 4
2.1 About smartphones . . . . . . . . . . . . . . . . . . . . . . . . 4
2.2 Android overview . . . . . . . . . . . . . . . . . . . . . . . . 6
2.2.1 Software architecture . . . . . . . . . . . . . . . . . . . 7
2.2.2 Applications structure . . . . . . . . . . . . . . . . . . 9
2.2.3 SDK and NDK . . . . . . . . . . . . . . . . . . . . . . 11
2.3 Augmented reality . . . . . . . . . . . . . . . . . . . . . . . . 12
2.3.1 Taxonomies . . . . . . . . . . . . . . . . . . . . . . . . 12
2.3.2 Applications and human vision . . . . . . . . . . . . . 17
2.3.3 Hardware setups for augmented reality . . . . . . . . 18
2.3.4 Application areas . . . . . . . . . . . . . . . . . . . . . 21
3 "The Earring Store": Objectives and Related Works 30
3.1 Introducing "The Earring Store" project . . . . . . . . . . . . 30
3.2 Related works . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
iv
CONTENTS v
3.2.1 Virtual Mirror . . . . . . . . . . . . . . . . . . . . . . . 31
3.2.2 Other E-commerce virtual mirror projects . . . . . . . 33
3.2.3 Face detection and head pose estimation . . . . . . . 35
3.3 Motivations and Goals . . . . . . . . . . . . . . . . . . . . . . 36
4 Designing the prototype of the application 40
4.1 Establishing requirements . . . . . . . . . . . . . . . . . . . . 40
4.2 Users analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
4.3 Architectural design . . . . . . . . . . . . . . . . . . . . . . . 43
4.3.1 Web service adopted technologies . . . . . . . . . . . 44
4.3.2 Data model and methods . . . . . . . . . . . . . . . . 47
4.4 3D modeling tools . . . . . . . . . . . . . . . . . . . . . . . . . 49
4.4.1 Importing the 3D models . . . . . . . . . . . . . . . . 51
4.5 software design . . . . . . . . . . . . . . . . . . . . . . . . . . 54
4.5.1 Class analysis and software choices . . . . . . . . . . 56
4.5.2 Head pose estimation issues . . . . . . . . . . . . . . 58
5 Prototype overview 63
5.1 Generated data analysis . . . . . . . . . . . . . . . . . . . . . 63
5.2 Activity flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
5.2.1 Main Gallery . . . . . . . . . . . . . . . . . . . . . . . 65
5.2.2 Earring in detail . . . . . . . . . . . . . . . . . . . . . . 67
5.2.3 Try on photo . . . . . . . . . . . . . . . . . . . . . . . . 68
5.2.4 Try on virtual mirror . . . . . . . . . . . . . . . . . . . 69
6 Users evaluation 71
6.1 Application operational time list . . . . . . . . . . . . . . . . 71
6.1.1 Time performance evaluation . . . . . . . . . . . . . . 73
6.2 Virtual mirror activity analysis . . . . . . . . . . . . . . . . . 74
6.3 Application effectiveness . . . . . . . . . . . . . . . . . . . . . 76
6.4 Application evaluation . . . . . . . . . . . . . . . . . . . . . . 81
CONTENTS vi
7 Conclusions and future works 83
7.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
7.2 Future works . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
List of Figures
2.1 Android architecture structure . . . . . . . . . . . . . . . . . 8
2.2 Android Manifest example . . . . . . . . . . . . . . . . . . . . 11
2.3 Continuum space and mixed reality . . . . . . . . . . . . . . 13
2.4 Mediated Reality taxonomy . . . . . . . . . . . . . . . . . . . 14
2.5 Functional augmented reality taxonomy graph . . . . . . . . 16
2.6 Head Mounted Display vision schema . . . . . . . . . . . . . 20
2.7 Haptice sensor system . . . . . . . . . . . . . . . . . . . . . . 21
2.8 Virtual Heliodon lighting simulation . . . . . . . . . . . . . . 22
2.9 Example of Surface rendering of preoperative data . . . . . . 24
2.10 Example of augmented reality game . . . . . . . . . . . . . . 26
2.11 TranslateAR in operation . . . . . . . . . . . . . . . . . . . . . 27
2.12 Military mechanics conducting routine maintenance prototype 29
4.1 Architectural design Deployment Diagram . . . . . . . . . . 43
4.2 .Net architecture . . . . . . . . . . . . . . . . . . . . . . . . . . 45
4.3 J2EE architecture . . . . . . . . . . . . . . . . . . . . . . . . . 46
4.4 Database Modello table . . . . . . . . . . . . . . . . . . . . . . 47
4.5 "The Earring Store"UML 2.0 class diagram . . . . . . . . . . . 55
4.6 Finding the face and the triangle made of the middle point
of the mouth and the eyes . . . . . . . . . . . . . . . . . . . . 59
4.7 Modeling the face: ABC Triangle builded on the face . . . . . 60
vii
LIST OF FIGURES viii
5.1 Images used for the catalog’s earrings . . . . . . . . . . . . . 63
5.2 Generated earring 3D models . . . . . . . . . . . . . . . . . . 64
5.3 Prototype application Activity Flow . . . . . . . . . . . . . . 64
5.4 Maingallery activity start . . . . . . . . . . . . . . . . . . . . . 65
5.5 Maingallery activity operations . . . . . . . . . . . . . . . . . 66
5.6 Earring in detail activity overview . . . . . . . . . . . . . . . 67
5.7 Try on photo activity operations . . . . . . . . . . . . . . . . . 68
5.8 Try virtual mirror activity overview . . . . . . . . . . . . . . 69
5.9 Android face detection performance on different illumina-
tion conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
6.1 Answers to question B . . . . . . . . . . . . . . . . . . . . . . 73
6.2 Answers to, left to right, questions D1, D2 and D3 . . . . . . 75
6.3 Answers to question E . . . . . . . . . . . . . . . . . . . . . . 75
6.4 Answers to question E1 . . . . . . . . . . . . . . . . . . . . . 75
6.5 Answers to question F . . . . . . . . . . . . . . . . . . . . . . 76
6.6 Answers to question F1 . . . . . . . . . . . . . . . . . . . . . . 76
6.7 Answers to question F2 . . . . . . . . . . . . . . . . . . . . . . 77
6.8 Answers to question F3 . . . . . . . . . . . . . . . . . . . . . . 77
6.9 Answers to question G . . . . . . . . . . . . . . . . . . . . . . 77
6.10 Answers to question G1 . . . . . . . . . . . . . . . . . . . . . 78
6.11 Answers to question G2 . . . . . . . . . . . . . . . . . . . . . 78
6.12 Answers to question H . . . . . . . . . . . . . . . . . . . . . . 79
6.13 Answers to question H1 . . . . . . . . . . . . . . . . . . . . . 79
6.14 Answers to question H2 . . . . . . . . . . . . . . . . . . . . . 80
6.15 Answers to question I . . . . . . . . . . . . . . . . . . . . . . . 80
List of Tables
6.1 Answers to question A1, A2, A3 and A4 . . . . . . . . . . . . 73
6.2 Answers to questions C1, C2, C3, C4 and C5 . . . . . . . . . 74
ix
Abstract
Mobile devices have widely spread over years, and particularly of An-
droid OS based smartphones, introducing, consequently, a huge number
of applications. In this paper I attempt to introduce android virtual mirror
functionalities in an E-commerce prototype application in order to evalu-
ate users reactions and behaviors. Augmented reality is being introduced
in a growing number of applications and what I want to do is to establish
whether its use in E-commerce applications can attract new costumers and
improve the selling rate of the application.
In order to verify my thesis, I have designed the prototype application,
called "The Earring Store". "The Earring Store" is an android application
that simulates a virtual earring store. I have adopted the Android 4.0 face
detection functionality and used it as the basis of the head pose estimation
in order to perform the virtual mirror effect. In order to improve the realism
of the application, 3D models of the earrings have been designed. The cata-
logue of the application stores an image, its 3D model and some other data
for each earring; it will be entirely managed through a RESTful webservice
I have designed and created and that is in charge of acceding the database
and provide the required data, in JSON format, to the application.
An analysis of time performance, virtual mirror functionality perfor-
mance and users impressions have been used to establish the effectiveness
of the prototype and more in general, of the virtual mirror functionality in
E-commerce applications. Users have broadly appreciated the virtual mir-
x
ABSTRACT xi
ror functionality even though, due to some issues pertinent the android 4.0
face detection function, it does not work properly whenever the user ro-
tates his/her head. Time performance has also been declared acceptable,
download time of the catalogue and 3D models do not bother the users.
Users also have stated that the virtual mirror functionality increases the in-
terest towards the application and the items and that they are more willing
into purchase an item from an application that includes the aforementioned
virtual mirror functionality rather than from another application that does
not.
Estratto
I dispositivi mobile si sono diffusi ampiamente negli ultimi anni ed,
in particolare, smartphone basati su Android OS introducendo, di con-
seguenza, un gran numero di applicazioni. In questo documento cerco di
valutare le reazioni ed i comportamenti di utenti di fronte ad un appli-
cazione E-commerce che contiene funzionalità virtual mirror. La realtà au-
mentata viene usata in un crescente numero di applicazioni di vario tipo
e ciò che punto a stabilire è se il suo utilizzo in applicazioni E-commerce
possa attrarre nuovi utenti ed incrementare le percentuali di vendita.
Al fine di verificare la mia tesi, ho realizzato il prototipo dell’applicazione,
chiamata "The Earring Store". Si tratta di un applicazione android di un
negozio virtuale di orecchini. Ho adoperato la funzionalità di face detec-
tion di Android versione 4.0 e l’ho usata come base per la head pose es-
timation per l’effetto virtual mirror. Con l’intento di aumentare il realismo
dell’applicazione, ho creato i modelli 3D degli orecchini. Il catalogo dell’applicazione
che, per ogni orecchino, contiene un immagine, il suo modello 3D ed altri
dati, sarà gestito interamente tramite un webservice di tipo REST che ho
disegnato e realizzato e che si occupa di accedere al database e fornire i
dati richiesti, in formato JSON, all’applicazione.
Analisi delle performance temporali, della funzionalità virtual mirror e
delle impressioni degli utenti sono state eseguite al fine di stabilire l’efficacia
del prototipo e, più in generale, della funzionalità virtual mirror in un ap-
plicazione E-commerce. Gli utenti hanno largamente apprezzato la fun-
xii
ESTRATTO xiii
zionalità virtual mirror anche se, a causa di alcuni problemi legati alla fun-
zione face detection di Android 4.0, risulta altamente imprecisa quando
l’utente ruota il capo. Le performance temporali sono state valutate ac-
cettabili, il tempo di download del catalogo e dei modelli 3D non infas-
tidisce gli utenti. Gli utenti hanno affermato che la funzionalità virtual mir-
ror aumenta l’interesse verso l’applicazione e gli oggetti "esposti" e che si
sentono più propensi ad acquistare i suddetti oggetti che oggetti da altre
applicazioni che non includono la funzionalità virtual mirror.
Chapter 1
Introduction
In this chapter I am going to provide a brief summary of this thesis first, point-
ing out its motivations and goals and then I am going to explain how this document
is organized
1.1 Overture
This thesis has as its purpose the creation of an Android mobile applica-
tion that integrates virtual mirror augmented reality functionalities based
on the new Android 4.0 Icecream sandwitch face detection function in or-
der to allow users to not only to see but also to "try" the item they are look-
ing at. Since application belongs to the E-commerce field, the focus will not
be on typical E-commerce applications’ functionalities but on the virtual
mirror functionality and the impression that it causes on users.
In this work I provide my own approach to the creation of an android
augmented reality application. It will include the functionalities of face de-
tection, head pose estimation. The application will interact with a webser-
vice for obtaining 3D models of the items and a catalogue with images and
other information.
1
CHAPTER 1. INTRODUCTION 2
The application has to be light and fast. The communication with the
web service and 3D model rendering operations are designed in order to
be fast and smooth while performing, improving the user experience. The
augmented reality functionalities are supposed to be handled with the new
Android face detection only in order to provide an analysis of this new
feature, too. The effectiveness of this approach will be evaluated by ques-
tioning the testers about their opinion of the application time related per-
formance, about the augmented reality virtual mirror performances and
about the effectiveness of such approach in order to persuade the testers to
purchase an item from the catalogue.
The application is just a prototype, so there will be no features typical of
online shopping or any data regarding ( real or fictitious ) store.
The testers will be will be chosen regardless of age and gender and will try
the prototype firsthand before providing their evaluation. Such evaluations
will be gathered and analyzed in order to find out if the application does
satisfy them and if not, where and what it is lacking.
1.2 Document structure
This work is divided into seven chapters, including the current one that
briefly introduces the objectives of this thesis and some functionalities of
the application.
• In chapter two I will provided the background information indis-
pensable for understanding the concepts behind the application.
• In chapter three I will provided the key concepts this work is based
on, the work motivations, its purposes and goals and related works
from literature review.
CHAPTER 1. INTRODUCTION 3
• In chapter four I will provided an overview of the design of the ap-
plication. Such overview includes the establishment of the require-
ments, user analysis and the architectural design, the adopted tech-
nologies for the web service, the database, creating the 3D models
and the client application itself.
• In chapter five I will provided an overview of the application by
means of screenshots, functionalities and analysis of how the appli-
cation works.
• In chapter six I will established the evaluation parameters and the
metrics for doing such evaluation.
• In chapter seven there will be the conclusions, a final analysis of the
evaluated parameters and some guidelines for future improvements
and developments.
Chapter 2
Background information
In this chapter, I am going to introduce the main background information nec-
essary to understand this work and its concept ideas. In the first part of this chap-
ter I am going to introduce smartphones and this era’s boom of mobile devices and
Android Operative System while on the second part I am going to describe Web-
services approaches and analyze the concept of Augmented Reality both from a
theoretical and from a more practical point of view.
2.1 About smartphones
Defined a feature phone as a mobile phone which at the time of produc-
tion has additional functions over and above a basic mobile phone which is
only capable of voice calling and text messaging, it is possible to define the
concept of smartphone by saying that the distinction between smartphones
and feature phones can be vague, and there is no official definition for what
constitutes the difference between them but one of the most significant dif-
ferences is that the advanced application programming interfaces (APIs) on
smartphones for running third-party applications can allow those applica-
tions to have better integration with the phone’s OS and hardware than the
typical of feature phones.
4
CHAPTER 2. BACKGROUND INFORMATION 5
The first smartphone was designed by IBM in 1992 and was called Simon,
it was shown as a concept product that year at COMDEX expo. A refined
version was commercialized to the public in 1994 at $ 1099.
Apart from being a mobile phone, its weight was 500 g, it contained some
applications as calendar, world clock, calculator, address book entries and
email. It could moreover send and receive FAX and included games. It had
no physical buttons to dial but had a touch screen to be used with an op-
tional stylus.
From 1994 technology has made huge progress, both in terms of hardware
and software, that allowed the production of more powerful, smaller and
more efficient devices. Current smartphones have considerably decreased
in size and weight ( Ex: Samsung Galaxy S3 weights 133 g, which is 3,75
times less than the weight of IBM Simon), they use multi core technology
low consumption CPU and have integrated graphic GPU.
From the commercial point of view, smartphones have reached 1.038 billion
devices sold (third quarter of 2012), which indicates an increment of 47%
with respect to the number of devices sold in the third quarter of 2011. Neil
Mawston, Executive Director at Strategy Analytics, estimates that 1.038 bil-
lion smartphones works that out to 1 person in every 7 owns a smartphone,
meaning that there are still more feature phone users out there, and that
there is still much more growth to come, at an even faster pace: "Smart-
phone penetration is still relatively low," he writes. "Most of the world does
not yet own a smartphone and there remains huge scope for future growth,
particularly in emerging markets such as China, India and Africa. The first
billion smartphones in use worldwide took 16 years to reach, but we fore-
cast the next billion to be achieved in less than three years, by 2015." [1]
.
Previous statement remarks that smartphones sell rate is increasing at
incredible pace and these this phaenomena are not going to stop anytime
CHAPTER 2. BACKGROUND INFORMATION 6
soon and indicates the necessity for developers to get into smartphones
application developments in order to reach a larger market. Among all
the smartphones, there is no standard OS, which implies that each OS has
its features and requires proper attention in order to develop applications
(though there are some tools that allow multi platform application devel-
opment). Modern mobile operating systems combine the features of a per-
sonal computer operating system with touchscreen, cellular, Bluetooth, WiFi,
GPS mobile navigation, camera, speech recognition, voice recorder, music
player, personal digital assistant (PDA) and other features.
According to International Data Corporation (IDC), the majority of smart-
phones OS market ( in the second quarter of 2012) is divided between An-
droid ( 68.1% ), iOS ( 16.9% ), Blackberry ( 4.8% ), Symbian ( 4.4% ) and
Windows Phone ( 3.5% ).
While this data remarks that Android based smartphones are the ones that
sell the most, it is important to notice that Android’s market share suc-
cess in the market can be traced directly to Samsung, which accounted for
44.0% of all Android smartphones shipped in 2Q12 and totaled more than
the next seven Android vendors’ volumes combined [2].
In the next paragraph I will introduce some of the main features of An-
droid OS that had lead it to be the leader in the smarphones OS market.
2.2 Android overview
Developed by Android Inc in 2005 and sold to Google in 2005, Android
OS is a free and open source operative system for smartphones and tablets
distributed by Google.
The first Android base phone ( HTC Dream ) was sold in October 2008
and run Android 1.0 version which among its features had: Web Browsers,
Camera, support Google Maps, Google Search, Google, Media Player, YouTube
CHAPTER 2. BACKGROUND INFORMATION 7
player and Wi-Fi and Bluetooth support. In the next versions Android im-
proved and introduced many features such: Widgets support, Video record-
ing and playback in MPEG-4, Multi-lingual speech synthesis, multi-touch
event tracking, native support for sensors as gyroscopes and barometers,
multitasking, USB connection and more. At the moment, it has been re-
cently released the version 4.2 Jelly Bean ( API 17 ).
In line with open source philosophy, Android OS has been developed in
Java and relies on Linux Kernel. However, Android does not have a native
X Window System by default nor does it support the full set of standard
GNU libraries, and this makes it difficult to port existing Linux applica-
tions or libraries to Android.
While being Android an open source OS, in order to release a commer-
cial device based on Android trademark, the device is required to satisfy
the Compatibility Definition Document ( CDD ), a document that contains
hardware and software requirements. Among those, there are API compat-
ibility, supported standards, minimum application compatible and min-
imum hardware available, in order for the device to be considered as a
unique smartphone or tablet that posses a set of characteristics that allow
cross compatibility to other devices.
2.2.1 Software architecture
Android system Architecture is made of four levels. As introduced in
the previous paragraph, Android stack software includes the linux kernel
in order to handle energy management, process management, driver in-
puto/output etc, at the lowest level.
In the immediately above level there are the custom 2D graphic libraries
called SGL, while 3D graphic libraries are based on OpenGL ( currently
Android supports OpenGL 2.0 EL ) with optional hardware acceleration.
SQLite is in charge of archive data using a relationship database and FreeType
CHAPTER 2. BACKGROUND INFORMATION 8
handles text rendering both vectorial and bitmap. Other libraries (of the
same level) include security protocol SSL support, multimedia data execu-
tion, streaming, etc.
On top of the libraries level, there are the Application Framework level and
the Application level. While the second is in charge of handling the appli-
cations, the first works as a base over which applications lie.
Figure 2.1: Android architecture structure
Android includes a minimum set of applications and Google Play ser-
vice in order to allow users to download certified applications, some for
free other for a certain price; besides it allows developers to upload their
own.
While lower levels are written in C, applications are written in Java and exe-
cuted via a virtual machine called Dalvik, that has an architecture "register-
based" and compiles "just-in time".
The byte-code from .class files is converted into .dex format ( Dalvik ex-
ecutable ), which is optimized for devices with low memory and perfor-
mance. The code and eventual resources or data are then compressed into
CHAPTER 2. BACKGROUND INFORMATION 9
a unique file .apk ( android package ), which is used to install the appli-
cation. Each application is handled as a unique process, thanks to Linux
kernel multiuser support, so that each different user has his/her own user
ID, with exclusive access from the OS.
Android makes available to each application only the components that are
immediately necessary for their proper execution and denies the access to
other parts of the system they have no access privileges. Data and resource
sharing is still possible using a mechanism that allows applications to share
the same user ID, although it requires that the applications have the same
digital certificate. There are, however, some data that are accessible only
if the user provides his/her consent at installation time ( such as the tele-
phone book, SMS messages, etc) and are not reachable without it.
2.2.2 Applications structure
Android application projects have a well defined structure and are writ-
ten in java ( with some exceptions ). In the /src folder there are all the
classes ( unless they are written in C ), in the /libs folder there are the ex-
ternal jars while resources are placed in the /res folder. Resources include
/drawable, folders used to contain images in different resolution ( in or-
der to adapt them to the device’s resolution). In /layout we can find all the
layouts ( xml files that contain the graphical elements ) used by the appli-
cation. In /values there are xml files that define the colours, strings and
styles used by the application. Files such as videos and others are stored in
the /raw folder.
Android applications are based on classes called Activities. An activity is a
single, focused thing that the user can do. Almost all activities interact with
the user, so the Activity class takes care of creating a window in which it is
possible to place the User Interface ( UI ) by including a Layout or coding
it. Since they have to be used by users, activities have methods for pause,
CHAPTER 2. BACKGROUND INFORMATION 10
resume etc that allow them to be used and keep their status even if inter-
rupted ( for example, by a phone call ). Activities can communicate among
them via "intent", which is an abstract description of an operation to be per-
formed ( to start either a new activity or a service ).
A layout defines the visual structure for a user interface, such as the UI. A
layout can be declared in two ways:
• Declare UI elements in XML. Android provides a straightforward
XML vocabulary that corresponds to the View classes and subclasses,
such as those for widgets and layouts.
• Instantiate layout elements at runtime. The application can create View
and ViewGroup objects (and manipulate their properties) program-
matically.
Each element can have other graphical properties and requires an unique
id. The id of the element is the link between the element itself and the activ-
ity: it allows to associate particular properties to that element and handling
its behavior.
The structure of the application is defined in an xml file called "manifest". In
this file the name of the application is declared, together with the icon, the
package and the label its uses. Moreover, there is a list of all the activities
that the application can use ( logic classes are not required to be declared )
and it points which one is the starting activity. It is also declared the target
SDK and the minimum SDK that is compatible with the application. More-
over all the permissions, required to access protected API calls ( such as
acceding at internet or the usage of the camera ) that the application needs
are declared.
While the manifest stores the main elements of the application, it is im-
portant to remark that it does not store any information about what those
elements do or how they interact among them.
CHAPTER 2. BACKGROUND INFORMATION 11
Figure 2.2: Android Manifest example
2.2.3 SDK and NDK
Android applications are written mainly in Java, libraries and docu-
mentation are provided in the Software Development Kit ( SDK ). SDK can
be either used via scripts and prompt commands or via Android Develop-
ment Tools ( ADT ) plugin for Eclipse. Using the SDK Manager tool, down-
loading, updating and deleting extra components among which there are
the different Android APIs, tools and sample projects is possible. Android
emulators can be created, configured, edited and deleted using the Android
Virtual Device Manager ( AVD ). For each emulator the installed API, CPU
architecture, RAM size, SD size, internal storage size, etc can be config-
ured. Select existing devices is also possible: by doing so, the emulator will
be configured as those devices, thought it will still possible to change size
of RAM, internal storage etc.
While Android libraries are written in Java ( on top of a C++ linux kernel
), native code in C and C++ can be included: by using the Native Devel-
opment Kit ( NDK ) and Java Native Interface ( JNI ) native code can be
re-used and included in Android applications. Sangchul Lee and Jae Wook
Jeon have carried out an analysis of performances between an application
CHAPTER 2. BACKGROUND INFORMATION 12
written in Java and another one that used native C code. They focused
the comparison over Maths operations and stated that in every part of
the experiment-integer calculation, floating-point calculation, memory ac-
cess algorithm, heap memory allocation algorithm, using native C library
achieves faster results than using the same algorithm running on Dalvik
virtual machine only [3] .
While deducing from Sangchul Lee and Jae Wook Jeon’s work that C
native code is way superior to Java code and that the delay introduced by
using the JNI communication does not influence the experimental results in
any way, it is important also to keep in mind that for other operations that
do not include complex calculation, performance gain of native code could
be minimal if not slower than Java code. Particularly, Android developers
suggest the use of native code (either C or C++) in case of self-contained,
CPU-intensive operations that do not allocate much memory, such as signal
processing, physics simulation, and so on [4] .
2.3 Augmented reality
2.3.1 Taxonomies
In modern era concepts of what is virtual and what is real are becom-
ing more diffuse due to the technological innovations that are being intro-
duced. These concepts, which seem to be one the opposite of the other, are
not totally separated though: they are connected through the Virtual Con-
tinuum, which is the space that links the real environment to the virtual
environment (and vice-versa).
In this space Fumio Kishino and Paul Milgram have provided a taxonomy
for the definition of Mixed Reality: a subclass of Virtual Reality related tech-
nologies that involve the merging of real and virtual worlds [5].
CHAPTER 2. BACKGROUND INFORMATION 13
Defined the range as the virtual continuum and its bounds as the real
environment ad virtual environment, we can define, inside Mixed Reality,
Augmented Virtuality as the augmentation of the virtual environment via
real phenomena such real objects or physical laws and Augmented Reality
as the enhancing of the real environment by means of virtual ( computer
graphic generated ) objects or information.
Augmented reality applications allow users to visualize data on a dis-
play which are not present in the reality by performing real time processing
of reality input ( which can be geospatial data, visual data from a camera
etc ).
Figure 2.3: Continuum space and mixed reality
Steven Mann extends Fumio Kishino and Paul Milgram’s Mixed real-
ity definition by adding a second dimension called Medility [6]. Medial-
ity introduces the concept of mediated Reality, which generalizes Mixed
reality concept and allows a classification which is not necessarily depen-
dent on virtual elements. Furthermore, Mediality introduces the concept of
Dimished reality, which indicates digital processing and discharging of real
elements from the input video, which is not classifiable in Milgram contin-
uum since there is no virtual element present. In further detail, Mann’s
taxonomy can be described by a Cartesian plane that has as its origin pure
CHAPTER 2. BACKGROUND INFORMATION 14
Reality ( R ), Virtuality ( V ) as x ax and Mediality ( M ) as y ax. Mediality
indicates the degree of executed changes, the further from the origin, the
more changes have been effected and the more it gets further from reality.
For remarkable values of M, Augmented Reality and Augmented Virtuality
are classified as Medialted reality and Mediated Virtuality. In the diametri-
cally opposite point to the origin, the perception of the pure virtuality is so
strong that it is severed from reality (severely mediated virtuality).
Figure 2.4: Mediated Reality taxonomy
The previous schema shows the relationship between Augmentation
and Mediation of Reality and Virtuality. In this case too, like in Space con-
tinuum case, the boundary between Reality and Virtuality is not perfectly
defined. Moreover, while extending the space by adding one dimension,
Mann’s space has to deal with the consequent introduction of a boundary
in the new dimension, in particular between of Mediation and Augmen-
tation. Since there is no such a constraint that forbids to introduce virtual
elements while removing real ones from the input, separation between Me-
diation and Augmentation is not perfectly defined too.
Previous definitions of Augmented Reality do not take into account any
CHAPTER 2. BACKGROUND INFORMATION 15
further distinction inside Augmented Reality definition itself as other tax-
onomies do, they just define the main area of Augmented Reality without
analyzing the content of the area itself.
Here is a short list of some taxonomies that consider more detailed analysis
of Augmented Reality.
Wendy E. Mackay’s analysis of Augmented Reality identifies three ways to
perform the augmentation based on what is augmented [7] :
• Augment the user: it is the first approach to Augmented Reality that
was ever used. The user wears the interface (usually keeps the object
on the hands or on the head)
• Augment the physical object: sensors are placed over the physical
object and they interact with a computer providing information ( about
location, temperature etc ).
• Augment the environment surrounding the user and the object: the
environment is ( usually ) augmented by means of a projector while at
least one camera fetches information from users ( like some gesture,
gaze tracking etc ) and a computer elaborates the information in order
to provide the appropriate response.
Instead, Olivier Hugues, Philippe Fuchs and Olivier Nannipieri pro-
pose a functional based taxonomy of Augmented Reality applications [8].
The first level divides functionality by augmentation of either perception
or Environment. Next level specifies the kind of augmentation and the last
level subdivides by geometrical and physical interactions between real and
virtual objects.
• Incrustation: objects displayed on the image regardless of any physics
law or geometry.
CHAPTER 2. BACKGROUND INFORMATION 16
• Integration: objects overlapped on the image taking into account physics
and geometry.
Figure 2.5: Functional augmented reality taxonomy graph
It is important to remark that, in the last level of the hierarchy, classi-
fying both applications that incrustate virtual entities over real images and
applications that incrustate real images on a virtual environment is possi-
ble.
Marcus Toennis and David A. Plecher proposal is based on six classes
of presentation principles that cover different aspects of the application [9]:
• Temporal: temporal property of a presentation, it can be continuous
or discrete.
• Dimensionality: continuum between 2D ( symbolic ) and 3D (virtual
objects ) information presentation.
• Registration: Virtual objects to the environment can be unregistered
( no alignment to the 3D world ), registered (the object is shown as
if it embeds into the environment by having the same perspective
and by appearing on the correct 2D position on the screen ) and can
be presented in a so-called contact-analog manner (In addition to the
information being respectively registered on the 2D screen, contact-
analog presentation displays the objects in the correct focal depth ).
CHAPTER 2. BACKGROUND INFORMATION 17
• Frame of Reference: The continuum spans between egocentric ( uses
the same point of view from which the user perceives the real scenery
to place the virtual camera for object rendering ) and exocentric pre-
sentation ( an object shown from another point of view, such as a
mini-map ).
• Referencing: Deals with the relation of the object of concern with re-
spect to the user of the AR system can see.
• Mounting: differentiates where a virtual object or information is mounted
in the real world ( objects furthermore can have more than one mount-
ing point ) .
Each application is categorized by classifying it by each class and lists
the results in a table, in this way applications are classified in the six di-
mensions space previously defined.
2.3.2 Applications and human vision
Augmented reality applications require input, elaboration and output
components in order to be used. Devices such computers, smartphones,
laptops etc usually contain all required hardware components in order to
process Augmented reality applications.
Hardware components have a big impact in the quality of those applica-
tions: if the hardware components are not powerful enough, quality and in
particular frame rate of the application will decrease drastically.
Human eyes perceive reality as continuous yet people usually think that 30
Frames Per Second ( FPS ) is the limit of our vision system. This misconcep-
tion has been cleared by Dustin D. Brand who has written an article about
how many frames per second human eyes can perceive. In his article, Brand
remarks that the limitation of perceived FPS is due to the viewing device
and not human eyes, in fact he says :
CHAPTER 2. BACKGROUND INFORMATION 18
"The United States Air Force ( USAF ), in testing their pilots for visual re-
sponse time, used a simple test to see if the pilots could distinguish small
changes in light. In their experiment a picture of an aircraft was flashed on
a screen in a dark room at 1/220th of a second. Pilots were consistently able
to "see" the afterimage as well as identify the aircraft. This simple and spe-
cific situation not only proves the ability to perceive 1 image within 1/220
of a second, but the ability to interpret higher FPS." [10].
While Brand demonstrated that human perception can go far over 30
FPS, what truly concerns Augmented reality applications is the frame rate
below which perception goes from fluent to brokenly, with a clear and dis-
tinctive perception of the passing from one frame to another.
In order to avoid previously described effect, it is important that, while
designing and testing any visual involved application, the minimum target
frame rate should be around 15-20Hz on less powerful devices while opti-
mal frame rate is equal or greater than 30Hz on the more powerful ones.
2.3.3 Hardware setups for augmented reality
The choice of hardware sensor devices is strictly application related.
Based on the purpose on the application those components are selected
and integrated ( if missing ) in the augmented reality system.
Input hardware components for Augmented reality applications are usu-
ally made of one camera for acquisition of visual related data. More cam-
eras can be simultaneously used in order to process 3D stereoscopic, which
is used in order to create the illusion of depth in an image by means of
stereopsis for binocular vision, or in order to perceive depth and analyze
3D motion, such gesture based kinects applications.
CHAPTER 2. BACKGROUND INFORMATION 19
Other input components could be the Accelerometer, GPS, Gyroscope etc...
which are typical of smartphone and tablet devices. Those sensors, which
do provide information about device position, orientation and motion, are
mostly used in applications that require georeferenced data or some spe-
cific movement of the device.
Other sensors too, like thermic sensors can be used as input for augmented
reality applications.
Efficiency of augmented reality applications relies in sensors data correct-
ness and transmission speed: wrong data could lead to a inexact input anal-
ysis and produce meaningless output while slow transmission data reduces
the application speed and leads to brokenly perception of the frame rate.
Output hardware device allow user interaction with the application. It is
essential that one or more displays are available in order to visualize the
video streaming output. Among the visual displays used in augmented re-
ality applications, there are computer monitors, mobile devices displays
and head-mounted displays.
Computer monitors and mobile devices display are the most used output
device ( since they are an essential component of computer and mobile de-
vice, it used to take for granted that most of the users of the application
have them ).
Head-mounted displays ( HMD ) have either one or two small displays
with lenses and semi-transparent mirrors embedded in a helmet, eye-glasses
( also known as data glasses ) or visor in order to project the data processed
after camera acquisition and adapted in order to match eyes position ( and
sometimes even eyes gaze). The display units are miniaturised and may
include LCD, OLED, etc... Adopting multiple micro-displays solution in
order to increase the total resolution and field of view is possible for head
mounted displays.
CHAPTER 2. BACKGROUND INFORMATION 20
Figure 2.6: Head Mounted Display vision schema
Another kind of output hardware is haptic hardware device. Haptic
hardware devices are used to enhance the user experience and deliver fur-
ther tactile knowledge about the properties of an object. Additional force
stimuli are transferred via the haptic interface in conjunction with what-
ever force response is being received from the real object, augmenting the
overall perception of the feel of the object [11]. Haptic technology is often
used in areas such medicine where, even if it is necessary, it is not always
possible to perform test e.g: a heart surgery test training for a doctor stu-
dent could lead to disastrous consequences, if a mistake were to happen.
CHAPTER 2. BACKGROUND INFORMATION 21
Figure 2.7: Haptice sensor system
2.3.4 Application areas
• Archeology/Architectural: Archeological remnants in urban areas tend
to be included in the urban landscape or even remain hidden in sub-
terranean locations which are not visible and, for these reasons, they
are accessed with difficulty by visitors [12]. Moreover, they are ( of-
ten ) deteriorated and their appearance can be way far off from the
original one. Augmented Reality technology is used to provide users
( in this case, visitors ) not only multimedia information such as his-
torical background of the archeological site ( through text, images,
video or even 3D models animation) but also a virtual reconstruc-
tion of the original building, improving the experience provided by
the tour. In this case, the most common hardware solution adopted
are smartphones that are used to recognize the different markers that
represent a particular piece of information or historical building each
in order to retrieve the correct multimedia information and provide
it on the display and/or on the headphones in case of audio format
information.
Concerning Architectural environment, Augmented Reality is mainly
used to prepare, analyze and study the development and the sta-
tus of a model or building. Among possible research fields, illumi-
nation and spatial related issues are the most studied. Particularly,
CHAPTER 2. BACKGROUND INFORMATION 22
Yu Sheng, Theodore C. Yapo, Christopher Young and Barbara Cutler
[13] present Virtual Heliodon, an application of interactive global il-
lumination and spatially augmented reality to architectural daylight
modelling that allows designers to explore alternative designs and
new technologies for improving the sustainability of their buildings.
Images of a model in the real world, captured by a camera above the
scene, are processed to construct a virtual 3D model.
Figure 2.8: Virtual Heliodon lighting simulation
• Education: Augmented reality is found to be very effective in the
field of teaching and this can be used in order to build interest in
students and young children to the study concepts which are imagi-
nary and are difficult to understand. By Merging these two concepts
of Augmented reality and mobile learning and deeply studying the
concept of mobile augmented reality, the idea is to develop an interac-
tive mobile augmented reality application based on the best learning
practices in which the interactive science book will act as a marker
and the web and the mobile camera will work as a tracking device
to trigger the new level of study experience of general science con-
cepts such as the study of materials, solids, liquids and gases, differ-
ent phenomenon they go through, universe and the galaxies, the basic
human skeleton parts, digestive and respiratory systems etc. Mobile
learning or M-learning through the use of mobile device allows any-
one to access information and learning materials from anywhere and
at any time. M-learning focuses on the mobility of the learner and in-
CHAPTER 2. BACKGROUND INFORMATION 23
teracting with portable device, like laptop, PDA, smart mobile phone
etc [14].
An example of M-learning Augmented reality application is presented
by Sejin Oh and Yungcheol Byun [15]. They propose a augmented re-
ality learning system that enables users to experience flower garden-
ing with an interactive agent over a physical book. To allow users to
cooperate with the interactive agents in a real space, the system over-
lays a virtual flower garden over a physical book by detecting and
tracking the page of the book through a camera and assign collabora-
tive works on the gardening to the learners. To improve learners’ en-
gagements in gardening, a picture is augmented with an interactive
agent that assists users in achieving desired goals in the gardening
environment. Specifically, it allows users to seamlessly interact with
an interactive agent with their mobile devices.
• Medical: In medical environment, Augmented Reality technology has
found many research possibilities. Particularly, in surgery related op-
erations, techniques such Image-guided surgery (IGS) and Computer-
aided surgery (CAS) have been developed in order to improve the
success rate, reduce risks and obtain an improved view of the sta-
tus of the patient. Image-guided surgery is mostly employed in min-
imally invasive operation. IGS model can be depicted as made up
of four stages linked up in a continuous chain. At each of the stages
forming the model, one or more computation or application of cer-
tain mathematical based procedure is executed before moving on to
the next stage. The stages are listed in order of execution as follows;
(a) image acquisition: the IGS protocol employs some aspects of pa-
tient registration methods like patient layout, imaging modality, field
strength, scan sequence, slice thickness, et cetera needed to correlate
the reference position of a virtual 3D dataset with the patient’s ref-
CHAPTER 2. BACKGROUND INFORMATION 24
erence position which will be useful at other stages of IGS, (b) pre-
operation planning: Using the image acquired in the previous step,
and achieved information such as the slice thickness, a 3D virtual
image (where the "damaged area" is segmentated in order to avoid
the risk of post-surgical morbidity by accidentally damaging adjacent
structures.)is produced and the surgeon performs simulations of the
operation over the reconstructed image,
Figure 2.9: Example of Surface rendering of preoperative data
(c) surgical interventional stage: At this point, integration of medical
images and other sources of information such as tracked instruments
is accomplished by enforcing the earlier segmentation and the actual
surgery is performed, and (d) post-operation monitoring which is
mostly handled at outpatient level. It also encompasses the use of ra-
diological images for post operation monitoring, treatment and med-
ication to patients. [16].
Another employment of IGS is in the 3D pose estimation of fractured
bones: The typical imaging modality currently used intraoperatively
in orthopedic IGS is fluoroscopic images. These images can be ac-
quired in real-time but are 2D. Hence they lack the spatial informa-
tion contained in 3D volumetric modalities. Thus a crucial enhance-
ment over current orthopedic IGS systems would be to enable the use
of 3D volumetric information during interventional procedures. This
CHAPTER 2. BACKGROUND INFORMATION 25
can be achieved by registering pre-operatively obtained 3D volumet-
ric data to 2D images that are acquired intraoperatively. This 2D-3D
registration will identify the position and orientation of the patient’s
anatomy in six degrees of freedom and will provide real-time 3D vi-
sualization, intraoperatively. Fracture reduction, hip and knee arthro-
plasties and several spinal procedures (pedicle screw placement) are
currently seen as potential areas of orthopedic surgery benefited by
intra-operative 3D visualization through a 2D-3D registration process
[17].
These techniques require the highest precision, even the lesser error
could lead to the worst consequences, so new techniques are being
developed and tested in order to prevent this from happening.
• Games and entertainment: Augmented reality and games are suc-
cessfully meshed-up and have increased Augmented reality popu-
larity and diffusion. Among the firsts successful Augmented reality
videogames there is "EyeToy: Play" (2003), a hardware and software
system that interacts with the console Sony Playstation 2. EyeToy in-
corporates a unfique USB camera that utilizes motion-tracking tech-
nology for gesture recognition so that gamers instantly become the
main character in their own game. In 2007 Sony released an upgraded
version of EyeToy, called "Playstation eye", in bundle with an Aug-
mented reality gamed called "The eye of judgement", for Playstation
3 console. The game required players to arrange physical cards over
a grid in order to play against other players or the console itself. The
cards were used as markers that were recognized by the camera and
were overlapped by their own 3D models on the screen.
In 2009, Microsoft presented the Kinect peripheral, which consists of
a Infra-red projector, Infra-red camera and a RGB camera. Kinect is
receiving a lot of attention thanks to rapid human pose recognition
CHAPTER 2. BACKGROUND INFORMATION 26
system developed on top of 3D measurement. The low cost, reliability
and speed of the measurement promise to make Kinect the primary
3D measuring devices in indoor robotics, 3D scene reconstruction,
and object recognition. Kinect can be used both with console Xbox
360 and windows ( 7 and 8 ) based computers [18]. In 2011 Microsoft
released official non commercial drivers for Kinect application devel-
opment, which provided another boost to Augmented reality games
developing. In the next years, Nintendo released the portable con-
sole Nintendo 3DS ( 2010 ), which is equipped with the software AR
Games. Particularly, it is suited for marker recognition based games:
since the console has two cameras on its side, no extra peripheral is
required. Furthermore, the console’s display allowds the 3D visual-
ization without any goggles required.
Figure 2.10: Example of augmented reality game
Currently, Augmented reality games trend is moving on the mobile,
console, table and smartphones market. Android and iOS operative
systems for smartphones are compatible with a big quantity of aug-
mented reality games, among which there are "AR Invaders", "iSnipe
you" and "DroidShooting". Smartphones games usually rely on the
device camera, GPS and accelerometer in order to capture informa-
tion regarding the surroundings and the smartphone position ( and
CHAPTER 2. BACKGROUND INFORMATION 27
orientation ) itself, that can also be used with markers or object recog-
nition, to provide the augmentation of the game.
• Translation: Using text recognition algorithms, the application can
recognize foreign language writings, translate them into the selected
language and display the translated text on the device’s display. Vic-
tor Fragoso, Steffen Gauglitz, Shane Zamora, Jim Kleban and Matthew
Turk [19] realized an Augmented reality application for smartphones
called TranslateAR. The application applies text detection algorithms
in order to find the text over the scene. Once the text has been found,
it is encircled by a bounding box. The resulting quadrilateral region of
interest is warped into a rectangle, correcting any perspective distor-
tion and showing the text as if seen orthogonally. The warped image
produced is used to extract background and foreground colour, as
well as to "read" the word via Optical character recognition ( OCR ).
Finally, the user taps over the text that he/she wants to translate into
the desired language and its translation is first performed by Google
and secondly overlaped over the screen.
Figure 2.11: TranslateAR in operation
• Military: Among all the possible application fields that augmented
reality technology has, there is the military field as well. In military
environment, soldiers can benefit from AR, which is often used in
order to increase the Situation Awareness (SA) or maintenance and
CHAPTER 2. BACKGROUND INFORMATION 28
repair operations, by using HMD and improve their efficiency. Re-
garding SA based AR technology, soldiers obtain through the display
dynamic information about surroundings, their own location, target
position and other multimedia information that is synchronized with
operative center and teammates by using satellite or wireless tech-
nologies.
On the other hand, maintenance and repair based augmented reality
applications, are used, as their names say, in order to help the sol-
diers while performing maintenance and repair operations.The ma-
jority applications focuses on specific subsets of the domain, which
can be categorized as activities involving the inspection, testing, ser-
vicing, alignment, installation, removal, assembly, repair, overhaul,
or rebuilding of human-made systems.
Steven J. Henderson and Steven Feiner [20] propose an analysis of
maintenance and repair base augmented reality applications and their
own prototype. Particularly for each task, the application provides
five forms of augmented content to assist the mechanic:
1. Attention-directing information in the form of 3D and 2D ar-
rows.
2. Text instructions describing the task and accompanying notes
and warnings.
3. Registered labels showing the location of the target component
and surrounding context.
4. A close-up view depicting a 3D virtual scene centered on the
target at close range and rendered on a 2D screen-fixed panel.
5. 3D models of tools (e.g., a screwdriver) and turret components
(e.g., fasteners or larger components), if applicable, registered at
their current or projected locations in the environment.
CHAPTER 2. BACKGROUND INFORMATION 29
Figure 2.12: Military mechanics conducting routine maintenance prototype
The aforementioned augmentations are visualized through an HMD
( that uses two displays ). An android smartphone wireless wrist-
worn controller has also been integrated in order to allow the user to
replay an animated sequence or control its speed. It provides also for-
ward and back buttons that allow the mechanic to navigate between
maintenance tasks. When viewing tasks with supporting animation,
additional buttons and a slider are provided to start, stop, and control
the speed of animated sequences. This controller is used as interface
since the head mounted display does not provide any physical or vir-
tual interface itself.
Chapter 3
"The Earring Store": Objectives
and Related Works
In this chapter I am going to first introduce the project. Then I am going to
provide information and a brief introduction to the prototype I have created and
some related works. Lastly, I am going to provice the goals and motivations of this
thesis
3.1 Introducing "The Earring Store" project
The "The Earring Store" is the prototype of a Augmented Reality appli-
cation for Android that I have developed for this thesis. The main idea of
this prototype is that of a catalogue-like application for an earring store that
integrates Augmented Reality functionality known as Virtual Mirror. The
client application interacts with a custom web service in order to get the
catalogue of earrings ( made of images, prices and other data ) and "build"
a local catalogue on the user device. Once the user selects a particular el-
ement from the catalogue, he/she can see the available information and,
eventually, decide if "try" it using the smartphone as a virtual mirror or
take a picture and try it on the picture. In case the user decides to try the
30
CHAPTER 3. "THE EARRING STORE": OBJECTIVES AND RELATED WORKS31
selected item, the application downloads the 3D model of such item from
the web service and allows the user to try it.
This concept application falls inside the categories of virtual mirror and
E-commerce/advertisment application ( since it is, in fact, an online shop-
ping application), where the first category indicates a particular function-
ality and the second one indicates its purpose.
Being an Augmented Reality application for smartphone, the "The Earring
Store" requires access to the camera of the device. Moreover, it also requires
access to Internet in order to fetch the catalogue and the 3D models from the
web service. Such requirements, together with the Android version, will be
available, as introduced in the previous chapter, for users on the page of the
application in the Google Play store once the application will be uploaded.
The 4.0 "Icecream sandwitch" version is the minimum Android version
in order for the application to correctly work. This is due to the fact that
Android 4.0 introduces new camera related functions, among them an im-
proved dalvik native face detection function that I’ve selected and used to
perform the virtual mirror functionality.
3.2 Related works
3.2.1 Virtual Mirror
Mirrors are well known and widely used in computer graphics to en-
hance the realism of virtual scenes [21]. The Magic Mirror is a user interface
technique that mimics a hand mirror. In addition to providing the optical
effect of a real mirror, several non-physical extensions are also proposed.
As a metaphor, the Magic Mirror is an intuitive and easy to learn interac-
tion technique. It can be combined seamlessly with most navigation tech-
niques. And it is easy to use because the only task involved is similar to
the manipulation of an object [22]. The mirror is supposed to reflect real-
CHAPTER 3. "THE EARRING STORE": OBJECTIVES AND RELATED WORKS32
ity, show us reality that surrounds us from another point of view. Alexan-
dre François, Elaine Kang and Umberto Malesci [23] presented their work
"handheld virtual mirror" where they introduced a scenario consisting of
a camera mounted on a display monitor and a magnetic tracking system.
It records streaming video as input and provides on the display the video
reflected as if it were a real mirror. However, it is not an augmented reality
application since they do not introduce any virtual objects on the scene. The
mirror can also mislead, distort the reality, distort it instead of reproduce it
accurately. T. Darrell, G. Gordon, J. Woodfill and M. Harville [24] describe
a virtual mirror interface which can react to people using robust, real-time
face tracking. The display can directly combine a user’s face with various
graphical effects, performed only on the face region in the image.
Among the many possible situations where a mirror is needed, there is
driving. The main purpose is to provide the driver of a car beside the infor-
mation from the traditional rear view mirror with additional information
in a mirror-like display that is not directly visible. Sameer Pardhy, Craig
Shankwitz and Max Donath [25] report on the idea of extending the rear
view mirror in a car with additional information from a DGPS (Differ-
ential Global Positioning System), "an onboard geo-spatial database" and
"radar or inter-vehicle communication" to provide the driver with an entire
knowledge about position, distance and speed of other vehicles and objects
close to his or her own car. Donath from the same group filed a patent on
this topic and called it "virtual mirror".
All of the mentioned mirrors referred to mixed or augmented reality ap-
plications used as a real display reflecting the world like a mirror. Some of
them are capable of augmenting the reflected world with additional infor-
mation registered with real objects. My mirror application belongs to this
second category .
CHAPTER 3. "THE EARRING STORE": OBJECTIVES AND RELATED WORKS33
3.2.2 Other E-commerce virtual mirror projects
The mirror metaphor combining reality with an augmentation of vir-
tual objects has been presented for improving different commercial applica-
tions: the customer is able to view oneself in a stationary mirror-like display
that augments clothes or accessories for instance. The following projects are
just a brief selection of some of the existing E-commerce virtual mirror ap-
plications and prototypes.
Probably the most famous virtual mirror E-commerce application is the
Ray Ban virtual mirror [26]. Developed by Ray Ban for desktop pc, it can
be downloaded from the Ray Ban web site. Once it starts, it connects to the
web site to check if there are new models and then, using the web cam of
the computer, it opens the video stream. The application performs first face
detection by making the user position his/her face inside a small ellipse in
the display and fitting face shape and eyes position. Then, they track the
users face and perform head pose estimation and overlap to the face the
glasses that the user wants to try. The application allows only one user at a
time and if it "loses" the user, it compels the user to place himself/herself
again in the ellipse and start from the beginning but it also allows user to
try the glasses and to change the model while trying it.
Silhouette iMirror [27] is a free iPhone/iPad application that essentially
has every feature that Ray Ban virtual mirror has, just for mobile Apple
devices. Moreover, unlike the Ray Ban version, it does not require the user
to keep still inside an ellipse while the face detection is performed neither
at initialization time nor when the application "loses" the user. Among its
features, it allows to change the glasses model while trying them, to resize
the model in case it is being displayed to small or too big and to purchase
directly the real glasses from the application (Ray Ban virtual mirror does
not allows to do so).
P. Eisert, P. Fechteler, J. Rurainsky [28] presented a project for the real-
CHAPTER 3. "THE EARRING STORE": OBJECTIVES AND RELATED WORKS34
time visualization of customized sports shoes. Their application setup con-
sists in a green platform where the user has to stand, a camera and a dis-
play. With an initial segmentation they extract the shoes of the user from
the image if both shoes are visible. Under the assumption that the user
stands on the platform, they perform a rough estimation of the shoes’ pose,
which is improved by using 3D tracking and gradient-based 3D motion es-
timation. The shoes are composed by several 3D sub-objects ( instead of
being a unique 3D model ), which allows the user to customize his/her
shoes by changing some of those them or their colour. By adding an invis-
ible 3D model of the user’s leg the obtain the integration effect previously
discussed in 2.3.1 Taxonomies.
Lu Wang, Ryan Villamil, Supun Samarasekera, and Rakesh Kumar [29]
propose a kinect based application for on-line handbag shopping. The sys-
tem requires a Kinect sensor. To start, a user stands in the initial calibration
pose defined for a couple of seconds, so that the system can perform im-
age segmentation and Edge-based background modeling to fully extract
the user from the background. Once the user is detected, he (or she) can
select virtual handbags on a screen with gestures. To check if there is an in-
tersection, the 3D handbag model is sampled with a set of 3D points which
are projected onto the image plane of the kinect camera. If the depth values
of the 3D points are all smaller than the corresponding values in the depth
map generated by kinect, the handbag has not intersection with the human
body. The selected virtual handbag will initially be displayed in one of his
hands. When the user raises his arm, the virtual handbag will slide to his
elbow or shoulder depending on the slope of his forearm and upper arm
(the part between shoulder and elbow); and when the user lows his arm,
the handbag can slide back to his hand. There are 3 stable positions of the
handbag: hand, elbow and shoulder.
CHAPTER 3. "THE EARRING STORE": OBJECTIVES AND RELATED WORKS35
3.2.3 Face detection and head pose estimation
Face detection, which often mistaken to face recognition ( that indicates
recognize a particular face as a certain person from a database), and head
pose estimation are two computer vision operations that indicate finding
the face ( either on an image or on a video ) and understanding its orien-
tation and position in space. The "The Earring Project" does indeed need
to perform both of them in order to accurately arrange the earrings and
accomplish its function as virtual mirror.
Face detection is usually performed as a binary pattern-classification
task. That is, the content of a given part of an image is processed into fea-
tures that a trained classifier decides whether that particular region of the
image is a face, or not. Often, a window-sliding technique is employed.
That is, the classifier is used to classify the usually square or rectangular
portions of an image, at all locations and scales, as either faces or non-faces.
The pose head estimation problem is a specific case of the pose estima-
tion problem, which is the problem of understanding the 3D pose of an
object by taking as input 2D images. Erik Murphy-Chutorian and Mohan
Manubhai [30] provide a survey of the approaches to the head pose estima-
tion problem and propose the following categories:
• Appearance Template Methods: compare a new image of a head to a
set of exemplars (each labelled with a discrete pose) in order to find
the most similar view.
• Detector Array Methods: train a series of head detectors each attuned
to a specific pose and assign a discrete pose to the detector with the
greatest support.
• Nonlinear Regression Methods: use nonlinear regression tools to de-
velop a functional mapping from the image or feature data to a head
pose measurement.
CHAPTER 3. "THE EARRING STORE": OBJECTIVES AND RELATED WORKS36
• Manifold Embedding Methods: seek low-dimensional manifolds that
model the continuous variation in head pose. New images can be em-
bedded into these manifolds and then used for embedded template
matching or regression.
• Flexible Models: fit a non-rigid model to the facial structure of each
individual in the image plane. Head pose is estimated from feature-
level comparisons or from the instantiation of the model parameters.
• Geometric Methods: use the location of features such as the eyes,
mouth, and nose tip to determine pose from their relative configura-
tion.
• Tracking Methods: recover the global pose change of the head from
the observed movement between video frames.
• Hybrid Methods: combine one or more of these aforementioned meth-
ods to overcome the limitations inherent in any single approach.
Such methods differ from computational time, accuracy and input re-
quirements ( ex: calibrated camera, multiple cameras etc). For the "The Ear-
ring Store" project, the input is the video stream from a single uncalibrated
camera. My approach can be classified as Hybrid Method ( Geometric and
Tracking Method ) and will be described in the next chapter.
3.3 Motivations and Goals
Online shopping has become commonplace and more convenient than
ever, allowing shoppers to buy not only typical products with ease, but
also products which may be difficult to find in stores or may have a more
diverse selection online. The foremost limitation in online shopping how-
ever, is apparel; clothing and accessories cannot be "dressed" by the user
CHAPTER 3. "THE EARRING STORE": OBJECTIVES AND RELATED WORKS37
before buying, which makes shopping for these things online impractical,
inconvenient, or costly.
Software can be used to address some of these limitations. While many
accessories do not require a mirror (i.e., low degree of necessity for mod-
eling on individuals before buying), and clothing often requires more than
a mirror (by virtue of including factors such as physical fit and stretch of
material). Such issue has been engaged by some companies using a static
approach: the user was to take a snapshot of himself/herself and manually
place the object in the exact position, handling its orientation and possibly
its scaling. In a second time, companies decided instead to a more intu-
itive and automatic approach: Augmented Reality virtual mirror. By tak-
ing a continuous video stream, the software first detects the body of the
user, it finds its pose and lastly it overlaps over the video the 3D model
of the given object the user wanted to try. This second approach simpli-
fies and at the same time stimulates user interaction. Users perceive the
3D model as a real object that follows each of their movements, having the
feeling that what they "are trying" is actually the store’s object. Jun Park
and Woohun Lee propose a paper where they state : "Since the middle
of 1990’s web-based E-Commerce markets have grown quickly. However,
Two-dimensional images and text in internet cannot provide enough infor-
mation of products to customers. The difference between the impressions
on the images and the actual products is due to the fundamental discrep-
ancy between the internet-based cyber world and the real environment. To
resolve the discrepancy, 3D virtual products can be provided ... but still 3D
virtual products are not in the same context as the users’ real environment.
To correctly resolve the discrepancy, the user’s real environments (user’s
office or home) and the virtual object (the product) should be seamlessly
mingled" [31].
what can be inferred from previous statement, is the importance in E-
CHAPTER 3. "THE EARRING STORE": OBJECTIVES AND RELATED WORKS38
commerce of integrating 3D models by meaning of Augmented Reality.
Android released a face detection functionality for developers with API
1, which was ( in theory ) able to provide the middle point between the
eyes, the eyes distance and more importantly, the pose of the head of the
found faces. Unfortunately, there is a BUG in that function and for what-
ever real pose of the head, it will return, for both X and Y and Z angle, 0
degrees. While that BUG is widely renown, no fix has been released. With
the new 4.0 icecream sandwitch version ( API 14 ), a new face detection has
been released. This function provides the developers with different param-
eters from the previous one: it provides, for each face, a rectangle, the pixel
coordinates of the left and right eye and the center of the mouth and an
ID. Using these parameters it should be possible to implement a head pose
estimator for the face found and increase the sense of realism of the virtual
mirror effect.
The purpose of this thesis is to provide an analysis of the user inter-
action with an E-commerce virtual mirror android application prototype
based on the new Android face detection function, taking into account user
experience and software performances ( focusing on the time consumption
factor ) and an evaluation of this new face detection. Regarding users ex-
perience, an evaluation of the prototype will be performed over different
aspects of its usability, such the navigation of the catalogue, the overview
of selected items and the use of the virtual mirror mode. Particular focus
will be given to the virtual mirror experience, not only from the usabil-
ity point of view but also analyzing its effectiveness and persuasiveness in
the process of convincing users to purchase items that are available on the
catalogue. Concerning software performance, as previously stated, I have
chosen to focus on speed performances. Such choice has been driven by
the fact that, as Jacob Nielsen says, " Even a few seconds’ delay is enough
to create an unpleasant user experience. Users are no longer in control, and
CHAPTER 3. "THE EARRING STORE": OBJECTIVES AND RELATED WORKS39
they’re consciously annoyed by having to wait for the computer. Thus, with
repeated short delays, users will give up unless they are extremely commit-
ted to completing the task. " [32].
In speed evaluation performances I am going to analyze the communi-
cation time between the web service and the client application, the render-
ing of 3D models on the client application and the required time needed to
perform transformations ( such as rotation, translation and scaling ) of the
previously mentioned 3D models. Evaluations will be performed both by
making users compile a survey after testing the prototype and by taking
analytical data wherever possible ( i.e. clock web service response time ).
Such evaluations and evaluations’ analysis will be provided in chapter 6.
It is also important to remark that there are no virtual mirror applications
( at least known to me ) that include real time face detection and reality
augmentation for android on the market there are instead applications that
provide simple mirror effect by acceding the camera and reproducing its in-
put stream on the display. However virtual mirror applications that include
face detection, pose estimation and reality augmentation are available for
computer and iphone/ipad devices ( i.e. Ray Ban virtual mirror website ).
There are no virtual mirror applications known to me that augments the
reality using earrings ( most of the existing applications use glasses ).
Evaluations will be performed in order to establish if the user experi-
ence has been proved satisfactory and to verify the ( eventual ) benefits of
the introduction of augmented reality into an E-commerce ( prototype ) ap-
plication can increase the persuasion factor and stimulate the purchasing.
Chapter 4
Designing the prototype of the
application
In this chapter focus will be on the design of the prototype of the application
itself. It will be provided a brief user analysis, the prototype application’s require-
ments, architectural and sofware design and overview.
4.1 Establishing requirements
While designing the prototype, I have decided to have the client appli-
cation separated from the catalog and 3D models. This is due to the fact
that I wanted to de-couple the application from the catalog and to reduce
the amount of memory that the application needs on the SD card of the
smartphone. This means that all the data concerning the catalog and its
items will be kept in a database and acceded through the web service.
The database of the catalog will contain, for each earring, an image, the
price, the name of the model, the material that composes the earring ( i.e:
gold, silver etc ), the kind of closure, the stone ( if any ) and the 3D model
of the earring.
Client application will be always in portrait mode, regardless of the ori-
40
CHAPTER 4. DESIGNING THE PROTOTYPE OF THE APPLICATION 41
entation of the smartphone. This is due to the fact that, whenever (if not told
to avoid it ) the smartphone changes it orientation, the activity starts from
scratch. Since almost every graphic component of the application is created
after downloading the data from the database, the application should save
the status of each element before changing the orientation of the layout
and build every element again. This process is discouraged on the official
android developers website:
"If restarting your activity requires that you recover large sets of data,
re-establish a network connection, or perform other intensive operations,
then a full restart due to a configuration change might be a slow user expe-
rience. Also, it might not be possible for you to completely restore your ac-
tivity state with the Bundle that the system saves for you with the ’onSave-
InstanceState()’ callback, it is not designed to carry large objects (such as
bitmaps) and the data within it must be serialized then deserialized, which
can consume a lot of memory and make the configuration change slow. In
such a situation, you can alleviate the burden of re-initializing your activ-
ity by retaining a stateful Object when your activity is restarted due to a
configuration change. " [33].
The virtual mirror part of the application is meant to be used by only
one person at a time. While it is possible to change this and remove this
limit I have imposed, I have decided not to because having more than one
person would increase the computation calculus of face detection and pose
estimation and require more instances of the 3D model of the earrings,
which could lead to memory related issues like out of memory.
Once the application starts, the catalogue will be in the lower part of
the screen, while above the catalogue there will be a box for the selected
item. The catalogue will be scrolled ( horizontally ) by simply touching the
screen. To select an item it will be necessary to clicking on it and keeping the
finger down for at least 2 seconds. Once the item is selected, its image will
CHAPTER 4. DESIGNING THE PROTOTYPE OF THE APPLICATION 42
appear inside the "selected item" box and, if the user decides to explore the
data about that item, it will need another click over the box to do so. This is
in order to allow the user to confront earrings and to avoid exploring items
they are not interested in. When viewing a single item of the catalogue,
the information will be disposed vertically and it will be able to scroll it (
vertically ).
Hardware requirements for the prototype are an Android based smart-
phone to test the client version and a host for both the database and the
webservice. The smartphone needs the presence of a frontal camera ( be-
cause it would be meaningless attempt to reproduce the virtual mirror ef-
fect on the rear one ) and internet connection in order to receive the cata-
logue and 3D models from the web service.
Software requirements are related to the Android version of the device,
which is the 4.0 Icream sandwitch version. This is due to the choice of using
the new android Face Detection function that was released in such version.
Since this is a prototype and not a commissioned application, there will
be no information about the brand ( either of the store or the earrings ),
neither store contacts ( i.e. email, telephone number, facebook page, twit-
ter address ) nor information about the store location ( i.e: Google maps
address ) nor aperture/closure time. Moreover, no purchasing and login
functionalities will be implemented.
4.2 Users analysis
The prototype is meant to anyone who is interested in buying one or
more earrings and decides to check an online catalogue. The application
should be appealing and at the same time be fluent, providing an entertain-
ing and interesting user experience. To do so, layouts will be kept as neat
as possible and in case of incorrect actions from the user, directions will
CHAPTER 4. DESIGNING THE PROTOTYPE OF THE APPLICATION 43
be provided by the application. The target user of the application is either
male or female ( it is most likely that the majority of users will be female
due to the fact that there are less males that wear earrings than females ),
has a variable age range ( most likely it will be used by young people who
are more acquainted with technology ) and posses an Android smartphone
which version is 4.0 or higher . The application itself does not expect any
"selection" of its users, everybody is free to use it.
4.3 Architectural design
The prototype will rely, as previously introduced, on a database for stor-
ing the data, a web service to acquire the data and a client application to
visualize it. The graph below summarizes the architecture of the applica-
tion.
Figure 4.1: Architectural design Deployment Diagram
CHAPTER 4. DESIGNING THE PROTOTYPE OF THE APPLICATION 44
4.3.1 Web service adopted technologies
I have built the web service using WCF .NET technology on top of REST
and JSON format for my prototype application. The choice of using REST
over SOAP was led by "SOAP-Based vs. RESTful Web Services A Case" pa-
per, written by Fatna Belqasmi, Jagdeep Singh, Suhib Younis Bani Melhem,
and Roch H. Glitho [34]. In which is performed a SOAP and REST analy-
sis by means of realizing a prototype of a conference manager for mobile
smartphones and evaluation in terms of end-to-end time delay and net-
work load when executing different conferencing application operations.
The results highlight the faster performance of REST based architecture
over the SOAP based one. Moreover, they show that "processing SOAP-
based Web service requests in a mobile environment can take 10 times
longer and consume eight times more memory than an equivalent REST-
ful Web service request". Thus, my decision of realising the web service
upon REST architecture instead of SOAP.
Concerning the response format of my web service, the choice was be-
tween JSON and XML formats. In Tommi Aihkisalo’s and Tuomas Paaso’s
paper " Latencies of Service Invocation and Processing of the REST and
SOAP Web Service Interfaces" [35]. The authors compare the Web service
invocation latencies and their causes experienced by the client during the
service request-response round trip. The system utilized for the measure-
ments consisted of a suitably instrumented client and a server stack on a
server implemented for sending and receiving messages containing arbi-
trarily binary or text data content between the client and server. Both SOAP
and REST versions have been implemented and, in particular, The REST-
ful implementation was tested using the wire formats of JSON, XML and
Google Protostuff, while SOAP relied on SOAP-XML and SOAP XOP/M-
TOM for the binary content. The analysis of the results carried out for this
study clearly show how in all cases in the light of the described scenario,
CHAPTER 4. DESIGNING THE PROTOTYPE OF THE APPLICATION 45
REST is the fastest. The experimental data regarding XML and JSON la-
tencies are quite similar: "the only benefit that the JSON solution is able
to achieve in real life applications is the possibility for the faster network
transmission of the marshalled request-response message objects due to the
tighter encoding with JSON". Since I’m trying to make this service as fast
as possible in order not to slow down the client application, I’ve selected
the JSON format as the response format of my web service.
For the development framework of the web service, the main options
were WCF .NET and J2EE.
Windows Communication Foundation (WCF) is a framework for Vi-
sual Studio for building service-oriented applications. Services are exposed
as service endpoints, that can be part of a continuously available service
hosted by IIS, or it can be a service hosted in an application. Clients of
a service that requests data from a service endpoint are considered end-
points themselves. The messages can be as simple as a single character or
word sent as XML, or as complex as a stream of binary data and can be
sent asynchronously from one service endpoint to another or to another
endpoint [36].
Figure 4.2: .Net architecture
Short for Java 2 Platform Enterprise Edition. J2EE is a platform-independent,
Java-centric environment from Sun for developing, building and deploying
CHAPTER 4. DESIGNING THE PROTOTYPE OF THE APPLICATION 46
Web-based enterprise applications online. J2EE is a platform independent
solution deployed in a single language (Java), although it does have sup-
port for other languages [37]. The latest version of the J2EE specification
has been augmented with the addition of several libraries to support Web
services. The two primary APIs are as follows:
• Java API for XML-Based RPC (JAX-RPC) is an API that enables de-
velopers to develop and deploy Web services.
• Java API for XML Registries (JAXR) provides a uniform and standard
API to access different kinds of XML registries.
• Several other APIs provide functionalities like sending and receiving
XML-based messages (JAXM), processing XML (JAXP), and binding
Java objects to XML documents (JAXB).
Figure 4.3: J2EE architecture
John Grundy, Zhong Wei, Radu Nicolescu and Yuhong Cai [38] pro-
pose a paper where the improve their own tool work in order to investi-
gate support for thin-client architecture modelling and performance anal-
ysis. Their work has been performed over J2EE and ASP.NET web services
and tests were run with three networked PCs, one each used to host the
CHAPTER 4. DESIGNING THE PROTOTYPE OF THE APPLICATION 47
client (ACT tool), web server and components (JSPs/ASPs) and database
(SQL Server 2000). The client requests and database server tables are iden-
tical, the middle-tier web components and servers being the difference. The
C#/ASP.NET version performs much faster than the JSP version in this
example. However, we used the Microsoft IIS web server, a commercial
performance-optimised platform for the ASP.NET hosting, but used an un-
optimised J2EE SDK application server to host the JSP web components.
Based on these considerations, I’ve decided to build the web service using
the .NET WCF framework.
4.3.2 Data model and methods
The Database, implemented on Microsoft SQL Server 2008 R2, is com-
posed of a single table that contains the data that I have defined while es-
tablishing the requirements. The table structure is:
Figure 4.4: Database Modello table
Where:
ID [ Primary Key, Int , Not Null]: the primary key of the element.
Nome [ nvarchar(50), Not Null ]: the name of the earring.
Prezzo [ deciaml(10 , 2), Not Null ]: the price of the earring.
Descrizione [ nvarchar(200), Not Null ]: a brief description of the earring.
Materiali [ nvarchar(50), Not Null ]: the materials of the earring, not of (if
any) the stone.
Pietra [ nvarchar(50), Not Null ]: the kind of stone (if any) of the earring
TipoChiusura [ nvarchar(50), Not Null ]: the kind of closure of the earring.
FilePath [ nvarchar(200), Not Null ]: the path to the location of the xml file
of the 3D model of the earring
CHAPTER 4. DESIGNING THE PROTOTYPE OF THE APPLICATION 48
Image [ nvarchar(200), Not Null ]: the path to the location of the png file of
the image of the earring
The database is acceded by the web service through the following ser-
vice endpoints:
GetCatalogo: This service endpoint retrieves all the information about
every earring except their 3D model. It is invoked before building the scroll
list of earrings in order to obtain the image and the name of the earring and
store the rest of the data for when the user decides to accede to it. While
most of the data is just passed through a String, the ID is passed as an
integer, the price as a float and for the image, the file is retrieved using its
path, then it is read as byte array and sent in the same format.
The data is encoded in JSON format as follows:
{
"EarringData": [{
"descrizione":"This elegant earring is from the new collection",
"id":1,
"imagepng":[137,80,78,71,13,10,26,10,0, ...],
"materiali":"Yellow gold",
"nome":"golden heart",
"pietra":"zircone",
"prezzo":23.45,
"tipochiusura":"pressure"
},
{
...
}]
}
CHAPTER 4. DESIGNING THE PROTOTYPE OF THE APPLICATION 49
GetXMLModel/"param" : This service endpoint retrieves the 3D model
of the earring. The parameter "param" indicates the ID of the earring I want
to retrieve the 3D model. This method was designed after having decided
how to import 3D models onto Android. Particularly, among the estab-
lished requirements I have decided that I do not want to save the file of the
3D model itself on the smartphone so I decided on a simple method that
I know how it works and the parameters it retrieves from the file of the
model, which turn out to be four arrays: one for the colours (Red, Green,
Blue, Alpha ), one for the faces, one for the normal to the faces, and one for
the position in space ( X, Y, Z ).
The data is encoded in JSON format as follows:
{
"Modello": [
"color":[0.909804,0.658824,0.364706,1,...],
"faces":[0,1,3,1,2,3,4,5, ...],
"id":2,
"normal":[0,0,-1,0,0,-1,0, ...],
"position":[0.480056,-0.131201,1.758275,0.459837,...]
]
}
4.4 3D modeling tools
Among the modelling tools, the most famous ones are Autodesk 3ds
Max and Blender.
Autodesk 3ds Max is a proprietary software for modeling, animating
CHAPTER 4. DESIGNING THE PROTOTYPE OF THE APPLICATION 50
and rendering virtual elements. Concerning 3D modeling, the software is
primarily based around polygon modeling. It is possible to create a wide ar-
ray of primitives, including cubes, cones, pyramids and teapots, to serve as
bases for more complex models. 3ds Max supports subdivision surfaces as
well, which makes your models look smoother. 3ds Max comes with many
different editing tools for manipulating your models, among which there is
the Soft Selection tool, which allows you to grab vertex clouds and tug on
them without creating unwanted geometry in your mesh-everything stays
smooth. As an alternative to polygon and sculpting methods, designers can
also build NURBs models. This curve-based modeling approach is used for
creating smooth surfaces and is particularly useful in simulating mechani-
cal parts where accuracy is essential [39]. Furthermore, 3ds Max is capable
of photorealistic global illumination. This refers to a type of lighting that
provides heightened detail to parts of an image that fall outside the reach
of direct lights. To accomplish this, 3ds Max employs a simplified radiosity
algorithms that simulate bouncing light. This is a computationally expen-
sive operation, which makes it impractical for real-time light-rendering sce-
narios like video games, but it is ideal for rendering movies and broadcast
television segments.
Blender is an open source Cross platform 3D software solution from
modeling, animation, rendering and post-production to interactive creation
and playback. It has a vast range of features regarding 3D modeling and
shading among which there are the possibility of using a range of 3D ob-
ject types including polygon meshes, NURBS surfaces, bezier and B-spline
curves, metaballs, vector fonts (TrueType, PostScript, OpenType), mesh mod-
eling based on vertex, edge and/or face selection, material previews ren-
dered by main render engine, modifier stack deformers such as Lattice,
Curve, Armature or Displace, smooth soft selection editing tools for or-
ganic modeling and more. Concerning illumination and shaders, it allows
CHAPTER 4. DESIGNING THE PROTOTYPE OF THE APPLICATION 51
to use diffuse shaders such as Lambert, Minnaert, Toon, Oren-Nayar, Lam-
bert and Specular shaders such as WardIso, Toon, Blinn, Phong, CookTorr
[40].
Both software are equipped with a huge number of features and effects
for 3D modeling, textures and shaders. In this case, since I’ve already had
some experience using Blender from my master course "Computer Graph-
ics and Applications", I’ve decided to use it to design the 3D models of the
earrings.
4.4.1 Importing the 3D models
Android includes support for 2D and 3D graphic development by means
of the Open Graphics Library (OpenGL), specifically, the OpenGL ES API.
OpenGL is a cross-platform graphics API that specifies a standard software
interface for 3D graphics processing hardware. OpenGL ES is a extract of
the OpenGL specification intended for embedded devices [41].
OpenGL is C++ native yet, the android framework API OpenGL has
been realized in Java. OpenGL C++ can still be used by means of Native
Development Kit (NDK).
At the moment, there are two available versions of OpenGL for android:
OpenGL ES 1.0 and OpenGL ES 2.0. While the first version has been re-
leased together with the first version of android, the second one has been
released for android Froyo version (API 8). The major difference between
OpenGL ES 1.x and Open GL ES 2.0 is the removal of the fixed pipeline,
which is replaced by a shader-based pipeline. The OpenGL ES 2.0 API does
not provide any formal functions for setting up lighting, or setting mate-
rial, or rasterization parameters. Instead, the programmer creates their own
’per verFtex’ and ’per fragment’ programs which will run directly on the
graphics hardware. The OpenGL ES Shading Language is used to write
these ’shader’ programs; it is a subset of the OpenGL Shading Language.
CHAPTER 4. DESIGNING THE PROTOTYPE OF THE APPLICATION 52
Unlike desktop OpenGL 2.0, OpenGL ES 2.0 does not allow use of the fixed
function pipeline at all, so applications written for OpenGL ES 1.x are not
compatible with OpenGL ES 2.0 [42].
Having said this, the more powerful OpenGL ES 2.0 allows a more cus-
tomized handling of shades and lights, which I have decided not to use. My
choice is given by the fact that the 3D models of the earrings are relatively
small and such effects would not be plenty appreciated since most of the
users could not notice them. Moreover, I’ve avoided every adding effect or
detail that could slow down the rendering, this includes also the texture
of the earrings ( which, of course, would also slow down the download of
the model from the web service since it is increasing the file size and the
increase the amount of time needed for the parsing and increase the space
required ). Since I have already established that I’m using the new android
face detection functionality, which requires the API 14 ( and that means
that it includes both OpenGL ES 1.0 and OpenGL ES 2.0 compatibility), my
choice of using the OpenGL ES 1.0 version instead of the ES 2.0 version has
been made based only on previous considerations and performance related
requirements.
By using OpenGL ES 1.0, it is possible to create simple 3D objects by
just preparing four buffers: one for the vertex, one for the color, one for the
normal and one for the faces:
gl.glVertexPointer(int size, int type, int stride, Buffer pointer)
gl.glColorPointer(int size, int type, int stride, Buffer pointer)
gl.glNormalPointer(int size, int type, int stride, Buffer pointer)
gl.glDrawElements(int mode, int count, int type, Buffer indices)
OGRE (Object-Oriented Graphics Rendering Engine) is a scene-oriented,
flexible 3D engine written in C++ designed to make it easier and more intu-
CHAPTER 4. DESIGNING THE PROTOTYPE OF THE APPLICATION 53
itive for developers to produce applications utilising hardware-accelerated
3D graphics. OGRE is made available under the MIT License [43]. OGRE
has released Blender Exporter, a blender plugin developed by OGRE that
allows to export the 3D model into xml format [44] or OgreMaxSceneEx-
porter, its counterpart developed for Autodesk 3ds Max. I have decided to
use the Blender Exporter plugin with Blender because I have found a 3D
model loader function based on it. Infact, the blog "Bay Nine Studios" that
provides a really simple object loader function, written for android and
compatible with OpenGL ES 1.0 [45] that parses such .xml format file of the
3D model and uses the previously mentioned OpenGL ES 1.0 functions to
render it.
My choice was led by the necessity of keeping the application perfor-
mance as fast as possible. By knowing how the parser of the 3D model
works, I have moved it onto the web service and reduced the computa-
tional calculus on the client prototype application. However, there are many
external libraries for Android to importing 3D models, the most famous are
min3D ( recently deprecated for the new Rajawali project), libgdx ( which
is actually a cross-platform game development library written in Java).
The min3D framework is based on OpenGL ES and you can use it to
develop 3D apps for Android. The major feature of min3D is the fact that
you don’t need to be an OpenGL specialist min3D is a lightweight 3d li-
brary/framework for Android using Java with OpenGL ES targeting com-
patibility with Android v1.5/OpenGL ES 1.0 and higher. It tracks closely
with the OpenGL ES API, which makes it ideal for gaining an understand-
ing of the OpenGL ES API while providing the convenience of an object-
oriented class library [46]. Recently, min3D project has been abandoned for
the Rajawali project. "Rajawali is a 3D framework for Android built on top
of the OpenGL ES 2.0 API. Its main purpose is to make things easy and to
take away the hard work that’s involved in OpenGL programming" [47].
CHAPTER 4. DESIGNING THE PROTOTYPE OF THE APPLICATION 54
Rajawali allows users to import .obj model giles, animations in .md2 and
.md5 format, point and directional lights, customize materials ( simple, dif-
fuse, phong, gouraud, toon, bump map, environment cube map, sphere
map, masked, particle ), bezier splines, catmull-rom splines, particles and
more.
Libgdx is a cross-platform game and visualization development frame-
work. It currently supports Windows, Linux, Mac OS X, Android, iOS and
HTML5 as target platforms. Libgdx allows you to write your code once and
deploy it to multiple platforms without modification [48]. It offers support
for OpenGL ES 2.0 for Android 2.0 and above through custom JNI bind-
ings. Libgdx allows developers to use rendering through OpenGL ES 1.0,
1.1 and 2.0 on all platforms, handling of Vertex arrays, Vertex buffer objects,
Textures, Shaders, .OBJ and .MD5 model loaders and many other features.
Libgdx does also include physics libraries: Box2D which is used for 2D
physics and also a experimental Bullet Physics wrapper which can be used
for 3D physics.
Also these libraries have been examinated at application design and
technologies selection time but they’ve been discarded both because, while
they offer a big number of effects and features, they’d be underemployed
and their use would have required to extrapolate the parsing of the .obj files
from the inner functions, move it onto the web service and create a custom
function that can load the 3D models by receiving only the minimum pa-
rameters instead of the 3D model file itself.
4.5 software design
This is the uml class diagram that I’ve used for the prototype of the
application:
CHAPTER 4. DESIGNING THE PROTOTYPE OF THE APPLICATION 55
Figure 4.5: "The Earring Store"UML 2.0 class diagram
CHAPTER 4. DESIGNING THE PROTOTYPE OF THE APPLICATION 56
4.5.1 Class analysis and software choices
For space reasons, I had to cut down most of the methods and vari-
ables from each class, so in this paragraph I am going to explain the most
significative ones and the purpose of each class that I have used.
• MainGallery: It is the main activity, it receives the data from the
classes ContactWebService and EarringJsonParse and builds up the
gallery.
• ContactWebService: This class purpose is to download from the web
service the requested content. It handles the different kinds of request
to the endpoints ( while the "GetCatalogueo" endpoint does not re-
quire any parameter, the "GetXMLModel" does require the ID of the
model to be downloaded) with one method and its override. Such
methods return the json string acquired from the web service.
• EarringJsonParse: This class uses the Jackson library (which is a li-
brary for parsing json strings directly into objects, without having to
explore the tree structure "by hand" ) its methods returns either the
full catalogue or the 3D model of an earring, accordingly to the re-
quest.
• EarringData: This class is data structure of the general earrings data
acquired from the web service. It has as variables the id, name, kind
of closure, description, material, stone and price of the earring and
their names match the ones on the web service, so that the jackson
library can perform the initialization automatically.
• EarringList: extension of the EarringData class, it is required by the
jackson libraries to perform the initialization.
• ThreeDModel: This class is data structure of the 3D model of an ear-
ring acquired from the web service. It has as variables the id, colour,
CHAPTER 4. DESIGNING THE PROTOTYPE OF THE APPLICATION 57
faces, normals and position of the earring and their names match the
ones on the web service, so that the jackson library can perform the
initialization automatically.
• ThreeDModelList: extension of the ThreeDModel class, it is required
by the jackson libraries to perform the initialization.
• DetailEarring: Second activity of the application, it builds the view
of the singular earring that the user has selected.
• TryOnPhoto: This activity allows the user to take a picture of him-
self/herself and, manually, place it on the picture. It allows the user
to resize the model, add a second model ( there cannot be more than
two or less than one models at a time ), rotate the model, change the
focus from one model to the other ( if present ) and remove one model
( if there are at least two models ).
• AugmentedRealityEarring: This activity allows the user to try the
virtual mirror effetct. it shows the image preview acquired from the
camera and the face detected ( one at most ) surrounded by a green
rectangle. The earrings 3D models are positioned on the sides of the
rectangle. It allows to increase or decrease the size of the 3D models
and their distance ( only horizontally ).
• Utils: This class contains the methods to understand if the camera is
being accessed from the emulator or from a real device and to cal-
culate the best size of the camera preview for when either a photo is
taken or the mirror effetc.
• GLEarringModel: This class builds the OpenGL buffers required for
the renderer to process and draw the 3D model of the earrings.
• GLEarringRenderer: This class renders the 3D models of the earrings
CHAPTER 4. DESIGNING THE PROTOTYPE OF THE APPLICATION 58
and handles every functionality that concern them such as the trans-
lation around the display, the rotation, the scale etc.
• MatrixGrabber, MatrixStack, MatrixTrackingGL: These three classes
are required in order to perform the gluUnProject function. OpenGL
ES 1.0 does not have any method to grab the model view and projec-
tion as OpenGL ES 2.0 does. Such function takes as input the origi-
nal x, y and z coordinates in the display coordinates system, the cur-
rent modelview and projection matrix and returns the output vector
objX, objY, objZ, that contains the computed object coordinates into
OpenGL coordinate system.
4.5.2 Head pose estimation issues
While the face detection function of Android 4.0 icream sandwitch works
fine, there are still some issues that condition part of the work, in particu-
lar, the head pose estimation. In the official Android developers website it
is specified that, for the coordinates of the mouth, left and right eye: " This is
an optional field, may not be supported on all devices. If not supported, the
value will always be set to null. The optional fields are supported as a set.
Either they are all valid, or none of them are." [49]. At the current moment,
I’ve tested the aforementioned functionality on Samsung Galaxy Nexus,
Samsung Galaxy S3 ( which is, at the moment, the newest and most pow-
erful android based device in the market ) and LG-P700 smartphones. Nei-
ther of those smartphones returns something different than "Null" when
asked for the features of the found face. Searching in the web I have not
been able to find about any android based smartphone that has full compat-
ibility with the new face detection function that I’ve adopted in my project
on the contrary, on the website Stackoverflow a user says that he/she has
tested that function on on a Galaxy Nexus, Nexus 4, nexus 7 and Nexus 10
and that none of those smartphones have full compatibility [50].
CHAPTER 4. DESIGNING THE PROTOTYPE OF THE APPLICATION 59
This means that, even if the face detection is high performance, I can’t
get the data necessary for the estimation of the head’s pose. Since the rect-
angle that surrounds the face doesn’t change its shape accordingly to the
face pose and when the user turns the face ( it just shifts from towards the
direction the face is now looking ), I’ve decided that, even if it is a funda-
mental part of the virtual mirror effect, it was necessary to cut it out from
the project. Unfortunately, android face detector ( API 1 ) does not provide
a sufficient amount of features to be able to build on top of it a head pose
estimator algorithm.
My approach to the problem was inspired by A. Nikolaidis and I. Pitas
[51]. They designed a gaze direction algorithm based only on geometri-
cal features. Such approach states that In my approach, I intended to use
as data the left eye, right eye and middle point of the mouth obtained
from the face detection. By supposing them as coplanar ( and, obviously,
non collinear), I thought of building a 2D triangle in space: "face symme-
try properties can be exploited in order to determine the angle between the
plane defined by these three facial features and the image plane ". They sug-
gest to track the gaze of a person by only knowing the position of his/her
eyes and middle point of the mouth ( which are the features that the an-
droid face detection should have provided ).
Figure 4.6: Finding the face and the triangle made of the middle point of the mouth and the
eyes
Once the face is found on the image, the three points form a triangle that
CHAPTER 4. DESIGNING THE PROTOTYPE OF THE APPLICATION 60
they use to calculate the gaze direction. By generalizing their thesis ( and
introduce a certain degree of errors ), I intended to use the same angular
approach they use to calculate the gaze direction in order to estimate the
user’s head pose. Android face detection should provide the coordinates
of the eyes as the average middle point of each eye.
Figure 4.7: Modeling the face: ABC Triangle builded on the face
we can state that, considering the triangle CDE denoted by the dashing
lines:
cosΦ = |CD|/|DE| (equation1)
And from the triangle ADE we know that:
|CD| = |CD| sin θ (equation2)
CHAPTER 4. DESIGNING THE PROTOTYPE OF THE APPLICATION 61
|AC| = |CD| cos θ (equation3)
Knowing that from the triangle ADE we get:
|AE|2 = |AD|2 + |DE|2 (equation4)
We also know that triangle ABEis isosceles ( due to the simmetry of
human face ) which provides us that:
|AB| = |AE|⇒ |AB|2 = |AE|2 (equation5)
Ψ = 180◦ − 90◦ −Θ (equation6)
ρ = 180◦ − 90◦ −Θ (equation7)
By storing as "frontal value" the value of Ψ as Ψf and ρ as ρf at initial-
ization time, and Ψ = 0 when the face is 90◦ towards right and ρ = 0 when
the face is 90◦ towards left , I can declare that the face is looking towards
right when Ψ value is greater than the one stored for the frontal face and
apply the same concept ( just towards left instead of towards right ) to ρ. By
defining my right rotation range as [ Ψf , 90◦ ] and my left one as [ ρf , 90◦
], I can normalize between Ψf and 90◦ into the range [0◦ , 90◦] and calculate
the rotation as:
Yaw =
90 ∗ (ρ− ρf)/(90 − Ψf) ρf < ρ < 90; Ψ < Ψf
−90 ∗ (Ψ− Ψf)/(90 − Ψf) Ψf < Ψ < 90; ρ < ρf
(equation8)
The above mentioned approach is used to calculate the Yaw angle. In
order to calculate the, I intended to use as pitch angle the between the 2D
line that passes through the eyes and X-axis. For the Roll angle I intended
to use the proportion between the distance between the eyes ( multiplied
CHAPTER 4. DESIGNING THE PROTOTYPE OF THE APPLICATION 62
to the cosine of the Yaw angle ) and the distance between the middle point
of the mouth and its projection on the line that links the eyes, taking of
course into account the previous position of the eyes and of the middle
point of the mouth in order to establish if the rotation is towards upside
or downside. Concerning the position of the head, its center can be located
on the interception between CD and BF for the x and y coordinates, for the
z coordinate it is necessary to obtain a parametric function that estimates
changing of the size of the head accordingly to its position in space.
This method is conceived with the intent of being fast and depends only
from the ( few ) data I ( should, but I do not ) have available. Physical
constraints of the motion of the face are supposed to be introduced in an
Extended Kalman Filter in order to improve the performances. Due to the
fact that, as mentioned before, the new android face detection function does
not provide the position of the points as it claims, I have not developed this
concept algorithm due to the impossibility of performing real, meaningful
test and improving the model.
Chapter 5
Prototype overview
In this chapter I ma going to provide a description of data generated for the
application, how the application works and screenshots of the application
5.1 Generated data analysis
In order test the prototype application, I have created four simple ear-
ring 3D models starting from four earring images. The original images
were:
Figure 5.1: Images used for the catalog’s earrings
Their size, in order from left to right, is of 52.3Kb, 159Kb, 124Kb, 26.1Kb.
With the only exception of the last image ( created by me ), the other ones
have been downloaded from the web. I’ve selected simple earrings in order
to simplify as much as possible their 3D models.
63
CHAPTER 5. PROTOTYPE OVERVIEW 64
The 3D models generated by blender and rendered on the application in-
stead look like:
Figure 5.2: Generated earring 3D models
The size of the .xml parsed files of the 3D models are, left to right, of
214Kb, 121Kb, 121Kb and 223Kb. I believe that the size of the 3D models
is quite big, specially taking into account that for none of the, the material
was stored. While being simple, their similarity with the original images is
sufficient, specially taking into account the size they are rendered. The size
of the images and of the earrings was meant to be kept at minimum, since
it influences both the download time and the rendering execution.
5.2 Activity flow
The following schema represents the activity flow of the application.
Figure 5.3: Prototype application Activity Flow
CHAPTER 5. PROTOTYPE OVERVIEW 65
Each of the four activities ( Main gallery, Earring in detail, Try on photo,
Try on virtual mirror) will be analyzed in the following paragraphs
5.2.1 Main Gallery
The initial activity of the client application. It starts with a slideshow (
made of a sequence of four images). Such slideshow is used to mask the
fact that once the application starts, it begins the download of the catalog (
images and data ).
Figure 5.4: Maingallery activity start
CHAPTER 5. PROTOTYPE OVERVIEW 66
The average download time is about 8 seconds and it is done from an
inner class that extends the Asynctaks, which is the class used to handle
asynchronous tasks. Once the download finish, it loads the downloaded
images and add them to the HorizontalScrollView that contains the ele-
ments of the catalog and the slideshow stops. Each image is LongClickable,
which means that if the user keeps the finger over the image longer than
two seconds, it counts as a normal click and the selected earring’s image is
moved into the "current focused earring window". This has been done in
order to avoid that every time the user rolls the menu doesn’t change the
selected earring. Once the user clicks on the selected image, the application
goes from the first to the second activity, which receives the image of the
selected earring and its data.
Figure 5.5: Maingallery activity operations
If the user decides to change the selected earring before seeing its de-
tails, he/she has just to click for at least two seconds over another earring in
the catalog. In case no earring has been selected and the user clicks over the
"current focused earring window", a message warns him/her that he/she
needs to select an earring before being able to the successive activity.
CHAPTER 5. PROTOTYPE OVERVIEW 67
5.2.2 Earring in detail
As soon as the activity starts, it receives as parameter the image and the
data of the selected earring, fills the content and starts first downloading
the 3D model of the earring and later preparing it to be passed to either
"Try on photo" or "Try on virtual mirror" activity. Meanwhile, the buttons
are disabled until the download is finished.
Figure 5.6: Earring in detail activity overview
The average download and "preparation" time is 2 seconds", and is
done from an inner class that extends the Asynctaks. In order to go to the
next activity, the user has to click over the button of the desired activity.
CHAPTER 5. PROTOTYPE OVERVIEW 68
5.2.3 Try on photo
The activity receives the earring’s 3D model and, in background, it starts
preparing it. Meanwhile, it starts previewing the frontal camera on the dis-
play. The user can take a photo ( which is not saved ) and decide if he/she
wants either to keep it or to redo it. If the user decides to keep it, the 3D
model of the earring is rendered on the display. The user has three trans-
formation options ( translation, rotation and scaling ) and three operation
options ( add earring, remove earring, change focus ). While the transfor-
mation options self explanatory, the operations have some constraints. In
order to add an earring, there must be only one earring on the display. In
order to remove an earring, it must be two earrings on the display. The
change focus operation switch the focus from one earring to the other, off
course it works only with two earrings over the display.
Figure 5.7: Try on photo activity operations
Since keeping allocating 3D models is memory consuming, they are
never deleted or really removed, just an internal variable of the earring (
visibility ) is set to false and the earring is not rendered until it changes
back to true.
CHAPTER 5. PROTOTYPE OVERVIEW 69
5.2.4 Try on virtual mirror
The activity receives the earring’s 3D model and, in background, it starts
preparing it. Meanwhile, it starts previewing the frontal camera on the
display. The face detection function starts as soon as the camera preview
does. there are four buttons on the bottom of the display: " Change the
earring size", "Change The distance" ( it is used to change the horizontal
distance between the earrings ), "Restore default size" and "Restore default
distance". The buttons can be used to improve the accuracy of the earring
location or to enlarge the earring in case it is not visible.
Figure 5.8: Try virtual mirror activity overview
It is important to remark that, even if the face detection function does
not provided the necessary features for the head pose estimation, it has
proved fast and robust against different illumination conditions.
CHAPTER 5. PROTOTYPE OVERVIEW 70
Figure 5.9: Android face detection performance on different illumination conditions
Being the application a mobile application, it is to be expected that it
has to work in different places each under different illumination conditions
instead than on a single room with always the same illumination conditions
as it happens for most desktop applications.
Chapter 6
Users evaluation
In this chapter I’ll evaluate the application. I am going to establish questions
that the users have to answer in order to provide an evaluation of the application
and verify if the goals of this thesis have been acheived. That is to say: increase
as much as possible the time performance of the application, verify the acceptance
of the mirror functionality based on the android 4.0 face detection and verify the
effectiveness of an E-commerce application with virtual mirror functionality ) have
been achieved.
6.1 Application operational time list
The following list contains the required amount of time for each activ-
ity:
Main gallery:
• Download the catalogue: ∼ 7.85 seconds
• Prepare the catalogue: no delay
• Update the "current focused earring window": no delay
• Prepare the data of the selected earring for the next activity: ∼0.13
seconds
71
CHAPTER 6. USERS EVALUATION 72
• Start the "Try on photo" activity: no delay
Earring in detail:
• Download the 3D model: ∼1.73 seconds
• Prepare the downloaded 3D model to be passed ( add it to the intent
) to the next acivity (either "try on photo" or "Try on virtual mirror"):
∼0.19 seconds
• Start the next activity: no delay
Try on photo:
• Take a picture: ∼0.57 seconds
• Confirm the picture: ∼0.34 seconds
• Go back to take another picture: ∼0.57 seconds
• Prepare and render the 3D models: ∼0.15 seconds
• Translation, Rotation, Scaling of the earring: no delay
• Add earring, Delete earring, Change focus operations: no delay
Try on virtual mirror:
• Start camera preview: ∼0.57 seconds
• Prepare and render the 3D models: ∼0.15 seconds
• Change earrings’ size or distance operations: no delay
• Restore earrings’ original size or distance: no delay
In the following paragraph I am going to provide users impressions of
the application’s time performance.
CHAPTER 6. USERS EVALUATION 73
6.1.1 Time performance evaluation
Speed performance of the application was one of the main goals of this
thesis. If the application is reactive, the user will most likely be hold back
from stopping using it. In order to evaluate the time performance of the
application, users were asked to evaluate it by filling the following bench-
mark form:
A. In a scale from 1 ( very slow ) to 5 ( very fast ), how do you evaluate:
1. The required time to load the catalogue.
2. The required time to see the details of the earring.
3. The required time to be able to try the "try on photo" functionality.
4. The required time to be able to try the "try on virtual mirror" func-
tionality.
Users ( 40 testers ) answers were:
Question Answer = 1 Answer = 2 Answer = 3 Answer = 4 Answer = 5
#1 2 [5%] 8 [20%] 13 [32.5%] 14 [35%] 3 [7.5%]
#2 0 [0%] 0 [0%] 0 [0%] 3 [7.5%] 37 [92.5%]
#3 0 [0%] 0 [0%] 0 [0%] 5 [12.5%] 35 [87.5%]
#4 0 [0%] 0 [0%] 0 [0%] 5 [12.5%] 35 [87.5%]
Table 6.1: Answers to question A1, A2, A3 and A4
92.5%7.5%
YesNo
Figure 6.1: Answers to question B
B. Do you think that the ini-
tial slideshow is useful to mask
the catalogue loading time? Yes/
No
CHAPTER 6. USERS EVALUATION 74
6.2 Virtual mirror activity analysis
The android 4.0 face detection time performance is satisfactory. On the
"Samsung Galaxy Nexus" smartphone, the average frame rate ( with the 3D
models of the earrings in the display ) is ∼20 FPS while the required time
to find the face on the preview is ∼1.82 seconds. In order to evaluate the
virtual mirror functionality, users were asked to evaluate it by filling the
following form:
C. In a scale from 1 ( minimum ) to 5 ( maximum ), how do you evaluate:
1. The fluency of the virtual mirror activity.
2. The speed required to find the face over the display.
3. The accuracy of finding the face over the display.
4. The accuracy of activity for positioning the models on face.
5. The effectiveness of the additional commands ( change size, change
distance etc).
Users ( 40 testers ) answers were:
Question Answer = 1 Answer = 2 Answer = 3 Answer = 4 Answer = 5
#1 0 [0%] 0 [0%] 0 [0%] 2 [5%] 38 [95%]
#2 0 [0%] 0 [0%] 3 [7.5%] 33 [82.5%] 4 [10%]
#3 2 [5%] 8 [20%] 13 [32.5%] 14 [35%] 3 [7.5%]
#4 9 [22.5%] 19 [47.5%] 12 [30%] 0 [0%] 0 [0%]
#5 0 [0%] 0 [0%] 2 [5%] 7 [17.5%] 31 [77.5%]
Table 6.2: Answers to questions C1, C2, C3, C4 and C5
CHAPTER 6. USERS EVALUATION 75
D. What do you think of the following virtual mirror activity issues:
1. Aren’t the models not realistic enough ? Yes/No
2. Do the models follow accurately head’s motions ? Yes/No
3. Are the models too small ? Yes/No
Users ( 40 testers ) answers were:
Yes
57.5%
No
42.5%
Yes15%
No
85%
Yes
72.5%
No
27.5%
Figure 6.2: Answers to, left to right, questions D1, D2 and D3
Yes 100% No0%
Figure 6.3: Answers to question E
E. Do you think that by increas-
ing the accuracy of the earrings
over the head would increase the
appreciation level of the activity?
Yes/No
Yes10%
No
90%
Figure 6.4: Answers to question E1
1. If yes, do you think it would
even if this means ( considerably )
slowing down the fluency of the ac-
tivity? Yes/No
CHAPTER 6. USERS EVALUATION 76
6.3 Application effectiveness
The application is an E-commerce one that includes augmented reality
virtual mirror activity in order to persuade the users to purchase an item
from the catalogue. I am going to now provide the benchmarks used to
evaluate if the "user persuasion" goal has been achieved or not.
F. Have you ever purchased clothes or accessories from a website or an
application? Yes/No
Users ( 40 testers ) answers were:
Yes
67.5%
No
32.5%
Figure 6.5: Answers to question F
1. If yes, how frequently? Once per: Day/Week/Month/Year/Other
Users ( 27 testers that voted "Yes" at question F ) answers were:
Day0%
Week14.8%
Month
55.6%
year7.4% Other
22.2%
Figure 6.6: Answers to question F1
CHAPTER 6. USERS EVALUATION 77
2. If yes, do you usually purchase items from your desktop computer or
a mobile device ? Desktop / Smartphone or Tablet / Both
Users ( 27 testers that voted "Yes" at question F ) answers were:
Desktop
62.9%
Smartphone or Tablet
11.1%Both
26%
Figure 6.7: Answers to question F2
3. If yes, has ( at least ) one of the websites/applications you’ve pur-
chased from a virtual mirror functionality included to try the models? Yes/No
Users ( 27 testers that voted "No" at question F ) answers were:
Yes0%No 100%
Figure 6.8: Answers to question F3
Yes
77.5%
No
22.5%
Figure 6.9: Answers to question G
G. Regardless of the fact that
you may not purchase online items
and the earring models in the cat-
alogue, do you think you’d ever
buy an earring from the applica-
tion? Yes/No
Users ( 40 testers ) answers were:
CHAPTER 6. USERS EVALUATION 78
1. If yes, which aspect of the application did persuade you? It’s simple
and neat/ The virtual mirror functionality / The try on photo functionality
/ All of these
Users ( 31 testers that answered "Yes" ) answers were:
It’s simple and neat0%
The virtual mirror functionality
22.6%
The try on photo functionality
6.4%
All of these
71%
Figure 6.10: Answers to question G1
2. If no, why? I do not like the items / The application doesn’t seem
trustworthy/ Other
Users ( 9 testers that answered "No" ) answers were:
I do not like the items
22.2%
The application doesn’t seem trustworthy
77.8%Other0%
Figure 6.11: Answers to question G2
CHAPTER 6. USERS EVALUATION 79
H. Do you think the virtual mirror functionality improves the persua-
sion of the application?
Users ( 40 testers ) answers were:
Yes95%
No5%
Figure 6.12: Answers to question H
1. If yes, why? It allows to try the model/ It entertains/ it adds prestige
to the application
Users ( 31 testers that answered "Yes" ) answers were:
It allows to try the model
92.1%
It entertains7.9%
it adds prestige to the application0%
Figure 6.13: Answers to question H1
CHAPTER 6. USERS EVALUATION 80
2. Would you like that other Websites/applications included virtual
mirror functionality? Yes/No
Users ( 40 testers ) answers were:
Yes 100% No0%
Figure 6.14: Answers to question H2
I. Are there other features ( either from other websites/applications or
still unavailable ) that you’d like to add to the application? Yes/No
Users ( 40 testers ) answers were:
Yes
77.5%
No
22.5%
Figure 6.15: Answers to question I
1. If yes, which features?
All the 31 users that answered "Yes" to question "I", asked for the possibility
to change the earring from one model to another, dynamically, in both "try
on photo" and "try on virtual mirror".
CHAPTER 6. USERS EVALUATION 81
6.4 Application evaluation
The application is an E-commerce application that includes augmented
reality functionality, based on android 4.0 face detection, in order to in-
crease its persuasion over users and encourage them to purchase items
from the catalogue. The users have been asked many questions about the
application time performance and the importance of the virtual mirror func-
tion inside the application.
From the results of the questions, it is possible to establish that:
From the time performance point of view, users are satisfied with the
application, their answers state that they are not likely to stop using the
application due to its speed. In particular, users appreciated the use of the
slideshow to mask the download time, since their attention focus on the
slideshow rather than on the download time is. This states that the goal of
time performance has been achieved.
Concerning the virtual mirror functionality from a technical point of view,
users stated that it is fluent and enjoyable with some not ignorable lacks
for what concern its accuracy in case of non frontal face and the unrealistic
earrings’ model. When reputedly proposed the tradeoff between accuracy
and computational speed, users showed more appreciation of the current
status than of a more accurate and slower virtual mirror functionality. An-
droid 4.0 face detection has been proved fast and efficient however, in or-
der to increase users’ appreciation of the virtual mirror functionality, it is
necessary to implement a new face detection function that is both fast and
efficient that can be efficiently used to build a head pose estimator on top
of it. The goal of building a virtual mirror application using the android 4.0
face detection function is partially achieved.
Concerning the efficiency of introducing the virtual mirror functionality in
an E-commerce application, users stated that it has been successful: Users
CHAPTER 6. USERS EVALUATION 82
considerably appreciated the possibility of having a preview of how the
item would look like on them. The virtual mirror functionality is a feature
that most of the current E-commerce applications lack and that the 95%
of the users appreciated.The choice of using android based smartphone as
platform has been proved, at least at the moment, unsatisfactory, since only
the 11% of the users do purchase online items only from smartphone and
tablet ( and this data does not take into account that there is percentage of
users that uses non android based devices). The data increases significantly
(38% ) if we consider the number of users that perform some online pur-
chases from both computer and smartphone/tablet ( again, this data does
not distinguish android users from other mobile OS users). Finally, the ap-
plication has successfully achieved the goal of increase its own persuasion
level over users by means of the virtual mirror functionality, 77.5% of the
users would purchase an item from the application ( only the 67.5% of the
users had declared that perform online purchases ).
Chapter 7
Conclusions and future works
In this last chapter, I am going to provide the conclusion of this thesis based on
the users answers and the possible future works in order to continue this thesis.
7.1 Conclusions
Tests have proved that the union of virtual mirror functionalities in E-
commerce applications increases the persuasion level of the application.
Users will most likely purchase the items that they can not only see in de-
tail, but also "try" and have a forecast of how that item would look like on
them. The persuasion level however, could be further increased by solving
the virtual mirror functionality issues: increasing the realism and the accu-
racy of the 3D models, adding incrustation effect and using bigger objects
( like, for example, glasses, hats etc ) instead of small objects like earrings.
The computational download time has been defined satisfactory and effi-
ciently masked by the initial slideshow, users are unlikely to stop using the
application for being slow. The goals of this thesis have been achieved ( al-
though the virtual mirror functionality performance has only partially been
achieved), adding virtual mirror functionalities to an E-commerce applica-
tion increases its effectiveness and grabs users’ attention. Virtual mirror
83
CHAPTER 7. CONCLUSIONS AND FUTURE WORKS 84
functionality should be implemented in every E-commerce application in
order to improve its persuasion effect and increase both the visits rate and
the sale rate.
7.2 Future works
The android 4.0 face detection has been proved efficient and fast. How-
ever, it has also been verified that ( at the moment ) there is no smart-
phone that can benefit from all the features it claims. Using computer vision
libraries for android such as OpenCV’s android porting or Qualcomm’s
FastCV, developing a customized android compatible face detection func-
tionality is possible. Moreover, since the face detection is self made and
customized, it can be tuned to implement a head pose estimator in order to
increase the efficiency of the virtual mirror functionality.
The computational speed of the application has been proved satisfac-
tory. However, in case of larger catalogue, the download time of the cat-
alogue is most likely to increase from the current ∼7.85 seconds. Using
smaller and more compressed images is most likely to improve the perfor-
mance but not to solve entirely the problem. Further researches regarding
the transmission of data via web could improve the performance. The ap-
plication size on the users smartphone is of 3.15Mb, which is fairly smaller
than may other E-commerce applications. It could be worth considering the
tradeoff between the download time and the application memory space:
by saving the catalogue on the smartphone, the download time would be
drastically reduced to catalogue updates only. Concerning the 3D models,
realism should be increased. The 3D models should be optimized while be-
ing designed in the 3D modeling tool and OpenGL ES 2.0 libraries should
be adopted ( instead of the current OpenGL ES 1.0 ) in the client appli-
cation in order to improve the rendering visual effect. Bigger models too
CHAPTER 7. CONCLUSIONS AND FUTURE WORKS 85
should be used: 72.5% of the users declared that the models are too small.
By changing the items of the catalogue to bigger items ( like hats, glasses,
etc) the usability should increase. Moreover, using bigger objects requires
the use of different techniques in order to create the incrustation effect of
the models and to increase the realism of the application.
Bibliography
[1] Ingrid Lunden. Mobile milestone: The number of smart-
phones in use passed 1 billion in q3, says strategy analytics,
url: http://techcrunch.com/2012/10/16/mobile-milestone-the-
number-of-smartphones-in-use-passed-1-billion-in-q3-says-strategy-
analytics/, visited on may 12, 2013, October 16, 2012.
[2] IDC Press Release. Android and ios surge to new smart-
phone os record in second quarter, according to idc , url:
http://www.idc.com/getdoc.jsp?containerid=prus23638712#.uowdbndwuh9,
visited on may 12, 2013, August 08, 2012.
[3] Sangchul Lee and Jae Wook Jeon. Evaluating performance of android
platform using native c for embedded systems. IEEExplore, pages
1160–1163, 2010.
[4] Android Developers. Android ndk, url:
http://developer.android.com/tools/sdk/ndk/index.html, vis-
ited on may 12, 2013.
[5] P. Milgram and F. Kishino. A taxonomy of mixed reality visual dis-
plays, August 25, 1994.
[6] Steve Mann. Mediated reality with implementations for everyday life,
url: http://wearcam.org/presence-connect, visited on may 12, 2013,
August 6, 2002.
86
BIBLIOGRAPHY 87
[7] Wendy E. Mackay. Augmented reality: Linking real and virtual worlds
a new paradigm for interacting with computers. ACM Digital Library,
1998.
[8] Olivier Hugues, Philippe Fuchs, and Olivier Nannipieri. New aug-
mented reality taxonomy: Technologies and features of augmented
environment. In B. Furht, Handbook of Augmented Reality, chapter
2:47–63, 1998.
[9] Marcus Toennis and David A. Plecher. Presentation principles in aug-
mented reality: Classification and categorization guidelines. Institut
fur Informatik der Technischen Universit¨at M¨unchen, 2011.
[10] Dustin D. Brand. Human eye frames per second, url:
http://amo.net/nt/02-21-01fps.html, visited on may 12, 2013,
Feb 21, 2010.
[11] Mitan SoIanki and Vinesh Raja. Haptic based augmented reality sim-
ulator for training clinical breast examination. IEEExplore, pages 265–
269, 2010.
[12] Bernardini Andrea, Delogu Cristina, Pallotti Emiliano, and Costantini
Luca. Living the past: Augmented reality and archeology. IEEExplore,
pages 265–269, July 13, 2012.
[13] Yu Sheng, Theodore C. Yapo, Christopher Young, and Barbara Cut-
ler. Virtual heliodon: Spatially augmented reality for architectural day-
lighting design. IEEExplore, pages 63–70, 2009.
[14] Parhizkar B., Gebril, Z.M., Obeidy W.K., Ngan M.N.A., Chowdhury
S.A., and Lashkari A.H. Android mobile augmented reality applica-
tion based on different learning theories for primary school children.
IEEExplore, pages 404–408, 2012.
BIBLIOGRAPHY 88
[15] Sejin Oh and Yungcheol Byun. The design and implementation of
augmented reality learning systems. IEEExplore, pages 651–654, 2012.
[16] Andrea Bernardini, Cristina Delogu, Emiliano Pallotti, Luca Costan-
tini, Abdulfattah A. Aboaba, Shihab A. Hameed, Othman O. Khal-
ifa, Aisha H. Abdalla, Rahmat H. Harun, and Nurzaini Rose Mohd
Zain. Towards a computational effective paradigm for image guided
surgery (igs). IEEExplore, pages 1–5, 2011.
[17] Gamage P., Xie, Sheng-Quan Q Shane, Delmas Patrice, Xu Peter, and
Mukherjee S. Intra-operative 3d pose estimation of fractured bone
segments for image guided orthopedic surgery. IEEExplore, pages 288–
293, 2009.
[18] Jan Smisek, Michal Jancosek, and Tomas Pajdla. 3d with kinect. IEE-
Explore, pages 1154–1160, 2011.
[19] Victor Fragoso, Steffen Gauglitz, Shane Zamora, Jim Kleban, and
Matthew Turk. Translatar: A mobile augmented reality translator. IEE-
Explore, pages 497–502, 2011.
[20] Steven J. Henderson and Steven Feiner. Evaluating the benefits of aug-
mented reality for task localization in maintenance of an armored per-
sonnel carrier turret. IEEExplore, pages 135–144, 2009.
[21] Christoph Bichlmeier, Sandro Michael Heining, Marco Feuerstein, and
Nassir Navab. The virtual mirror: A new interaction paradigm for
augmented reality environments. IEEExplore, pages 1498–1510, 2009.
[22] Jerome Grosjean and Sabine Coquillart. The magic mirror: A
metaphor for assisting the exploration of virtual worlds, Dec. 2008.
[23] Alexandre François, Elaine Kang, and Umberto Malesci. handheld
virtual mirror. ACM Digital Library, pages 140–140, 2002.
BIBLIOGRAPHY 89
[24] T. Darrell, G. Gordon, J. Woodfill, and M. Harville. A virtual mirror
interface using real-time robust face tracking. IEEExplore, pages 616–
621, 1998.
[25] S. Pardhy, C. Shankwitz, and M. Donath. A virtual mirror for assisting
drivers. IEEExplore, pages 255–260, 2000.
[26] Ray Ban. Virtual mirror, url: http://www.ray-
ban.com/italy/science/virtual-mirror, visited on may 12, 2013.
[27] silhouette. Silhouette imirror, url:
http://www.silhouette.com/it/it/look/imirror/, visited on may
12, 2013.
[28] P. Eisert, P. Fechteler, and J. Rurainsky. 3-d tracking of shoes for virtual
mirror applications. IEEExplore, pages 1–6, 2008.
[29] Lu Wang, Ryan Villamil, Supun Samarasekera, and Rakesh Kumar.
Magic mirror: A virtual handbag shopping system. IEEExplore, pages
19–24, 2012.
[30] Erik Murphy-Chutorian and Mohan Manubhai. Head pose estimation
in computer vision: A survey. IEEExplore, pages 607–626, 2009.
[31] Jun Park and Woohun Lee. Augmented E-commerce: Making Augmented
Reality Usable in Everyday E-commerce with Laser Projection Tracking.
Springer Berlin Heidelberg, 2004.
[32] Jakob Nielsen. Jakob nielsen’s alertbox website response times, url:
http://www.nngroup.com/articles/website-response-times/, visited
on may 12, 2013, June 21, 2010.
[33] Android Developers. Retain an ob-
ject during a configuration change, url:
BIBLIOGRAPHY 90
http://developer.android.com/guide/topics/resources/runtime-
changes.html, visited on may 12, 2013.
[34] Fatna Belqasmi, Jagdeep Singh, Suhib Younis, Bani Melhem, and
Roch H. Glitho. Soap-based vs. restful web services a case study for
multemedia conferencing. IEEExplore, pages 54–63, 2012.
[35] Tommi Aihkisalo and Tuomas Paaso. Latencies of service invocation
and processing of the rest and soap web service interfaces. IEEExplore,
pages 100–107, 2012.
[36] Yatin V. Patil. What’s the difference between wcf and web
services?, url: http://www.nngroup.com/articles/website-response-
times/, visited on may 12, 2013, Dec 26, 2010.
[37] Webopedia. J2ee, url: http://www.webopedia.com/term/j/j2ee.html,
visited on may 12, 2013.
[38] John Grundy, Zhong Wei, Radu Nicolescu, and Yuhong Cai. An en-
vironment for automated performance evaluation of j2ee and asp.net
thin-client architectures. IEEExplore, pages 300–308, 2004.
[39] Autodesk. Autodesk 3ds max fea-
tures: Core technology & features, url:
http://usa.autodesk.com/adsk/servlet/pc/item?siteid=123112&id=18260605,
visited on may 12, 2013, 2013.
[40] Blender Foundation. Blender features, url:
http://www.blender.org/features-gallery/features/, visited on
may 12, 2013, 2012.
[41] Android Developers. Opengl, url:
http://developer.android.com/guide/topics/graphics/opengl.html,
visited on may 12, 2013.
BIBLIOGRAPHY 91
[42] POWERVR. Migration from opengl es 1.0 to opengl es 2.0. POWERVR,
pages 100–107, Aug 11, 2009.
[43] Assaf Raman, Holger Frydrych, Jim Buck, Dave Rogers, Mattan Furst,
Noam Gat, Brian Johnstone, Jan Drabner, and Murat Sari. Ogre, url:
http://www.ogre3d.org/about, visited on may 12, 2013.
[44] Ogre Forums. Blender exporter, url:
http://www.ogre3d.org/tikiwiki/blender+exporter, visited on
may 12, 2013, June 01, 2012.
[45] Bay Nine Studios. Importing 3d models in android: Textures, url:
http://www.bayninestudios.com/author/admin/, visited on may
12, 2013, August 23, 2011.
[46] Dennis Ippel. Min3d, url: http://code.google.com/p/min3d/, visited
on may 12, 2013, 2010.
[47] Dennis Ippel. Announcing rajawali: An opengl
es 2.0 based 3d framework for android, url:
http://www.rozengain.com/blog/2011/08/23/announcing-
rajawali-an-opengl-es-2-0-based-3d-framework-for-android/, visited
on may 12, 2013, Aug 23, 2011.
[48] Mario Zechner and Nathan Sweet. libgdx introduction, url:
http://code.google.com/p/libgdx/wiki/introduction, visited on
may 12, 2013, Jan 27, 2013.
[49] Android Developers. Camera.face class overview, url:
http://developer.android.com/reference/android/hardware/camera.face.html,
visited on may 12, 2013.
[50] budius. Stackoverflow:which hardware supports eyes and mouth de-
tection?, url: http://stackoverflow.com/questions/14708583/which-
BIBLIOGRAPHY 92
hardware-supports-eyes-and-mouth-detection, visited on may 12,
2013, Feb 5, 2013.
[51] A. Nikolaidis and I. Pitas. Facial feature extraction and pose determi-
nation. Pattern Recognition, pages 300–308, 2000.