Papadakos PhD 2013

203
UNIVERSITY OF CRETE DEPARTMENT OF COMPUTER SCIENCE FACULTY OF SCIENCES AND ENGINEERING Interactive Exploration of Multi-Dimensional Information Spaces with Preference Support by Panagiotis Papadakos PhD Dissertation Presented in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy Heraklion, November 2013

description

Papadakos Panagiotis PhD

Transcript of Papadakos PhD 2013

Page 1: Papadakos PhD 2013

UNIVERSITY OF CRETEDEPARTMENT OF COMPUTER SCIENCE

FACULTY OF SCIENCES AND ENGINEERING

Interactive Explorationof Multi-Dimensional Information Spaces

with Preference Support

by

Panagiotis Papadakos

PhD Dissertation

Presented

in Partial Fulfillment

of the Requirements

for the Degree of

Doctor of Philosophy

Heraklion, November 2013

Page 2: Papadakos PhD 2013
Page 3: Papadakos PhD 2013
Page 4: Papadakos PhD 2013
Page 5: Papadakos PhD 2013

UNIVERSITY OF CRETE

DEPARTMENT OF COMPUTER SCIENCE

Interactive Exploration of Multi-Dimensional Information Spaceswith Preference Support

PhD Dissertation Presented

by Panagiotis Papadakos

in Partial Fulfillment of the Requirements

for the Degree of Doctor of Philosophy

APPROVED BY:

Author: Papadakos Panagiotis

Supervisor: Tzitzikas Yannis, Assistant Professor, University of Crete

Commitee Member: Plexousakis Dimitris, Professor, University of Crete

Commitee Member: Savidis Anthony, Professor, University of Crete

Commitee Member: Spyratos Nicolas, Professor Emeritus, University of Paris-South

Commitee Member: Vassiliadis Panos, Assistant Professor, University of Ioannina

Commitee Member: Rauber Andreas, Associate Professor, Vienna University of Technology

Commitee Member: Paltoglou Georgios, Senior Lecturer, University of Wolverhampton

Department Chairman: Trahanias Panos, Professor, University of Crete

Heraklion, November 2013

Page 6: Papadakos PhD 2013
Page 7: Papadakos PhD 2013

To the sacred

reflexive, symmetric and transitive

relation of

a student and a teacher

Page 8: Papadakos PhD 2013
Page 9: Papadakos PhD 2013

“The only principle that does not inhibit progress is:

anything goes.”

– Paul Karl Feyerabend

Against Method (1976)

Page 10: Papadakos PhD 2013
Page 11: Papadakos PhD 2013
Page 12: Papadakos PhD 2013

The drawing in the previous page sketches the following:

i) Preferences are part of the cognitive process of decision making

ii) This dissertation takes advantage of multi-dimensional hierarchies

iii) The process that any PhD student has to face: starting from a few unrealistic aims, going through the gradual immersion

and disorientation in the ocean of the available knowledge (a difficult and frustrating situation where the help of the advisor

is appreciated), to the final gathering and integration of the contributions (see also the respective drawing in p. 175).

Image taken from beamer XƎLATEX template available from https://github.com/drbunsen/drbunsen-beamer

Page 13: Papadakos PhD 2013

Acknowledgments

The following pages cannot convey my feelings and the experiences that I gained during all these years

of my Doctoral Dissertation odyssey. The blank space was filled by black ink in just a few seconds. Only

a small odour reminds the process of their imprinting. But my ‘imprinting’ was a long process. Differ-

ent ‘typesetters’ wrote their words with different metal sorts in different places. Their printings have

affected my scientific, artistic and human nature, and I owe them my present. Those people I would like

to thank.

The ‘End’ of this work could not have been typed without the undivided and unconditional support

of my supervisor, Assistant Professor Yannis Tzitzikas. Through all these years his academic advice

and directions were always on the ’spot’. I am also grateful to him for his constant mentorship and

for believing in me when I had lost my confidence. Although I was his first PhD student, he managed to

accommodate to the specific particularities of my personality and stimulate my interest and enthusiasm.

What is more important though, is that as blinkers keep horses from seeing what nature meant them to

see, which is just about everything, I was taught to try and remove my mental blinkers.

I want to deeply thank Professor Dimitris Plexousakis, head of Information Systems Laboratory (ISL)

for the time he devoted to me all these years. He has been a critic and at the same time a supportive

advisor. His insights have been really inspiring and crucial. As the head of the lab he created a highly

creative and inspiring environment for me.

I would like to express my sincere appreciation to the third member of my advisory committee, As-

sociate Professor Anthony Savidis, who was my supervisor during my MSc “3 Dimensional CRC” voyage

in the Human Computer Interaction (HCI) lab. Although the initial plans of my PhD thesis changed to

unknown territories for him, he managed to understand my work and provide guidance and comments.

Furthermore I am indebted to the other members of my examination committee, Professor Emeritus

Nicolas Spyratos, Assistant Professor Panos Vassiliadis, Associate Professor Andreas Rauber and Senior

Lecturer Georgios Paltoglou, for their constructive comments and suggestions.

xiii

Page 14: Papadakos PhD 2013

I was fortunate to meet Irini Fundulaki and Kostas Stefanidis, two researchers who helped me a lot

to gain self-confidence. Irini motivated a number of interesting discussions that helped me understand

deeper my work. Kostas is the motivating example of a young, smart, capable and passionate researcher.

I wish him all the best to his career.

Moreover I would like to acknowledge the support of the Institute of Computer Science of the Foun-

dation of Research and Technology (FORTH-ICS) and especially the ISL, both financially and for the facil-

ities (the lights of the laboratory were kept on until early morning some times). It is a nice place to be,

with exceptional people who elicit inspiring discussions.

This research has been co-financed by the European Union (European Social Fund - ESF) and Greek

national funds through the Operation Program ”Education and LifeLong Learning” of the National Strate-

gic Reference Framework (NSRF) - Research Funding Program: “Herakleitus II. Investing in knowledge

society through the European Social Fund”. Despite the above formal words that I have to write, this

financial support has been really important for me, especially during this financial crisis period.

Finally, I want to thank the following persons with whom I spent a lot of time all these years: Nikos

Tsagkarakis for all the time that we spent together, our discussions, the summits we reached, for cultivat-

ing ’our’ vineyard and drinking the raki ’spirit’ we produced, Anna-Maria Papadaki for being an ‘earthy’

human being, Aristea Papadimitriou for her gaze and our philosophical discussions, Georgia Troullinou

for taking care of me when I was for a second time an ‘infant’, Michalis Papadakis for his ’amanedes’, Dim-

itris Robotis for cooking on Sundays, Andreas Sfakianakis for being bald, Despoina Pavlidi for the house

in Panagia, Sofia Skandali for our trips, Christina Lantzaki for our interesting discussions on graphs,

Nikos Manolis and Maria Psaraki for the times in the ’Lefka’ basements, ”Aksas” for not listening to his

name Nikos, my colleagues in FORTH Patkos Theodore, Yannis Marketakis, Pavlos Fafalios, Nikos Arme-

natzoglou, Yannis Kitsos, Dimitris Andreou, Stella Kopidaki, and Yannis Kargakis, Corina Doerr for the

nice logo of Hippalus and Ionas for the name Hippalus, Yannis Roussakis for ping-pong, my neighbours

in the lab, George Baryannis and Chrysostomos Zeginis, as long as Ioannis Chrysakis, Dimitra Zografis-

tou, Roula Avgoustaki, Lida Charami, Athina Kritsotaki, Irini Maravellia and Manos Papadakis for their

patience, Maria Moutsaki for ‘scanning’ and Dimitris Aggelakis for ‘windows’, George Konstantinidis for

our friendship before he left Greece, and Dimitra Makri for her understanding.

This work is a result of the constant support of my parents, Stavros and Maria, and my two sisters

Stavroula and Katerina. They always believed and supported me in any possible way.

xiv

Page 15: Papadakos PhD 2013

Abstract

Users access large amounts of information resources (documents or data) mainly through search func-

tions, where they type a few words and the system (web search engine, query engine) returns a linear

list of hits. While this is often satisfactory for focalized search, it does not provide enough support for

recall-oriented (exploratory) information needs, which constitute the majority according to various user

studies.

The interaction of Faceted and Dynamic Taxonomies (FDT), is a highly prevalent model for explo-

ratory search, which allows users to get an overview of the information space (e.g. search results) and

offer them various groupings of the results (based on their attributes, metadata, or other dynamically

mined information). These groupings enable users to restrict their focus gradually and in a simple way

(through clicks, i.e. without having to formulate queries), enabling them to locate resources that would

be difficult to locate otherwise (especially the low ranked ones).

The enrichment of search mechanisms with preferences could be proved useful for recall-oriented

information needs. However, the current approaches for preference-based access (mainly from the area

of databases), seem to ignore the fact that users should be acquainted with the information space and

the available choices for describing effectively their preferences.

In this dissertation we extend the interaction model of FDT with preference actions that allow users

to express their preferences interactively, gradually, and in a simple way.

Initially, we introduce a preference framework appropriate for information spaces comprising re-

sources described by attributes whose values can be hierarchically valued and/or multi-valued. We

define the language, its semantics and the required algorithms. The framework supports preference

inheritance in the hierarchies, automatic conflict resolution, as well as preference composition (priori-

tization, Pareto and their combination).

Subsequently, we enrich the FDT model with preference actions and we propose logical optimiza-

tions and methods for exploiting the intrinsic characteristics of the FDT-based interaction, aiming at

xv

Page 16: Papadakos PhD 2013

making it applicable to large amounts of information. Then, we present the design and the implementa-

tion of the web-based system Hippalus, which realizes the extended interaction model.

Regarding user benefits, at first we theoretically analyze user gain in terms of the number and diffi-

culty of choices, and then we describe and analyze three user-based evaluations that we have conducted.

The first investigates the degree of effectiveness of preferences (and the effort to express them) when

users are not aware of the available choices. The results showed that only 20% of the users managed to

express effective preferences without knowing the available choices.

The second comparatively evaluates FDT and other exploratory models. The results showed that the

majority of users preferred FDT, was more satisfied by FDT and achieved higher rates of task completion

with FDT.

The last one concerns the evaluation of the preference-enriched FDT as realized by Hippalus. The

results were impressive. Even in a very small dataset, with the preference-enriched FDT all users suc-

cessfully completed all tasks in 1/3 of the time and with 1/3 of the actions in comparison to the plain FDT.

Moreover all (100%) of the users (either plain or experts) preferred the preference-enriched interface.

Keywords: Preferences, Exploratory Search, Interactive Information Retrieval, Decision Making

Supervisor: Tzitzikas Yannis

Assistant Professor

Computer Science Department

University of Crete

xvi

Page 17: Papadakos PhD 2013

Περίληψη

Η πρόσβαση των χρηστών σε μεγάλους όγκους πληροφοριακών πόρων (δεδομένων ή εγγράφων) συνήθως

γίνεται μέσω λειτουργιών αναζήτησης όπου οι χρήστες παραδοσιακά πληκτρολογούν μερικές λέξεις κλει-

διά και το σύστημα αναζήτησης (π.χ. η μηχανή αναζήτησης ή σύστημα αποτίμησης επερωτήσεων) επι-

στρέφει μία γραμμική λίστα «επιτυχιών» (hits). Αν και αυτό είναι ικανοποιητικό για τις ανάγκες της

επικεντρωμένης αναζήτησης (focalized search), αυτού του τύπου οι αποκρίσεις δεν παρέχουν επαρκή

υποστήριξη σε ανάγκες εξερευνητικού χαρακτήρα (recall oriented), οι οποίες, κατά διάφορες μελέτες,

είναι και οι περισσότερες.

Ένα ευρέως πλέον διαδεδομένο μοντέλο εξερευνητικής αναζήτησης είναι η αλληλεπίδραση μέσω

Πολυεδρικών και Δυναμικών Ταξινομιών (ΠΔΤ). Το μοντέλο αυτό επιτρέπει στους χρήστες να εποπτεύ-

σουν τον πληροφοριακό χώρο, π.χ. τα αποτελέσματα μιας αναζήτησης, προσφέροντας τους διάφορες

ομαδοποιήσεις των αποτελεσμάτων (βάσει των γνωρισμάτων τους, των μεταδεδομένων τους, ή άλλων

δυναμικά εξηγμένων πληροφοριών). Οι ομαδοποιήσεις αυτές επιτρέπουν στους χρήστες να περιορίσουν

το επίκεντρο τους σταδιακά, και με απλό τρόπο (απλά κλικς), χωρίς δηλαδή να χρειάζεται η διατύπωση

επερωτήσεων, και εν τέλει να βρουν πηγές που θα ήταν δύσκολο να βρεθούν στη γραμμική λίστα αποτε-

λεσμάτων λόγω της χαμηλής τους κατάταξης.

Ο εμπλουτισμός των μηχανισμών αναζήτησης με προτιμήσεις θα μπορούσε να αποδειχθεί ιδιαίτερα

χρήσιμος σε ανάγκες εξερευνητικού χαρακτήρα (recall oriented), όμως οι τρέχουσες προσεγγίσεις πρό-

σβασης πληροφορίας με υποστήριξη προτιμήσεων (που προέρχονται κυρίως από το χώρο των βάσεων

δεδομένων), αγνοούν το γεγονός ότι οι χρήστες πρέπει να είναι εξοικειωμένοι με τον πληροφοριακό

χώρο και τις διαθέσιμες επιλογές για να μπορέσουν να περιγράψουν αποτελεσματικά τις προτιμήσεις

τους.

Σε αυτή τη διατριβή επεκτείνουμε το μοντέλο αλληλεπίδρασης των ΠΔΤ με δράσεις που επιτρέπουν

στους χρήστες να εκφράσουν τις προτιμήσεις τους διαλογικά, σταδιακά, και με απλό τρόπο.

Αρχικά εισάγουμε ένα μοντέλο προτιμήσεων κατάλληλο για πληροφοριακούς χώρους αποτελούμε-

xvii

Page 18: Papadakos PhD 2013

νους από πόρους που περιγράφονται από γνωρίσματα των οποίων οι τιμές μπορεί να είναι ιεραρχικά

οργανωμένες ή/και πλειότιμες. Ορίζουμε τη γλώσσα, τη σημασιολογία της και τους σχετικούς αλγόρι-

θμους. Το μοντέλο υποστηρίζει κληρονομικότητα προτιμήσεων στις ιεραρχίες και αυτόματη επίλυση

συγκρούσεων, καθώς και τελεστές σύνθεσης προτιμήσεων (προτεραιοποίηση, Pareto και συνδυασμός

τους).

Εν συνεχεία εμπλουτίζουμε το μοντέλο ΠΔΤ με δράσεις προτίμησης και προτείνουμε διάφορες βελτι-

στοποιήσεις και τρόπους αξιοποίησης των εγγενών χαρακτηριστικών των ΠΔΤ για την εφαρμοσιμότητα

του μοντέλου σε μεγάλους όγκους πληροφορίας. Κατόπιν παρουσιάζουμε τη σχεδίαση και υλοποίηση του

ιστο-συστήματος Hippalus, που υλοποιεί το εκτεταμένο μοντέλο αλληλεπίδρασης.

Σχετικά με το όφελος για το χρήστη, αρχικά αναλύουμε θεωρητικά τα οφέλη βάσει του πλήθους

των επιλογών και της δυσκολίας αποφάσεων που καλείται να πάρει, και εν συνεχεία περιγράφουμε και

αναλύουμε τα αποτελέσματα τριών αξιολογήσεων από χρήστες.

Η πρώτη διερευνά το βαθμό αποτελεσματικότητας των προτιμήσεων (και τον κόπο διατύπωσής τους)

όταν ο χρήστης δεν έχει γνώση των διαθέσιμων επιλογών. Τα αποτελέσματα έδειξαν ότι μόνο το 20% των

χρηστών μπορούν να εκφράσουν αποτελεσματικές προτιμήσεις χωρίς γνώση των διαθέσιμων επιλογών.

Η δεύτερη αξιολογεί την αποδοτικότητα των ΠΔΤ έναντι άλλων εξερευνητικών μοντέλων, και τα

αποτελέσματα έδειξαν ότι οι ΠΔΤ προτιμήθηκαν από το μεγαλύτερο μέρος των χρηστών, προσέφεραν

μεγαλύτερη ικανοποίηση και οδήγησαν σε μεγαλύτερα ποσοστά ολοκλήρωσης των εργασιών.

Η τρίτη αφορά το εκτεταμένο με προτιμήσεις μοντέλο ΠΔΤ και η αξιολόγηση έγινε χρησιμοποιώντας

το σύστημα Hippalus. Τα αποτελέσματα ήταν εντυπωσιακά. Ακόμα και σε πολύ μικρές συλλογές, με τη

χρήση της διεπαφής με προτιμήσεις, όλοι οι χρήστες ολοκλήρωσαν με επιτυχία όλες τις εργασίες στο 1/3

του χρόνου (!) και με υποτριπλάσιες ενέργειες σε σχέση με την απλή ΠΔΤ. Επίσης το 100% των χρηστών,

απλών και έμπειρων, προτίμησε την εμπλουτισμένη με προτιμήσεις διεπαφή.

Keywords: Προτιμήσεις, Εξερευνητική Αναζήτηση, Αλληλεπιδραστική Ανάκτηση Πληροφορίας, Πά-

ρσιμο Αποφάσεων

Επόπτης: Τζίτζικας Ιωάννης

Επίκουρος Καθηγητής

Τμήμα Επιστήμης Υπολογιστών

Πανεπιστήμιο Κρήτης

xviii

Page 19: Papadakos PhD 2013

Contents

Page

Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii

Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xv

Εκτεταμένη Περίληψη . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvii

Table of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xix

List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxiii

List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxv

1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.1 Objective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.2 Context, Approach and Research Questions . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.3 Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.4 Produced Publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

1.5 Thesis Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2 Background and Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.1 Context: Exploration through FDT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2.2 Preference Management in General . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

2.2.1 Various Perspectives of Preference Management . . . . . . . . . . . . . . . . . . 17

2.3 Faceted and Dynamic Taxonomies (FDT) and Preferences: Motivation . . . . . . . . . . 21

2.4 The Database World . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

2.5 IR and Preferences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

2.6 FDT and Preferences: Past and Related Works . . . . . . . . . . . . . . . . . . . . . . . . 25

2.7 Motivation and Running Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

3 A Preference Framework for Multidimensional Information Spaces . . . . . . . . . . . . . . . 35

3.1 Syntax of the Language . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

3.2 The Domain of Semantics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

xix

Page 20: Papadakos PhD 2013

3.3 Syntax to Semantics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

3.3.1 Flat Single-Valued Attributes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

3.3.2 Set-Valued Attributes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

3.3.3 Best/Worst Preferences over Hierarchically Organized Values . . . . . . . . . . 46

3.3.4 Relative Preferences over Hierarchically Organized Values . . . . . . . . . . . . 52

3.3.5 Preferences over Hierarchical Set-Valued Attributes . . . . . . . . . . . . . . . . 59

3.4 Multi-Facet Preferences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

3.4.1 Prioritized Composition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

3.4.2 Pareto Composition and Best Matches Only (BMO)-set . . . . . . . . . . . . . . . 63

3.4.3 Combination of Priority and Pareto Compositions . . . . . . . . . . . . . . . . . 65

3.5 A Complete Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

4 Complexity and Optimizations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

4.1 Computational Complexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76

4.2 Optimizations for Deriving the Preference-based Order . . . . . . . . . . . . . . . . . . 78

4.2.1 An Algorithm based on the Focal Object Set . . . . . . . . . . . . . . . . . . . . . 78

4.2.2 Optimizations for Capturing Set-Valued Attributes and Top-K Requirements . . 82

4.3 Optimizations for Multi-Facet Preferences . . . . . . . . . . . . . . . . . . . . . . . . . . 85

4.3.1 Prioritized Composition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85

4.3.2 Pareto Composition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86

4.3.3 Combination of Priority and Pareto Compositions . . . . . . . . . . . . . . . . . 87

5 Applicability and the System Hippalus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89

5.1 Application in Searching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90

5.1.1 Case: Web Searching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91

5.1.2 Case: Relational Databases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94

5.1.3 Case: RDF Bases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96

5.2 Hippalus: A Preference Enriched Faceted Exploratory System . . . . . . . . . . . . . . 98

5.2.1 Software Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98

5.2.2 Visualization and User Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . 100

5.2.3 Interaction Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102

6 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109

xx

Page 21: Papadakos PhD 2013

6.1 Evaluation Approaches & Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110

6.1.1 Metrics for Exploratory Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110

6.1.2 Metrics Related to the Proposed Interaction Scheme . . . . . . . . . . . . . . . . 114

6.2 Theoretical Analysis of the Number of User Decisions and Effort in FDT . . . . . . . . . . 116

6.3 DiFEPreKO Hypothesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119

6.3.1 Analytical Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119

6.3.2 User Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121

6.4 Evaluation of Various Exploration Approaches . . . . . . . . . . . . . . . . . . . . . . . 131

6.5 Evaluation of Hippalus System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134

6.6 Evaluation Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144

7 Conclusion and Future Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147

7.1 Synopsis of Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147

7.2 Directions for Future Work and Research . . . . . . . . . . . . . . . . . . . . . . . . . . 148

Appendices

A Complete Syntax of Preference Language . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171

B Binary Relations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173

C Acronyms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175

xxi

Page 22: Papadakos PhD 2013

xxii

Page 23: Papadakos PhD 2013

List of Figures

2.1 Dynamic Taxonomies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2.2 Finding a Hotel in the Island of Symi (FDT over booking.com) . . . . . . . . . . . . . . . 14

2.3 Checking Olympus Cameras (FDT over eBay) . . . . . . . . . . . . . . . . . . . . . . . . 15

2.4 FTD-based GUI of the Mitos WSE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

2.5 Distinctions of Preference Management Approaches . . . . . . . . . . . . . . . . . . . . 19

2.6 SciNet Prototype . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

2.7 Example Taxonomies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

3.1 Hasse Diagram of Preference Relation Over E (E,R≻) . . . . . . . . . . . . . . . . . . . 40

3.2 Example for Flat Single-Valued Attributes . . . . . . . . . . . . . . . . . . . . . . . . . . 43

3.3 Example for a DAG . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

3.4 Example for Flat Multi-Valued Attributes . . . . . . . . . . . . . . . . . . . . . . . . . . 45

3.5 Example of Preferences Without Exploiting Hierarchies . . . . . . . . . . . . . . . . . . 47

3.6 Hasse Diagram of Actions Refinement . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

3.7 Taxonomy of Manufactures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

3.8 Hasse Diagram of Scope-Based Ordering of Preference Actions . . . . . . . . . . . . . . . 54

3.9 Examples of Conflicts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

3.10 Relative Inherited Preferences and Conflicts Examples . . . . . . . . . . . . . . . . . . . 57

3.11 Examples of Cycles of the Form e ≺ e′ ≺ e . . . . . . . . . . . . . . . . . . . . . . . . . . 58

3.12 Hasse Diagram of the Relation R for the Manufacturer Attribute . . . . . . . . . . . . . . 59

3.13 Scope Based Ordering of Actions (Left for Best/Worst Actions, Right for Relative Prefer-

ence Actions): Complete Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

3.14 Hasse Diagram for the Relation Rbw: Complete Example . . . . . . . . . . . . . . . . . . 69

3.15 Hasse Diagram for the Relation R≻: Complete Example . . . . . . . . . . . . . . . . . . 69

3.16 Hasse Diagram for the Relation R: Complete Example . . . . . . . . . . . . . . . . . . . 70

xxiii

Page 24: Papadakos PhD 2013

3.17 Hasse Diagram for Ordering Ordering Multi-Valued Attributes According to MoreWins

Rule: Complete Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

3.18 Hasse Diagram for Ordering Multi-Valued Attributes According to MoreGoodLessBad Rule:

Complete Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

5.1 Processes of Web Searching and Exploratory Web Searching . . . . . . . . . . . . . . . . 90

5.2 Process of Exploratory Web Searching Enhanced with Preference Actions . . . . . . . . 91

5.3 Mitos GUI for Exploratory Web Searching . . . . . . . . . . . . . . . . . . . . . . . . . . 93

5.4 Facets and Zoom-Points of Running Example . . . . . . . . . . . . . . . . . . . . . . . . 96

5.5 Example of RDF/S . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97

5.6 System Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99

5.7 The RDF Knowledge Base . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99

5.8 Hippalus: The Main Page of Hippalus . . . . . . . . . . . . . . . . . . . . . . . . . . . 101

5.9 Hippalus: Value Expansion - Object Restriction . . . . . . . . . . . . . . . . . . . . . . 103

5.10 Hippalus: Expression of Relative Preference Korean ≻ European . . . . . . . . . . . 103

5.11 Hippalus: (a): Expressing Preferences, (b): Object Restrictions after Preference Expression 105

5.12 Hippalus: Composition of Preference Actions. Manufacturer Prioritized to Price . . . . . 106

5.13 Hippalus: Composition of Preference Actions. Price Prioritized to Manufacturer . . . . . 106

5.14 Hippalus: Composition of Preference Actions. Default Combination Mode . . . . . . . . 107

5.15 Hippalus: Restricted Focus with Preferences Applied . . . . . . . . . . . . . . . . . . . 107

6.1 Available IR Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112

6.2 Evaluation Step B: Users Select a Car from the List (1st page) . . . . . . . . . . . . . . . . 122

6.3 Evaluation Step B: Users Select a Car from the List (2nd page) . . . . . . . . . . . . . . . 123

6.4 Probabilities and Distribution Function of the Binomial Distribution . . . . . . . . . . . 131

6.5 Comparative Evaluation Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135

6.6 Plurality and Borda results for (a) Ease of Use, (b) Usefulness, (c) Preference and (d) Satisfaction. 141

6.7 Average Values in Last Step of Each Task. (a) for Timings (T) and Actions (A), while (b)

Depicts the Values for Recall (R), Precision (P) and Average Precision (AP) . . . . . . . . 144

xxiv

Page 25: Papadakos PhD 2013

List of Tables

2.1 Basic Notions and Notations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

3.1 Scopes (Direct and Under Inheritance) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

3.2 Scopes: Example for Best/Worst Preferences . . . . . . . . . . . . . . . . . . . . . . . . 48

3.3 Scopes: Example for Relative Preferences . . . . . . . . . . . . . . . . . . . . . . . . . . 54

3.4 Complete Example: Scopes and Active Scopes . . . . . . . . . . . . . . . . . . . . . . . . 69

3.5 Tuples in Database: Complete Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

4.1 PrefOrderOptChanges for Capturing Relative Preferences Over Hierarchically Organized

Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

4.2 Complexity for Non-Optimized and Optimized Alg. PrefOrder and PrefOrderOpt . . . 81

6.1 Choices and Number of Clicks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118

6.2 Example of Hypothesis Evaluation Results . . . . . . . . . . . . . . . . . . . . . . . . . . 126

6.3 Results of the hypothesis evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129

6.4 Percentages of the 30 Users that Expressed a Preference Over a Valid Attribute . . . . . 136

6.5 Graeco-Latin Square Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139

6.6 Plain and Expert Users Average, Max and Min Timings and User Actions for each Task

for both UIs per each User Group . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142

6.7 Plain and Expert Users Average, Max and Min Timings and User Actions per each Task

and All Tasks for both UIs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143

6.8 Plain and Expert Users Recall, Precision and Average Precision Metrics per each and all

Tasks for both UIs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145

xxv

Page 26: Papadakos PhD 2013

xxvi

Page 27: Papadakos PhD 2013

Chapter 1

Introduction

Contents1.1 Objective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.2 Context, Approach and Research Questions . . . . . . . . . . . . . . . . . . . . . 1

1.3 Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.4 Produced Publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

1.5 Thesis Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

1.1 Objective

In one sentence, we could say that the general objective of this thesis is to offer users a flexible and

effective method for accessing large amounts of data, able to support recall-oriented information needs

and decision making.

1.2 Context, Approach and Research Questions

Users access large amounts of information resources (documents or data) mainly through search func-

tions, where the user types a few words and the system (web search engine, query engine) returns a

1

Page 28: Papadakos PhD 2013

2 Chapter 1. Introduction

linear list of hits. While this is often satisfactory for focalized search, where the user knows exactly

what he is looking for and can be satisfied by a single hit, it does not provide enough support for recall-

oriented (exploratory) information needs, which are the majority according to various user studies (Rose

and Levinson (2004); Crawford (2006)). Below we describe in brief the “world of unstructured data” and

the “world of structured data”.

Information Retrieval (IR) is the area of study concerned with the processes by which user queries

to information systems are matched against stored objects (in principle full-text documents), which are

finally returned to the user. Mainly, it is a system-based approach that does not take into consideration

the user, except during query formulation. However, researchers recently have started trying to under-

stand the role of users in IR, since there is a belief that we cannot design effective IR systems without

knowing how users interact with them (Kelly (2009)). This has led to the development of Interactive

Information Retrieval (IIR), where users are studied along with their interactions with systems and in-

formation.

Traditional IR abstracts human interactions and experiences out of the evaluation of a retrieval sys-

tem, and as a result leads to suboptimal IR systems. The interest of the community for a TREC-style

evaluation framework for studying interaction and users led in three recent Tracks. These Tracks in-

clude the Interactive Track (TRECs 3-11), the HARD Track (TRECs 12-14) and ciQA(TRECs 15-16)1. Each

one of them experimented with different evaluation frameworks but none of them was successful in

establishing a generic evaluation framework (Kelly (2009)).

On the other hand, the recent applications must cope with a wide range of data, which can be unstruc-

tured (full text documents), semi-structured (XML) or structured (databases, linked-data). Furthermore

a plethora of new tasks, quite different from the classical query evaluation task, are being performed:

from data mining algorithms and machine learning to collaborative recommendation and filtering. As a

result, there is an interest in the integration of IR and databases, like in Papadakos et al. (2008a); Li et al.

(2011), and the exploitation of available techniques from both scientific regions in a user friendly way.

In the world of structured information (e.g. databases, the Semantic Web) users are offered powerful

and expressive languages to query the underlying information. On the other hand, such query languages

are not directly utilized by end users, since the formulation of queries is a laborious and difficult task

for them (Reisner (1981)). Consequently, efforts for exploiting such languages in user friendly general-

1http://trec.nist.gov/tracks.html

Page 29: Papadakos PhD 2013

1.2. Context, Approach and Research Questions 3

purpose models of exploration/navigation have come up (e.g. Chakrabarti et al. (2004); Oren et al. (2006);

Mäkelä et al. (2006); Hildebrand et al. (2006); Becker and Bizer (2009); Le Phuoc et al. (2010); Ferré and

Hermann (2012)). For instance, the interaction scheme of FDT (Sacco and Tzitzikas (2009)) allows users to

explore the information space and is suitable for recall-oriented tasks, as the ones found in Exploratory

Search (ES)2 and decision making environments. By using the Faceted and Dynamic Taxonomies (FDT)

interaction scheme, users can restrict their focus (object set) without having to formulate queries, but

through a small set of actions (zoom in/out/side). Each action corresponds to a query (formulated on-

the-fly) which can be enacted by a simple click. This approach can be applied over the results of an IR

system (simple user query with relative terms), a relational database (by using available query languages

like SQL) or data published under Semantic Web technologies like RDF/S or OWL data models (querying

them using SPARQL, SQWRL, etc).

In this dissertation we investigate how we can extend these actions with preference management.

Such an extension can further ease the interaction and speed up the restriction of the focus to those

parts of the information space that the user is interested in. Such actions can be especially beneficial

for mobile devices and User Interfaces (UIs) over small screen real-estates (i.e. smart-phones and tablet

computers), which need special interfaces (as the one proposed by Neumann and Schmeier (2012)). To

this end, we extend the FDT interaction with user specified preferences. In other words FDT allows con-

structing queries by simple navigation/exploration actions, and this work extends this set of actions in

order to offer preference-compliant exploration.

Works on preference management over structured data (e.g. Kießling (2002); Chomicki (2003); An-

dreka et al. (2002); Kießling and Kostler (2002)) require that the user either has to formulate complex

preference queries, or the application developer has to develop specialized interfaces which internally

construct such queries. Moreover, and more importantly, for formulating an effective preference spec-

ification the user should be already acquainted with the information space and the available choices,

which could be unknown as in the case of web databases (Stefanidis et al. (2011a)). In addition, in this

work we formulated the hypothesis that without knowing the available choices, the declarative expres-

sion of preferences is a tiresome and time-consuming process and proved its validity through a user

study.

2The ES initiative aims to provide users with better tools for advanced information seeking tasks such as learning, investi-gation and analysis according to Marchionini (2006).

Page 30: Papadakos PhD 2013

4 Chapter 1. Introduction

The above observations justify the need for flexible and universal (i.e. general purpose) access meth-

ods that offer exploration services and real-time preference elicitation. The requirements for such explora-

tory environments include:

a) generality, they should be capable of capturing a wide range of information spaces and user informa-

tion needs,

b) expressiveness, it should be possible for the user to interactively specify complex preference structures

and

c) usability, the users should be able to use and understand the interaction immediately, and the resulting

interaction should be effective and desired by the users.

As a result, the main research questions that arise from the above are:

• How can we gradually and flexibly specify preferences over information spaces that might be hierar-

chically organized and might support multi-valued attributes and which will be their semantics?

• How can we tackle the algorithmic perspective so that the proposed preference-extended FDT in-

teraction can be applied over large information bases?

• How does the preference-extended FDT interaction affect the user effort and other metrics during

exploratory tasks?

1.3 Contribution

The key points and contributions of this thesis are:

• It introduces an interaction model for preference elicitation during FDT exploration. Most works on

preference management focus only on the order of objects, while this work focuses also on the

other aspects of the FDT interaction scheme (facet/zoom-points), i.e. on the order of “queries”

the user can select for changing his focus.

• At first we introduce a preference framework appropriate for information spaces comprising re-

sources described by attributes whose values can be hierarchically valued and/or multi-valued. We

define the language, its semantics and the required algorithms. The framework supports preference

Page 31: Papadakos PhD 2013

1.4. Produced Publications 5

inheritance in the hierarchies, automatic conflict resolution, as well as preference composition (prioriti-

zation, Pareto and their combination).

• It elaborates on the system and algorithmic perspective of the proposed approach, and introduces

methods that allow applying the approach over large information spaces.

• Subsequently, we present the design and the implementation of the web-based system Hippalus

which realizes the extended interaction model.

• Regarding the benefits for the users, at first we analyze theoretically the user gain in terms of

number and difficulty of choices.

• Then we describe and analyze three user-based evaluations that we have conducted. The first in-

vestigates the degree of effectiveness of preferences (and the effort to express them) when the

user is not aware of the available choices. The results show that only 20% of the users managed to

express effective preferences without knowing the available choices. The second, comparatively

evaluates FDT and other exploratory models. The results show that the majority of users preferred

FDT, were more satisfied by FDT, and they achieved higher rates of task completion with FDT. Fi-

nally, the last one evaluates the preference-enriched FDT as realized by Hippalus. The results

were impressive. Even in a very small dataset, with the preference-enriched FDT all users com-

pleted successfully all tasks in 1/3 of the time and with 1/3 of the actions in comparison to the

plain FDT. Moreover all (100%) of the users (either simple or experts) preferred the preference-

enriched interface.

To the best of our knowledge this is the first work that supports the above.

1.4 Produced Publications

The research activity related to this thesis has so far produced 2 journal, 3 conference, 1 workshop and

1 demo papers along with 2 technical reports, which are briefly described below:

Related to the application of FDT over a Web Search Engine (WSE)

• DEXA’08 Workshops paper Tzitzikas et al. (2008) describes FleXplorer, which is a framework for

Page 32: Papadakos PhD 2013

6 Chapter 1. Introduction

FDT, that can manage millions of objects in real-time and is used by Mitos WSE3.

• The idea of combining the interaction scheme of FDT and on-line clustering algorithms was de-

scribed in the conference paper Papadakos et al. (2009a), presented at ECDL’09 and also in HDMS’09

(Papadakos et al. (2009b)).

• ECDL’09 Doctoral Consortium workshop paper Papadakos (2009) describes the initial vision of this

Dissertation.

• WISE’09 conference paper Kopidaki et al. (2009) describes a snippet-based clustering algorithm

named NM-STC, which is used by the previous work.

• The KAIS’12 journal Papadakos et al. (2012a) proposes the exploitation of both static and dynamic

metadata for the FDT interaction scheme, studies an incremental way of speeding up the explo-

ration process of this approach and provides the results of an experimental user study over Mitos.

Extension of FDT with preferences

• The FI’12 journal Tzitzikas and Papadakos (2013), motivates the need for real-time preference elic-

itation, introduces a language for enriching the interaction scheme of FDT, with preference elicita-

tion and preference-based interaction. Key aspects of the proposed approach include, the support

of hierarchically organized values, the support of set-valued attributes, and the incremental preference

specification mode, with the scope-based method for resolving conflicts. The proposed algorithms, take

advantage of the rapid reduction of the information space through the use of FDT, and are indepen-

dent of the size of the information base.

• A demo paper, showcasing an implementation of the above functionality over an RDF exploratory

system, is described in Papadakos et al. (2012b).

Related to IR indexing and querying

• The technical report Papadakos et al. (2008b), published in CORR’08 describes Mitos, a DBMS-based

WSE that provides the FDT interaction scheme.3 Under development by the Department of Computer Science of the University of Crete and FORTH-ICS

(http://groogle.csd.uoc.gr:8080/mitos/).

Page 33: Papadakos PhD 2013

1.5. Thesis Outline 7

• The DBMS-index of Mitos is discussed in the PCI’08 conference paper Papadakos et al. (2008a),

where different database representations are discussed and compared.

• An extension of the previous work with one additional representation and experimental results is

provided in the technical report Papadakos et al. (2009c), published in CORR’09.

Submitted and under review

• Submitted to the International Journal of Information Technology & Decision Making an article

based on the hypothesis user study described in Section 6.3. The title of the article is “Compar-

ing the Effectiveness of Intentional Preferences versus Preferences over Specific Choices: A User

Study”.

• A paper that describes in detail the Hippalus system and discusses the results of the evaluation

in Section 6.5 has been submitted for review to the 1st International Workshop on Exploratory

Search in Databases and the Web (ExploreDB 2014), with the title “Hippalus: Preference-enriched

Faceted Exploration”.

1.5 Thesis Outline

The rest of this thesis is organized as follows. Chapter 2 provides the required background information

on FDT and preference management, and discusses related work. Chapter 3 defines the syntax and se-

mantics of a preference specification language for multi-dimensional hierarchical information spaces.

Chapter 4 elaborates on the algorithmic perspective of the proposed approach and introduces a number

of optimizations which are crucial for the applicability of the framework. Chapter 5 examines imple-

mentation and application issues of the approach. Chapter 6 discusses user effort, checks the validity

of the hypothesis of this thesis and examines the results of a number of user-based evaluations. Finally,

Chapter 7 concludes the thesis and identifies issues that are worth further work and research.

Page 34: Papadakos PhD 2013

8

Page 35: Papadakos PhD 2013

Chapter 2

Background and Related Work

Contents2.1 Context: Exploration through FDT . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2.2 Preference Management in General . . . . . . . . . . . . . . . . . . . . . . . . . 17

2.2.1 Various Perspectives of Preference Management . . . . . . . . . . . . . . . . . 17

2.3 FDT and Preferences: Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

2.4 The Database World . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

2.5 IR and Preferences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

2.6 FDT and Preferences: Past and Related Works . . . . . . . . . . . . . . . . . . . . 25

2.7 Motivation and Running Example . . . . . . . . . . . . . . . . . . . . . . . . . . 30

In this chapter we discuss the background and the related work regarding FDT and preferences.

Specifically, Section 2.1 reviews and provides notions and notations for FDT. Regarding preferences

and personalization, Section 2.2 reviews preference management in general. In Section 2.3 we motivate

the enrichment of FDT with preference actions. Section 2.4 and Section 2.5 discusses preferences under

the prism of the database and IR world respectively. Finally, Section 2.6 discusses related work in the

context of decision making and preferences over FDT, while Section 2.7 provides the motivating example

of this thesis.

9

Page 36: Papadakos PhD 2013

10 Chapter 2. Background and Related Work

2.1 Context: Exploration through FDT

Most Database (DB) and IR applications, as well as most Web Search Engines (WSEs), are very effective

for focalized search, i.e. they make the assumption that users can accurately describe their information

need using a query which is usually a small sequence of words. However, as several user studies have

shown, a high percentage of search tasks are exploratory (Crawford (2006),Rose and Levinson (2004)), the

user does not know accurately his information need (e.g. in WSE users provide in average 2.4 words as

described in Inan (2006)) and he can not be satisfied by a single ‘hit’. The information needs emerge as

users iteratively seek, learn and reflect on the gathered results during the session (Byström and Järvelin

(1995); Chowdhury et al. (2011)). As a result focalized search very commonly leads to inadequate inter-

actions and poor results. The available UIs in most cases do not aid the user in query formulation, and

do not provide any exploration services. The returned answers are simple ranked lists of results, with

no organization. For this reason users typically reformulate their initial query, inspect the top elements

of the returned answer, and so on.

One approach to this problem is results clustering (Hearst and Pedersen (1996); Zamir and Etzioni

(1998); Kopidaki et al. (2009)) which provides an overview of the search results. A survey of web clus-

tering engines is provided in Carpineto et al. (2009). Results clustering aims at grouping the results

into topics, called clusters, with predictive names (labels), aiding the user to locate quickly documents

that otherwise he wouldn’t practically find especially if these documents are low ranked (and thus not

in first result pages). The snippets, the cluster labels and their structure is one instance of what we

call dynamically-mined metadata. We use this term to refer to metadata which are query-dependent, i.e.

they cannot be extracted/mined a-priori. The problem with clustering is that such metadata are usu-

ally mined only from the top-K part of a query answer because it would be unacceptably expensive

(for real-time interaction) to apply these tasks on large quantities of data. In addition, the lack of pre-

dictability, the fact that a number of algorithms create clusters with common results, the difficulty of

labeling the groups (at least for non snippet-based algorithms) and the counterintuitiveness of cluster

sub-hierarchies, make the explicit use of clustering difficult for the users (Hearst (2006)).

Other approaches to exploratory search (Meij et al. (2009); Shokouhi and Radinsky (2012); Fafalios

et al. (2012b); Shokouhi (2013)) include completions, either of a single term the user is typing in or the en-

tire query, and auto suggestions representing the full user query intent. Such completions also include

Page 37: Papadakos PhD 2013

2.1. Context: Exploration through FDT 11

result completions (i.e. the user is presented a number of results according to the keywords he has typed).

Such approaches have been used for a while by mainstream search engines like Google1, Freebase2, com-

mercial sites like eBay or Evi3 which is an answer engine.

On the other hand, modern environments should guide users in exploring the information space

and in expressing their information needs in a progressive manner. Systems supporting FDT offer a sim-

ple, efficient and effective way for explorative tasks and are discussed in Sacco and Tzitzikas (2009).

Dynamic taxonomies (faceted or not) is an interaction framework based on a multi-dimensional classifica-

tion of (may heterogeneous) data objects allowing users to browse and explore the information space in

a guided, yet unconstrained way through a simple visual interface. Features of this framework include:

(a) display of current results in multiple categorization schemes (called facets - or just attributes), (b)

display of categories (i.e. attribute values) leading to non-empty results only, (c) display of the count

information of the indexed objects of each category (i.e. the number of results the user will get by se-

lecting that category), and (d) the user can refine his focus gradually, i.e. it is a session-based interaction

paradigm in contrast to the query-and-response dialog of current WSE which is stateless.

Figure 2.1: Dynamic Taxonomies

An example of the idea of dynamic taxonomies assuming only one facet is shown in Figure 2.1. Figure

2.1(a) shows a taxonomy comprising 10 terms (European, Italian, Spanish, German, Fiat, Lancia, Seat4,

VW, BMW, Audi) and 8 indexed objects (1-8). Figure 2.1(b) shows the dynamic taxonomy if we restrict

1http://www.google.com2http://www.freebase.com/3http://www.evi.com/4Since Seat was bought by VW we assume that cars build by Seat are both Spanish and German.

Page 38: Papadakos PhD 2013

12 Chapter 2. Background and Related Work

our focus to the objects {4,5,6}. Notice that it comprises only 6 terms, those that lead to objects in {4,5,6}.

Figure 2.1(c) sketches user interaction, based on the restriction shown in Figure 2.1(b) (e.g. at the left

side bar). Notice the count number next to each term.

User Interaction. The user explores or navigates the information space by setting and changing

his focus. The notion of focus can be intensional or extensional. Specifically, any conjunction of terms (or

any boolean expression of terms in general) is a possible focus. In this case we can say that the focus is

defined intensionally. For example, the initial focus can be the empty, or the top term of a facet. However,

the user can also start from an arbitrary set of objects, and this is the common case in the context of a

WSE. In that case we can say that the focus is defined extensionally. Specifically, if A is the result of a

free text query q (or if A is a set of tuples returned by an SQL query q), then the interaction is based on

the restriction of the faceted taxonomy on A (Figure 2.1(b) shows the restriction of a taxonomy on the

objects {4,5,6}). At any point during the interaction, we compute and provide to the user the immediate

zoom-in/out/side points along with count information (as shown in Figure 2.1(c)). When the user selects

one of these points then the selected term is added to the focus (corresponding to a more refined query),

and so on.

Notions and Notations. Table 2.1 defines formally and introduces notations for terms, terminologies,

taxonomies, faceted taxonomies, interpretations, descriptions and materialized faceted taxonomies, that will be

used in the sequel. The upper part of the table describes taxonomies. The middle part of the table de-

scribes materialized faceted taxonomies, which is actually the kind of information sources that we consider.

In brief, Obj is a set of objects (e.g. the set of all documents indexed by a WSE), each described with

respect to one or more aspects (facets), where the description of an object with respect to one facet

consists of assigning to the object one or more terms from the taxonomy that corresponds to that facet.

I is the interpretation function, while I takes into account the semantics. For example, and assuming

the example of Figure 2.1(a), we have I(Lancia) = {2, 3}, I(Italian) = ∅, while I(Lancia) = {2, 3}

and I(Italian) = {1, 2, 3}. The lower part of the table describes the FDT-interaction. For example,

Figure 2.1(b) depicts the restriction over the set A = {4, 5, 6}, and the reduced terminology TA is the set of

shown terms.

Scalability Regarding scalability we should mention that FDT can provide real-time exploration ser-

vices for millions of objects using techniques like those proposed in Yee et al. (2003); Sacco (2006a); Ben-

Yitzhak et al. (2008). Thorough experimental results over FleXplorer are given in Tzitzikas et al. (2008).

Page 39: Papadakos PhD 2013

2.1. Context: Exploration through FDT 13

TAXONOMY

Name Notation Definition

terminology T a set of terms (can capture categorical/numeric values)

subsumption ≤ a partial order (reflexive, transitive and antisymmetric)

taxonomy (T,≤) T is a terminology, ≤ a subsumption relation over T

broaders of t B+(t) { t′ | t < t′}

narrowers of t N+(t) { t′ | t′ < t}

direct broaders of t B(t) minimal<(B+(t))

direct narrowers of t N(t) maximal<(N+(t))

top element ⊤i ⊤i = maximal≤(Ti)

MATERIALIZED FACETED TAXONOMIES

faceted taxonomy F = {F1, ..., Fk} Fi = (T i,≤i), for i = 1, ..., k and all T i are disjoint

object domain Obj any denumerable set of objects

interpretation of T I any function I : T → P(Obj)

materialized faceted taxonomy (F , I) F is a faceted taxonomy {F1, ..., Fk} and I is an inter-pretation of T =

∪i=1,k T i

ordering over interpretations I ⊑ I ′ I(t) ⊆ I ′(t) for each t ∈ T

model of (T ,≤) induced by I I I(t) = ∪{I(t′) | t′ ≤ t}

description of o wrt I DI(o) DI(o) = { t ∈ T | o ∈ I(t)}

description of o wrt I DI(o) ≡ DI(o) { t ∈ T | o ∈ I(t)} = ∪t∈DI (o)({t} ∪B+(t))

FDT-INTERACTION: BASIC NOTIONS AND NOTATIONS

intentionally specified focus ctx any subset of T such that ctx = minimal(ctx)

projection on Fi ctxi ctx ∩ Ti

Kinds of zoom points w.r.t. a facet i while being at ctx

zoom points AZi(ctx) { t ∈ Ti | I(ctx) ∩ I(t) = ∅}

zoom-in points Z+i (ctx) AZi(ctx) ∩N+(ctxi)

immediate zoom-in points Zi(ctx) maximal(Z+i (ctx)) = AZi(ctx) ∩N(ctxi)

zoom-side points ZR+i (ctx) AZi(ctx) \ {ctxi ∪N+(ctxi) ∪B+(ctxi)}

immediate zoom-side points ZRi(ctx) maximal(ZR+(ctx))

Restriction over an object setA ⊆ Obj (i.e. extensional focus)

reduced interpretation IA IA(t) = I(t) ∩A

reduced terminology TA { t ∈ T | IA(t) = ∅} ={ t ∈ T | I(t) ∩A = ∅} = ∪o∈AB

+(DI(o))

Table 2.1: Basic Notions and Notations

Page 40: Papadakos PhD 2013

14 Chapter 2. Background and Related Work

Figure 2.2: Finding a Hotel in the Island of Symi (FDT over booking.com)

As expected, the computation of zoom-in points with count information is more expensive than without:

in 1 sec we can compute the zoom-in points of around 240.000 results (i.e. |A| = 240.000) with count

information, while without count information we can compute the zoom-in points of around 540.000

results.

Applications. Examples of applications of faceted metadata-search include: e-commerce (e.g. eBay

shown in Figure 2.3 or Amazon5), library and bibliographic portals (e.g. DBLP, ACM Digital Library),

booking applications (e.g. booking.com6 as shown in Figure 2.2), museum portals (e.g. Hyvönen et al.

5http://www.amazon.com6http://www.booking.com

Page 41: Papadakos PhD 2013

2.1. Context: Exploration through FDT 15

Figure 2.3: Checking Olympus Cameras (FDT over eBay)

(2005) and Europeana7), mobile phone browsers (e.g. Karlson et al. (2006)), specialized search engines

and portals (e.g. Mäkelä et al. (2005); Yee et al. (2003)), Semantic Web (e.g. Hildebrand et al. (2006);

Mäkelä et al. (2006)), general purpose WSE (e.g. Mitos Papadakos et al. (2009a)) and collaborative envi-

roments (e.g. mSpace Schraefel et al. (2003)). Moreover, and as shown in Papadakos et al. (2012a) this

interaction scheme can act complementarily to the query-and-response dialogue of the current WSE,

along with available dynamic metadata mined through clustering techniques (Kopidaki et al. (2009)) or

entity mining (Fafalios et al. (2012a, 2013); Kitsos et al. (2013); Fafalios and Tzitzikas (2013)).

As an application example, Figure 2.4 shows a screenshot of a WSE that supports FDT exploration.

7http://www.europeana.eu

Page 42: Papadakos PhD 2013

16 Chapter 2. Background and Related Work

Figure 2.4: FTD-based GUI of the Mitos WSE

Specifically, it shows the screen after the user submitted the query java. In that screenshot, 4 different

facets are shown, each corresponding to one metadata attribute (at the left bar). The values (zoom-points

or terms) of two of these facets (By date and By clustering) are hierarchically organized, while the values

of the rest two facets (By filetype and By domain) are flat (no hierarchical organization). The results of the

current focus appear at the right frame. For more on this application the reader can refer to Papadakos

et al. (2012a), while the real-time snippet-based results clustering algorithm employed is described in

Kopidaki et al. (2009).

Page 43: Papadakos PhD 2013

2.2. Preference Management in General 17

2.2 Preference Management in General

Preferences are a central part of our every day lives and lead human decision making. Commonly, pref-

erences are not hard constraints but wishes, simple or complicated ones (covering one or more aspects),

which might or might not be satisfied. Such wishes might be independent, or might affect each other

even in conflicting ways (Stefanidis et al. (2011a)).

Preferences have been studied in a number of fields since they are a multi-disciplinary topic. Such

fields include Philosophy (Hansson (2001)), social sciences like Psychology (Scherer (2005)) and Eco-

nomics (Fishburn (1999)) and Decision Making (Lichtenstein and Slovic (2006)). Furthermore, they have

been thoroughly studied in a number of Computer Science areas. Specifically, they have been studied in

the fields of Artificial Intelligence (AI) (Wellman and Doyle (1991)), Human Computer Interaction (HCI)

(Linden et al. (1997)), and especially in Information Systems (ISs) like in databases (Kießling (2002);

Chomicki (2003)), XML (Kießling et al. (2001)), and OLAP (Golfarelli et al. (2011)).

A survey on representation, composition and application of preferences in DBs is given at Stefanidis

et al. (2011a), while a survey of major questions and approaches for preference handling in applications

such as recommender systems, personal assistant agents and personalized user interfaces is given at

Peintner et al. (2008). Pu and Chen (2008) propose guidelines and report examples for product search and

recommender systems. In Figure 2.5 we show some distinctions of preference management approaches

from various perspectives, which are discussed below.

2.2.1 Various Perspectives of Preference Management

We can identify the following perspectives of preference management8:

Subject of Personalization. In general, a user can express preferences regarding the informational

contents of an application, the visualization of the contents, the services that the user has access to at

any time, or the interaction between the user and the application.

Preference Formulation. Preferences can be defined either using a qualitative approach like in Kießling

(2002); Chomicki (2003) and Georgiadis et al. (2008) or a quantitative approach as in Agrawal and Wimmers

(2000); Balke and Güntzer (2004) and Koutrika and Ioannidis (2005). According to the qualitative approach,

preferences are described directly, using a preference relation ≻Pref (i.e. x ≻Pref y). Preference rela-8 This categorization is based on Stefanidis et al. (2011a) survey.

Page 44: Papadakos PhD 2013

18 Chapter 2. Background and Related Work

tions may be specified using logical formulas (Chomicki (2003)), or by using special preference constructors

(Kießling (2002)). In the quantitative approach, preferences are described indirectly by defining scoring

functions (i.e. Score(x) > Score(y)). Scores may be assigned through preference functions (Agrawal and

Wimmers (2000)) or through degrees of interest under specific satisfied conditions (Koutrika and Ioanni-

dis (2004)). The qualitative approach is more powerful and expressive than the quantitative approach,

since not every preference can be modeled using scoring functions, according to Chomicki (2003) and

Fishburn (1970). There are also approaches that support a mixture of qualitative and quantitative prefer-

ences (Rossi et al. (2008)). This can be done by putting together a CP-net (Conditional Preference Network)9

and a set of constraints.

Certainty of Preference. The above approaches can be further specialized depending on whether the ex-

pressed preferences are crisp or fuzzy (uncertain). Uncertainty expresses the level of confidence whether

a particular preference holds and can be modeled by using fuzzy set theory. Barrett and Salles (2006) re-

views the literature on fuzzy preferences.

Sources of Preference. Preferences can be specified explicitly by the users (either through a query lan-

guage that supports preferences (Kießling and Kostler (2002); Levandoski et al. (2010)), or through the

mediation of an application that produces such queries (Kießling et al. (2011b)), or implicitly, by tracking

silently user actions and monitoring user behaviour. The latter category includes works like Gadanho

and Lhuillier (2007), Kelly and Teevan (2003) and Pound et al. (2011). In addition, preferences can be in-

ferred based on the assumption that similar people like similar things. Such works include collaborative

filtering systems (Schafer et al. (2001) and Rashid et al. (2002)). Machine learning has also been applied

for learning preference value functions. For example desJardins et al. (2006) and Wagstaff et al. (2010)

present methods for learning preferences over sets of items, by taking as input a collection of positive

examples.

Subject Information Space. Another criterion is the structure of the underlying information space

(unstructured information (i.e. text), relational spaces, multi-dimensional spaces with hierarchically

organized attribute domains, support of multi-valued attributes, etc).

Context. Preferences can hold unconditionally and in this case are called context-free. On the other hand,

contextual or conditional preferences hold when specific conditions are met. Furthermore, contextual pref-

9 A CP-net is a directed graph G over attributes V , whose nodes are annotated with conditional preference tables for eachattribute (Boutilier et al. (2004)), and uses conditional ceteris paribus (all else being equal) semantics.

Page 45: Papadakos PhD 2013

2.2. Preference Management in General 19

Figure 2.5: Distinctions of Preference Management Approaches

erences can be divided to internal, when the context can be defined from information available to the data

over which preferences are expressed on, and external when not. Computing context (i.e. network connec-

tivity), user context (i.e. profile), physical context (i.e. temperature) and time are common types of external

contexts (Chen and Kotz (2000)). A model for the propagation of user preferences through contexts is

described in Ciaccia and Torlone (2011) while a model for expressing contextual is described in Stefanidis

et al. (2011b).

Page 46: Papadakos PhD 2013

20 Chapter 2. Background and Related Work

Elasticity. Preferences can be exact or elastic. Exact preferences can either be satisfied or not, while elastic

should be satisfied as closely as possible. For example, Kießling (2002) captures elastic preferences using

the AROUND preference construct and distance functions.

Complexity. Complexity describes the degree of detail and how specific a preference is. Generic or sim-

ple preferences express preferences over a single attribute of the entities of interest while a compound

preference combines a number of simple preferences.

Completeness. The description of user preferences usually is incomplete, since it is inconvenient for

users to express preferences over all pairs of objects in the domain of interest (Stefanidis et al. (2011a)).

In such cases, the lack of preference relations over some objects can be interpreted either as an equal

preference (i.e. they are equivalent), as an incomparability (i.e. these objects can not be compared), or

finally we assume that there is a gap in our preference knowledge, which can be avoided by defining a

preorder extending the given partial order (Ross (2007)).

Semantics. Preferences can use two different semantics: ceteris paribus semantics and totalitarian se-

mantics. Ceteris paribus is latin and means “all else being equal”. For example the preference “I prefer

a square table over a round table”, when any other attributes like size, wood, etc. are the same. On the

other hand totalitarian semantics mean that if “I prefer an object o1 over o2” for a specific attribute, it

means that I do not prefer o2 over o1 for any other attribute (Pareto semantics).

Stability. Furthermore, its difficult to assume that user preferences are stable, so frameworks that cap-

ture preferences should not assume that they are fixed. Users change their preferences even while in-

specting available choices. For example Doyle (2004) shows how easily preferences change over time.

Chomicki (2007) proposes an incremental preference revision framework, where the revised preference

relation is produced by composing the original preference relations with another preference relation, by

using preference composition methods like union, prioritized and Pareto composition. Elaborating even

more, in this thesis we show that the expression of user preferences is time-consuming and results to

incomplete preferences, when the user does not have the ability to view and explore the existing choices.

We have named this hypothesis the Difficulty of Formulating Effective Preferences without Knowing the

Options (DiFEPreKO) hypothesis which we evaluate in Section 6.3, through a user study.

Granularity. Preferences can be expressed at different levels of granularity. For example in databases,

preferences can be expressed over individual tuples, sets of tuples (i.e. where preferences do not depend

only on individual tuple values but also on properties of groups of tuples like in Brafman et al. (2006);

Page 47: Papadakos PhD 2013

2.3. FDT and Preferences: Motivation 21

Binshtok et al. (2007) and Zhang and Chomicki (2011)), attributes (Georgiadis et al. (2008)), relationships

(Koutrika and Ioannidis (2004)) (i.e. preferences expressed over relationships between two type of enti-

ties), relations (i.e. preferences expressed on class of entities) and finally on facts (i.e. preferences on the

space of hierarchy attributes) (Golfarelli et al. (2011)). In the FDT world, preferences can be expressed

over different objects, zoom-points and facets. Most of the available works focus only to objects. Recently,

there are works that also affect facets, like in Dash et al. (2008); Wagner et al. (2011) and Pound et al.

(2011), which will be presented later in Section 2.6.

2.3 FDT and Preferences: Motivation

One main thesis of this work is that effective preference specification presupposes knowledge of the in-

formation space and of the available choices. FDT-based interaction can aid users in getting acquainted

with the information space and the available choices. Therefore FDT can aid preference elicitation even

if instead of the preference actions proposed in this proposal, the other well known approaches (e.g.

Preference SQL described in Kießling and Kostler (2002) and Kießling et al. (2011a) or FlexPref described in

Levandoski et al. (2010)) are employed for expressing user preferences and/or deriving the correspond-

ing object ranking. The computation and display of zoom points reduces the need for specifying complex

preference profiles and users can explore the available choices (or the most preferred) non linearly. For

instance, by clicking on the zoom points the user can inspect the available choices based on the desired

values. Without effective exploration services the user is obliged to explore linearly blocks of objects

and the derivation of small blocks (equivalently many blocks) requires rich preference specifications

which are cumbersome to acquire. Let’s consider a set of attributes and suppose the user selects one

zoom point of the first attribute. The FDT approach will show those values of the rest attributes that are

“active”. Such browsing can aid users in identifying for each attribute those values for which it is worth

specifying a complex value tradeoff (e.g. by using a quantitative approach).

On a multi-dimensional space where user preferences for each dimension have been specified, the

efficient set (else called skyline, or Pareto optimal set) is indeed very useful if the user is interested in one

hit (e.g. one car to buy, one hotel to book). In the FDT approach and with the actions that we propose

(specifically with term-scoped actions), the most preferred values from each dimension are shown as

zoom points and at decreasing order of preference. We know that all objects that have these values (i.e.

Page 48: Papadakos PhD 2013

22 Chapter 2. Background and Related Work

those objects that have at least one of the most preferred values of an attribute), are certainly part of

the skyline. So the preference-extended FDT interaction inherently provides partially skyline support.

However, to compute the entire skyline we need to apply one skyline algorithm (e.g. Kossmann et al.

(2002); Papadias et al. (2005)), so skyline computation can be considered as a helpful complementary

service. The computed skyline can then be explored using the FDT method.

2.4 The Database World

For applying user preferences over relational data, many different methods have been proposed in the

literature. The most used ones are skylines (i.e. return objects in a database that are not dominated

by any other object in the data) (Börzsönyi et al. (2001); Kossmann et al. (2002); Chomicki et al. (2003);

Papadias et al. (2005)) and top-K (i.e. score each object using a monotonic ranking function and return

the top-K (Chaudhuri and Gravano (1999); Chang and Hwang (2002); Ilyas et al. (2004a,b)). Other methods

include k-dominance (i.e. consider only k dimensional subspaces for dominance)(Chan et al. (2006a)),

k-frequency (i.e. rank each object based on how often they are returned in the skyline when different

number of dimensions are considered) (Chan et al. (2006b)), top-k dominance (i.e. rank objects based on

how many other objects it dominates and returns the k objects with the highest score (Yiu and Mamoulis

(2007)), k-representative dominance (i.e. selecting k skyline points so that the number of points, which are

dominated by at least one of these k skyline points is maximized) (Lin et al. (2007)), hybrid multi-objective

methods (computing sets of objects that are non-dominated with respect to a set of monotonic objective

functions (Balke and Güntzer (2004)), ranked skylines (i.e. adapt to user-specific information needs and

identify the skyline results of user-specified retrieval size k) (Lee et al. (2009)), distance-based dominance

(i.e. a new definition of representative skyline that minimizes the distance between a non representative

skyline point and its nearest representative) (Tao et al. (2009)), and lastly ϵ skylines (i.e. the number of

skylines can be increased or decreased, provide a built-in rank for all objects and integrate weights to

different dimensions) (Xia et al. (2008)). Finally, user satisfaction can further be improved by increasing

the diversity of the results, like in desJardins and Wagstaff (2005) and Vee et al. (2009).

Page 49: Papadakos PhD 2013

2.5. IR and Preferences 23

2.5 IR and Preferences

Preference management and personalization in IR has been approached from various perspectives. The

initial step for personalizing IR systems was query reformulation through explicit relevance feedback

(Rochio (1971); Choi et al. (2001); Bot and Wu (2004)), or pseudo-relevance feedback (Kelly and Belkin

(2001); Kelly and Teevan (2003)), which is implicit feedback inferred from user behavior (i.e. selection

of a document, time the document is open, etc). The approaches for personalization and information

filtering can be roughly classified into two categories: content-based filtering and collaborative filtering.

In the first approach, the documents are monitored and the system pushes to the user the best match-

ing ones to his user profile. The user can provide explicit relevance feedback, updating his profile using

different retrieval models, like Boolean, VSM, probabilistic models (Robertson and Jones (1976); Yu et al.

(2004); Zigoris and Zhang (2006); Zhang and Koren (2007)), retrieval models that rank objects based on

user-defined reference points (Korfhage (1997)), inference networks (Callan (1996)), language models

(Croft and Lafferty (2003)), user feedback to improve preference learning (Cohen et al. (1999)) and ma-

chine learning algorithms for learning ranking functions(Lewis (2001); Yang et al. (2005); Shawe-Taylor

et al. (2002); Burges et al. (2005); Zhai and Lafferty (2006); Zha et al. (2006); Liu (2011)). In the latter

approach, the system takes advantage of other similar user profiles and preferences, except from doc-

uments content. Memory-based (utilize the entire user-item database to generate a prediction) and

model-based (provide item recommendation by first developing a model of user ratings) approaches

have been proposed (Breese et al. (1998); Delgado and Ishii (1999); Herlocker et al. (1999); Hofmann and

Puzicha (1999); Jin et al. (2004); Konstan et al. (1997); White et al. (2010)). Other approaches (Basu et al.

(1998); Melville et al. (2001); Wang et al. (2006); Pitkow et al. (2002)) try to combine both techniques, to

provide an effective recommendation system.

A very recent and interesting approach is described in Ruotsalo et al. (2013a). This work presents the

design and study of interactive user modeling, where the user model’s features are keywords, and aims to

support exploratory tasks. Specifically this work allows the users to perceive the state of a user model

at all times and provide feedback that directly rewards and penalizes. In addition the users can continu-

ously tune the system’s belief about their information needs. Feedback is provided by drag-&-dropping

keywords from available documents into the exploratory view. Keywords near the center of the explora-

tory view are more important than keywords near the edges. Figure 2.6 shows an snapshot of the SciNet,

Page 50: Papadakos PhD 2013

24 Chapter 2. Background and Related Work

Figure 2.6: SciNet Prototype

which is a prototype implementing the above functionality. The results show that interactive user mod-

eling can help users to more effectively find relevant, novel and diverse results without compromise in

task execution time. The same authors in Ruotsalo et al. (2013b) introduce an interactive intent mod-

eling, where the user directs exploratory search by providing feedback for estimates of search intents.

Estimates are visualized in an Intent Radar, where relevant intents are are close and similar intents have

similar angles.

Such approaches, except Ruotsalo et al. (2013a) which also affects keywords, affect only object rank-

ing and do not exploit available metadata (which could be mined statically or dynamically as proposed

in Papadakos et al. (2012a); Kitsos et al. (2013)), With respect to our proposal they can be considered

as complementary techniques that are based solely on the textual content of the objects. In addition,

the proposed model can incorporate IR-like rankings by exploiting a Relevance facet, which corresponds

to the score returned by the WSE. Furthermore, they do not engage users (again except Ruotsalo et al.

(2013a)) to use available personalization techniques in the search process.

Page 51: Papadakos PhD 2013

2.6. FDT and Preferences: Past and Related Works 25

2.6 FDT and Preferences: Past and Related Works

Supporting personalization in FDT is not well studied. Most FDT systems, like Flamenco (Hearst et al.

(2002)), output facets and zoom-points in lexicographical order. An alternative is to order facets and

zoom-points based on the number of indexed documents as in Oren et al. (2006). Some other systems

like eBay Express10 (merged now to the main eBay portal), only present a manually chosen subset of

facets to the users, and the zoom-points are again ranked based on the number of indexed documents.

Manually selecting and maintaining a number of preferred facets can be time consuming, especially for

systems that support a great number of facets and zoom-points. In addition in systems like eBay or

Amazon, users are able to order the available objects according to simple object ordering operations

over one specific attribute (e.g. order objects according to Price, or Price + Shipping, or Duration

of auction in ascending or descending order).

Set-Cover Ranking

One of the first approaches for facet ranking, is described in Dakka et al. (2005). Specifically, this

work aims at providing automatic and scalable methods for the creation of multifaceted interfaces. In

addition, it provides methods for selecting the best portions of the generated hierarchies (considering

the limitations of screen space). Specifically, they introduce two approaches for facet ranking. The

first tries to maximize the number of indexed objects that are accessible from the top-k facets (set-cover

ranking). The second, named merit based, takes into consideration the structural properties of the sub-

hierarchies under the selected facets (i.e. the structure of zoom-points). Specifically, the merit-based

method ranks higher facets that enable users to access their contents with the smallest cost on average.

Interestingness Ranking

Another approach described in Dash et al. (2008), tries to select the list of facets that will be dis-

played to the user following a query, a problem called facet selection problem. In this method the notion

of interestingness is incorporated into the ranking. Each facet is measured based on how surprising it

is, by aggregating the “interestingness” of its values given a certain expectation. They define three dif-

ferent ways for setting the expectations. The first is the natural one, where the users assume a natural

distribution in the data-set (i.e. documents uniformly distributed along each facet, or that facets are

independent). The second is navigational one, where they assume that the user is already familiar with

10http://www.ebay.com

Page 52: Papadakos PhD 2013

26 Chapter 2. Background and Related Work

the repository and the expectation is set based on how the user navigates the results. Finally, there is

an ad-hoc way, where the user sets the expectation based on an arbitrary query. However, in this ap-

proach, users cannot explictly define their preferences over facets and zoom-points and cannot affect

the ordering of the objects.

Collaborative Approaches

A collaborative filtering method with explicit user ratings to design a personalized FDT system is pro-

posed in Koren et al. (2008), where several algorithms are proposed and evaluated. They propose a gen-

eral probabilistic framework to build faceted document models and user relevance models. Users express

a preference for retrieved documents and facet-values pairs are ranked according to their probability

of being included in a document relevant to the user. Their objective is to minimize user cost, which

is defined as the time needed for reaching an item of interest. The time is an aggregation of the times

for reading facet headings, browsing facet hierarchies and correcting browsing mistakes. Moreover, the

authors provide an evaluation methodology for personalized faceted search research, in order to com-

plement user studies by being cheap, repeatable, and controllable. In contrast to our work, this work

does not allow the user to express any facet and zoom-point preferences. Furthermore, it assumes that

each user is searching for exactly one document, and that the user has perfect knowledge of the target

document. This can be the case only for focalized search, but not for exploratory search, which is our

point of interest.

A number of collaborative approaches for the personalization of faceted search and visual graph nav-

igation in Semantic Web data, by content filtering based on (manually or automatically) created ontolo-

gies are proposed in Tvarožek (2006); Tvarožek and Bieliková (2007a,c,d,b); Tvarožek et al. (2008). These

approaches take advantage of metadata stored in an ontology to create at runtime new facet descrip-

tions. The set of available facets and restrictions adapt to the in-session user behavior and on long term

user and other users characteristics stored in the user model. According to these approaches, relevance

to users is measured by calculating the distance between values in the hierarchical ontology. In addi-

tion they annotate search results to improve user orientation and guidance. Again, it is a collaborative

approach and there is no support for explicit preferences.

Minimum Effort and Cost Approaches

Minimum-effort driven navigational techniques for enterprise databases and warehouses are de-

scribed in Roy et al. (2008). At each step of the navigation, the system asks the user one or more questions

Page 53: Papadakos PhD 2013

2.6. FDT and Preferences: Past and Related Works 27

regarding different facets. Then according to the user response, it dynamically fetches the next most

promising set of facets. For example in a cars database, a very simple faceted search interface is one

where the user is prompted an attribute (e.g. Manufacturer), to which he responds with a desired value

(e.g. Honda), after which the next appropriate attribute (e.g. Model) is suggested to which he responds

with a desired value (e.g. Accord). The proposed approach is based on minimal cost decision trees, which

is anNP-Completeproblem. As a result, they use a simple approximation algorithm. This algorithm selects

facets based on their ability to rapidly drill down to the most promising tuples as well as the user abil-

ity to provide desired values for them. In addition, in Roy and Das (2009) the same authors investigate

opportunities to improve the performance of minimum effort driven faceted search techniques. The

main idea is motivated by the early stopping techniques used in the TA-family of algorithms for top-K

computations. In comparison to the proposed approaches in this thesis, this work does not allow users

to express preferences, and it only concerns which facets will be displayed.

In the same manner but for zoom-points, Kashyap et al. (2010) propose a cost-based system for

faceted navigation, named FACeTOR. The user is presented with a subset of all possible facet conditions

(zoom-points), which are selected based on a probabilistic cost model of user navigation. This approach

guarantees that the overall navigation cost is minimized and every result is guaranteed to be reachable

by a facet condition. Since the selection of the optimal facet conditions is NP-Hard, they present two intu-

itive heuristics. The first, is inspired by an approximation algorithm for the weighted set cover problem

and attempts to find a relatively small set of suggestions that have a high probability of being recognized

by users. The second heuristic, greedily selects each facet condition assuming that all future suggestions

have identical properties. This automatic approach only concerns zoom-points and does not allow users

to express preferences.

Semantically Enriching Tweets

Abel et al. (2011) present an adaptive and personalized faceted search engine for Twitter. They pro-

pose strategies for inferring facets and facet-values (entities and topics) from tweets and related external

Web resources, by semantically enriching tweets. Given the semantically enriched tweets, they propose

user and context modeling strategies that identify (current) interests of a given Twitter user and allow

for contextualizing the demands of this user. As a result they propose faceted search strategies for con-

tent exploration on Twitter and methods that adapt to the interests and context of a user, by ranking

the facets and facet-values. Finally, they present an evaluation environment based on simulated users

Page 54: Papadakos PhD 2013

28 Chapter 2. Background and Related Work

to evaluate different strategies in this adaptive faceted search engine on Twitter. All the above func-

tionality is offered automatically, and as a result the user can not explicitly express his preferences over

facets, values and objects or define his context.

Log Based Utility

Pound et al. (2011) model the user faceted-search behavior using the intersection of web query-logs

with existing structured data, in order to capture facet and facet-value utility for a specific query. They

present an automated scalable solution that elicit user preferences on attributes and values. They pro-

pose different disambiguation techniques ranging from simple keyword matching to more sophisticated

probabilistic models (based on clustering, logs or clicks) for mapping keywords to different possible

attributes. Furthermore, they present a variety of techniques that deal with disambiguating amongst

different overlapping attribute-value pairs per query (table or context dependent value selection). In

addition they discuss how to use signals from the data, like entropy and sparseness to discover which

attributes make better facets. As a result, facets and values are ordered according to available log infor-

mation and users are not allowed to explicitly express their preferences for their specific information

need.

Intuition Based Ranking

All of the above approaches, assume a precise information need. That is, relevance, interestingness,

and user costs (for fulfilling an information need) have been employed for measuring facet importance.

On the other hand, Wagner et al. (2011) provide a browsing-oriented approach (i.e. the user has a fuzzy

information need and slowly explores an unknown collection of items) for facet ranking. They use an

aggregation function over different intuitions and metrics for facet ranking. In particular they prefer

facets that allow users to modify the result set via small and uniform facet operations. In addition they

group facets and their values by using a divisive hierarchical clustering technique algorithm leading

to an Extended Facet Tree. Finally, they provide a task-based evaluation of their system regarding effec-

tiveness and efficiency. Compared to our approach, this approach tries to rank facets and facet values

according to the characteristics of the facets and facet values space, but does provide explicit user pref-

erences or ranking of objects according to preference. On the other hand, this is the first method that

targets exploratory search and shows that ranking of facet and facet values can be effective for explora-

tory information needs.

Preference Search

Page 55: Papadakos PhD 2013

2.7. Motivation and Running Example 29

Finally, Kießling et al. (2011b) propose the substitution of Faceted Search, which they consider as a

tedious and time consuming trial and error process, with Preference Search. Preference Search replaces

lengthy user sessions by one single user request, where the user completes a search mask. The user

input is then automatically compiled into one single Preference SQL query. This query is afterwards

augmented in a context-sensitive and user adaptive fashion by a recommender component using sen-

sors and friends recommendations from a social network. It then presents to the user the BMO objects.

Excluding the recommender system, the above functionality can be easily implemented using our pro-

posed method, by letting the user expressing his preferences over the related facets. Then the system

could return to him the top objects for each facet that the user has defined a preference (i.e. Pareto opti-

mal set). Furthermore, our method is more expressive, since it allows preferences over attributes with

hierarchically organized values which are possibly set-valued. The support of hierarchies can make the

expression of preferences less time consuming, more intuitive and with less cognitive load. One further

note is that Kießling et al. (2011b) assume that Faceted Search can return empty results, which is not true

in our case, since only categories that lead to non empty results are displayed. Specifically, our hypoth-

esis is that FDT can aid exploratory search by letting the user progressively expressing his information

needs. In addition, since preferences are incomplete and most importantly they change over time, the

proposed Preference Search method, with its single user request can be successful for focalized search,

and not for explorative environments.

2.7 Motivation and Running Example

Let us first motivate the benefits of FDT for decision making over our running example. Consider an in-

ternational dealer of used cars and suppose that the available cars are stored in a relational table of the

form: Car(id, Manufacturer, Model, Category, Price, Color, Power, Volume, Year, Mileage, Fuel,

Location, Comment, Accessories). An instance of the table is shown below:

Id Manufacturer Model Category Price Color Power Volume Year Mileage Fuel Location Comment Accessorieso1 Porsche Carrera 911 Cabrio 50000 Black 350 3600 2005 54000 Petrol Cefalonia Uncrashed {ABS,AT}o2 Alfa Romeo 164 Sedan 15000 Red 180 3000 1995 76000 Petrol Heraklion Crashed {ABS}... ... ... ... ... ... ... ... ... ... ... ... ... ...

In addition there are three taxonomies that have been designed in order to provide an hierarchical

organization for the values of the attributes manufacturer, fuel, and location. The leaves of these

Page 56: Papadakos PhD 2013

30 Chapter 2. Background and Related Work

taxonomies are the domains of the corresponding attributes which are recorded in the tuples of the

relational table11. Specifically, assume the taxonomies shown in Figure 2.7.

Figure 2.7: Example Taxonomies

Example 1 Assume that somebody, call him James, wants to change his car. He is interested in a family car, al-

though he preferred sport cars when he was younger. His wife prefers Jeeps but he is reluctant due to the extra

parking space required and because the garage of his home is somehow small. He believes that Japanese and Ger-

man cars are more reliable than French or Korean cars. He likes the fact that Hybrid cars consume less, are more

ecological and that the annual taxes are lower for such cars. James lives at the city of Heraklion, so cars owned by

persons that do not live in the island of Crete (where Heraklion resides) are less preferred for him (due to the trav-

eling time and cost) unless the case is exceptional. In addition he cannot afford an expensive car. Ideally he would

like a Porsche with four doors (e.g. Porsche Panamera) and enough space for luggage, hybrid with consumption less

than 6lt/100Km, bigger than Panamera (to satisfy his wife) but smaller than Cayenne, with less than 10 thousands

kilometers, in sale by his favorite neighbor and at a very good price (e.g. less than 30K Euros), but this is a utopian

desire. James aims at buying one car, but it is probable that he would buy a ”Porsche Carrera 4s” if available at a

very good price, and another decent but inexpensive family car to satisfy the rest requirements.11 Our model also allows tuples that contain values which are not necessarily leaves of the corresponding taxonomy.

Page 57: Papadakos PhD 2013

2.7. Motivation and Running Example 31

Although lengthy, the above description is by no means complete. There are a lot of other aspects

that would determine James’ final decision (years of guarantee, grip, airbags, Euro NCAP stars, color,

trunk, GPS, CD player, trip calculator, sunroof, etc). What we want to stress with this example is that the

specification of preferences is a laborious, cumbersome and time consuming task, and that the resulting

descriptions are in most cases incomplete. Pragmatically, decision making is based on complex trade-

offs that involve several (certain or uncertain) attributes as well as user’s attitude towards risk (Keeney

and Raiffa (1976)). Moreover preferences are not stable over time.

We believe that it is beneficial to provide users with an interaction method in which the preference

specification cost is paid gradually and depends on the available choices. For example, why spending time

for expressing complex tradeoffs between Porsche models with 4 doors versus those with 2 doors if

no Porsche car is in sale. Therefore an effective interaction that shows users the available choices is

important for reducing the preference specification cost and for speeding up decision making12.

In brief, the proposed preference specification actions affect the presentation order of:

• facets, i.e. the order by which facets (i.e. criteria, attributes) appear,

• terms, i.e. the order of the zoom-in/side points (i.e. criteria values, attribute values) appear

(which can be hierarchically organized and/or set-valued), and

• objects (of the focus), i.e. the order by which the objects (i.e. choices) appear.

Now suppose a user who (a) likes European cars, (b) does not like Italian cars, (c) likes Ferrari, and

(d) prefers low prices. According to the framework that we propose, the user can express the above

preferences straightforwardly, i.e. without having to refer to particular European countries or Italian

manufacturers (for expressing (a) and (b)) thanks to the hierarchically organized values, and preference

inheritance). Furthermore, he does not have to express all the above in one shot. He can provide them

gradually and in any order, say (b)-(a)-(d)-(c), and there is no need to define priorities for resolving

the conflicts (e.g. the fact that he likes Ferrari but he does not like Italian cars). The priority will be

deduced automatically by a scope-based conflict resolution rule. For instance, the scope of (b), i.e. Italian

cars, is contained in the scope of (a), which is the set of European cars, so (b) prevails on Italian cars.

Analogously (c) prevails on Ferrari cars (despite the fact that Ferrari is Italian).12See also Section 6.3 for a user-based evaluation of the DiFEPreKO hypothesis.

Page 58: Papadakos PhD 2013

32 Chapter 2. Background and Related Work

Moreover the user can express more expressive statements like (e) I prefer Asian to European cars,

and (f) I prefer Italian to Korean cars. From these two statements we can deduce that the user prefers

Fiat to Kia, and prefers Toyota to Peugeot. The above are examples of just some of the functionalities

offered by the proposed approach.

With respect to the characteristics described earlier in Section 2.2.1, this work focuses on multi-

dimensional spaces with hierarchically organized attribute domains, and explicitly-specified and crisp qualita-

tive user preferences. We assume that these preferences hold unconditionally (i.e they are context-free),

exact (although we provide support for distance functions), and simple (we assume that preference inher-

itance is not a compound preference and we also provide prioritized and Pareto composition).

We also focus on the preference elicitation process. Preference elicitation refers to the problem of

developing a decision support system capable of generating recommendations to a user, thus assisting

him in decision making. It is important for such a system to model user’s preferences accurately, find

hidden preferences and avoid redundancy. A survey of preference elicitation methods is given in Chen

and Pu (2004) while a survey of preference elicitation from a computer scientist’s perspective is given in

Braziunas (2006). Most of the above methods focus on the quantitative approach, i.e. on the elicitation

of multi-criteria value functions. In this thesis we use the term real-time preference elicitation because

according to our approach: (a) the system requires from the user to express his preferences only for

those facets/values that are involved in the available (and restricted) set of choices (i.e. not for the

entire value space), and (b) we exploit the hierarchical organization of terms for reducing the number

of preferences that have to be explicitly specified.

To conclude and to the best of our knowledge this is the first work that proposes an incremental

preference elicitation mode which allows the user to define the desired preference structure gradually

and flexibly, over attributes with hierarchically organized values and possibly set-valued, and employs a scope-

based conflict resolution rule.

Page 59: Papadakos PhD 2013

Chapter 3

A Preference Framework for Multidimen-

sional Information Spaces (Syntax, Seman-

tics and Algorithms)

Contents3.1 Syntax of the Language . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

3.2 The Domain of Semantics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

3.3 Syntax to Semantics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

3.3.1 Flat Single-Valued Attributes . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

3.3.2 Set-Valued Attributes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

3.3.3 Best/Worst Preferences over Hierarchically Organized Values . . . . . . . . . 46

3.3.4 Relative Preferences over Hierarchically Organized Values . . . . . . . . . . . 52

3.3.5 Preferences over Hierarchical Set-Valued Attributes . . . . . . . . . . . . . . . 59

3.4 Multi-Facet Preferences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

3.4.1 Prioritized Composition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

3.4.2 Pareto Composition and BMO-set . . . . . . . . . . . . . . . . . . . . . . . . . 63

3.4.3 Combination of Priority and Pareto Compositions . . . . . . . . . . . . . . . . 65

3.5 A Complete Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

33

Page 60: Papadakos PhD 2013

34 Chapter 3. A Preference Framework for Multidimensional Information Spaces

In this chapter we extend the interaction of FDT with user actions for preference specification /

elicitation. Specifically, we introduce a preference framework appropriate for information spaces com-

prising resources described by attributes whose values can be hierarchically valued and/or multi-valued.

We define the language, its semantics and the required algorithms. The framework supports preference

inheritance in the hierarchies, automatic conflict resolution, as well as preference composition (prioritization,

Pareto and their combination).

We start by introducing a preference framework for multidimensional information spaces. Specifi-

cally, Section 3.1 introduces the syntax for preference actions, Section 3.2 describes the domain of the

semantics, while Section 3.3 defines the syntax of the semantics for flat, hierarchical, single-valued and

set-valued attributes. Finally, Section 3.4 describe the composition (prioritized and Pareto) of preference

actions over multiple facets.

3.1 Syntax of the Language

Here we introduce a language consisting of statements that we call preference actions, which can be easily

enacted by simple input user actions (i.e. mouse selections). We consider an information space as the

one described in Section 2.7.

Specifically, each action has a scopeType and a spec, which consists of an anchor and a rankSpec. In

more detail, the scopeType (either facets order, terms order, or object order) determines which

kind of elements it affects (facets, terms, or objects). Furthermore, each action is “anchored” (anchor) to

one element which can be a facet, a term or even an object1. This anchor allows enacting the preference

actions through the GUI straightforwardly as we will see in Section 5.2.2 (i.e. if the user right-clicks on

an element e, a pop-up window will show and allow the user to select the desired preference action,

where the selected action will be anchored to e. . Finally, each action is associated to a rankSpec (rank

description) which can be lexicographic (for ordering strings lexicographically), count (for ordering

elements based on the number of objects that are classified to them), value (for ordering numerically-

valued facets) and indexedBy (for ordering objects according to the number of facets each object is

indexed by)2. The language also defines actions for supporting best / worst (i.e. short-cuts to express

1Such actions would be interesting for example in expressing positive or negative preference over a specific object.2In addition we could also support any other method suggested in the bibliography for automatically ranking facets and

facet-values as discussed in Section 2.6.

Page 61: Papadakos PhD 2013

3.1. Syntax of the Language 35

preferred / non-preferred according to specific policies) elements ( later on, we extend the language

to also capture relative preferences, Prioritized and Pareto composition, intervals, etc.). Syntactically,

preference actions are defined through the following grammar in a BNF variation:

⟨stmt⟩ ::= ⟨scopeType⟩⟨spec⟩

⟨scopeType⟩ ::= facets order : | terms order : | objects order :

⟨spec⟩ ::= ⟨anchor⟩⟨rankSpec⟩

⟨anchor⟩ ::= facet ⟨Fi⟩

| term ⟨tj⟩

| object ⟨ok⟩

| ϵ // the empty string

⟨rankSpec⟩ ::= {lexicographic | count | value | indexedBy} {min|max}

| best | worst

| use scoreFunction ⟨score()⟩ {min|max}

In the above grammar Fi, tj and ok, denote names that match a facet, a term or object respectively,

while score is the name of a real-valued function provided by the user or the application programmer

(e.g. around operator for proximity search, which can be the edit distance for categorical values, or

absolute value of distance for numerical values). Some examples follow:

(1) facets order: count max

(2) facets order: facet Manufacturer best

(3) terms order: facet Year value max

(4) objects order: term Location.Cefalonia best

(5) objects order: facet Relevance value max

(6) objects order: use scoreFunction Relevance * dist(Price,20K) max

Before explaining formally their semantics, let us first describe them informally. The 1st action

specifies the order of facets to be in decreasing order with respect to their count information (i.e. max

counts are preferred). The 2nd places the facet Manufacturer at the top of the facets list. The 3rd specifies

Page 62: Papadakos PhD 2013

36 Chapter 3. A Preference Framework for Multidimensional Information Spaces

that the order of the terms of the facet Year to be in decreasing order. The 4th places all objects classified

(directly or indirectly) under the term Cefalonia at the top of the object ordering.

Now suppose a user who starts the car seeking process by formulating a free text query which the

WSE evaluates over the attribute comment of the database. In this case the user would like to see the

objects in decreasing order with respect to their relevance. The 5th action captures this requirement

where the facet called Relevance corresponds to the score returned by the WSE. Finally, the 6th action

orders the objects based on a function over the relevance score and distance from a given price.

We now extend the syntax to support relative preferences over facets and terms, as shown before:

⟨stmt⟩ | facets order : prefer facet ⟨Fi⟩ to ⟨Fj⟩

⟨stmt⟩ | terms order : prefer term ⟨ti⟩ to ⟨tj⟩

⟨stmt⟩ | objects order : prefer term ⟨ti⟩ to ⟨tj⟩

Regarding object ranking, we extend the syntax to allow composition of preference that synthesize twoor more actions with complex preference constructors over the different facets. Such actions includePareto, Pareto Optimal (i.e. same ordering as the skyline), Priority and Combinational composition. Thesyntax of such actions is given below:

⟨stmt⟩ | objects order : Pareto ⟨setOfFacets⟩

⟨stmt⟩ | objects order : ParetoOptimal ⟨setOfFacets⟩

⟨stmt⟩ | objects order : Priority ⟨orderedSetOfFacets⟩

⟨stmt⟩ | objects order : Combinational ⟨bucketOrderedSetOfFacets⟩

Below we introduce some possible extensions, although we do not focus on them. For example, thecompositions described above presuppose a number of object scoped preference actions over each facetthat participates in the composition. On the other hand, in the skyline3 operator of SQL, for each attributeparticipating in the skyline, a single preference is expressed along with the operator (i.e. SELECT * FROM

3 In brief, the skylines as in Papadias et al. (2005) are the maximal (w.r.t. preference) elements, i.e. those which are notdominated by others. This set is also called efficient set, or Pareto optimal set.

Page 63: Papadakos PhD 2013

3.2. The Domain of Semantics 37

Cars SKYLINE OF price MIN, consumption MIN).

⟨specList⟩ ::= ⟨Fi⟩ {LOW | HIGH} | ⟨specList⟩

⟨stmt⟩ | objects order : skylineOf ⟨specList⟩

Furthermore, we can extend the syntax so that to support interval-anchored actions and named actions(that eases the formulation of more complex preferences):

⟨anchor⟩ | term interval [ ⟨ti⟩⟨tj⟩]

⟨namedStmt⟩ ::= NamedAction ⟨String⟩ : ⟨stmt⟩

Notice that for interval functions, we only consider as the pair of the interval, numerical values that

are values of the same facet Ff (i.e. ti, tj ∈ Tf ) such that ti ≤ tj . Then with term interval [ ⟨ti⟩⟨tj⟩ ]

we denote all tk ∈ Tf such that ti ≤ tk ≤ tj . In this case, we use as anchor of a preference action

all available values between ti and tj . Such actions can be used as shortcuts and can be easily defined

through simple menus.

The complete syntax of the language is given in Appendix A.

3.2 The Domain of Semantics

In general, a preference over a set of elements E can be expressed as a binary relation over the elements

of E. In the described approach, we do not assume that preference relationships are transitive. So we

hereafter assume that a preference relation is the binary relation (E,≻) (sometimes we will also use its

dual relation denoted by≺). The proposed approach can also be used if we consider transitivity over the

preference relationships (as in Kießling (2002)), i.e. a preference relation is a strict partial order (E,≻)),

except for set-valued attributes, since the MoreWins-Rule described later in Def. 3 is not transitive.

The actions specified by the syntax allow structuring (ordering) the materialized faceted taxonomy

according to the preferences. Independent of how many actions have been issued and what their seman-

tics are, the defined preference at each point in time, comprises k+2 preference relations. Specifically:

Page 64: Papadakos PhD 2013

38 Chapter 3. A Preference Framework for Multidimensional Information Spaces

• One over the facets: ({F1, . . . , Fk},≻F ),

• k preference relations, one for each facet Fi (of the form (Ti,≻i)), and

• one preference relation for the objects (A,≻Obj).

Let B be the set of user actions the user has issued. We can partition this set to k + 2 subsets (where

k is the number of facets) as follows: BF holds the user actions for facets, BTi holds the user actions

for the terms of each facet Fi and BObj holds the user actions regarding the objects’ preferences. So

B = BF∪(∪

i BTi)∪BObj . As each of these sets can contain more than one action, we have to specify how

the corresponding preference relation is defined, e.g. from the actions in BTi to define the preference

relation (Ti,≻i).

Let us now introduce some required notions about preference relations. Consider a setE = {Porsche,

Ferrari, F iat} and a preference relation R≻ over E consisting of one relationship, specifically R≻ =

{Porsche ≻ Ferrari}. We shall use dom(R≻) to denote the elements of E that participate in R≻,

here dom(R≻) = {Porsche, Ferrari}, and call inactive the elements of E which are not members of

dom(R≻), in our case Fiat. Given a preference relation R≻, with R≺ we will denote its dual order. Com-

monly, preference relations are illustrated using Hasse diagrams. In our case (E,R≻) can be illustrated

as shown in Figure 3.1. Given a preference relationR≻ and two objects o1, o2 with o1 ≻ o2 we will denote

that o1 is preferred to o2 and with o1 ≺ o2 the reverse.

Figure 3.1: Hasse Diagram of Preference Relation Over E (E,R≻)

Definition 1 (Valid Preference) We consider a preference relation R≻ over a set of elements E to be

valid, if it is acyclic. □

Given a set of objects Obj, a bucket order B on Obj with |Obj| items, is the total order L defined

over the |B| sets B1, ..., B|B|, where the |B| buckets are a partition4 of Obj. For any two items oi and oj

in Obj, if they are in the same bucket, there is no preference precedence between oi and oj , and these4All blocks are pairwise disjoint.

Page 65: Papadakos PhD 2013

3.3. Syntax to Semantics 39

two items are said to be “tied”. If item oi belongs to Bk and item oj belongs to Bl, we say that oi is more

preferred to oj if and only if Bk precedes Bl according to the total order L. A total order on Obj can be

viewed as a special case of a bucket order such that every bucket consists of only one item.

Definition 2 We say that a linear or bucket order L over E respects a binary relation R over E, if R ⊆ L.

3.3 Syntax to Semantics

To describe formally the semantics of the syntax we have to define what the various keywords of the

syntax, like count, mean precisely.

Initially, note that the semantics of lexicographic, count, value, and indexedBy are straightfor-

ward, and each defines a linear or bucket order. The same is true also for use ScoreFunction. Note

however that count is not applicable to objects, while indexedBy is only valid for objects. For a term

t, t.count is the number of objects in A, indexed by term t, or a narrower term of t. For a facet Fi,

Fi.count is the number of the elements in A which are indexed by terms of Fi. For example, consider

the example of Figure 2.1(a) where we have only one facet, say Manufacturer. At that point we have

Manufacturer.count = 8 while for the term Italian we have Italian.count = 3. In the restric-

tion on the set A = {4, 5, 6} that is shown in Figure 2.1(b), we have Manufacturer.count = 3 while

Italian is not shown, since Italian.count = 0. Formally, and using the notations of Table 2.1, we have

t.count = |I(t) ∩ A| and Fi.count = |J(Fi) ∩ A| where J(Fi) = {o ∈ Obj | D(o) ∩ Ti = ∅} (FDT

notations are described in Table 2.1). The semantics of best/worst(ei) and prefer ei to ej actions are

defined in an aggregated way (i.e. not in isolation) and are clarified next.

3.3.1 Flat Single-Valued Attributes

We will now define the semantics of actions that express qualitative preferences, i.e. actions of the form

best(ei), worst(ei), and prefer ei to ej , starting from the case of single-valued and flat attributes.

LetB (resp. W ) be the elements ofE on which a best (resp. worst) action has been defined. LetR≻

be the relative preferences (of the form ei ≻ ej) over E provided by the user. We shall now introduce

an algorithm, Alg. Apply, which takes as input these sets and derives one linear or bucket order. The

Page 66: Papadakos PhD 2013

40 Chapter 3. A Preference Framework for Multidimensional Information Spaces

algorithm also takes a parameter Policy which determines the ordering of the inactive elements (will be

explained later on).

Algorithm 1 Apply(E, B,W,R≻, Policy)Input: the set of elements E, the set of best elements B over E, the set of worst elements W over E, aset of relative relationships R≻ over E, and Policy for inactive elementsOutput: a bucket order L over E that respects R

1: Rbw ← {(b, w) | b ∈ B,w ∈W} // each best is preferred than each worst2: R← Rbw ∪R≻ //add relative prefs3: L← SourceRemoval(R) //produce blocks with boundaries4: I ← E \ (B ∪W ∪ dom(R≻)) // I contains the inactive elements5: L′ ← addInactiveElements(L, I, Policy)6: return L′

Algorithm 2 SourceRemoval(R)Input: a binary relation R over EOutput: a bucket order L over E that respects R

1: L← ⟨⟩2: repeat3: S ← maximal≻(R)4: R← R \ {(x, y) ∈ R | x ∈ S} // Remove maximal5: L← L.append(S) // Append a bucket to L6: until S = ∅7: return L

At first the algorithm constructs a graph by connecting each best to each worst element ((b, w)means

b ≻ w). So best/worst are interpreted as “each best is preferred to each worst”. Then it adds to the graph

the relationships inR≻. Furthermore, we should note here that the parametersB andW actually define

a set of relationships (Rbw at line 3 of the algorithm), so they could have been expressed directly through

the R≻ parameter, however we keep them separate as they constitute an easily enacted (for the user)

shorthand. In order to create Rbw this algorithm assumes that |B| ≥ 1 and |W | ≥ 1. If this is not the

case, we can use different policies.

Although a linear or bucket order could be produced by traversing the graph in a breadth first search

(BFS) manner (where the first block will contain the more preferred elements, the second the next more

preferred, etc), if the transitive reduction is a DAG (Directed Acyclic Graph, i.e. not a tree), then BFS

could yield wrong results (i.e. the produced linear or bucket order would not respect the condition of

Definition 2). This will be made clear in a following example. Using instead of BFS a topological sorting

algorithm, which yields a linear ordering of the nodes of a DAG such that each node comes before all

Page 67: Papadakos PhD 2013

3.3. Syntax to Semantics 41

nodes to which it has outbound edges, e.g. Alg. SourceRemoval as shown above, we can always get a

linear order that respects R. In particular, Alg. SourceRemoval is based on the source removal algorithm

described in Kahn (1962), satisfying the condition that all removed maximal nodes are inserted in the

same bucket. Initially, it finds all the maximal elements of R, moves them in a bucket, and continues

with the maximal elements of their children, and so on.

Figure 3.2: Example for Flat Single-Valued Attributes

To give an example, let B = {Ferrari}, W = {Fiat, Lancia} and R≻ = {Porsche ≻ Ferrari,

Porsche ≻ Fiat}. Figure 3.2 shows at the left the diagram ofR, and at the right the result of topological

sorting (as derived by step 5 of Apply), i.e. L = ⟨Porsche, Ferrari, {Fiat, Lancia}⟩, meaning that

the bucket order consists of three blocks (the first two are singletons).

Figure 3.3: Example for a DAG

Consider another example, where R≻ = {Porsche ≻ Ferrari, Ferrari ≻ Lancia, Lancia ≻

Fiat, Porsche ≻ Fiat}. Figure 3.3 shows the resulting total order (i.e. L = ⟨ Porsche, Ferrari,

Lancia, F iat⟩) derived by topological sorting. If the final order was derived using BFS, the bucket order

would be LBFS = ⟨Porsche, {Ferrari, F iat}, Lancia⟩, although Fiat is the least preferred car. As a

Page 68: Papadakos PhD 2013

42 Chapter 3. A Preference Framework for Multidimensional Information Spaces

result R ⊈ LBFS (i.e. LBFS does not respect R).

Regarding inactive elements (elements not participating in any action), they can be considered as max-

imal or minimal elements according to the application needs (controlled by parameter Policy of Alg.

Apply). For example consider a facet Fi with values from a set Ti, and a number of actions that define

the sets Bi,Wi, R≻i . By using E = Ti and calling Alg. Apply, in line 7 we compute the set of inactive

elements I = E \(Bi∪Wi∪dom(R≻i)) (where dom(R≻i) is the elements of E that participate in R≻i).

Now by using the command addInactiveElements (line 5 of Alg. Apply) and passing as parameters

the bucket order L′, the inactive elements I and the policy based on the application needs, which can be

maximal (resp. minimal), we put the inactive elements at the beginning (resp. end) of L′ as a new block.

As a final note, our approach assumes totalitarian semantics regarding the attributes that do not

participate in any preference action. For example, if Ferrari ≻ Fiat, then any car manufactured by

Ferrari is preferred to any car manufactured by Fiat. In the opposite case, (i.e. if our approach sup-

ported the ceteris paribus semantics), a Ferrari would be preferred to a Fiat car, provided that these

cars agreed regarding preference on the values of all other attributes.

3.3.2 Set-Valued Attributes

Multi-valued attributes appear in several cases (social tags, clustering, etc). In our running example

suppose that the attribute accessories of the table Car is multi-valued, taking values like ABS, ESP

(Electronic Stability Program), AT (Auto-Transmission), DVD, etc.

Definition 3 (Induced Preference over Sets: MoreWins-Rule)

If s, s′ are two subsets of E, with wins(s, s′) we will denote the number of “times” s beats s′ according

to≻. Formally:

wins(s, s′) = |{(e, e′) | e ∈ s, e′ ∈ s′, e ≻ e′}|

Any subset S of the powerset of E (i.e. S ⊆ P(E)), can be ordered according to a preference relation

that we will denote by≻{}, defined by the following rule:

s ≻{} s′ iff wins(s, s′) > wins(s′, s)

Page 69: Papadakos PhD 2013

3.3. Syntax to Semantics 43

As an example consider a set T = {ABS,ESP,AT,DV D} and three statements which define

ABS as best, ESP as worst and that ABS ≻ AT. Now consider the following family of sets: S = {{ABS},

{ESP}, {ABS,ESP}, {AT,ABS}, {AT,ESP}, {DVD,ESP}}. The win(s, s′)/win(s′, s) values

of all pairs of sets from the above family are shown in the next table (the last column shows the number

of clear winnings - not ties).

By using Def. 3 (i.e. ≻{}) and then applying topological sorting we get the following bucket order

⟨{ABS}, {AT,ABS}, {ABS,ESP}, {{AT,ESP}, {DVD,ESP}}, {ESP}⟩, as shown in Fig. 3.4.

w(s, s′)/w(s′, s) {ABS} {ESP} {ABS,ESP} {AT, ABS} {AT, ESP} {DVD, ESP} all

{ABS} 0/0 1/0 1/0 1/0 2/0 2/0 5/0

{ESP} 0/1 0/0 0/1 0/2 0/1 0/1 0/5

{ABS,ESP} 0/1 1/0 1/1 1/2 2/1 2/1 3/2

{AT,ABS} 0/1 2/0 2/1 1/1 3/0 3/0 4/1

{AT,ESP} 0/2 1/0 1/2 0/3 1/1 1/1 1/3

{DVD,ESP} 0/2 1/0 1/2 0/3 1/1 1/1 1/3

Figure 3.4: Example for Flat Multi-Valued Attributes

Now suppose that both ABS and ESP are defined as best elements, and that both AT and DVD aredefined as worst. In that case it holds:

wins({ABS}, {ABS,ESP}) = wins({ABS,ESP}, {ABS}) = 0

wins({AT}, {AT,DV D}) = wins({AT,DV D}, {AT}) = 0

This means that with wins we get 0 whenever sets with best only elements are compared, and sets with

worst only elements are compared. If we would like to break such ties we could adopt a MoreGoodLessBad-

Page 70: Papadakos PhD 2013

44 Chapter 3. A Preference Framework for Multidimensional Information Spaces

rule (the more best elements the better and the less worst elements the better). To define it formally,

we first have to introduce some notations. Given an element e we use sup(e) to denote the number of

elements that e dominates, minus 1. Formally, sup(e) = |{e′ ∈ E | e ≻ e′}| − 1. Notice that each

worst element takes a negative value. Given a set of values e we define the support of s, denoted by

Support(s), by summing up the support of its terms, i.e. Support(s) =∑

e∈s sup(e). Note that since a

worst value takes -1 we can discriminate an s having one worst term from one s′ having 10 worst terms

(Support(s) = −1, while Support(s′) = −10). We can now proceed and define:

Definition 4 (Breaking ties: MoreGoodLessBad-rule)

If wins(s, s′) = wins(s′, s) and Support(s) > Support(s′) then s ≻{} s′. □

In our case: Support({ABS,ESP})=2>Support({ABS})=1>Support({AT})=−1>Support(

{AT,DV D}) = −2, and the induced ordering, i.e. ⟨ {ABS,ESP}, {ABS}, {AT}, {AT,DV D} ⟩,

satisfies the MoreGoodLessBad-rule.

To conclude, in case we have preferences over atomic values but the information space has set-valued

attributes, then it is enough to use Alg. Applywith a small modification. Initially, we follow the first two

steps of Alg. Apply, in order to compute the relation≻ of the atomic values. We should stress here that

to correctly compute wins we have to take into account the transitive closure of the preference relation.

For example, ifa ≻ b and b ≻ c and we want to computewins({a, e}, {c, e})we should consider thata ≻

c. In other words, we should anticipate the topological sorting of Apply over individual values before

computing wins over sets. Then we compute the wins (and the Support to break ties), to define ≻{}.

Afterwards we continue with the next steps of Alg. Apply, i.e. topological sorting and so on, eventually

yielding the final bucket order of the sets. The steps are given in more detail in Alg. 3.

3.3.3 Best/Worst Preferences over Hierarchically Organized Values

So far we have considered single-valued and set-valued attributes over flat (non hierarchically orga-

nized) value domains. Let us now consider hierarchically organized values. As an example if the user

is interested in “Italian” cars and marks them as “best” then it is reasonable to apply “best” also to its

narrower terms, i.e. to Ferrari, Fiat, etc. It is not hard to see that the approach described in the previous

section is not adequate for terms which are not leaves. For example suppose the following set of actions

(using an informal syntax): B = {Best(European), Worst(Italian), Best(Ferrari)}, which define

Page 71: Papadakos PhD 2013

3.3. Syntax to Semantics 45

Algorithm 3 ApplyOverFamiliesOfSets(E, B,W,R≻, Policy)Input: the set of elementsE (here each element ofE is a set), the set of best elementsB, the set of worstelements W , a set of relative relationships R≻, and Policy for inactive elementsOutput: a bucket order L over E

1: Rbw ← {(b, w) | b ∈ B,w ∈W}2: R← Rbw ∪R≻3: R← Closuretransitivity(R) // Addition of the transitively induced links4: for each e, e′ ∈ E, s.t. e = e′ do5: if wins(e, e′) > wins(e′, e) then6: set e ≻{} e

7: else if wins(e, e′) < wins(e′, e) then8: set e′ ≻{} e9: else if wins(e, e′) = wins(e′, e) then

10: resolve the tie by computing the support(e) and support(e′)11: L← SourceRemoval(≻{})12: I ← E \ dom(≻{}) // I is the set of inactive elements13: L′ ← addInactiveElements(L, I, Policy)14: return L′

the sets B = {European, Ferrari}, W = {Italian}. If we apply Alg. Apply without taking into

account the taxonomy we would get the bucket order shown in Figure 3.5 which does not make much

sense, nor helps us to derive the intended ordering of cars.

Figure 3.5: Example of Preferences Without Exploiting Hierarchies

It follows that without proper exploitation of the subsumption relation, the user would have to issue

a high number of actions, all anchored to leaf terms. To tackle this problem, below we introduce a form

of preference inheritance where preferences are “inherited” to the narrower terms. Let b be an action in

B. We shall use scope(b) to denote the scope of the action b, which is the set of elements (either facets,

terms, or objects) that are affected by this action. To capture inheritance we will redefine the scope of

actions which are anchored to terms of a taxonomy.

Page 72: Papadakos PhD 2013

46 Chapter 3. A Preference Framework for Multidimensional Information Spaces

Definition 5 (Scope and Inheritance) Let b be an action b = ⟨e, rs⟩ where e is its anchor and rs the

other part of the action. The scope of b is defined as:

scope(b) = scope(⟨e, rs⟩) = ∪e′∈N∗(e)scope(⟨e′, rs⟩)

whereN∗(e) stands for e and the narrower elements of e, formallyN∗(e) = {e}∪N+(e) = {e′ | e′ ≤ e}.

In other words, the scope of b is the union of the scopes of the actions obtained by replacing the

anchor e with a narrower term of e. Table 3.1 defines exactly the scope for each action, while the scopes

of our example according to Def. 5, are shown in the first two columns of Table 3.2.

Table 3.1: Scopes (Direct and Under Inheritance)

scopeType anchor (D)irect scope (I)nherited scope

facet Fi Ti Ti

terms order term tj {tj} N∗(tj)

objects order term tj I(tj) I(tj)

Table 3.2: Scopes: Example for Best/Worst Preferences

action scope active scopeb1: Best(Europe) {European, German, Audi, BMW, Porsche, French,

Citroen, Peugeot, Italian, Lancia, Ferrari, Fiat,Lamborghini }

scope(b1) \ scope(b2)

b2: Worst(Italian) {Italian, Lancia, Ferrari, Fiat, Lamborgini} scope(b2) \ scope (b3)b3: Best(Ferrari) {Ferrari} scope(b3)

Note that the set of actionB = {b1, b2, b3} defines a valid preference, i.e. no cycles are formed (recall

Def. 1). However, if we “unfold” each b ∈ B, based on its scope, then we will get aB′ that does not define

a valid preference, e.g. Ferrari will be both best and worst and this forms a cycle. To tackle this problem,

and to provide an intuitive interpretation of user’s actions, we will introduce what we call active scope,

after first introducing some required definitions.

Definition 6 We say that an action b is equally or more refined than an action b′, denoted by b ⊑ b′, if

scope(b) ⊆ scope(b′). □

Page 73: Papadakos PhD 2013

3.3. Syntax to Semantics 47

In this way a preorder (reflexive and transitive) relation over B, denoted by (B,⊑) is defined. In the

case of our example, the Hasse diagram of⊑ is shown in Figure 3.6.

Figure 3.6: Hasse Diagram of Actions Refinement

The objective is to use (B,⊑) for resolving the conflicts incurred due to inheritance. This can be

done by assuming that more specific preferences prevail over less specific ones (specificity). Particularly,

we introduce the following rule:

Definition 7 (Scope-based Dominance Rule)

If A ⊆ scope(b) ⊆ scope(b′) then b′ is dominated by b onA, and thus action b′ should not determine the

ordering of A. □

We can now define the active scope of each action, by excluding from its scope the scopes of its direct

children with respect to⊑. Specifically, we can define active scope as:

Definition 8 (Active Scope)

IfC(b) denotes the direct children of bwith respect to⊑, then the active scope of b, denoted by aScope(b),

is defined as: aScope(b) = scope(b) \ (∪

b′∈C(b) scope(b′)) □

In our example, the active scopes are shown in the Table 3.2. From these we obtainB = ascope(b1)∪

ascope(b3), and W = ascope(b2), which define a valid preference. Specifically, Alg. Apply will return

(assuming inactive elements go at the end) the following bucket order:

⟨ {European, Ferrari, German, Audi, BMW, Porsche, French, Citroen, Peugeot},

{Lancia, F iat, Lamborghini},

{Asian, Japanese, Toyota, Korean, Kia, American, U.S.A., Chrysler} ⟩

Now, consider the same set of actions B but suppose that they are object-scoped instead of term-

scoped, and assume that the table Cars contains the following tuples:

Page 74: Papadakos PhD 2013

48 Chapter 3. A Preference Framework for Multidimensional Information Spaces

Id Manuf ...

P Porsche

L Lancia

Fi Fiat

Fe Ferrari

T Toyota

The (plain and active) scopes in this case are:

action scope active scope

b1: Best(Europe) {P, L, Fi, Fe} {P}

b2: Worst(Italian) {L, Fi, Fe} {L, Fi}

b3: Best(Ferrari) {Fe} {Fe}

The sets B and W of the active scopes are: B = {P, Fe} and W = {L,F i}. With these parameters

Alg. Apply will yield the ordering: ⟨{P, Fe}, {L,F i}, {T}⟩.

The algorithm that supports inherited preferences and scope-based resolution of conflicts is Alg.

PrefOrder. It starts by computing the scopes of each action b ∈ B (line 2) using Def. 5 in order to

compute the preorder relation (B,⊑) (line 3). Afterwards, it computes the active scopes using the Def. 8

(line 5), and expands the original set of actionsB to a new set of actions B′, by including the new actions

computed by the active scopes (line 6). Then, it parses the new actions set B′ in order to get the B, W

and R≻ (line 8). Finally, it calls Alg. Apply (line 9).

Algorithm 4 PrefOrder(E, B, Policy)Input: the set of elements E, the set of actions B, and Policy for inactive elementsOutput: a bucket order L over E

1: // Part (i): Computation of (B,⊑)2: Compute the scopes of the actions in B3: Form (B,⊑)4: // Part (ii): Efficient Computation of Act. Scopes5: Use (B,⊑) to compute the active scopes of the actions in B6: Use the active scopes to expand the set B to a set B′7: //Part (iii): Derivation of the final bucket order8: (B,W,R≻)← Parse(B′)9: return Apply(E,B,W,R≻, Policy) // call to Alg. 1

Let us discuss now a number of propositions.

Page 75: Papadakos PhD 2013

3.3. Syntax to Semantics 49

Prop. 1 If B ∩W = ∅ and (T,≤) is a tree, then in the expanded (through active scopes) actions, a term

cannot be both Best and Worst. ⋄

Proof:

Since (T,≤) is a tree, for each term t there is only one and unique path starting from t and

ending to the root of the tree. The term t will be in active scope of the closest action, i.e. in

the active scope of an action anchored on t, or on its father, or on the father of its father,

and so on. Therefore it can be in the active scope of an action anchored to its closest (in the

path) term. Since B ∩W = ∅ that anchor can be either in B or in W (not both), therefore

t cannot be both Best and Worst.

This means that the inheritance of preferences over tree-structured facets cannot create any am-

biguity. However if (T,≤) is a DAG (Directed Acyclic Graph), then Prop. 1 does not always hold, e.g.

consider a term having two direct fathers one defined as best, the other as worst. Such actions do not

define a valid preference and below we show how we can detect such cases. Let:

effAnchors(t) = minimal{ t′ | t ≤ t′ and t′ is anchor of one preference action}

Prop. 2 If B ∩W = ∅, then there is not any ambiguity about a term t iff the actions in effAnchors(t)

are all either Best or Worst. ⋄

Proof:

It is a straightforward consequence of the definitions that a term t will be in active scopes of

the actions anchored in the terms that belong to the set effAnchors(t) = minimal { t′ | t ≤

t′ and t′ is anchor of one preference action}. If all such actions are Best (resp. Worst) state-

ments, then t will be Best (resp. Worst) in the expanded statements. If however some of

these actions are Best and some are Worst, then (since t will be in the active scopes of all

of them) t will be both Best and Worst, and thus the expansion will create ambiguities (and

hence an invalid preference).

Note that Prop. 1 is a special case of Prop. 2, since in trees for each term t it holds:

|effAnchors(t)| ≤ 1

Page 76: Papadakos PhD 2013

50 Chapter 3. A Preference Framework for Multidimensional Information Spaces

Algorithmically we can check whether the actions defined over a DAG-structured facet create an ambi-

guity by checking the condition of Prop. 2 only for those terms which have more than one direct fathers.

Prop. 3 Alg. PrefOrder respects the scope-based dominance rule (Def. 7). ⋄

Proof:

Suppose the opposite, i.e. suppose that ∃ b, b′ and A ⊆ Obj s.t. A ⊆ scope(b) ⊆ scope(b′)

and that PrefOrder orders the elements of A on the basis of action b′. This cannot be true,

since according to Def. 8, the active scope of b′ will not contain A. Notice that although in

the definition of active scopes (Def. 8) only the direct children are used, the scope (defined as

in Def. 5) is based on N∗(e) so it takes into account all children wrt ≤. For this reason it is

enough at Def. 8 (and actually more efficient at implementation level) to consider only the

direct children.

3.3.4 Relative Preferences over Hierarchically Organized Values

To complete the expressive power of the proposed actions, here we study the case of relative (qualitative)

preferences over hierarchically organized values. Specifically, our objective is to support sets of prefer-

ences of the form:

(b1): Asian ≻ European

(b2): European ≻ Kia

(b3): BMW ≻ Asian

(b4): Kia ≻ Fiat

(b5): Toyota ≻ Kia

whose semantics take into account inheritance, and conflicts are resolved in an intuitive manner. To

this end we will define the scope and the expansion of such preferences.

Definition 9 (Scope of Relative Preferences)

The scope of a preference relationship ei ≻ ej , denoted by scope(ei ≻ ej), is defined as:

scope(ei ≻ ej) = (N∗(ei)×N∗(ej)) ∪ (N∗(ej)×N∗(ei))

Page 77: Papadakos PhD 2013

3.3. Syntax to Semantics 51

Definition 10 (Expansion of Relative Preferences)

The expansion of a preference relationship ei ≻ ej , denoted by expansion(ei ≻ ej), is defined as:

expansion(ei ≻ ej) = {e′i ≻ e′j | e′i ∈ N∗(ei), e′j ∈ N∗(ej)}

This means that expansion(ei ≻ ej) actually “unfolds” the preference relationship ei ≻ ej on the

basis of the subsumption relationships, while scope(ei ≻ ej) does not contain any preference relation-

ship (it is used for resolving conflicts as we shall see below). The scope-based ordering of such actions

is defined as before (Def. 6), i.e. b ⊑ b′ iff scope(b) ⊆ scope(b′). We can now define the active scope of a

preference ei ≻ ej by excluding from its expansion all relationships e′i ≻ e′j such that (e′i, e′j) belongs

to the scope of a child (w.r.t. ⊑) action.

Definition 11 (Active Scope of Relative Preferences)

The active scope of a preference action b, in the context of a set of preference actions B is defined as:

aScope(b) = {ei ≻ ej ∈ expansion(b) | ∄b′ ∈ B s.t. b′ ⊑ b and (ei, ej) ∈ scope(b′)}

which is equivalent to:

aScope(b) = expansion(b) \ (∪b′⊑b

{ei ≻ ej | (ei, ej) ∈ scope(b′)})

Assume that the taxonomy of manufacturers has the form shown in Figure 3.7,

Figure 3.7: Taxonomy of Manufactures

Page 78: Papadakos PhD 2013

52 Chapter 3. A Preference Framework for Multidimensional Information Spaces

Then the scope-based ordering of preferences b1-b5 is that shown in Figure 3.8, while Table 3.3 shows

the scopes, expansion and active scopes of the actions.

Figure 3.8: Hasse Diagram of Scope-Based Ordering of Preference Actions

Table 3.3: Scopes: Example for Relative Preferences

preference expansion active scopeb1: Asian ≻ European Asian ≻ European,

Asian ≻ BMW ,Asian ≻ Fiat,Kia ≻ European, Kia ≻BMW , Kia ≻ Fiat,Toyota ≻ European,Toyota ≻ BMW ,Toyota ≻ Fiat,Lexus ≻ European,Lexus ≻ BMW ,Lexus ≻ Fiat

Asian ≻ European,Asian ≻ Fiat,Toyota ≻ European,Toyota ≻ Fiat,Lexus ≻ European,Lexus ≻ Fiat

b2: European ≻ Kia European ≻ Kia,BMW ≻ Kia,Fiat ≻ Kia

European ≻ Kia,BMW ≻ Kia

b3: BMW ≻ Asian BMW ≻ Asian,BMW ≻ Kia,BMW ≻ Toyota,BMW ≻ Lexus

BMW ≻ Asian,BMW ≻ Kia,BMW ≻ Toyota,BMW ≻ Lexus

b4: Kia ≻ Fiat Kia ≻ Fiat Kia ≻ Fiatb5: Toyota ≻ Kia Toyota ≻ Kia Toyota ≻ Kia

As in the case of Best/Worst preferences (and Prop. 1 and 2), here we have to examine whether the

expansion of relative preferences creates ambiguities (conflicts), apart from those which are resolved

by the scope-based rule, and how we can identify such cases.

LetB be a set of relative preference actions, which define a valid preference relationR≻ over a (T,≤).

We will examine whether a preference relationship between two terms e and e′ of T (either e ≻ e′ or

e′ ≻ e), can be in the active scope of more than one action in B. If this holds then this means both e ≻ e′

Page 79: Papadakos PhD 2013

3.3. Syntax to Semantics 53

and e′ ≻ e could belong to the expanded (through the active scopes) preference relation, and thus that

preference relation would be invalid.

Let us make the hypothesis that a relationship e ≻ e′ belongs to the active scope of two actions bi

and bj such that bi = bj . Suppose that bi : ti ≻ ti′ and bj : tj ≻ tj′ . Certainly e ≻ e′ should belong to the

expansions of both bi and bj . Membership to expansion of bi means: e ≤ ti and e′ ≤ ti′ . Membership to

expansion of bj means: e ≤ tj and e′ ≤ tj′ . We can identify the following cases:

(i) if ti ≤ tj and ti′ ≤ tj′ then it holds bi ⊑ bj and thus e ≻ e′ can belong to the active scope of bi

only (and not of bj).

(ii) if tj ≤ ti and tj′ ≤ ti′ then it holds bj ⊑ bi and thus e ≻ e′ can belong to the active scope of bj

only (and not of bi).

(iii) if ti ≤ tj and tj′ ≤ ti′ , or tj ≤ ti and ti′ ≤ tj′ then neither bi ⊑ bj nor bj ⊑ bi holds. This means

that in such cases it could belong to the active scopes of both. An example is shown at Figure 3.9

(left).

(iv) If ti||tj and/or tj′ ||ti′ , again we have bi ⊑ bj and bj ⊑ bi, meaning that e ≻ e′ would belong to

the active scopes of both. Note that the case ti||tj and tj′ ||ti′ can occur in DAGs, and an example is

shown at Figure 3.9 (right). For the case of trees we cannot have ti||tj , since we know that e ≤ ti

and e ≤ tj (it is not possible to hold all these three relationships). For the same reason in trees it

cannot hold tj′ ||ti′ .

A P B R

I J

I > J due to A > BI < J due to R > P

e ≤ e’e e’: e e’: e > e’

European

German

BMW

Asian

Korean

KIA

BMW > KIA due to German > AsianBMW < KIA due to Korean > European

Figure 3.9: Examples of Conflicts

The cases (iii) and (iv) are indicative situations when conflict can occur. Note that case (iii) can occur

both in trees and DAGs, while case (iv) only in DAGs.

Page 80: Papadakos PhD 2013

54 Chapter 3. A Preference Framework for Multidimensional Information Spaces

It follows from the above that we need methods for detecting the cases where inheritance causes

invalidities. One method to do so, is to compute the expansion and then check for cycles. This means

that a classical cycle detection algorithm (e.g. topological sort) is enough for detecting such cases.

We could also avoid the expansion step in some cases. Below we elaborate on a method that could be

applied for the case of tree-structured taxonomies. To begin with, let Re denote the expanded (through

the notion of active scopes) preference relation of R≻ (obviously, R≻ ⊆ Re).

Prop. 4 (Relative Inherited Preferences and Conflicts)

For tree-structured taxonomies, the expansion throughactive scopes of a valid preference relationR≻ (yield-

ing a preference relation Re) can create a conflict iff (if and only if) there are two actions in R≻ (not

necessarily different) of the form a ≺ b and c ≺ d such that either:

(i) a ≤ d and c ≤ b hold, or

(ii) b ≤ c and d ≤ a, hold.

If these actions are the same, meaning that a = c and d = b, the formulation of the proposition becomes:

Re has a conflict iff there is an action a ≺ b and either a ≤ b or b ≤ a. ⋄

Proof:

(Direction: If the conditions of the proposition hold then Re has a conflict)

As we can see from Figure 3.10 (i), if the conditions of the proposition hold, thenRe contains

a conflict (either between a and c, or between d and b). Regarding the special case (where

the two actions are the same), note that if b ≤ a then we get the cycle b ≺ b (see Figure 3.10

(ii-left)). If a ≤ b then we get the cycle a ≺ a (see Figure 3.10 (ii-middle)). Note than non

trivial cycles (i.e. not self-cycles) can also occur, e.g. if c ≤ b ≤ a, with the expansion we

will get c ≺ b and b ≺ c (see Figure 3.10 (ii-right)).

(Direction: if Re has a conflict then the conditions of the proposition hold)

Trivial Cycle

Suppose that Re has a trivial cycle of the form a ≻ a. Since this relationship cannot belong

to R≻ (which is acyclic by assumption), it should be result of an inherited action, therefore

a should have a superclass, say sp, for which there is an action sp ≻ sb, and this action

for being inheritable to a, it should also be a ≤ sb. Therefore it should hold a ≤ sb and

a ≤ sp. However, since ≤ is a tree, sb and sp cannot be incomparable (i.e. it cannot be

Page 81: Papadakos PhD 2013

3.3. Syntax to Semantics 55

d

a

b

c

a

d

c

b

a

b

b

a

a

b

c

a

e

c

e’

d b

(i) (ii) (iii)

Figure 3.10: Relative Inherited Preferences and Conflicts Examples

sb||sp), therefore it should either be a ≤ sb ≤ sp or a ≤ sp ≤ sb. We reached to the

conclusion that there exists an action sp ≻ sb and either sb ≤ sp or sp ≤ sb. This is exactly

what the proposition states.

Cycle of the form e ≺ e′ ≺ e

A relationship e ≺ e′ can belong to Re either because it belongs to R≻, or due to an action

a ≺ b to whose active scope the relationship e ≺ e′ belongs. In the latter case it should be

e ≤ a and e′ ≤ b.

Analogously, a relationship e′ ≺ e can belong to Re either because it belongs to R≻, or due

to an action c ≺ d to whose active scope the relationship e ≺ e′ belongs. In the latter cases,

it should be e′ ≤ c and e ≤ d (illustrated at Figure 3.10 (iii)).

However, since ≤ is a tree it cannot be a||d nor b||c. Therefore we can have one of the

following four cases (also illustrated at Figure 3.11).

(i) a ≤ d and b ≤ c

(ii) a ≤ d and c ≤ b

(iii) d ≤ a and b ≤ c

(iv) d ≤ a and c ≤ b

We cannot be in case (i) because in that case e ≻ e′ would not be in the active scope of c ≺ d

(that would contradict one of our hypothesis). Similarly, we cannot be in case (iv) because

in that case e ≺ e′ would not be in the active scope of a ≺ b. So only (ii) and (iii) can hold.

Notice that we reached to the exact conditions that the proposition states.

Page 82: Papadakos PhD 2013

56 Chapter 3. A Preference Framework for Multidimensional Information Spaces

d

a

e

c

e’

b

(i)

d

a

e

b

e’

c

a

d

e

c

e’

b

a

d

e

b

e’

c

(ii) (iii) (iv)

Figure 3.11: Examples of Cycles of the Form e ≺ e′ ≺ e

Based on the above proposition, below we describe an algorithmic method for identifying such prob-

lems without having to expand R≻, i.e. without having to compute Re. For each pair of statements

(i.e. for each pair of relationships in R≻) we check whether the condition of Proposition 4 holds. This

means that we need to check the proposition |R≻|(|R≻|−1)/2 times. To check the proposition once, we

have to check whether four≤ relationships hold. If the transitive closure of≤ is stored then this can be

checked fast (one scan, or even faster if indexes exist). If the transitively induced≤ relationships are not

stored, then we can check whether t ≤ t′ by applying the reachability algorithm with cost analogous to

the average depth of the taxonomy. If however the taxonomy has been labeled (e.g. using Agrawal et al.

(1989)), then we can check whether t ≤ t′ inO(1).

At application level, the detected invalidities can be managed in various ways. For instance, we can

inform the user and ask him to revise his preferences or to resolve the ambiguity. Alternatively one

could consider the preference invalid and thus ignore it, or “cut” the inheritance at some points (e.g. at

the points of conflicts), or employ other conflict resolution rules (e.g. the closer in≤ hierarchy prevails,

or the more recent action prevails, etc). All these are application-specific issues that go beyond the focus

of this thesis.

Obviously (B,⊑) contains relationships between preferences of the same kind (i.e. Best/Worst and

Relative). Therefore, when we are in the first step of the algorithm where we compute (B,⊑), first we

calculate the actions’ refinement preorder for Best/Worst preference actions, then for Relative prefer-

ence actions and finally we return the union of these relationships.

Returning to preference-based order and the actions b1-b5 given at the beginning of this section, we

can apply Alg. PrefOrder as it is (assuming the scope defined as in this section). Specifically, to produce

Page 83: Papadakos PhD 2013

3.4. Multi-Facet Preferences 57

the induced ordering we have to pass to Alg. Apply through R≻, all active scopes of the actions in B.

Figure 3.12 shows the transitive reduction (i.e. the Hasse Diagram) of the relation R≻ for the prefer-

ences over the Manufacturer attribute. The derived bucket order by Alg. PrefOrder in our example is: ⟨

{BMW}, {Asian, Toyota, Lexus}, {European}, {Kia}, {Fiat} ⟩ , and its restriction on the leaves

of the taxonomy is: ⟨ {BMW}, {Toyota, Lexus}, {Kia}, {Fiat} ⟩ which captures the intuition.

Figure 3.12: Hasse Diagram of the Relation R for the Manufacturer Attribute

3.3.5 Preferences over Hierarchical Set-Valued Attributes

In case we have set-valued attributes over hierarchically organized value domains, we can again exploit

inheritance to order the sets. In particular, consider the scope and active scope as defined earlier, in a way

that captures relative preferences. We can apply Alg. PrefOrder up to line 8 (i.e. just before calling the

algorithm Apply), and then apply the algorithm described in Section 3.3.2 (based on the relation ≻{}),

to derive the final bucket order. The steps are sketched in more detail in Alg. 5.

3.4 Multi-Facet Preferences

Here we describe the case where we have actions that concern more than one facets. The user can define

separately a preference for each facet (using one or more actions) and then compose them using Priority

or Pareto (Pareto Optimal is a subcase of Pareto) operators, or a composition of the previous operators.

Page 84: Papadakos PhD 2013

58 Chapter 3. A Preference Framework for Multidimensional Information Spaces

Algorithm 5 PrefOrderSetValued(E, B, Policy)Input: the set of elements E (E is a family of sets), the set of actions B, and Policy for inactive elementsOutput: a bucket order L′ over E

1: // As in Alg. 4:2: Compute the scopes of the actions in B and form (B,⊑)3: Use (B,⊑) to compute the active scopes of the actions in B4: Use the active scopes to expand the set B to a set B′5: (B,W,R≻)← Parse(B′)6: // As in Alg. 3:7: Rbw ← {(b, w) | b ∈ B,w ∈W}8: R← Rbw ∪R≻9: R← Closuretransitivity(R) // Addition of the transitively induced links

10: Compute≻{} based on wins and support as in Alg. 311: L← SourceRemoval(≻{})12: I ← E \ dom(≻{}) // I is the set of inactive elements13: L′ ← addInactiveElements(L, I, Policy)14: return L′

3.4.1 Prioritized Composition

Prioritized composition (Kießling (2002)) of two preference relations P1 and P2, denoted by P1 ▷ P2,

meaning that P1 has more priority than P2, is defined as:

x ≻P1▷P2 y iff x1 ≻P1 y1 ∨ (x1 = y1 ∧ x2 ≻P2 y2)

Let Bi and Bj be two sets of object-scoped actions. Suppose the user has defined Bi ▷ Bj , and let

A be the current object set (the focus). The ordering of A with respect to Bi ▷ Bj , is derived by order-

ing each block defined by the preference Bi, using the preferences in Bj . The exact steps are given in

Alg. MFPriority. At Step 1 we derive the blocks defined by the preference Bi. At Step 2 we order the

elements of each block derived from the first step, using the actions in Bj . Finally, at Step 3 we just put

these blocks in the order specified by Step 1.

Let us now denote with o1 ∼ o2 that two objects are indifferent based on the relation R≻, i.e. that

neither o1 ≻ o2 or o2 ≻ o1 holds. A refinement of the indifference relation associated to a preference

relation R≻ is to consider objects o1, o2 as equivalent o1 ≈ o25, if o1 ∼ o2 and for all o ∈ Obj such that

o1 ≺ o or o ≺ o1, it is o2 ≺ o or o ≺ o2 respectively and vice verca. If o1 ∼ o2 and o1 ≈ o2 we say that

5Another symbol used for equivalence in the bibliography is ≡.

Page 85: Papadakos PhD 2013

3.4. Multi-Facet Preferences 59

Algorithm 6 MFPriority(A, Bi, Bj)Input: the objects of current focus A, the actions Bi for facet Fi, and the actions Bj for facet Fj

Output: a bucket order L of A corresponding to Bi ▷ Bj1: We call the Alg. PrefOrder(A,Bi) and let L = ⟨A1, . . . AM ⟩ be the produced bucket order where

M is the number of blocks returned.2: For each block Am of L (1 ≤ m ≤ M ) where |Am| > 1, we call PrefOrder(Am,Bj), returning a

bucket order Lm = ⟨Am1, . . . , Amz⟩.3: We replace each block Am of L with its bucket order Lm and this yields the final bucket order L =⟨L1, . . . , LM ⟩.

objects o1 and o2 are incomparable Ciaccia and Torlone (2011).

It is easy to see that the produced bucket order interprets prioritized composition (▷) as:

x ≻P1▷P2 y iff x1 ≻P1 y1 ∨ (x1 ∼P1 y1 ∧ x2 ≻P2 y2)

where x1 ∼P1 y1 means that x1 and y1 are in the same block in the bucket order produced by P1. This

means that the relative ordering of the blocks defined by P1 is preserved, and this policy is aligned with

what the user expects to see. This is the prioritized composition described in Chomicki (2003), which is

referred to as triangle composition in Ross (2007).

A refinement of the above is to use equivalence ≈ instead of indifference (∼) (see Section 3.2). This

refinement can be made, since in our case if o1 ∼ o2 and for all o ∈ Obj such that o1 ≺ o or o ≺ o1, it

is o2 ≺ o or o ≺ o2 respectively and vice verca (our algorithms provide a bucket order for all elements,

since they also consider inactive elements). As a result, the produced bucket order interprets prioritized

composition (▷) as follows:

x ≻P1▷P2 y iff x1 ≻P1 y1 ∨ (x1 ≈P1 y1 ∧ x2 ≻P2 y2)

The above algorithm can be straightforwardly generalized to more than two facets. For example

assume that the user has defined:

BLoc ▷ BManuf ▷ Bprice

Moreover, assume the actionsBLoc = {Best(Crete),Worst(Chania)},BManuf ={Best(European),

Worst(Italian), Best(Ferrari)} andBPrice= {pricemin}, and suppose that the current focusA con-

sists of the following tuples:

Page 86: Papadakos PhD 2013

60 Chapter 3. A Preference Framework for Multidimensional Information Spaces

Id Location Manuf. Price ...

L Heraklion Lancia 10 ...

B Chania BMW 20 ...

A1 Athens Audi 20 ...

A2 Athens Audi 21 ...

F1 Heraklion Ferrari 100 ...

F2 Rethymno Ferrari 80 ...

The constituent and final bucket orders are shown below (for the composed preferences we use nest-

ing to make clear how each block was derived):

LBLoc= ⟨{L,F1, F2}, {B}, {A1, A2}⟩

LBManuf= ⟨{B,A1, A2, F1, F2}, {L}⟩

LBPrice= ⟨{L}, {B,A1}, {A2}, {F2}, {F1}⟩

LBLoc▷BManuf= ⟨⟨{F1, F2}, {L}⟩{B}, {A1, A2}⟩

LBLoc▷BManuf▷BPrice= ⟨⟨⟨{F2}, {F1}⟩{L}⟩{B}, ⟨{A1}, {A2}⟩⟩

= ⟨F2, F1, L,B,A1, A2⟩

Note that the above specified prioritized composition method (and algorithm) does not adopt the

ceteris paribus semantics, since it does not require equality of values. Higher priority implies preference

over all other attributes, and therefore it adopts the totalitarian semantics. Totalitarian semantics are too

strong and can lead to cyclic preferences when several comparative preference statements are dealt with

(Neves and Kaci (2010)). In our framework we always compose facets either by Priority or Pareto compo-

sition or a combination of them to avoid this kind of cycles. We can even assume a default behaviour of

automatic facet priority driftage, based on the interaction of the user with the facets. The second prior-

itized facet assumes totalitarian semantics for each block of the bucket order returned by ordering the

elements based on the most prioritized facet, the third prioritized facet assumes totalitarian semantics

per sub-block of the previous bucket order, etc. For example in the previous case, since the Location facet

is prioritized over the Manufacturer facet, and Heraklion ≻ Chania, the Lancia will be preferred to the

BMW, although Italian cars are not preferred over other European cars.

Page 87: Papadakos PhD 2013

3.4. Multi-Facet Preferences 61

3.4.2 Pareto Composition and BMO-set

The Pareto composition (Kießling (2002)) assumes that the preferences expressed over different facets are

equally important. Typically, the Pareto composition of two preference relations P1 and P2, denoted by

P1⊗ P2, is defined as:

x ≻P1⊗P2 y iff (x1 ≻P1 y1 ∧ (x2 = y2 ∨ x2 ≻P2 y2)) ∨ (x2 ≻P2 y2 ∧ (x1 = y1 ∨ x1 ≻P1 y1))

The winnow (Chomicki (2003)) operator or Pareto optimal or Best operator (Torlone and Ciaccia (2002)),

selects the maximal elements of the preference order defined using the Pareto composition (i.e. BMO-

set). There are many algorithms for the winnow operator, like BNL described in Börzsönyi et al. (2001)

or SFS described in Chomicki et al. (2003). The winnow operator is also implicit in skyline queries, which

supports only LOWEST and HIGHEST preferences based on the Preference Algebra described in Kießling

(2002). Methods for calculating skylines over partially ordered data have also started to emerge as in

Zhang et al. (2010).

Let Bi and Bj be two sets of object-scoped actions. Suppose the user has defined Bi ⊗ Bj , and let

A be the current object set (the focus). Lets denote with ABMO the BMO-set of the focus A. The exact

steps for computing the Pareto are given in Alg. MFPareto.

Algorithm 7 MFPareto(A, Bi, Bj)Input: the objects of current focus A, the actions Bi for facet Fi, and the actions Bj for facet Fj

Output: a bucket order L of A corresponding to Bi ⊗ Bj1: We call the Alg. PrefOrder(A,Bi) and Alg. PrefOrder(A,Bj) for facets Fi and Fj and let Li =⟨Ai1, . . . Aim⟩ be the produced bucket order for facetAi andLj = ⟨Aj1, . . . Ajn⟩ for facetAj , wherem and n is the number of blocks returned for each facet resp.

2: while the bucket orders Bi and Bj are not empty do3: Get the maximal elements of each bucket order, i.e. Aimax and Ajmax

4: Check which objects in the bucket ordersAimax andAjmax are not dominated by other objects intheAj andAi bucket orders respectively. These objects belong to the current BMO-setABMOcurrent

5: Append ABMOcurrent to returned bucket order L6: Remove objects in ABMOcurrent from bucket orders Bi and Bj7: return Bucket order L

Initially, we derive the bucket orders defined by the preference actions Bi and Bj . Then we get the

maximal elements from each bucket order (i.e. the objects of the current BMO-set are included in them)

and test which objects are not dominated by others (by checking the bucket order of the other preference

Page 88: Papadakos PhD 2013

62 Chapter 3. A Preference Framework for Multidimensional Information Spaces

action). These objects are part of the current BMO-set, and are removed from the initial bucket orders

Bi and Bj . Then we continue computing the next BMO-set of the remaining objects. Notice, that if we

are interested only in the Pareto optimal, i.e. winnow operator, we need to find only once the BMO-set.

One can easily see that the produced bucket order interprets Pareto composition (⊗) as:

x ≻P1⊗P2 y iff (x1 ≻P1 y1 ∧ (x2 ∼P2 y2 ∨ x2 ≻P2 y2)) ∨ (x2 ≻P2 y2 ∧ (x1 ∼P1 y1 ∨ x1 ≻P1 y1))

where x1 ∼P1 y1 means that x1 and y1 are in the same block in the bucket order produced by P1. Again,

in our case indifference means equivalence, so the finally produced bucket order is interpreted as:

x ≻P1⊗P2 y iff (x1 ≻P1 y1 ∧ (x2 ≈P2 y2 ∨ x2 ≻P2 y2)) ∨ (x2 ≻P2 y2 ∧ (x1 ≈P1 y1 ∨ x1 ≻P1 y1))

The above algorithm can be straightforwardly generalized to more than two facets. For example,

assume that the user has defined:

BLoc ⊗ BManuf ⊗ Bprice

Moreover, assume the actionsBLoc = {Best(Crete),Worst(Chania)},BManuf = {Best(European),

Worst(Italian), Best(Ferrari)} and BPrice = {price min}, and suppose that the current focus A

consists of the following tuples:

Id Location Manuf. Price ...

L Heraklion Lancia 10 ...

B Chania BMW 20 ...

A1 Athens Audi 20 ...

A2 Athens Audi 21 ...

F1 Heraklion Ferrari 100 ...

F2 Rethymno Ferrari 80 ...

The constituent bucket orders are shown below:

LBLoc= ⟨{L,F1, F2}, {B}, {A1, A2}⟩

LBManuf= ⟨{B,A1, A2, F1, F2}, {L}⟩

LBPrice= ⟨{L}, {B,A1}, {A2}, {F2}, {F1}⟩

Page 89: Papadakos PhD 2013

3.4. Multi-Facet Preferences 63

From the above we can see that A1 dominates A2, since A1 is less expensive and has the same pref-erence regarding Location and Manufacturer. In addition, A1 is dominated by B, since they have the sameprice and the manufacturer is equally preferred, but B is located in Chania, which is preferred over theinactive Athens. Finally, F1 is dominated by F2, since F2 is less expensive. Then, the final bucket orderreturned by the algorithm is:

LBLoc⊗BManuf⊗BPrice= ⟨{L,F2}, {F1, B}, {A1}, {A2}⟩

The Pareto optimal set (i.e. the result of the winnow operator or skyline operator), is ABMO =

{L,F2}, which is the maximal element of the above bucket order.

Pareto composition assumes Ceteris paribus semantics. Recall that Ceteris paribus semantics means

that if o1 ≻ o2 for a specific attribute, I prefer o1 to o2 considering that for all the other attributes o1

and o2 are “equal”, (i.e. objects are equivalent). Furthermore, we expand the Ceteris paribus semantics by

accepting that if o1 ≻ o2 for a specific attributeattr1, o1 and o2 are at least “equal” for all other attributes.

So we also accept o1 to be preferred to o2 for another attribute attrN instead of being “equal”. Here we

assume that “all else equal” is captured by the o1 ≈ o2 (equivalence) operator (i.e. o1 and o2 are in the

same bucket for a specific attribute).

For example in the previous case, A1 is preferred to A2 since it is less expensive and all the rest

attributes are the same. Furthermore, L is preferred to F2, since Heraklion ≻ Rethymnon, and L is

less expensive than F2.

3.4.3 Combination of Priority and Pareto Compositions

In addition, we can provide combinations of the Priority and Pareto compositions. For example assume

that L1 is the bucket order returned by Alg. MFPriority (i.e. we have a composition of type P11 ▷

P12▷ ...▷ P1k ) and that L2 is the bucket order returned by Alg. MFPareto (i.e. composition of type

P21⊗P22⊗...⊗P2l ). Then, we can combine the previous bucket orders, using either Priority or Pareto

composition, by calling Alg. MFPriority or MFPareto resp (combining also their semantics). There are

works like the one described in Neves and Kaci (2010) that provide a combination of Priority and Ceteris

paribus semantics. In this case instead of calculating the bucket orders (the first step of each algorithm)

we can pass the already computed buckets L1 and L2 to the appropriate algorithm.

Page 90: Papadakos PhD 2013

64 Chapter 3. A Preference Framework for Multidimensional Information Spaces

In this way we can calculate Priority compositions of the type:

(P11⊗ P12⊗ ...⊗ P1k)▷ (P21▷ P22▷ ...▷ P2l)

(P11▷ P12▷ ...▷ P1k)▷ (P21⊗ P22⊗ ...⊗ P2l)

(P11⊗ P12⊗ ...⊗ P1k)▷ (P21⊗ P22⊗ ...⊗ P2l)

Respectively we can calculate Pareto compositions of the type:

(P11▷ P12▷ ...▷ P1k)⊗ (P21▷ P22▷ ...▷ P2l)

(P11▷ P12▷ ...▷ P1k)⊗ (P21⊗ P22⊗ ...⊗ P2l)

(P11⊗ P12⊗ ...⊗ P1k)⊗ (P21▷ P22▷ ...▷ P2l)

Compositions of the type:

(P11▷ P12▷ ...▷ P1k)▷ (P21▷ P22▷ ...▷ P2l)

(P11⊗ P12⊗ ...⊗ P1k)⊗ (P21⊗ P22⊗ ...⊗ P2l)

would return equivalent results as compositions:(P11▷ P12▷ ...▷ P1k ▷ P21▷ P22▷ ...▷ P2l)

(P11⊗ P12⊗ ...⊗ P1k ⊗ P21⊗ P22⊗ ...⊗ P2l)

respectilve, since according to Kießling (2002) Priority and Pareto compositions are associative.As an example, assume that the user has defined:

(BLoc ▷ BManuf )⊗ Bprice

Moreover, assume the actionsBLoc = {Best(Crete),Worst(Chania)},BManuf ={Best(European),

Worst(Italian), Best(Ferrari)} and BPrice = {price min}. Finally suppose that the current focus A

consists again of the same following tuples:

Page 91: Papadakos PhD 2013

3.5. A Complete Example 65

Id Location Manuf. Price ...

L Heraklion Lancia 10 ...

B Chania BMW 20 ...

A1 Athens Audi 20 ...

A2 Athens Audi 21 ...

F1 Heraklion Ferrari 100 ...

F2 Rethymno Ferrari 80 ...

The constituent and final bucket orders are shown below.

LBLoc= ⟨{L,F1, F2}, {B}, {A1, A2}⟩

LBManuf= ⟨{B,A1, A2, F1, F2}, {L}⟩

LBPrice= ⟨{L}, {B,A1}, {A2}, {F2}, {F1}⟩

LBLoc▷BManuf= ⟨⟨{F1, F2}, {L}⟩{B}, {A1, A2}⟩

L(BLoc▷BManuf )⊗BPrice= ⟨{L,F2}, {F1, B}, {A1}, {A2}⟩

In case the user had defined the opposite combination:

(BLoc ⊗ BManuf )▷ Bprice

then the constituent and final bucket orders would be:

LBLoc⊗BManuf= ⟨{F1, F2}, {B}, {A2, A1}, {L}⟩

LBPrice= ⟨{L}, {B,A1}, {A2}, {F2}, {F1}⟩

L(BLoc⊗BManuf )▷BPrice= ⟨{F2}, {F1}, {B}, {A1}, {A2}, {L}⟩

3.5 A Complete Example

This section provides a complete example for making more clear the semantics of preferences state-

ments. Consider the following set of preference actions:

b1: Best(Europe)

b2: Worst(Italian)

Page 92: Papadakos PhD 2013

66 Chapter 3. A Preference Framework for Multidimensional Information Spaces

b3: Porsche≻ Ferrari

b4: Fiat≻ Korean

b5: Japanese≻ French

The scope-based ordering of actions are shown in Fig. 3.13, where the left diagram concerns the

best/wost actions, while the right one concerns the relative preference actions. The scopes and active

scopes of the actions are shown in Table 3.4.

Figure 3.13:Scope Based Ordering of Actions (Left for Best/Worst Actions, Right for RelativePreference Actions): Complete Example

It follows, that Alg. Apply will be called with the following parameters:

Param Param value

B European, German, Audi, Bmw, Porsche, French, Citroen,

Peugeot

W Italian, Lancia, Ferrari, Fiat, Lamborghini

R≻ Porsche ≻ Ferrari, Fiat ≻ Korean, Fiat ≻ Kia, Japanese ≻

French, Japanese ≻ Citroen, Japanese ≻ Peugeot, Toyota ≻

French, Toyota ≻ Citroen, Toyota ≻ Peugeot, Lexus ≻ French,

Lexus ≻ Citroen, Lexus ≻ Peugeot

The diagram of Rbw is shown in Figure 3.14, while the diagram of R≻ is shown in Figure 3.15. The

final diagram of R is shown in Figure 3.16. For reasons of space names are abbreviated.

The returned bucket order, assuming all these actions are term-scoped is:

⟨{E,G,A,B, Po, J, T, Lx}, {Fr,C, Pe}, {I, Lmb, Fe, F i, La}{Ko,Ki}⟩

The bucket order over the leaves of the taxonomy (i.e. car manufacturers) is:

Page 93: Papadakos PhD 2013

3.5. A Complete Example 67

Table 3.4: Complete Example: Scopes and Active Scopes

action scope / expansion active scopeb1: European, German, Audi, BMW ,

Porsche, French, Citroen, Peugeot,Italian, Lancia, Ferrari, Fiat,Lamborghini

European, German, Audi, Bmw,Porsche, French, Citroen,Peugeot

b2: Italian, Lancia, Ferrari, Fiat,Lamborghini

Italian, Lancia, Ferrari, Fiat,Lamborghini

b3: Porsche ≻ Ferrari Porsche ≻ Ferrarib4: Fiat ≻ Korean, Fiat ≻ Kia Fiat ≻ Korean, Fiat ≻ Kiab5: Japanese ≻ French, Japanese ≻

Citroen, Japanese ≻ Peugeot, Toyota ≻French, Toyota ≻ Citroen, Toyota ≻Peugeot, Lexus ≻ French, Lexus ≻Citroen, Lexus ≻ Peugeot

Japanese ≻ French, Japanese ≻Citroen, Japanese ≻ Peugeot,Toyota ≻ French, Toyota ≻Citroen, Toyota ≻ Peugeot,Lexus ≻ French, Lexus ≻Citroen, Lexus ≻ Peugeot

Figure 3.14: Hasse Diagram for the Relation Rbw: Complete Example

Figure 3.15: Hasse Diagram for the Relation R≻: Complete Example

⟨{A,B, Po, T, Lx}, {C,Pe}, {Lmb, Fe, F i, La}{Ki}⟩

Now suppose an object-relational database (i.e. a database that supports multi-valued attributes)

containing the following tuples shown in Table 3.5.

Page 94: Papadakos PhD 2013

68 Chapter 3. A Preference Framework for Multidimensional Information Spaces

Figure 3.16: Hasse Diagram for the Relation R: Complete Example

Table 3.5: Tuples in Database: Complete Example

Id Manufacturer Price AccessoriesC Citroen 10 {DVD}B BMW 20 {ABS, AT}A1 Audi 20 {ABS, MT, DVD}A2 Audi 21 {ABS}F1 Ferrari 100 {ESP, MT}F2 Ferrari 80 {ESP, ABS, MT}P Porsche 150 {ESP}F3 Fiat 5 {}K Kia 12 {DVD}T Toyota 20 {ABS, AT, ESP, DVD}

Then, if we apply the manufacturers ordering to the specific objects in the table, we get:

LBManuf.= ⟨{B,A1, A2, P1, T}, {C}, {F1, F2, F3}, {K}⟩

Now consider the following three preference actions over the attribute Accessories:

Page 95: Papadakos PhD 2013

3.5. A Complete Example 69

b1: Best(ABS)

b2: Worst(DVD)

b3: AT≻MT

These actions define the following preference relation R:

ABS AT

| |

DVD MT

Suppose that we have to order the values that appear in the attribute Accessories of the tuples inTable 3.5 according to preference, i.e. we want to order the set:

{ {}, {DVD}, {ABS}, {ESP}, {ABS,AT}, {ESP,MT},

{ABS,MT,DV D}, {ESP,ABS,MT}, {ABS,AT,ESP,DV D} }

w(s, s′)/w(s′, s) { } {ABS} {DVD} {ESP} {ABS, AT} {ESP, MT} {ABS,MT,DVD} {ESP, ABS, MT} {ABS, AT, ESP, DVD} all

{} 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0

{ABS} 0/0 0/0 1/0 0/0 0/0 0/0 1/0 0/0 1/0 3/0

{DVD} 0/0 0/1 0/0 0/0 0/1 0/0 0/1 0/1 0/1 0/5

{ESP} 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0

{ABS,AT} 0/0 0/0 1/0 0/0 0/0 1/0 2/0 1/0 1/0 5/0

{ESP,MT} 0/0 0/0 0/0 0/0 0/1 0/0 0/0 0/0 0/1 0/2

{ABS, MT, DVD} 0/0 0/1 1/0 0/0 0/2 0/0 1/1 0/1 1/2 1/4

{ESP,ABS,MT} 0/0 0/0 1/0 0/0 0/1 0/0 1/0 0/0 2/1 4/2

{ABS,AT,ESP,DVD} 0/0 0/1 1/0 0/0 0/1 1/0 2/1 1/2 1/1 3/3

The ordering of these values according to MoreWins rule (i.e. Def. 3 in Section 3.3.2) is shown in the

Hasse diagram of Figure 3.17. We can resolve the ties by using the MoreGoodLessBad rule (i.e. Def. 4).

Specifically, Support({}) = −1, Support({ABS}) = 0, Support({DVD}) = −1, Support({ESP})

= −1,Support({ABS,AT})= 0,Support({ESP,MT})= −2,Support({ABS,MT,DV D})= −2,

Support({ESP,ABS,MT}) = −2, and finally Support({ABS,AT,ESP,DV D}) = −2.

As a result for the empty set {} we have {} ≻ {ESP,MT}, {} ≻ {ABS,MT,DV D}, {} ≻

{ESP,ABS,MT}, {} ≻ {ABS,AT,ESP,DV D}. For {ABS} we have {ABS} ≻ {}, {ABS} ≻

{ESP}, {ABS} ≻ {ESP,MT} and {ABS} ≻ {ESP,ABS,MT}, while for {DVD} , {DVD} ≻

{ESP,MT}. Finally, regarding {ABS,AT}, {ABS,AT} ≻ {ESP}, while for {ESP}, {ESP} ≻

Page 96: Papadakos PhD 2013

70 Chapter 3. A Preference Framework for Multidimensional Information Spaces

Figure 3.17:Hasse Diagram for Ordering Ordering Multi-Valued Attributes According toMoreWins Rule: Complete Example

{ESP,MT}, {ESP} ≻ {ABS,MT,DV D}, {ESP} ≻ {ESP,ABS,MT} and finally {ESP} ≻

{ABS,AT,ESP,DV D}.

The ordering of these values according to MoreWins rule (i.e. Def. 3 in Section 3.3.2) is shown in the

Hasse diagram of Figure 3.18.

Figure 3.18:Hasse Diagram for Ordering Multi-Valued Attributes According to MoreGoodLess-Bad Rule: Complete Example

Page 97: Papadakos PhD 2013

3.5. A Complete Example 71

After running topological sorting we get the following final bucket order over the sets

⟨ {{ABS}, {ABS,AT}}, {{ESP}, {}},

{{ABS,AT,ESP,DV D}, {ESP,ABS,MT}},

{ABS,MT,DV D}, {DVD}, {ESP,MT} ⟩

If we assume the tuples of the Table 3.5 , the expressed preference actions are object scoped, thenthe final bucket ordering is:

LBAccess.= ⟨{A2, B}, {P1, F3}, {T, F2}, {A1}, {K,C}, {F1}⟩

Suppose that we also want cars to be sorted according to their price in ascending order, i.e. the orderof the cars in Table 3.5 is

LBPrice= ⟨{F3}, {C}, {K}, {T,B,A1}, {A2}, {F2}, {F1}, {P1}⟩

Now consider that (BManufacturer⊗BPrice)▷BAccessories. As a result, according to previous bucketorders we have:

LBManuf.⊗BPrice= ⟨{F3, B,A1, T}, {C,A2}, {K,P1}, {F2}, {F1}⟩

and

L(BManuf.⊗BPrice)▷BAccessories= ⟨{{B}, {F3}, {T}, {A1}}, {{A2}, {C}}, {{P1}, {K}}, {F2}, {F1}⟩

Page 98: Papadakos PhD 2013

72

Page 99: Papadakos PhD 2013

Chapter 4

Complexity and Optimizations

Contents

4.1 Computational Complexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76

4.2 Optimizations for Deriving the Preference-based Order . . . . . . . . . . . . . . 78

4.2.1 An Algorithm based on the Focal Object Set . . . . . . . . . . . . . . . . . . . . 78

4.2.2 Optimizations for Capturing Set-Valued Attributes and Top-K Requirements . 82

4.3 Optimizations for Multi-Facet Preferences . . . . . . . . . . . . . . . . . . . . . . 85

4.3.1 Prioritized Composition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85

4.3.2 Pareto Composition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86

4.3.3 Combination of Priority and Pareto Compositions . . . . . . . . . . . . . . . . 87

At first (Section 4.1) we discuss the computational complexity of the algorithms presented in the

previous sections. Then at Section 4.2 we introduce more efficient algorithms for object-ordering, while

at Section 4.2.2 we focus on algorithms for set-valued facets which can be used also for evaluating the

top-K elements of the object order. Finally, at Section 4.3 we discuss some optimizations for multi-facet

preferences.

73

Page 100: Papadakos PhD 2013

74 Chapter 4. Complexity and Optimizations

4.1 Computational Complexity

Alg. 1 (Apply)

In the worst case all elements ofE are involved and the most expensive task is that of topological sorting.

The topological sorting is in O(|E| + |R|), thus w.r.t. E we can say that it is in O(|E|2). If the actions

are object-scoped, i.e. E corresponds to Obj, then the complexity of Apply is inO(|Obj|2).

Alg. 3 (ApplyOverFamiliesOfSets)

Suppose E is a set of terms over |Ti|. The computation of the closure at line (3) is in O(|Ti|3). Then

we have to compute |E|2 computation of wins (between all pairs of element of E). Since to compute

wins(s, s′) we need O(|s|2) steps, the cost for computing all wins is in O(|E|2 avgSetSize2) where

avgSetSize is the average size of the sets in E. Note that for some of the pairs we may have to compute

the support of the involved sets. Since for computing the support of one atomic element the cost is |Ti|,

the computation of Support(s) is inO(|s||E|). Altogether, the computation of wins and support is in

O(|Ti|3 + |E|2avgSetSize2).

As regards the size of E, for a facet with |Ti| values we can have at most 2|Ti| sets, therefore |E| ≤

2|Ti|. However |E| cannot be bigger than |A|, therefore we can write |E| ≤ min(2|Ti|, |A|).

Alg. 4 (PrefOrder)

Let us now elaborate on the computational complexity of PrefOrder and suppose that all actions in B

are object-scoped, i.e. E corresponds to Obj. Line 2 requires computing the scopes of all actions in B.

The computation of the scope of an action depends on |Obj|, and the size of the scope can be |Obj| in

size, i.e it is in O(|B| ∗ |Obj|). Line 3 requires |B|2 comparisons of sets, where each set can be |Obj| in

size, i.e. it is inO(|B|2|Obj|). Line 5 requires computing the active scopes and this depends on |B| and

|Obj|, i.e it is inO(|B||Obj|). Line 6 requires firstly to compute the parameters B,W and then to run

Alg. Apply. The cost of the latter is inO(|Obj|2) as discussed earlier. It follows that the overall cost of

Alg. PrefOrder is inO(|Obj|(|Obj|+ |B|2)).

Page 101: Papadakos PhD 2013

4.1. Computational Complexity 75

Alg. 6 (MFPriority)

Consider the algorithm MFPriority of Section 3.4.1. Assume that we have k facets, i.e. we have to order

the elements according to a prioritized composition of actions over each facet (B1,B2, . . . ,Bk). Let us

describe the complexity for only two facets. In that case we have to apply Alg. PrefOrder with cost

O(|Obj|(|Obj| + |Bi|2)), and then for each block of the produced bucket order to call Alg. PrefOrder

with actions |Bj |. The cost of each such call is in O(|Obj|(|Obj| + |Bi|2)). Overall we can say that the

cost isO(|Obj|(|Obj|+ |B|2)), where |B| = |B1|+ ...+ |Bk| . Now the cost of the algorithm for k facets

is inO(|Obj|(k|Obj|+ |B|2)).

Alg. 7 (MFPareto)

Consider the algorithm MFPareto of Section 3.4.2. Assume that we have k facets, i.e. we have to or-

der the elements according to a Pareto composition of actions over each facet (B1,B2, . . . ,Bk). Let us

describe the complexity for only two facets. In that case we have to apply Alg. PrefOrder twice with

cost O(|Obj|(|Obj| + |Bi|2) + |Obj|(|Obj| + |Bj |2)). The set of objects in the two maximal buckets

in the worst case can be the whole set of objects (i.e. |Obj|). Then we have to check for each object in

the maximal blocks of the returned two bucket orders, if they get dominated for any of the two criteria

(as described by preference actions ordering objects). This can be done by running existing skyline algo-

rithms like BNL (Börzsönyi et al. (2001)) which has a cost of O(|Obj|2). In the worst case (i.e. only one

element is not dominated in each run) we have to repeat this for |Obj| objects, i.e. the cost for finding

the Pareto is in O(|Obj|3). Overall we can say that the cost is O(|Obj|(|Obj|2 + |B|2)). Now the cost

of the algorithm for k facets is in O(|Obj|(k|Obj|2 + |B|2)), where |B| = |B1| + ... + |Bk|. If we only

calculate the Pareto Optimal (i.e. the skyline) then the cost is inO(|Obj|(k|Obj|+ |B|2))

Combination of Pareto and Priority

Regarding the combination of Pareto and Priority as described in 3.4.3, the complexity will be in the

worst case inO(|Obj|(k|Obj|2 + |B|2)), which is the complexity of the Pareto (i.e. the most expensive

composition).

Page 102: Papadakos PhD 2013

76 Chapter 4. Complexity and Optimizations

4.2 Optimizations for Deriving the Preference-based Order

Facet and Zoom Point Ordering.

Since the set of facets F = {F1, . . . , Fk} is usually small, the computation of ≻F is not expected to be

expensive and we can use the proposed algorithms straightforwardly. The same is true for ordering the

zoom points of each facet (as |Ti| is usually small). Also note that we do not have to order the entire Ti

but only the “active” terms (i.e. the zoom points Zi(ctx) and ZRi(ctx) as defined in Table 2.1) which

are subsets of Ti.

Object Ordering.

Let us now focus on object ordering. If |Obj| (and thus all |A|’s) is small, then we could again apply the

proposed algorithm straightforwardly. If |Obj| is high then |A| can be high too.

At such cases we propose exploiting the benefits of adopting the FDT approach, i.e. the fast conver-

gence to small results sets with a few clicks. Converge is discussed in detail (and it is quantified) at Section

6.2. This means that an acceptable and feasible policy is to order according to preference the set A only

if |A| is below a given threshold (say a few hundreds). For these reasons, below we present an algorithm,

Alg. PrefOrderOpt, which is an optimized version of Alg. PrefOrder, and whose complexity does not

depend on |Obj|, but only on |A| and |B|, so it can be applied to large information bases. We could call

this algorithm focus-based.

An alternative algorithm which can be beneficial in cases A is large is given in section 4.2.2.

4.2.1 An Algorithm based on the Focal Object Set

Alg. PrefOrderOpt takes as input the set A to be ranked which we can assume that is not big (due to the

fast convergence of FDT). First we present some auxiliary functions and the main idea (ignoring the case

of relative preferences and set-valued attributes), and then the full algorithm.

We can start by the observation that if we have a function that checks whether b1 ⊑ b2 holds, where

b1 and b2 are actions, then we can form the relation⊑. An algorithm that implements such a function, de-

noted by CheckSubScopeOf(b1, b2), is given below. The key point is that we can decide whether b1 ⊑ b2

holds, without having to compute the scopes of these actions. Instead we can base our approach on the def-

inition of action scopes (Table 3.1). Specifically, if the anchor of b1 is not empty, while the one of b2 is

Page 103: Papadakos PhD 2013

4.2. Optimizations for Deriving the Preference-based Order 77

empty (e.g. order all facets lexicographically) then it returns True. The rest cases follow the general

rule: terms are more refined to facets. In case of two term-anchored actions whose terms are ≤-related,

then the actions are⊑-related too (see line 6). If furthermore labeling is used (e.g. Agrawal et al. (1989))

which is good choice in such applications, then the cost of this function is always inO(1).

1: function CheckSubScopeOf(b1, b2): Boolean

2: if (b1.anchor = ϵ) ∧ (b2.anchor = ϵ) then

3: return True

4: if (b1.anchor = ⟨ti⟩) ∧ (b2.anchor = ⟨Fj⟩) then

5: return True

6: if (b1.anchor = ⟨ti⟩) ∧ (b2.anchor = ⟨tj⟩) ∧ (ti ≤ tj) then

7: return True

8: return False

For defining the intended algorithm we also need a boolean function IsInScopeOf(o, b) that returns

True if o belongs to the scope of b. This function can be implemented as follows.

1: function IsInScope(o, b): Boolean

2: if b.scopeType=”object order:” then

3: if b.anchor=”facet Fi” then

4: if D(o) ∩ Ti = ∅ then return True

5: else return False

6: else if b.anchor=”term tj” then

7: if tj ∈ D(o) then return True

8: else return False

9: return False

The main cost of IsInScope(o, b) is the cost required to check whether a term is narrower than

another (line 7 requires checking if tj is broader than a term assigned to o, i.e. if tj ≥ t′j where t′j ∈ D(o)),

so its cost isO(|R≤|)where |R≤| denotes the number of relationships of a taxonomic relation. If labeling

is used (e.g. Agrawal et al. (1989)) then this cost isO(1). Assume now that the average number of terms

that are directly assigned to an object o is denoted with avgD≤. Then the final cost of IsInScope(o, b)

Page 104: Papadakos PhD 2013

78 Chapter 4. Complexity and Optimizations

is inO(avgD≤).

We can now present the optimized version of Alg. PrefOrder, which is Alg. PrefOrderOpt shown

below. It takes as input two parameters, an object setA, and a set of actionsB (the latter is one of the k+2

sets of actions). Part (1) includes the optimized version of lines (2-3) of PrefOrder, and Part (2) includes

the optimized version of lines (5-6) of PrefOrder. We can see that the algorithm never computes the

scope of any action and this is the key point for applying it in large information bases (in the sense that its

computational complexity does not depend on |Obj|). Instead, it checks whether elements of E (recall

that E has been reduced through clicks) belong to the scopes of actions.

Algorithm 8 PrefOrderOpt(E, B, Policy)Input: the set of elements E, the set of actions B, and Policy for inactive elementsOutput: a bucket order over E

1: /** Part (1): Computation of (B,⊑) */2: V isited← ∅3: R⊑ ← ∅ // R⊑ corresponds to⊑4: for each b ∈ B do5: for each b′ ∈ B \ V isited do6: if CheckSubScopeOf(b, b′) then7: R⊑ ← R⊑ ∪ {(b ⊑ b′)}8: else if CheckSubScopeOf(b′, b) then9: R⊑ ← R⊑ ∪ {(b′ ⊑ b)}

10: V isited← V isited ∪ {b}11: endfor12: endfor13:14: /** Part (2): Efficient Computation of Act. Scopes */15: for each b ∈ B do16: C(b)← direct children of b wrt R⊑17: ActiveScope[b]← {e ∈ E | IsInScope(e, b)∧18: (∀c ∈ C(b) it holds IsInScope(e, c) = False)}19: endfor20:21: Use the active scopes to expand the set B to a set B′22: /** Part (3): Derivation of the final bucket order */23: (B,W,R≻)← Parse(B′)24: return Apply(E,B,W,R≻, Policy) // call to Alg. 1

Regarding its complexity, suppose that the taxonomy of each facet is labeled. The cost of the first

part of the algorithm is inO(|B|2). Note that as long the user is not submitting a new action, (B,⊑) can

Page 105: Papadakos PhD 2013

4.2. Optimizations for Deriving the Preference-based Order 79

CheckSubScopeOf : line 6 Let b1.anchor = (ei, ej) and b2.anchor = (e′i, e′j). We

have to write:((ei ≤ e′i) ∧ (ej ≤ e′j)) ∨((ej ≤ e′i) ∧ (ei ≤ e′j))

Alg. PrefOrderOpt : lines(17-18)

ActiveScope[b]←{(e, e′) ∈ E × E | IsInScope(e, e′, b)∧(∀c ∈ C(b) it holds IsInScope(e, e′, c) = False)}

IsInScope(o, o′, b) Let b.anchor = (ti, tj). We have to write:((ti ∈ D(o)) ∧ (tj ∈ D(o′))) ∨((tj ∈ D(o)) ∧ (ti ∈ D(o′)))

Table 4.1:PrefOrderOptChanges for Capturing Relative Preferences Over Hierarchically Or-ganized Values

be preserved and reused when the user is changing his focus (soO(|B|2) is payed once). The second part

of the algorithm has |B| iterations. Assuming labeling, the cost of each iteration is (avgD≤) ∗ |A| ∗ (1 +

avgC⊑) where avgC⊑ is the average number of direct children of an action w.r.t⊑. It follows that the

cost of the second part is (avgD≤) ∗ |B| ∗ (|A| ∗ (1+ avgC⊑)) = (avgD≤) ∗ |B| ∗ |A|+(avgD≤) ∗ |A| ∗

(|B| ∗ (avgC⊑)) = (avgD≤) ∗ |A| ∗ (|B| + | ⊑ |). It holds that | ⊑ | ≤ |B|2, and as a result the cost of

the second part is (avgD≤) ∗ |A| ∗ |B|2. The last part of the algorithm is the cost of Alg. Apply, which

in our context is expressed asO(|A|2).

Changes for Capturing also Relative Preferences

The optimized algorithm Alg. PrefOrderOpt can be easily adapted so that to handle also relative pref-

erences over hierarchically organized values (as defined in Section 3.3.4). Specifically we just have to

make the changes shown at Figure 4.1.

Table 4.2: Complexity for Non-Optimized and Optimized Alg. PrefOrder and PrefOrderOpt

Part Alg. PrefOrder Alg. PrefOrderOpt Alg. PrefOrderOptRelative

Part 1 O(|Obj|(|Obj|+ |B|2)) O(|B|2) O(|B|2)Part 2 O(|Obj||B|) O(|A||B|2avgD≤) O(|A|2|B|2avgD≤)Part 3 O(|Obj|2) O(|A|2) O(|A|2)Total O(|Obj|(|Obj|+ |B|2)) O(|A|(|A|+ |B|2avgD≤)) O(|A|2|B|2avgD≤)

Page 106: Papadakos PhD 2013

80 Chapter 4. Complexity and Optimizations

Regarding complexity, if labeling is available, then CheckSubScopeOf remains in O(1) and as a

result part one remains to O(|B|2). The function IsInScope(o, o′, b) requires 4 checks of the form

tj ∈ D(o′). Again, ifAvgD≤ denotes the average number of terms that are directly assigned to an object

o ∈ Obj, then these checks cost O(AvgD≤) time. In the revised lines 17-18 of Alg. PrefOrderOpt the

cost of each iteration is higher (in place of |A| we now have |A|2). Therefore the cost of the second part

of the algorithm is now inO(|A|2|B|AvgD≤).

Synopsis. Table 4.2 summarizes the complexities for the 3 different parts of the algorithm, for the non-

optimized and optimized version of the algorithm. The key point is that the complexity of the optimized

algorithm is independent of |Obj|.

4.2.2 Optimizations for Capturing Set-Valued Attributes and Top-K Requirements

Here we provide an optimized algorithm for ordering a set of objects A for the case where (i) we have

relative preferences over a facet whose values are hierarchically organized, and (ii) the object descrip-

tions according to that facet are set-valued. The reason for describing this case separately is because

IsInScope was defined without considering set-valued attributes (however note that a plain vanilla

algorithm was given in Sect. 3.3.5).

Let Fi be the facet whose terms are hierarchically organized and suppose that the object descriptions

are set-valued at that facet. We start by assuming that the relation≻{} over sets of terms of facet Fi has

been computed, and obviously this includes inheritance resolution (computation of active scopes), and

computation of the wins and Support if needed (as we have described in Section 3.3.2). Now the idea of

the algorithm is the following:

(a) for the objects in A collect their descriptions w.r.t. Fi (let Z be this set),

(b) compute the restriction of≻{} on Z ,

(c) apply topological sorting on Z based on≻{}, and

(d) from the blocks of Z derive the blocks of the objects.

The exact steps of the algorithm are given in Alg. 9.

In line (1) we compute Z , which is the family of sets of terms of Fi that occur in A. As stated earlier,

we assume that the relation≻{} over all values that occur in Fi has been defined (as in lines 1-10 of Alg.

3). Now line (2) sets R to be the restriction of≻{} on Z . Subsequently we apply topological sorting and

Page 107: Papadakos PhD 2013

4.2. Optimizations for Deriving the Preference-based Order 81

Algorithm 9 PrefOrderSetValuedOpt(A, ≻{})Input: A, an order≻{} over a set-valued attribute with values in Fi.Output: Ordering of A w.r.t. ≻{}

1: Z = {Di(o) | o ∈ A}2: R =≻{}|Z // restriction of≻{} on Z

3: L← SourceRemoval(R)4: OL← ∅5: for each block b of L do6: for each term set s in b do7: ob = I(s) ∩A // ob is the corresponding object block8: append ob to OL

9: append to OL a block separator10: return OL

we obtain L. Next, we start consuming L starting from the first block. Note that a block can contain

one or more term sets. For each such set s we scan A and let ob be the objects that have this value. The

elements in ob are appended to OL which is the order of objects. This continues until having consumed

all blocks of L.

To compute≻{} we can follow lines (1)-(10) of Alg. 3 and according to section 4.1, the complexity for

this is inO(|Ti|3 + |E|2avgSetSize2). As regards the size of E, for a facet with |Ti| values we can have

at most 2|Ti| sets (i.e. the size of P(Ti)), therefore |E| ≤ 2|Ti|. However |E| cannot be bigger than |A|,

therefore we can write |E| ≤ min(2|Ti|, |A|). One policy is to compute ≻{}|Z when needed. Another is

to compute≻{} over all distinct sets that occur for that facet (and to update it each time the user issues

a preference action that concerns that order), to avoid recomputing it while the user restricts the set A.

At run-time we just have to take its restriction of Z . We favor this policy in the given algorithm.

Generalization and Top-K Algorithm

Note that Alg. 9 essentially corresponds to the following approach: first order the terms and from their

ordering derive the object ordering. Now suppose that we are not in the context of a set-valued attribute. If

instead of passing the parameter ≻{}, we pass an ordering over the values of Ti, then a rising question

is whether we could use this algorithm, instead of Alg. 8, to produce the object order, and in what cases

that algorithm would be beneficial.

Let approach this question from the computational complexity perspective. Suppose the case of rel-

Page 108: Papadakos PhD 2013

82 Chapter 4. Complexity and Optimizations

ative preferences over a Ti. Instead of≻{}, we have to pass as parameter the ordering over the values of

Ti. To compute this ordering we can use Alg. 8 where instead of having to order the objects in A we or-

der the terms of Ti. In this case, and according to Table 4.2, the cost of this step is inO(|Ti|2|B|2avgD≤).

Note that it is not necessary to compute Z or the restriction of the preference relation on Z (lines 1-2 of

Alg. 9), in the sense that the final answer will be correct in any case due to the intersections with A at

line 7. The computation of Z can be beneficial if Z is much smaller than Ti (in that case for less s’s we

will have to compute I(s)). Also note that the way A is defined can be exploited for further optimiza-

tions. For instance, if A has been defined intentionally (by one query), then we may already know the

set Z without having to scan the set A.

Line (3) requires topological sorting whose cost is inO(|Ti|2). The subsequent loop will have at most

|Ti| iterations (in particular |Z|), and the cost of each iteration is that of the operation I(s) ∩ A. The

operation I(s) ∩ A can be implemented in various ways, based on the sizes of the operants (and the

data structures that are in place). E.g. if A is small it is better to scan A to select those objects whose Di

description equals s, and in this case the cost of an operation I(s)∩A is inO(|A|∗avgD≤). On the other

hand if A is big, and I(s) is small, then it is better to compute and scan I(s) and then delete from this

set those elements which are not in A. In this case the cost of an operation I(s)∩A, if we assume direct

access to the elements I(s) and ability to perform binary search for lookups at A, isO(|I(s)| log |A|). It

follows that the cost of the loop is inO(|Ti| ∗min(|A| ∗ avgD≤, |I(s)| ∗ log |A|)).

Overall, the cost of Alg. 9 for single-valued facets, including the cost for computing the preference

relation to be passed to this algorithm, is in O(|Ti|2|B|2 + |Ti| ∗ min(|A| ∗ avgD≤, |I(s)| ∗ log |A|)).

Recall that the cost of Alg. 8 (according to Table 4.2) is inO(|A|2 ∗ |B|2 ∗ avgD≤).

One benefit of Alg. 9 is that it can be more efficient than Alg. 8 if A is large and Ti is small. This

is evident also from their complexities; Alg. 9 will have the costO(|Ti|2|B|2 + |Ti||I(s)| log |A|), while

Alg. 8 will have the costO(|A|2 ∗ |B|2 ∗ avgD≤). Note that cases where |A| can be very large may occur

at application level. For instance, consider the case of a user who has expressed a number of object-

scoped actions, and instead of restrictingA, he would like to directly get the most preferred objects. The

user wants to bypass the information thinning process probably because he believes that his preference

actions are enough for bringing the most desired object to the top positions of the returned answer. Is

it not hard to see that in this scenario, both plain (Alg. 4) and Alg. 8, are prohitively expensive. Alg.

9 will be more efficient, but it will still order the entire Obj. Although, according to our opinion the

Page 109: Papadakos PhD 2013

4.3. Optimizations for Multi-Facet Preferences 83

assumption that the user has expressed a detailed and complete description of his preferences is not

very realistic (recall the discussion at the end of Section 2 and the DiFEPreKO Hypothesis that will be

discussed in Section 6.3), if we want to support such scenarios then a possible direction is to devise an

appropriate top-k algorithm. Top-k algorithms for preference-aware queries have been proposed (e.g.

Georgiadis et al. (2008); Stefanidis et al. (2010); Spyratos et al. (2011)), however they are appropriate for

plain relational sources, meaning that hierarchically organized values or set-valued attributes are not

supported. However notice that Alg. 9 can be slightly changed to become a top-K algorithm. Specifically

we consume blocks of L until OL has reached K objects. With this we complete the discussion of the

main cases where the adoption of Alg. 9 is beneficial.

On the other hand Alg. 8, can be faster than Alg. 9 if the Ti is large in comparison to A (e.g. suppose

Ti is a thesaurus and A is a set of few tens of objects). Also note that another merit of Alg. 8 is that it can

be straightforwardly extended to accommodate object-anchored preference actions, or other multi-facet

preferences, due to its “scope-based” approach.

4.3 Optimizations for Multi-Facet Preferences

4.3.1 Prioritized Composition

According to Section 4.1, the cost of MFPriority (presented at Section 3.4) is in O(|Obj|(k|Obj| +

|B|2)) for k facets. Let us now suppose that in algorithm MFPriority we use PrefOrderOpt instead

of PrefOrder. The cost of MFPriority in that case is in O(|A|(k|A| + |B|2)). Analogously, one could

adopt Alg. 9 and calculate accordingly the complexity of MFPriority.

Now we will introduce an alternative approach for supporting what we call efficient priority-driftage.

We refer to the scenario where the user changes priorities with one click, and we want the new ordering

of objects to appear instantly. To begin with, as long as the user does not submit an action, each (Bi,⊑)

can be kept stored and reused. Now suppose the user is inspecting an answer set A and he changes

facets (i.e. he clicks on one facet) just for changing the priorities. Specifically, suppose the user has al-

ready specifiedBi▷Bj , meaning that both LBiand LBi▷Bj

have already been computed (according to

MFPriority). Suppose that the user now clicks on Fj just for changing the prioritized multi-facet pref-

erence to Bj ▷ Bi. According to the approach presented at Section 3.4, the application of MFPriority

Page 110: Papadakos PhD 2013

84 Chapter 4. Complexity and Optimizations

will first compute LBjand finally it will produce LBj▷Bi

. The key idea of the alternative algorithm is

that we can avoid calling Alg. PrefOrder for each block of LBjat Step 2 of Alg. MFPriority. Specif-

ically we will show that from LBiand LBj

we can compute LBj▷Bi. It can be easily proved that the

first blocks of LBj▷Biis the restriction of LBi

on the objects of the first block, say Aj1, of LBj. This

means that the first blocks of LBj▷Bican be obtained by scanning once LBi

and deleting (skipping)

each object encountered that does not belong to Aj1.

In the example of Section 3.4, the first two blocks of LBLoc▷BManuf(i.e. the blocks {F1, F2}, {L}),

can be obtained by replacing the first block of LBLoc(i.e. {L,F1, F2}) by what is left after scanning

LBManufand ignoring the elements that do not belong to {L,F1, F2}. Since LBManuf

= ⟨{B,A1, A2,

F1, F2},{L}⟩ we will get ⟨{F1, F2}, {L}⟩.

The cost of this approach, and assuming two facets, is inO(|A|2), since we have to scan |A| elements

and for each one of them to perform a lookup to a set that consists of at most |A| elements. If we have

k facets then the cost is in O((k − 1) ∗ |A|2). Notice that its cost is independent of the number of user

preference actions Bi for each facet assuming that the user does not submit new actions. However this

approach requires keeping in memory LB1, · · · , LBk

. Each has at most |A| objects (according to the

suggested scenario), therefore the main memory cost is k ∗ |A| where k is the number of facets.

To summarize, an alternative to Alg. MFPriority policy is to compute and have stored the bucket

order LBifor each 1 ≤ i ≤ k. Then any prioritized composition of these sets of actions can be obtained

by the method just described. The cost of priority driftage in this case does not depend on the number

of preference actions but requires hosting k|A| objects in main memory.

Top-K Prioritized Composition. Now suppose that the user wants (or the user screen has place for)

only the top-P hits where P is a positive integer. We can exploit this constraint to speedup the process.

In particular, from the bucket order of the first in priorityBi, we can get the minimum number of blocks

whose cardinality if summed is greater or equal to P (if this is possible, i.e. if P ≤ |A|). For instance, if

P = 4 in our example then we will get only the first 2 blocks of LBLoc.

4.3.2 Pareto Composition

According to Section 4.1, the cost of MFPriority (presented at Section 3.4) is inO(|Obj|(k|Obj|+ |B|2))

for k facets. Let us now suppose that in algorithm MFParetowe use PrefOrderOpt instead of PrefOrder.

Page 111: Papadakos PhD 2013

4.3. Optimizations for Multi-Facet Preferences 85

The cost of MFPriority in that case is inO(|A|(k|A|2+ |B|2)). Analogously, one could adopt Alg. 9 and

calculate accordingly the complexity of MFPareto.

4.3.3 Combination of Priority and Pareto Compositions

Using the optimizations described in Sections 4.3.1 and 4.3.2, the cost of the combination of the two

algorithms is inO(|A|(k|A|2 + |B|2)).

Page 112: Papadakos PhD 2013

86

Page 113: Papadakos PhD 2013

Chapter 5

Applicability and the System Hippalus

Contents

5.1 Application in Searching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90

5.1.1 Case: Web Searching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91

5.1.2 Case: Relational Databases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94

5.1.3 Case: RDF Bases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96

5.2 Hippalus: A Preference Enriched Faceted Exploratory System . . . . . . . . . . 98

5.2.1 Software Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98

5.2.2 Visualization and User Interface . . . . . . . . . . . . . . . . . . . . . . . . . . 100

5.2.3 Interaction Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102

The objective of this chapter is to elaborate on the feasibility of the proposed interaction and pref-

erence framework over different application domains. Furthermore it presents the design and imple-

mentation of Hippalus a prototype system that realises the preference enriched FDT. In more detail,

Section 5.1 elaborates on the applicability of the proposed approach over an FDT-based WSEs, relational

databases and RDF/S respectively. Finally, Section 5.2 describes the Hippalus system.

87

Page 114: Papadakos PhD 2013

88 Chapter 5. Applicability and the System Hippalus

5.1 Application in Searching

Searching is a process that can be applied over a number of different application domains. Here we

elaborate on the feasibility of the proposed interaction and preference framework over WSEs, relational

databases and RDF/S respectively.

Figure 5.1: Processes of Web Searching and Exploratory Web Searching

The left part of Figure 5.1 shows (with a traditional WSE in mind) how search is performed today.

On the other hand, the right part of Figure 5.1 and Figure 5.2 showcase the proposed approaches for

exploratory and preference based searching. The same processes can be applied over the relational

databases and the Semantic Web domains, by submitting instead of a free text query the appropriate

SQL or SPARQL queries and applying FDT and the proposed preference framework over the results.

Page 115: Papadakos PhD 2013

5.1. Application in Searching 89

Figure 5.2: Process of Exploratory Web Searching Enhanced with Preference Actions

5.1.1 Case: Web Searching

One application domain of the proposed approach is that of Web searching. Commonly, the various static

metadata that are available to a search engine (e.g. domain, language, date, filetype, etc.) are exploited

only through the advanced (form-based) search facilities that some WSEs offer (and users rarely use). An

Page 116: Papadakos PhD 2013

90 Chapter 5. Applicability and the System Hippalus

approach that exploits such metadata by adopting the interaction scheme of FDT exploration was first

proposed and analyzed in Papadakos et al. (2009a). The proposed process for exploratory web searching

is sketched in the right column of Figure 5.1. Specifically the process constitutes of the following steps:

• The user submits a free text query which he assumes that corresponds to his specific information

need

• The system computes a ranked set of pages, documents, items

• Available static metadata are then loaded to the system for these items (i.e. date, language, filetype,

domain, etc.)

• Available small top − K excerpts (i.e. snippets), which can be produced in real-time, are then

computed

• Based on the previously top −K computed snippets, we can mine dynamic metadata, by using a

clustering or entity mining algorithm

• Then the FDT interaction scheme is applied, by calculating for each facet, the corresponding values

and count numbers (i.e. static and dynamic metadata and their values)

• The system visualizes the available information regarding the facets, terms and objects

• The user can explore the information space by restricting his focus

• The can explore the next top−K results, by dynamically mining metadata from the next top−K

snippets

• Finally, if the user is not satisfied with the results returned by the initial query, he submits a new

query

The previous process can be enriched with preference expression over the facets, zoom-points and

objects. Figure 5.2 depicts the process in more detail. The difference with the previous process is that

now the user can express preferences over the visualized facets, zoom-point and objects. Then the sys-

tem computes and presents to the user the most preferred facets, zoom-points and objects, according to

the expressed user preferences.

Page 117: Papadakos PhD 2013

5.1. Application in Searching 91

Since the first two steps of the above process correspond to the query-and-answer interaction that

current WSE offer (which is stateless), what we propose essentially extends and completes the current

interaction.

Note that the FDT interaction scheme has already been implemented over the Mitos WSE1. Figure

5.3 shows the GUI of that engine. This figure shows 5 facets and their values and counts for the user

submitted query “library”. Specifically, one facet is dynamically mined (i.e. By clustering) while the rest

4 facets are based on static metadata (By domain, By date, By filetype, By language). To the best of our

knowledge, there are no other WSEs that offered the same kind of information/interaction at that time.

A complete presentation that also includes an incremental algorithm for speeding up the interaction

and the results of a user study is available to Papadakos et al. (2012a).

Facets

based on

staticmetadata

Facet based on dynamic

metadata extracted from

the top-k resources

A (objects

of focus)

facet

facet

facet

zoom

pointsfacet

Figure 5.3: Mitos GUI for Exploratory Web Searching

Regarding preferences, the default operation mode of Mitos (and of most FDT search engines) is

captured by the following actions:1 Under development by the Department of Computer Science of the University of Crete and FORTH-ICS

(http://groogle.csd.uoc.gr:8080/mitos/).

Page 118: Papadakos PhD 2013

92 Chapter 5. Applicability and the System Hippalus

facets order: lexicographic min

terms order: count max

objects order: Relevance value max

This indicates that the language presented in Section 3 can capture the default behaviour of various

systems. In order to extend the exploratory web searching process with preferences we have to add the

two additional steps of the process depicted in Figure 5.2. This functionality can be provided in the same

way as described later on in Section 5.2.2.

5.1.2 Case: Relational Databases

The interaction scheme of FDT can also be applied over relational databases (i.e. over a single table or

over the results of a query defined by using the query language (SQL)). If we want to explore data that

are not stored in one table, then we can exploit the view mechanism that relational databases provide: a

view is actually a named SQL query, over which other queries can be formulated as if it was a table of

the database. Specifically, we can define a view comprising attributes from different relations (tables)

and its definition may include joins, and various other descriptions. Subsequently, we can apply the FDT

interaction scheme over the contents of this view (i.e. over its answer), by assuming that each attribute

of the view is a facet, and the set of its distinct values that appear in that attribute correspond to the

terms of this facet. The tuples in the answer of that view are the objects. This is the ”straightforward”

approach to apply FDTs and preference-based browsing over relational sources.

Let us now compare this method with Preference SQL at a syntactical level, i.e. the usability of our

method in comparison to using directly preference SQL. Suppose the following table Car:

Id Manufacturer Color

o1 VW Silver

o2 Ferrari Red

o3 Fiat Yellow

o4 BMW Silver

o5 Kia Silver

o6 Lexus Silver

o7 Toyota Silver

o8 Kia Silver

o9 Fiat Red

Page 119: Papadakos PhD 2013

5.1. Application in Searching 93

Consider now a user that wants to buy a car and assume that he prefers European cars to any other

cars. Also he likes Lexus equally to European cars. Finally Fiat and Kia are his least preferred brands.

The above preferences can easily be expressed using our preference language. Specifically, they can

be expressed as preference actions that are anchored to terms, with objects as their scopeType Such a

preference could be expressed using Preference SQL (Kießling et al. (2011a)). Preference SQL returns the

BMO set. i.e. the Pareto optimal. The query could have the following format, assuming that the user

knows all the distinct manufacturers that can be stored in the database:

SELECT * FROM CAR PREFERRING

Manufacturer EXPLICIT ('Kia' < 'Toyota',

'Kia' < 'Ferrari','Kia' < 'Lancia', 'Kia' < 'Citroen',

'Kia' < 'Peugeot', 'Kia' < 'BMW', 'Kia' < 'VW',

'Kia' < 'Lexus',

'Fiat' < 'Toyota', 'Fiat' < 'Honda', 'Fiat' < 'Ferrari',

'Fiat' < 'Lancia', 'Fiat' < 'Citroen', 'Fiat' < 'Peugeot',

'Fiat' < 'BMW', 'Fiat' < 'VW', 'Fiat' < 'Lexus',

'Toyota' < 'Ferrari', 'Toyota' < 'Lancia', 'Toyota' < 'Citroen',

'Toyota' < 'Peugeot', 'Toyota' < 'BMW', 'Toyota' < 'VW',

'Toyota' < 'Lexus')

In this query the user explicitly defines the preference relation over all available manufacturers. An

alternative and simpler query is to provide a layered form of the user preferences:

SELECT * FROM CAR PREFERRING

Manufacturer LAYERED(('Ferrari', 'Lancia', 'Peugeot',

'Citroen', 'BMW', 'VW', 'Lexus'),

('Toyota'), ('Kia', 'Fiat))

These queries will return the following bucket order ⟨{o1, o2, o4, o6}⟩. The first query is too com-

plex for a plain user and presupposes knowledge of the schema and the values stored in the database.

The second query is simpler, but again presupposes that the user is able to construct the appropriate

layers of the preference relation and that he knows the available stored values and database schema.

Furthermore, such a query must be given in one shot. If the user changes his preferences over time or

by exploring the available objects, then he must submit a reformulated query.

If the user was exploring this table using the interaction scheme of FDT he would get the facets and

zoom-points shown in Figure 5.4. Subsequently, he would be able to express his preferences interactively

Page 120: Papadakos PhD 2013

94 Chapter 5. Applicability and the System Hippalus

Figure 5.4: Facets and Zoom-Points of Running Example

(by clicking on values and selecting the desired action). Furthermore, he would express his preferences

gradually, until the point that he gets a list of results that satisfies him.

5.1.3 Case: RDF Bases

Recently, the amount of data published on the public Semantic Web has exploded, especially in the form

of Linked Data2 (Bizer et al. (2009)). Specifically, by September 2011 available datasets had grown to 31

billion RDF triples, interlinked by around 504 million RDF links3. The interaction scheme of FDT can also

be applied over the Semantic Web and there are already several browsers that provide FDT over RDF.

Examples include /facet (Hildebrand et al. (2006)), Ontogator (Mäkelä et al. (2006)) and BrowseRDF (Oren

et al. (2006)).

RDF/S sources actually follow a structurally object-oriented model. The structuring of information

assumed by FDTs is simpler: objects described by attributes whose values may be hierarchically orga-

nized, meaning that associations between objects of the same or different types are not assumed. This

implies that for applying FDT over RDF/S sources one has to decide the part (in its original native form

or transformed) of the RDF/S source that should be explored according to FDT.

One way for this is to follow the approach described for relational sources. Specifically, we can spec-

ify the desired part (or the desired transformation) by a SPARQL query. Then we can apply FDT over the

results of this query. Since the structure of the results is actually a relational table, we can apply FDT

exploration as in relational tables.

2Wikipedia defines Linked Data as a term used to describe a recommended best practice for exposing, sharing, and connect-ing pieces of data, information, and knowledge on the Semantic Web using URIs and RDF

3http://en.wikipedia.org/wiki/Linked_data

Page 121: Papadakos PhD 2013

5.1. Application in Searching 95

An alternative way, that does not use a query but instead specifies the part of the source that should

be explored is described below. Moreover this method can exploit the subClassOf relationships. Specifi-

cally, subClassOf relationships are treated as hierarchically organized values. In this case, the objects of

interest (i.e. the set Obj) can be defined by selecting one class of the source: all direct and indirect in-

stances of this class constitute the set Obj. For instance, assuming the case of Fig 5.5 we can define that

the objects of interest are the instances of the class Vehicle. As facets we can consider the properties

that start or point to the above class. Moreover the class hierarchies (of Vehicle, Location, Manufacturer)

are exploited. Specifically, in this example it is like having three facets: type whose values are the hierar-

chy of Vehicle, madeBy whose values are the instances of Manufacturer, organized hierarchically through

the subclasses of that class, and locatedIn whose values are the instances of Location, organized hierarchi-

cally through the subclasses of that class.

More expressive exploration models which exploit the full structuring of RDF/S sources (even its

fuzzy extensions (Manolis and Tzitzikas (2011))), go beyond the scope of this work.

Figure 5.5: Example of RDF/S

Page 122: Papadakos PhD 2013

96 Chapter 5. Applicability and the System Hippalus

5.2 Hippalus: A Preference Enriched Faceted Exploratory System

To demonstrate the feasibility of our approach and for identifying possible difficulties or other issues

related to implementation and application, we have designed and implemented a proof of concept pro-

totype, named Hippalus. The logo of Hippalus is a ancient greek boat with the≻ symbol of preferences

as sail�4. This system was used for the user study described in Section 6.5.

5.2.1 Software Architecture

Instead of starting from scratch, we have decided to design and build Hippalus over RDF/S sources

and RDF/S managing software. Specifically, we have implemented the proposed preference framework

over a prototype for browsing and exploring RDF sources5 based on the model described in Manolis and

Tzitzikas (2011), apart from the aspect of fuzzyness. Hippalus uses Jena6, which is a Java framework

for building Semantic Web applications. The architecture of the system and its components is given in

Figure 5.6.

The user submits his preferences through HTML5 context menus, which are then translated to state-

ments of the preference language described in Section 3.1. These statements are then send to the servlet-

based server, through HTTP requests. The server checks the validity of the received requests and analyze

them using a parser of the preference language described in Chapter sec:IPS. If the action is valid, it is

passed as input to the appropriate preference algorithm (as described in Chapter 3 and 4). To query

the underlying RDF information base, we use Jena through a Data Manager component, for abstracting

the details of this particular component. Finally, the computed preference relation and therefore the

preference bucket order is send to the State component, which in turn updates the UI through an HTTP

response.

4Hippalus was a Greek navigator and merchant who probably lived in the 1st century BCE. He is credited to have discoveredthe direct route from the Red Sea to India over the Indian Ocean by plotting the scheme of the sea and the correct location ofthe trade ports along the Indian coast, and by taking advantage of the monsoon wind.

5 The information base that feeds Hippalus is represented in RDF/S, using a schema adequate for representing objectsdescribed according to dimensions with hierarchically organized values.

6http://jena.apache.org/

Page 123: Papadakos PhD 2013

5.2. Hippalus: A Preference Enriched Faceted Exploratory System 97

Figure 5.6: System Architecture

Manufacturer Drive_System Vehicle_Type Transmission

European American

U.K. U.S.A.

1 8353

Aston_Martin J eep

Car Truck Manual Semi-automatic2-Wheel_Drive All-Wheel_Drive

2-Wheel_Drive, Rear

Figure 5.7: The RDF Knowledge Base

Page 124: Papadakos PhD 2013

98 Chapter 5. Applicability and the System Hippalus

5.2.2 Visualization and User Interface

Regarding visualization and FDT one has to decide where and how to visualize: the focus (current object

set), the facets, the zoom points (and their count information), the intentional description of the current

state, and finally the information related to preferences. These are the main decisions.

The most widely adopted approach or policy (evidenced by the UI design of global systems like book-

ing.com), is to use a left bar for the facets and the corresponding zoom points, the right area for the

scrollable list of objects in the focus, and a top small area for the description of the current state. For

each of these elements, various visual elements can be used. A thorough description is available at Chap-

ter 4 of Sacco and Tzitzikas (2009) book.

In our case we have to decide where to show the preference-related information and actions, since

this has not been supported by any system so far. Regarding preference actions, one approach is to

provide the preference-related action through right-click activated pop-up menus. This policy does

not require allocating permanent screen space for these actions. However the user should be aware

that these options exist. The design of the preference actions, includes actions that are anchored to

one element, and this makes the right click activated actions straightforward. However, the proposed

preference based framework also supports actions that concern two elements (i.e. relative preferences

like German ≻ Italian).

Regarding the way the description of the current state is shown to the user, the user should be able to

view not only the intentional description of his current state, but also the accumulated preferences that

he has formulated. Finally, the user should be able to store and load his preferences, since exploration

is a time depth process.

Based on the above requirements, we have designed a Web application that offers exploration ser-

vices for a set of objects described in using several dimensions, where all this information is represented

in RDF. In this case we map facets and terms from FDT to classes and subclasses respectively. The pref-

erence actions are offered through HTML 5 context menus7 and AJAX, which are enacted by right clicking

in the browser window. The user is able to order classes, subclasses and objects using best, worst, pre-

fer to actions (i.e. relative preferences), around to actions (over a specific value), or actions that order

them lexicographically, or based on their values or count values. Furthermore, he can compose object

7Available only to firefox 8 and up.

Page 125: Papadakos PhD 2013

5.2. Hippalus: A Preference Enriched Faceted Exploratory System 99

Figure 5.8:Hippalus: The Main Page of Hippalus. (a) Shows the Area where Facets andTerms are Displayed, (b) the Ranked Objects Area, (c) the Preference Actions His-tory and Composition Tool, (d) ‘Interesting Objects’ Tool (i.e. Like a ShoppingCart) and (e) the Object Restriction History

related preferences, using Priority, Pareto, Pareto Optimal, and Combination8 compositions, by selecting the

appropriate composition mode and selecting classes through the classes’ context menus. The default

composition is Combination. Regarding objects, since their number can be very large, the user is able to

define a threshold, so that preferences are applied only when the number of objects is reduced under this

threshold9. Options and parameters regarding the system functionality can be set through a drop-down

8Order according to priorities if defined. The rest actions use Pareto composition and are the least prioritized.9 The user can reduce the number of objects (simple menus support only actions affecting objects), by navigating over the

classes, subclasses, and objects and restricting his focus.

Page 126: Papadakos PhD 2013

100 Chapter 5. Applicability and the System Hippalus

menu (i.e. simple or full support of preference menus, threshold, evaluation parameters, load-save etc.)

5.2.3 Interaction Example

For demonstration purposes as well as for the needs of the user study (described in detail at section 6.5)

we have constructed an information base about cars. Each is described using classes like Manufacturer

and Drive_System, which are hierarchically organized, while the rest like Vehicle_Type are flat (as

shown in Fig. 5.7). In this figure, continuous arrows denote subClassOf relationships while dashed arrows

denote typeOf relationships. The information base contains 50 cars, indexed under 23 classes and 85

subclasses.

Here we describe a more complete scenario demonstrating how hard and soft constraints can be

specified by the user, in an easy and gradual manner. It also aims at making clear the merits of the

underlying preference framework (preference inheritance and scope-based conflict resolution). A video

showcasing this scenario is available online10.

Figure 5.8 shows the main page of Hippalus over the collection of 50 cars. Specifically, part (a)

shows the attributes, their values (which can be hierarchically organized), accompanied by the number

of their occurrences, where the user can restrict his focus or express preferences anchored to them.

Part (b) depicts the objects area, which is ranked according to preference, part (c) shows the preference

actions history and composition tool, part (d) displays the ‘Interesting Objects’ tool (i.e. like a shopping

cart) and finally, part (e) the object restriction history.

Figure 5.9 shows that one can expand broad values, like Asian (from the attribute Manufacturer),

and that by clicking on the value Korean the focus is restricted on three Korean cars. Notice that the left

bar has been updated, i.e. only the values that appear in the restricted set are presented (all attributes

have count up to 3). With additional clicks the user can further reduce the focus, e.g. from the attribute

Fuel Type we can see that one of the cars consumes Diesel and two cars Gasoline. By clicking on

Gasoline we see these two cars and by mouse over one of them the user gets its “Object Card” showing

all attributes of that car. At the right bottom frame the user can see the history of his clicks and can

undo any click.

Preferences are activated through right click menus. Suppose we cancel all clicks and assume that

10http://www.youtube.com/watch?v=Cah-z7KmlXc

Page 127: Papadakos PhD 2013

5.2. Hippalus: A Preference Enriched Faceted Exploratory System 101

Object

restriction

mouse

over

(b)

Figure 5.9: Hippalus: Value Expansion - Object Restriction

Figure 5.10: Hippalus: Expression of Relative Preference Korean ≻ European

we want to express that we prefer Korean cars than European. This means that we do not want to see

only Korean; we just want to get them ranked higher than European. This is shown in Figure 5.11.a (top)

where we see that now the user is getting a linear list of blocks of equally preferred objects, here the

Page 128: Papadakos PhD 2013

102 Chapter 5. Applicability and the System Hippalus

first contains Korean cars, the next one European (thanks to inheritance the user does not have to say

anything about German, Italian, French, etc).

It is important that preferences can be expressed incrementally, and at any point during the interac-

tion, e.g. suppose that the user also prefers prices around 12,000. He can use the action around 12,090

as shown in Figure 5.11.a (bottom). We can see that the object order now becomes more refined (the

figure shows 14 blocks). Notice that the first block contains one Korean (Hyunday) and one Fiat. This

happens because both of his preference actions have the same priority (and Fiat is closer to 12090). If the

user wants to give higher priority to one preference he can use the right frame dedicate on this. Figure

5.11(b) shows the object order obtained after expressing that the preferences on manufacturers have

higher priority than the preferences over prices.

At any time the user can click on a value from a facet to restrict the current focus, which is now a

preference-based list of cars. For instance, if the user wants to see only cars having two doors, he can

click on 2 in the attribute Doors. We can see that now he gets only 8 cars, which are ranked according

to his preferences so far. The user could cancel this extra restriction from the object restriction history.

In general the user can combine object restriction (or relaxation) actions and preference actions in

any order. Figure 5.14 shows the ranked list of objects, after restricting our focus only to cars that have

2 doors. The two previous preference actions (i.e. Korean ≻ European and price around 12090) are

used for the final preference ranking of objects.

The composition of preference actions is shown in Figure 5.12. Specifically, we have created two

priority levels, by pressing the ‘Add Priority Level’ button. Then we defined the desired priority order,

by drag-and-drop facets to the appropriate priority levels. As the user changes the priority levels, for

example by placing Manufacturer to Level 1 priorities and Price to Level 2 priorities, the system calculates

on the fly the new order of objects. Notice how refined the objects ranking is, because of the second

preference action that orders the cars around the price 12090. If we revert the priorities, the objects

order changes Figure 5.13. The default composition mode is the Combination mode. This mode shown in

Figure 5.14, is like Pareto, if no priorities are defined.

Page 129: Papadakos PhD 2013

5.2. Hippalus: A Preference Enriched Faceted Exploratory System 103

(a)

(b)

Figure 5.11:Hippalus: (a): Expressing Preferences, (b): Object Restrictions after PreferenceExpression

Page 130: Papadakos PhD 2013

104 Chapter 5. Applicability and the System Hippalus

Figure 5.12: Hippalus: Composition of Preference Actions. Manufacturer Prioritized to Price

Figure 5.13: Hippalus: Composition of Preference Actions. Price Prioritized to Manufacturer

Page 131: Papadakos PhD 2013

5.2. Hippalus: A Preference Enriched Faceted Exploratory System 105

Figure 5.14: Hippalus: Composition of Preference Actions. Default Combination Mode

Figure 5.15: Hippalus: Restricted Focus with Preferences Applied

Page 132: Papadakos PhD 2013

106

Page 133: Papadakos PhD 2013

Chapter 6

Evaluation

Contents6.1 Evaluation Approaches & Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . 110

6.1.1 Metrics for Exploratory Search . . . . . . . . . . . . . . . . . . . . . . . . . . . 110

6.1.2 Metrics Related to the Proposed Interaction Scheme . . . . . . . . . . . . . . . 114

6.2 Theoretical Analysis of the Number of User Decisions and Effort in FDT . . . . . 116

6.3 DiFEPreKO Hypothesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119

6.3.1 Analytical Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119

6.3.2 User Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121

6.4 Evaluation of Various Exploration Approaches . . . . . . . . . . . . . . . . . . . 131

6.5 Evaluation of Hippalus System . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134

6.6 Evaluation Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144

The objective of this chapter is to elaborate on how the proposed preference-based interaction scheme

could be evaluated. Specifically, Section 6.1 reviews the related work, identifies the various metrics and

evaluation approaches that have been proposed or used and are related to our interaction scheme, and

proposes new metrics for decision making. Consequently, Section 6.2 studies theoretically the conver-

gence of FDT-based UIs and the required user effort with or without preference actions. Afterwards,

Section 6.3 introduces and evaluates through a simple experiment, an hypothesis saying that without

107

Page 134: Papadakos PhD 2013

108 Chapter 6. Evaluation

the ability to explore the existing choices, the expression of preferences can be time-consuming and

result to incomplete preferences. Furthermore, Section 6.4 and 6.5 discuss two user-based evaluations.

The first one shows the effectiveness of FDT for exploratory tasks, while the second one evaluates the

proposed preference-based scheme over Hippalus and discusses the results of the evaluation. Finally

Section 6.6 concludes this chapter.

6.1 Evaluation Approaches & Metrics

Here we discuss a number of exploratory search metrics and we identify metrics that are relevant to our

proposed preference-based interaction scheme. Furthermore, we study theoretically the convergence

of FDT-based UIs and the required user effort with or without preference actions.

6.1.1 Metrics for Exploratory Search

One characteristic of any ES approach is that it is session-based. With session-based we refer to a dialogue

between the user and the system such that the response of the system (e.g. answer, branch shown)

does not depend only on the current user request (e.g. query, click) but also on his previous request

and session history in general. Furthermore, according to Marchionini (2006) ES is recall-oriented. As a

result, standard single query metrics like traditional Precision and Recall metrics or Instance Recall (Over

(1997)), which allow multiple queries per session (rewarding for the number of distinct relevant answers

identified in a session of a given length), are inefficient for evaluating session-based information tasks.

The evaluation of systems offering session-based IIR is difficult, because of the complexity of data re-

lationships, diversity of displayed data, interactive nature of exploratory search, along with the percep-

tual and cognitive abilities offered. They rely heavily on users’ ability to identify and act on exploration

opportunities, as described in White et al. (2007).

Discussions of available methods and metrics for evaluating experimental UIs for web searching are

provided in the works of Käki and Aula (2008) and Kelly et al. (2009). Kanoulas et al. (2011b) give an

overview of data collections and metrics for the evaluation of session-based IR1. Finally, a recent survey

regarding the evaluation of web retrieval effectiveness, is provided in Carterette et al. (2012). According

1The 2012 session can be visited in http://ir.cis.udel.edu/sessions/index.html

Page 135: Papadakos PhD 2013

6.1. Evaluation Approaches & Metrics 109

to it, we can group evaluation methods and metrics in three different categories. The first one include

traditional metrics, that do not make any assumption regarding the user. The second one tries to use

simple user models and finally, the third uses advanced user models. Figure 6.1 shows the different

groups of metrics, which are described below.

First Group of Metrics: No User Model

The first group of metrics assumes binary relevance (i.e. a document is either relevant or not) and is

based on sets of documents and not ranked lists. Specifically, it includes the traditional metrics of Pre-

cision and Recall from the Cranfield’s studies and their combinations, like F-measure and Average Precision.

Average Precision is the most widely used metric in IR.

Second Group of Metrics: Simple User Model

The second group includes metrics that make simple assumptions about user behaviour. For example

the Expected Search Length metric described in Cooper (1968), assumes that the user walks down a ranked

list of documents and observes every document until a stopping point. This is the point where he sat-

isfies his need. This metric uses a cost function for each visited document, based on the relevance of

the document. In addition, Robertson (2008) demonstrates that Precision and Recall can be redefined us-

ing the above user model, by defining the utility of each document. In this case, the Precision becomes a

measure of utility and Average Precision becomes an expectation of utility over a number of browsing deci-

sions. Furthermore, Robertson et al. (2010) proposed the Graded Average Precision (GAP ), a new measure

that redefines Average Precision by taking into consideration different relevant grades. Specifically, this

metric assumes that the user regards as relevant documents that have a relevance value over a specific

threshold.

The Rank Biased Precision (RBP ) measure, described in Moffat and Zobel (2008), tries to incorporate

the user’s persistence to examine a certain number of documents in the results list (e.g. the user looks

only the first result, or the top ten results). However, this measure does not take into account the quality

of the answer. Another relative metric is the Discounted Cumulated Gain (DCG) and its variations. The

best known is the normalized discounted cumulative gain (nDCG), described in Järvelin and Kekäläi-

nen (2002), which uses a graded relevance scale of documents and measures the usefulness or gain of

a document, based on its position in the result list. The gain is accumulated from the top of the list to

Page 136: Papadakos PhD 2013

110 Chapter 6. Evaluation

Figure 6.1: Available IR Metrics

the bottom, with the gain of each result discounted at lower ranks. The result is normalized by dividing

with the DCG of the ideal ranked answer set. Schuth and Marx (2011) suggest an adaptation of nDCG

for FDT-based information systems. Specifically, they are interested in which facet-value pairs will be

Page 137: Papadakos PhD 2013

6.1. Evaluation Approaches & Metrics 111

presented to the user. They also propose nrDCG, which is a recursive version of DCG.

In the same manner, Expected Reciprocal Rank (ERR) described in Chapelle et al. (2009), tries to over-

come the problem of DCG and RBP , that a text in a specific position has always the same profit, by

taking into account the quality of the response of the system. It is a popular measure for tasks that re-

turn a single relevant document and is based on the cascade user model. This model assumes a user that

accumulates utility by stepping down the ranked list and decides whether to continue browsing based

on the accumulated utility. Yilmaz et al. (2010) proposes the EBU metric, a similar metric to ERR,

which uses the same cascade user model, but in addition takes into consideration the effect of document

snippets.

Third Group of Metrics: Advanced User Model

The third category includes metrics that use more advanced user models. We can consider two different

subfamilies for these metrics. The first includes metrics that take into consideration novelty and diver-

sity. Examples include subtopic generalizations of recall and precision as described in Zhai et al. (2003),

where the user gets utility from each different topic that was retrieved. In addition, the intent-aware fam-

ily of measures described in Agrawal et al. (2009), assumes that there is a probability distribution over

subtopics. In Clarke et al. (2008), the a-nDCG metric, takes into account duplicate text, by penalizing

duplicate text. Finally, Chapelle et al. (2011) describe an intent-aware ERR, computed as a weighted

average of ERR over intents.

The second subfamily includes the metrics that are session-based. This subfamily includesnsDCG, a

variant ofnDCG for sessions, which is described in Järvelin et al. (2008) and incorporates a cost for each

query reformulation. Furthermore, the work described in Yang and Lad (2009), proposes a theoretical

probabilistic framework that takes into consideration the user interactions over multi-session ordered

lists, in order to evaluate and optimize information distillation2. The associated user models is a user that

steps down a list until a point where he reformulates his query and begins again from the new ranked list.

Finally, recent work of Kanoulas et al. (2011a) generalizes the traditional measures of IR such as Precision,

Recall and Average Precision during a session. These metrics assume that a user steps down a ranked list

until a point where he either reformulate his query or abandons the search.

2 Information Distillation is an emerging area of research, which focuses on the effective combination of ad-hoc IR, noveltydetection and adaptive filtering.

Page 138: Papadakos PhD 2013

112 Chapter 6. Evaluation

Other Evaluation Approaches

In addition to all the above, the work of Kules et al. (2009) examined the interaction with a faceted online

library catalogue and found that facets are very important in exploratory processes. Azzopardi (2009)

represents the usage of an ES as a stream of documents and studies the performance of such systems

based on time and usage. Kules and Capra (2008) discusses ways to create exploratory tasks for faceted

search UIs. Wilson and Schraefel (2007) propose a method for evaluating exploratory search by blend-

ing IR frameworks with HCI design. Works that use statistical methods such as factor analysis (FA) and

structural equation models (SEM) in order to examine the interrelationships between multiple evalua-

tion criteria are described in Toms et al. (2005) and O’Brien et al. (2008) respectively. Finally, Carterette

et al. (2011) simulates user behaviour by using ’click’ data and a Bayes procedure.

6.1.2 Metrics Related to the Proposed Interaction Scheme

Regarding the evaluation of our preference-based interaction scheme, we will consider both non session-

based and session-based metrics, which can be measured at each step of the interaction. We include both

non session-based and session-based metrics, so that we can conclude how each user action affects the

results set and the user task respectively.

Non-session based metrics

The following non-session based metrics could be beneficial for the evaluation of our approach:

Average Precision. It is one of the most commonly used metrics, since it takes into consideration

both precision and recall. It is calculated by the following formula:

AP =n∑

i=1

p(i)δr(i) (6.1)

where p(i) is precision of document in position i of the search results, δr(i) is the difference in recall

from document in position i − 1 to document in position i and n is the number of objects in the result

set.

normalized Discounted Cumulative Gain - nDCG. Discounted Cumulative Gain, is a metric de-

scribed in Järvelin and Kekäläinen (2002), which promotes systems that return relevant documents near

Page 139: Papadakos PhD 2013

6.1. Evaluation Approaches & Metrics 113

the top of the answer set and penalizes systems that return relevant documents at the bottom of the

answer set. It is calculated by the following formula for the position k:

DCGk =

k∑i=1

2reli − 1

log2(i+ 1)(6.2)

where reli is the relevance of document i and reli ∈ [0, 1]. The normalized DCG i.e. nDCG in the position

r is calculated by divingDCGr with the IDCGr value, which is the idealDCGr value (documents were

returned in the optimum way). Specifically,

nDCGr =DCGr

IDCGr(6.3)

normalized Discounted Cumulative Gain - nDCG for FDT. An adaptation of nDCG for FDT-based

information systems by taking into consideration the facet-value pairs, is described in detail in Schuth

and Marx (2011). This metric focuses on two aspects: (a) prefer facet-values that would return a lot

of relevant documents high in the return list and (b) prefer facet-values that would return relevant

documents we have not seen by earlier facet-values.

normalized recursive Discounted Cumulative Gain - nrDCG for FDT. A recursive version of

nDCG for FDT is also proposed in Schuth and Marx (2011). Such a metric could be very useful for

suggesting the top-K most valuable facet-values to the user, when the display area is limited (i.e. mobile

devices). Furthermore, these metrics could provide the default ordering of facets and their values (in

addition to the lexicographic, value and count based ordering).

normalized Expected Reciprocal Rank - nERR. This is a metric that takes into consideration the

usability of the documents in the answer set and is described in detail in Chapelle et al. (2009). This

metric is calculated by the following equation:

ERR =n∑

r=1

1

rP (user stops at position r) (6.4)

where P (r) is the probability that the user stops searching after the document in position r. This prob-

Page 140: Papadakos PhD 2013

114 Chapter 6. Evaluation

ability is calculated by the following equation:

P (user stops at position r) =

r−1∏i=1

(1−R(reli))R(relr) (6.5)

where R(reli) is the probability that document i satisfies the user. In more detail R(rel) is calculated

by the following equation:

R(rel) =2rel − 1

2max rel(6.6)

where max rel is the maximum relevance score. While there is no justification for using this formula

(like in the gain function of DCG), values could be inferred from logged user data.

ERR can be normalized (nERR), by dividing with the maximum ERR for a specific query.

Session-based Metrics

Session-based metrics include:

Session-based Precision, Recall and Average Precision. These metrics extend the classic preci-

sion, recall and average precision metrics for sessions. They are described in detail in Kanoulas et al.

(2011a).

normalized session Discounted Cumulative Gain - nsDCG. Järvelin et al. (2008) extends DCG and

nDCG to a session. This specific metric takes into consideration also the number of queries. The bigger

the number of the queries, or the number of interactions in the case of explaratory systems, the smaller

the value of the metric. Specifically, nsDCG is calculated by:

nsDCG(q) =(1 + logbqq)

−1 ∗DCG

IDCG(6.7)

where q is the number of the query or user interaction and 1 < bq < 1000.

6.2 Theoretical Analysis of the Number of User Decisions and Effort in

FDT

Here we try to measure the number of choices that a user has to make in order to reach (through explora-

tory browsing) the desired object, assuming that all objects are described by one or more hierarchically

Page 141: Papadakos PhD 2013

6.2. Theoretical Analysis of the Number of User Decisions and Effort in FDT 115

organized attributes. Specifically, we theoretically discuss the convergence of FDT-based UIs and the

required user effort with and without preference actions.

Convergence of FDT Exploration

The algorithm presented at Section 4.2 (Alg. PrefOrderOpt) is based on the assumption that the focus A

can be reduced very fast in a FDT-based interaction. In this section we report an analysis for justifying

this claim.

Consider one taxonomy having the form of a complete and balanced tree of depth d and degree b.

Let n be the number of objects in the information base (which are indexed by that tree). In that case

b ∗ d is the number of choices a user has to see in order to reach (select) a particular leaf (i.e. the number

of terms whose label the user has to read if he starts from the root of the tree), and d is the number of

decisions (i.e. clicks) he has to make. If we want each object to have a distinct description (assuming that

each object is classified to one leaf of the taxonomy), then this means that:

b ∗ d = b ∗ logbn = d√n ∗ d (6.8)

The real-valued degree b that minimizes the product b ∗ d is the Euler’s number e, so let us assume

that b = 3 is the more beneficial degree. If each leaf should index 10 objects, then for n = 1011 objects

we need 1010 descriptions. Assuming b = 3 we get b ∗ d = 3 ∗ log31010 ∼= 3 ∗ 19 = 57 choices, and

d ∼= 19 clicks.

Now suppose that we have k facets. Finding the desired description requires selecting one leaf from

each Ti. As there are k facets, and we must select one leaf from each one of them, the overall displayed

choices are obtained by multiplying by k. Since we have k facets, we can obtain the n distinct descrip-

tions if each facet has k√n leaves (since their cartesian product yields n distinct descriptions). In this

case, the depth d of a facet equals to logb k√n, and the degree b is d

√k√n = d∗k

√n. It follows that the

overall displayed choices are:

b ∗ d ∗ k = b ∗ logb( k√n) ∗ k = b ∗ logb(n

1k ) ∗ k =

= b ∗ logb(nkk ) = b ∗ logbn (independent of d)

b ∗ d ∗ k = d∗k√n ∗ d (independent of b)

Page 142: Papadakos PhD 2013

116 Chapter 6. Evaluation

and the number of clicks required is k∗d. Some indicative values of these parameters are shown in Table

6.1. From the last row we can see, that in order to select the desired 10 objects from a peta-sized (∼ 1015)

information base, the user has to make 30 clicks.

Table 6.1: Choices and Number of Clicks

n/10 k b d Num. of Choicesb ∗ d ∗ k

Num. of Clicksk ∗ d

531.441 (∼ 106) 3 3 4 36 123.486.784.401 (∼ 1011) 5 3 4 60 20

∼ 1015 10 3 3 90 30

In the previous analysis we have considered plain faceted taxonomies, not dynamic ones. According

to FDT, during the interaction process the only displayed terms are those whose addition to the current

selection yields a conjunction having a non empty extension. So, although the number of clicks will not

be reduced, the number of choices (i.e. the number of terms the user has to read) will be less, since each

displayed term will not have all of its b children active.

From this small example we can realize the potential of FDT on rapidly reducing very big information

spaces. We should also mention that the analysis in Sacco (2006b) shows that 3 zoom operations on leaf

terms are sufficient to reduce an information base of 107 objects described by a taxonomy with 103 terms

to an average of 10 objects.

Plain FDT versus FDT with Preferences w.r.t. User Effort

Note that term-scoped preferences (i.e. those that order the terms of a facet according to user’s pref-

erences) make the aforementioned choices less laborious since the more desired options are shown first.

Specifically if we assume that a preference relation for each facet has been defined, and we assume that

the most preferred choice from each facet is prompted first (and it is unique), then the cost of the re-

quired decisions is not b ∗ d ∗ k but 1 ∗ d ∗ k since the user just clicks on the first choice without having

to look at the rest choices.

Returning to the context of the car selection use case, if we assume that each of the 7 billions persons

living on this planet sells one car, then for n = 109 objects we need 108 descriptions if we want to reach

a block comprising 10 cars. Assuming k = 10 and degree b = 3 we get that b ∗ d ∗ k = b ∗ log3108 ∼=

3 ∗ 15 = 45 choices have to be displayed (using plain faceted taxonomies), and certainly less than 45

Page 143: Papadakos PhD 2013

6.3. DiFEPreKO Hypothesis 117

using dynamic taxonomies. If we assume that preferences have been defined for each of the k = 10

facets, then the choices are reduced to 15, which is equal to the number of required clicks.

6.3 Difficulty of Formulating Effective Preferences without Knowing the

Options (DiFEPreKO) Hypothesis Evaluation Through a User Study

In this section we introduce the Difficulty of Formulating Effective Preferences without Knowing the

Options (DiFEPreKO) hypothesis:

Hypothesis Without the ability to view and explore the existing choices, the expression of preferences is time-

consuming and in most cases results to incomplete preferences (i.e. preferences that are not sufficient for selecting

the most desired option from a particular set of choices).

Initially, we provide an analytical comparison between extensional and intentional preferences re-

garding effort, completeness and correctness. Afterwards, we describe the conducted user study for

evaluating the DiFEPreKO hypothesis. We present the results and conduct a statistical significance test

to check the randomness of our results.

6.3.1 Analytical Comparison

Let A1, . . . , Ak be the k attributes that are used for describing the choices, and let dom(Ai) denote the

set of values that Ai can take (for each i = 1..k). Let V be the cartesian product of the domains of the

attributes, i.e. V = dom(A1)× . . .× dom(Ak).

We can consider that a complete (overV ) intentionally specified preference aims at defining a linear order

of the elements of V . Let denote by ip an intentionally specified preference and let≻ip denote the linear

order of V that ip defines.

Now consider a specific set of choices S ( S ⊆ V ). We can consider that a complete extensional pref-

erence over a set of choices S aims at defining a linear order of the elements of S. Let denote by ep the

preference specification and let≻ep denote the defined linear order of S that ep defines.

Completeness and Correctness of Intentional Specified Preferences

Page 144: Papadakos PhD 2013

118 Chapter 6. Evaluation

Consider that we have an S, the user has defined an≻ep, and suppose that we consider≻ep correct. We

could say that an ip is correct and complete with respect to ≻ep, if the restriction of ≻ip on S is equal to

≻ep.

However note that since in decision tasks humans mainly have to select the most desired element

(the hotel to book, the car to buy, the place for holidays), and not to order the entire list of available

options, in our user study (described afterwards), we will consider and compare only the first and the

second most preferred elements, i.e. only the two most preferred elements according to≻ep and≻ip.

Effort Required for Expressing Complete Preferences

We could quantify, in a very rough way, the effort required for expressing complete intentional prefer-

ences with the amount |V |. Similarly, we could quantify, in a very rough way, the effort required for

expressing extensional predicates with the amount |S|.

For instance, if we have only one attribute that takes two values, and only two objects, then |V | =

|S| = 2, meaning that it is equally laborious to express preferences intentionally or extensionally. If

on the other hand we have 10 attributes, each having 10 possible values, and two objects, then |S| = 2

while |V | = 1010, indicating that it is much more laborious to express preferences intentionally than

extensionally in this case.

The above specified costs, do not aim at being accurate; they aim to capture the main point. One could

easily refine the costs according to various aspects, for instance, according to the type of the attribute

values. Specifically, for an attribute Ai we can define Cost(Ai) = |dom(Ai)| if categorical, else (i.e.

if the domain is arithmetic) we can define Cost(Ai) = 1. The latter because in arithmetic attributes

(e.g. horsePower, fuelConsumption, price) commonly the user just has to express whether he prefers

the highest values, the lowest values, or those around a specific value, hence he does not have to inspect

the available values. In contrast, in categorical attributes (e.g. bodyType, brand, color), the user has

to express his preference on the specific values of the attributes. Based on the above perspective, the

cost for specifying complete intentional preferences could then defined as Cost(A) = Cost(A1) ∗ . . . ∗

Cost(Ak) (note that Cost(A) ≤ |V |).

Page 145: Papadakos PhD 2013

6.3. DiFEPreKO Hypothesis 119

6.3.2 User Study

In this user study 30 persons participated, 18 male and 12 female, from 7 countries and of ages between

22 - 75 years old. All of the participants had at least secondary education, while most of them had a MSc

or PhD degree3. The experiment had two steps, Step 1 and Step 2.

Step 1

In the first step, all participants were asked to express their preferences, according to the following:

Suppose that you have (it is obligatory) to change your car. You have to select and buy a new one, which you

will use for the next 5 years, and of course you will have to pay it. Please express your preferences on paper. This

paper will be handed to a different person who has at his disposal a limited collection of available cars. This person

will select one for you based on the available cars and the preferences that you expressed.

You have 30 minutes at most to express your preferences. You are free to express them in any form you like,

e.g. in natural language text (e.g. I prefer a car with an engine volume between 1200 and 1400 cc), by providing

an ordering of the firms according to your preference (e.g. Japanese, European or BMW, Audi), by specifying the

preferred (ideal) price, etc. Other characteristics could include year, body type, engine volume, power, max speed,

acceleration, fuel consumption, weight, fuel type, price, trunk, etc.

Please measure how much time you spent on this exercise and give us the paper.

Step 2

Immediately after completing the first step, participants continued with the second step, in order to

avoid users’ preferences alteration. In this step, users were given a list of cars and were asked to identify

which car was ideal for them in order to buy it. In total, the list consisted of 50 cars and is shown in Fig.

6.2 and Fig. 6.3. Again, users were asked to measure how much time they spent on finding the ideal car.

Results

Subsequently, we checked if the paper-written preferences of Step 1 would allow someone to obtain the

car selected in Step 2. In case the answer is ”YES” then it means that the preference expression on paper

3 9 persons that participated in this evaluation were participants of the First MUMIA Training Summer School ”Building NextGeneration Search Systems” (http://www.mumia-network.eu/index.php/training-school-2012), Olympiada, Chalkidiki, Greece.

Page 146: Papadakos PhD 2013

120 Chapter 6. Evaluation

Figu

re6.

2:Ev

alua

tion

Step

B:Us

ersS

elec

taCa

rfro

mth

eLi

st(1

stpa

ge)

Page 147: Papadakos PhD 2013

6.3. DiFEPreKO Hypothesis 121

Figu

re6.

3:Ev

alua

tion

Step

B:Us

ersS

elec

taCa

rfro

mth

eLi

st(2

ndpa

ge)

Page 148: Papadakos PhD 2013

122 Chapter 6. Evaluation

was complete and sufficient in order to select the ideal car. If the answer is “NO”, then the conclusion

would be that they did not manage to express their preferences in a sufficient way, in order to get the

most desired car from the small list of available cars. In addition, we compared the times users spent in

both steps.

In order to check the results, a broker was given the forms of Step 1, with the expressed preferences

of each one of the participants. Subsequently, he was asked to select the ideal and the second ideal car

from the list of Step 2, based on the participants preferences.

Users preferences were divided in two categories: Specific and General. Specific preferences are pref-

erences that use specific values of the attributes domain (i.e. ’I prefer red to yellow cars’ or ’I want a car

with a displacement between 1200cc and 1400cc’). General preferences are preferences that do not use specific

values (i.e. ’I want a cheap car’ or ’I want a car that does not pollute the environment’).

The broker used a number of criteria in order to select the most ideal car for a specific user, based

on the user’s preferences expressed in Step 1. The following criteria, ordered according to significance,

were used for ranking the objects of our collection of cars:

1. Specific Preferences Criterion (SPC) Initially we only consider the Specific preferences applicable

to our collection. If the expressed preferences are prioritized (i.e. ’Firstly I prefer a car that costs

less than 10000 Euros, secondly a car with a displacement between 1200cc and 1400cc, etc.’), then

cars are ranked according to the most prioritized preference, then according to the second most

prioritized preference, etc. This is the Prioritized composition we discussed in Section 3.4.1. In the

case that the user did not provide any priority order, preferences were considered equal. When a

number of preferences have the same priority, then Pareto composition is used as it was described

in Section 3.4.2. Lastly, in case of ties, the final bucket order is derived by ordering the cars of each

bucket according to the number of wins per preference, like the rules described in Section 3.3.2.

2. General Preferences Criterion (GPC) If there are still ties, when deriving the most ideal and

second ideal car, based on the ordering of cars created by the previous step, we take advantage

of the General preferences, in case they can be applied to our collection. Specifically, the broker

transformed each General preference to an ordering of cars. For example the preference ’I want a

cheap car’, means ordering the cars of each bucket according to their price. Again the same criteria

(Priority composition, Pareto composition and the wins rule) were used to derive the ideal and the

Page 149: Papadakos PhD 2013

6.3. DiFEPreKO Hypothesis 123

second ideal car.

3. Broker Assumption Criterion (BAC) Finally, in the very few cases of a second tie, the broker was

free to use his own assumptions like preferring most of the times the cheaper one, or based on his

own opinion about manufacturers reliability. The above assumptions were used in our evaluation

and were sufficient for the small number of cases that we had a 2nd tie (i.e. a tie in the General

preferences).

An indicative example of the results table is shown in Table 6.2. In the first columns, the table stores

information regarding the user. The Step 1 column, holds information for Step 1, specifically the number

of Specific preferences (SP ), the number of General preferences (GP ), and the total number of prefer-

ences (TP ), which are 3, 8 and 11 respectively in our case. Furthermore, in this specific example, the

user spent 20 minutes to express his preferences (Time). Step 2 column holds information regarding the

second step of the evaluation. Specifically, according to our example, the user selected from the list of

cars the car with id 46 (CID), after searching the list of cars for only 3 min. (Time).

Additionally, there is a grade for this specific car. This grade indicates the priority of the Specific

preferences expressed by the user and is based on the number of Specific preferences this car satisfies.

Specifically, the preference grade PGiz of a preference Pi for car cz can take the following values:

• ✓ (i.e. the preference Pi is satisfied for this car)

• − (i.e. the preference Pi is not satisfied for this car)

• ◦ (i.e. the preference Pi is not applicable for this car). These are the inactive elements which in

this case we have considered as worse than elements that satisfy this preference, but better that

elements that do not satisfy this preference.

• a number if this car satisfies the corresponding value from an ordered set of preference values

(i.e. 1 for the first, 2 for the second, etc.). For example, if Fiat ≻ Audi and Audi ≻ Mercedes,

then a car made by Fiat would have a value 1 for this preference, Audi would have a value 2, etc.

and the rest of the cars made by other manufacturers would have a value of 0, which are inactive

elements and are considered as the worst in this case.

When there was no preference priority, all preferences were considered equal i.e. the grades would

be {PG1z, ..., PGnz} for a specific car cz . On the other hand, if P1 is prioritized over P2, which is priori-

Page 150: Papadakos PhD 2013

124 Chapter 6. Evaluation

User

User Information Step 1 Step 2

Id Age Gen. Educ. Country SP GP TP Time CID Grade Time

16 23 F MSc Greece 3 8 11 20 m. 46 {{0}, {−✓}} 3 m.

Broker

Ideal 2nd Ideal 1st vs 2nd

CID Grade CID Grade wins in

43 {{1}, {✓✓}} 28 {{2}, {✓◦}} SPC

Results

Br vs Us General

wins in NIIP IIPR NUP UPR S1 S2

SPC 0 0% 5 45.4% − −

Table 6.2: Example of Hypothesis Evaluation Results

tized over all the other preferences for car carz , then the grade would be {{PG1z}, {PG2z}, {PG3z, ...,

PGnz}}. In our example, P1 is prioritized over P2 and P3, which have equal priority. Notice that for

space reasons, we provide only grades regarding the SPC criterion.

The next columns concern the broker, and specifically the first and second ideal cars that he pro-

poses. Again we store the car ids (CID) and their corresponding grades. The next column (1st vs 2nd)

describes in which step of the broker’s criteria process, the ideal car was preferred over the 2nd ideal

car, and takes a value from {SPC,GPC,BAC}.

Finally, the last seven columns give us an overview of the results. The first Br vs Us column de-

scribes in which step of the broker’s criteria process, his ideal car was preferred over the user’s selected

ideal car, and again takes a value from {SPC,GPC,BAC}. The next column, Number of Intentional

Inconsistent Preferences (NIIP) means the number of preferences, expressed in Step 1 by the user, that

were intentionally overriden when the user selected the ideal car from the collection (e.g. choosing a

car with a displacement of 1500cc when he had expressed a preference for an engine less than 1400cc).

Intentional Inconsistent Preferences Percentage (IIPP) holds the percentage of preferences that were

overriden over the number of Specific preferences. Number of Unused Preferences (NUP), holds the

number of preferences that were not used, since they were not applicable in our collection, and Unused

Preferences Percentage (UPP) holds the percentage of preferences that were not used over the whole

number of preferences that the user expressed. Finally, (S1) marks if the user selected the same car as

Page 151: Papadakos PhD 2013

6.3. DiFEPreKO Hypothesis 125

the ideal car proposed by the broker and (S2) marks if the user selected the same car as the second ideal

car proposed by the broker.

The results of the evaluation are shown in Table 6.3. From the results it is obvious that only 6 out

of the 30 participants (20%) were able to select the ideal car in Step 2, according to their expressed

preferences in Step 1. This supports our initial hypothesis that without exploring the existing choices,

the expression of preferences results to incomplete preferences. In addition, when the user did not select

the ideal car according to his preference, the broker’s ideal car won the user selected car, according to

his expressed preferences in Step 1, 79.16% during the Specific preference criteria phase (SPC), 16.67%

during the General preference criteria phase (GPC), and only 4.16% during the broker’s assumption

phase (BAC).

If we also take into consideration the 2nd ideal car proposed by the broker, then the number of the

participants raises to 10 (33.3%). Notice though, that 75% of the 2nd ideal cars, lost from the ideal car

during the Specific preference criteria phase (SPC). This means that the ideal car is clearly preferred to

the 2nd ideal one, according to the expressed Specific preferences by the user. The rest 25% of these cars

lost during the broker’s assumption phase (BAC). Regarding the (1st vs 2nd) column, we can see that

40% the broker was able to discriminate the ideal from the 2nd ideal car during the SPC phase, 36.6%

during the GPC phase and 23.3% during the broker biased BAC phase.

Furthermore, we can see that participants spent on average 10minutes in Step 1 (the worst case was a

user that used the whole 30 minutes time slice). Users tried to take into consideration every aspect they

could imagine, since a test collection with the viable choices was not available. As a result the process

of preference expression was time consuming and also lead to a number of inconsistent preferences.

Specifically, each participant expressed on average 9.73 preferences, of which 6.7 were Specific prefer-

ences and the rest 3.06 were General preferences. Of the 6.7 Specific preferences, 1.54 were inconsistent

(i.e. meaning that finally the user selected an ideal car that did not satisfy this preference). 3.2 prefer-

ences per user were not applicable to our collection, and as a result were not used. This result showcases

that users spend a lot of time expressing preferences that are not consistent with their final decision or

applicable for selecting the ideal selection.

On the other hand, participants spent 4 minutes to find the ideal car from the list of 50 cars in Step 2.

We can argue that in the case of a list of thousands cars, participants would have to spent a lot more time

in order to find the ideal car. In this case we could exploit available information thinning approaches

Page 152: Papadakos PhD 2013

126 Chapter 6. Evaluation

and the proposed preference framework.

Another important conclusion is that only 2 out of the 30 participants provided a prioritized list of

preferences. This might explain the differences between the ideal car users selected and the ideal car

that was picked by the broker, since for example the price is one of the most important factors when

purchasing a car.

Finally, we can conclude that even though most of the participants have a high educational level,

they were not able to provide the appropriate preferences that could lead to the ideal car for them.

They spend on average 10 minutes in order to provide preferences that were either non applicable or

they were overridden when they chose the ideal car from the test collection.

Statistical Significance Test

We also conducted a statistical significance test to check the randomness of our results. In our evalu-

ation test we have dichotomous data, where each individual in the sample is classified in one of two

categories. The first category is the individuals who expressed preferences that can lead to the ideal car

for them (CIdeal) and the second category is the individuals who expressed preferences that could not

lead to the ideal car (CNon−Ideal). A suitable statistical test in our case is a one-tailed (lower-tailed) bino-

mial significance test, since we have dichotomous data, observations are independent from each other,

probabilities of success and failure are constant across trials and the critical region falls at one end of

the possible values (Griffiths (2009)).

Our null hypothesis H0 is:

Null Hypothesis (H0) More than half of the users expressed their preferences without exploring available cars,

and were returned the ideal car for them from a car collection.

Then the alternative hypothesis H1 is:

Alternative Hypothesis (H1) Less than half of the users expressed their preferences without exploring avail-

able cars, and were returned the ideal car for them from a car collection.

In our case, we want the user selected car id (cidu) (i.e. the ideal car from the car collection for the

user) and the first ideal car id selected by the broker (cidb) to be the same (cidu = cidb), for more than

half of the cases.

Page 153: Papadakos PhD 2013

6.3. DiFEPreKO Hypothesis 127

User

Brok

erRe

sults

User

Info

rmat

ion

Step

1St

ep2

Idea

l2n

dId

eal

1stv

s2nd

Brvs

UsGe

nera

l

IdAg

eGe

n.Ed

uc.

Coun

try

SPGP

TPTi

me

CID

Grad

eTi

me

CID

Grad

eCI

DGr

ade

win

sin

win

sin

NIIP

IIPR

NUP

UPR

S 1S 2

132

MM

ScGr

eece

63

94

m.

40{1

✓✓

◦✓✓}

2m

.40

{1✓✓

◦✓✓}

4{3

✓✓✓✓✓}

SPC

-0

0%0

0%✓

225

FM

ScGr

eece

45

95

m.

46{◦

◦✓✓}

3m

.46

{◦◦✓✓}

40{◦

◦✓✓}

GPC

-0

0%5

55.5

%✓

326

FM

ScGr

eece

70

77

m.

31{✓

4−

✓◦◦✓

}1

m.

30{✓

4✓✓

◦◦✓

}38

{✓0✓✓

◦◦✓

}SP

CSP

C1

14.2

%2

28.5

%−

427

MM

ScGr

eece

34

75

m.

46{✓

✓✓}

3m

.46

{✓✓✓}

30{✓

✓✓}

GPC

-0

0%0

0%✓

526

MPh

DGr

eece

51

66

m.

46{✓

−−✓−}

2m

.15

{✓−

−✓✓}

16{✓

−−✓✓}

BAC

SPC

360

%0

0%−

630

FPh

DFr

ance

61

710

m.

47{−

✓✓

−◦−

◦}5

m.

10{−

✓✓

−✓

−◦}

32{−

✓✓

−✓

−◦}

BAC

SPC

350

%1

14.2

%−

732

MUn

iver

s.Au

stria

81

910

m.

16{✓

✓◦−

−−✓}

3m

.31

{✓✓✓

◦✓✓✓}

34{✓

✓✓

◦✓✓✓}

BAC

SPC

342

.8%

00%

−−

833

FPh

DEs

toni

a3

58

7m

.48

{◦−

✓}

1m

.48

{◦−

✓}

4{◦

−✓}

BAC

-1

33.3

%5

62.5

%✓

933

ΜM

ScNo

rway

47

1110

m.

45{−

✓✓✓}

3m

.41

{−✓✓✓}

42{−

✓✓✓}

BAC

BAC

125

%7

63.5

%−

1031

FPh

DBu

lgar

ia4

610

15m

.34

{✓✓✓✓}

15m

.30

{✓✓✓✓}

35{✓

✓✓✓}

GPC

GPC

00%

550

%−

1132

MPh

DIn

dia

100

107

m.

4{{

✓−},

{−◦✓

◦◦◦✓◦}

}5

m.

6{{

✓✓},

{−◦−

◦◦◦−◦}

}4

{{✓−},

{−◦✓

◦◦◦✓◦}

}SP

CSP

C2

20%

330

%−

✓12

31M

PhD

Gree

ce7

310

5m

.40

{{✓✓},

{✓✓✓✓1}}

2m

.40

{{✓✓},

{✓✓✓✓1}}

42{{

✓✓},

{✓✓

−✓2}}

SPC

-0

0%0

0%✓

1329

MM

ScGr

eece

110

1110

m.

34{◦

◦−

◦◦◦◦✓

✓✓✓}

5m

.11

{◦◦✓

◦◦◦◦✓

✓✓✓}

16{◦

◦−

◦◦◦◦✓

✓✓✓}

SPC

SPC

19.

09%

654

.5%

−−

1428

MM

ScGr

eece

100

1010

m.

14{◦

◦✓✓✓✓✓

◦−✓}

3m

.26

{◦◦✓✓✓✓✓

◦✓✓}

43{−

◦✓✓✓✓✓

◦✓✓}

SPC

SPC

110

%1

10%

−−

1537

FM

ScGr

eece

156

2113

m.

49{−

✓✓

−◦◦◦◦◦◦◦◦◦◦◦}

4m

.33

{✓✓✓✓

◦◦◦◦◦◦◦◦◦◦◦

}50

{−✓✓✓

◦◦◦◦◦◦◦◦◦◦◦

}SP

CSP

C2

13.3

%14

82.3

%−

1623

FM

ScGr

eece

38

1120

m.

46{◦

✓✓}

3m

.43

{✓✓✓}

28{✓

✓✓}

GPC

SPC

09%

545

.5%

−−

1776

MHi

ghSc

h.Gr

eece

141

1515

m.

46{◦

◦◦◦◦◦◦✓

✓◦−✓

−−}

5m

.31

{◦◦◦◦◦◦◦✓

✓◦−✓

−✓}

46{◦

◦◦◦◦◦◦✓

✓◦−✓

−−}

SPC

SPC

321

.4%

746

.6%

−✓

1844

MPh

DAu

stria

55

1030

m.

17{◦

✓−

✓−}

30s.

37{◦

✓✓✓−}

17{◦

✓−

✓−}

SPC

SPC

240

%4

40%

−✓

1925

MM

aste

rGr

eece

56

115

m.

46{✓

−−✓✓}

2m

.15

{✓−

✓−

✓}

14{✓

−✓

−✓}

GPC

GPC

240

%2

18.8

%−

2042

MM

aste

rGr

eece

104

1414

m.

46{1

✓✓

◦−}

18m

.20

{1✓✓✓✓}

17{1

−✓✓✓}

SPC

SPC

120

%7

50%

−−

2147

FUn

iver

s.Gr

eece

32

515

m.

15{✓

−−}

1m

.20

{−✓✓}

17{−

−✓}

SPC

SPC

266

.6%

00%

−−

2246

MUn

iver

s.Gr

eece

80

85

m.

30{✓

◦✓✓✓✓✓

−✓}

1m

.30

{✓◦✓✓✓✓✓

−✓}

46{✓

◦−

−◦−

✓−

✓}

SPC

-1

12.5

%1

12.5

%✓

2322

MUn

iver

s.Gr

eece

90

98

m.

21{◦

✓−

✓✓✓

−✓−}

1m

.47

{◦−

−✓✓✓✓✓✓}

5{◦

−−✓✓✓✓✓✓}

GPC

SPC

333

.3%

00

%−

2426

FM

ScGr

eece

40

45

m.

30{◦

✓✓✓}

5m

.46

{◦✓✓✓}

30{◦

✓✓✓}

GPC

GPC

00%

125

%−

✓25

30F

PhD

Gree

ce5

1217

10m

.36

{◦◦◦−

✓}

5m

.30

{◦◦◦✓

✓}

16{◦

◦◦✓

✓}

GPC

SPC

120

%10

58.8

%−

2626

FM

ScGr

eece

90

97

m.

31{◦

◦◦◦−

−−✓✓}

1m

.16

{◦◦◦◦✓✓✓

−−}

30{◦

◦◦◦✓

−✓

−✓}

BAC

SPC

337

.5%

00%

−−

2725

MM

ScGr

eece

54

915

m.

46{−

✓✓

−−}

5m

.16

{✓✓✓✓✓}

32{✓

✓✓✓✓}

GPC

SPC

360

%3

33.3

%−

2832

MM

ScGr

eece

90

912

m.

32{✓

✓−

−◦✓✓

−−}

5m

.30

{✓✓

−−

◦✓✓

−✓}

39{✓

−✓

−◦✓

✓−

−}

BAC

SPC

444

.4%

333

.3%

−−

2926

FM

ScGr

eece

64

1010

m.

48{−

2−

✓✓−}

5m

.14

{−0✓

−✓✓}

39{−

0✓

−✓✓}

GPC

SPC

350

%3

30%

−−

3027

MM

ScGr

eece

33

65

m.

4{◦

✓✓}

5m

.49

{◦✓✓}

30{◦

✓✓}

GPC

GPC

00%

116

.6%

−−

AverageVa

lues

6.7

3.06

9.73

10.1

m4.

01m

1.53

24.4

%3.

228

.71%

TotalN

umber

64

Tabl

e6.

3:Re

sults

ofth

ehy

poth

esis

eval

uatio

n

Page 154: Papadakos PhD 2013

128 Chapter 6. Evaluation

If Y is the number of successes in n trials, then the probability of getting Y successes in n trials is

due to the binomial distribution (Griffiths (2009)):

P (Y = y) =

(n

k

)∗ pk ∗ (1− p)n−k =

n!

y! ∗ (n− y)!∗ py ∗ q(n−y)

where p is the probability of success and q = 1−p the probability of failure. In our case n = 30, p = 0.5

and q = 0.5. So we want to check what is the probability that 6 or less participants are successful in

finding the appropriate car for them. So, we want to calculate:

P (X ≤ 6) =

6∑i=0

P (X = i)

which will provide the p-value (the probability of obtaining a test statistic at least as extreme as the one

that was actually observed). Figure 6.4 shows the probabilities of the binomial distribution for different

number of successes, the cumulative distribution function and the Type I error area (i.e. rejecting falsely

a true null hypothesis).

Regarding the significance level, according to Wasserman (2004) an α value of 0.05, which is com-

monly used in the bibliography, provides a strong evidence against H0, while a value of less than 0.01

provides a very strong evidence against H0. In our case we used an α value of 0.01. The α value deter-

mines the risk of a Type I error.

We used the R language 4 in order to calculate the above probability. Specifically, we executed the

command:

binom.test(6, 30, 0.5, alternative="less")

which returned a p-value = 0.0007155 ≤ α = 0.01. As a result we have a very strong evidence against

H0. So we can reject the null hypothesis H0 and we can conclude that: ”Less than half of the users can

express their preferences without exploring available cars, in such a way that they can be returned the ideal car for

them from a car collection”.

Furthermore, if we also consider the second ideal cars, where the number of successes is 10 then, we4R is an open source programming language and software environment for statistical computing and graphics. The R lan-

guage is widely used among statisticians and data miners for developing statistical software and data analysis. (http://www.r-project.org/)

Page 155: Papadakos PhD 2013

6.4. Evaluation of Various Exploration Approaches 129

Figure 6.4: Probabilities and Distribution Function of the Binomial Distribution

can execute the command:

binom.test(10, 30, 0.5, alternative="less")

which returned p-value = 0.04957 ≥ α = 0.01. As a result, with a significance level α = 0.01 we

cannot reject the H0. But if we relax our significance level to α = 0.05, then p-value = 0.04957 ≤ α =

0.05. And as a result we have a strong evidence (instead of the very strong evidence that is provide by

α = 0.01), to reject the H0 and accept the H1.

6.4 Evaluation of Various Exploration Approaches

We conducted a comparative evaluation between a) an interface providing FDT over static metadata b) an

interface providing a clustering algorithm and c) a combination of FDT with both static and dynamically

mined metadata (i.e. clustering). The purpose of this evaluation was to prove the effectiveness of the

FDT scheme over other exploratory schemes like clustering.

Thirteen users participated in the evaluation with ages ranging from 20 to 30, 61.5% males and 38.5%

Page 156: Papadakos PhD 2013

130 Chapter 6. Evaluation

females. We can distinguish two groups: the advanced group consisting of 3 users and the regular one

consisting of 10 users. The advanced users had prior experience in using clustering and multidimensional

browsing services, while the regular ones had not.

We specified four information needs (or tasks) of exploratory nature, and for the first three we spec-

ified three variations for each. According to Lindgaard and Chattratichart (2007), a big number of tasks

can improve usability test performance. All tasks were refined using the task refinement steps described

in Kules and Capra (2008):

• The task descriptions should include words or semantically close terms that are values of a facet.

• By using keywords of the task description, the user should not complete the task by using the first

10 results of the answer set (else the task would be too easy and would not requiring exploratory

search).

• The facets should be useful without having to click the ”show more” link of a facet.

In the described comparative approach the idea is to let users compare a number of different systems

and rank them according to the following different criteria:

Log Data Analysis

During the evaluation we logged and counted for each user: (a) the number of submitted queries and (b)

the number of clicked zoom points (by facet).

Task Completeness

We measured the average percentage of the correct URLs that users found (both regular and advanced

users) for the evaluating tasks in each user interface, out of the total number of correct URLs in our

testbed.

User Preference

To identify the most preferred interface (for regular and advanced users) we aggregated the preference

rankings for each task using Plurality Ranking (i.e. we count the first positions), and Borda ranking Borda

(1781) (i.e. we summed the positions of each interface in all rankings). In a Plurality column, the higher

Page 157: Papadakos PhD 2013

6.4. Evaluation of Various Exploration Approaches 131

a value is, the better (i.e. the more first positions it got), while in a Borda column the less a value is, the

better. The rows marked with All show the sum of the values of all tasks. With bold we have marked the

best values for each task.

User Satisfaction

Users ranked the interfaces based on their satisfaction, and we aggregated the satisfaction rankings for

each task again using Plurality Ranking and Borda ranking.

User Friendliness

Users ranked the interfaces based on their user friendliness, and we aggregated the rankings for each

task again using Plurality Ranking and Borda ranking.

The results, which are described in detail at Papadakos et al. (2012a) showed that the FDT-based

approaches were the most preferred. Specifically, users both advanced and regular ones, were able to

achieve a significantly higher degree of task completeness with the FDT-based approaches, instead of

the plain clustering one. Furthermore, they submitted the least number of queries with FDT interfaces

(advanced users made more than 50% less queries with FDT). The plain clustering interface was the least

preferred for 58.3% of the advanced users and for 65% of the regular, while the (c) interface was the

most preferred one for the advanced users. For regular users there was a tie between (a) and (c). Finally,

regarding satisfaction, 55% of the advanced users were highly satisfied from (c), while 50% of the regular

users were satisfied by (a). Only 16.6% of the advanced users and 12.5% of the regular users were highly

satisfied from the plain clustering interface.

Furthermore a statistical analysis was conducted, where the upper and lower limits with 95% confi-

dence were computed. For regular users, we made the following observations:

• Only 5% of the regular users with a±9.2 error were not satisfied by the FDT interface (A)

• Only 5% of the regular users with a±15.91 error have low preference for the combined interface

(C)

• Only 12.5% of the regular users, with a±20 error, find the clustering interface highly satisfactory

Page 158: Papadakos PhD 2013

132 Chapter 6. Evaluation

• Only 12.5% of the regular users, with a±20 error, have low satisfaction for the combined interface

(C)

For the advanced users, we did not come up with a clear conclusion due to the big errors.

Summarizing the results of the evaluation, the UIs providing the FDT interaction scheme over static

metadata was the most preferred UIs for regular users. On the other hand, advanced users preferred by

a small margin an FDT UI which in addition provided dynamic metadata through a clustering algorithm.

In any case, the browsing interaction scheme of FDT with or without dynamic metadata was preferred,

provided better user satisfaction, and resulted in a higher task completeness degree with less queries,

than other browsing interaction schemes (i.e. plain clustering).

6.5 Evaluation of Hippalus System

We evaluated the Hippalus system over two different user groups, plains users and expert users. These

two groups were asked to complete a number of tasks over two different UIs:

• a) UI1: Hippalus system with exploration and browsing capabilities only (preference functional-

ity was disabled)

• b)UI2: Hippalus system with exploration and browsing capabilities and preference functionality

enabled

We compared the above interfaces with respect to ease of use, ease of learning, usefulness, user preference,

user satisfaction, and task accomplishment. Furthermore, we wanted to examine how users really used the

above interfaces and for this reason we conducted a log analysis (usage-based evaluation). Finally, for

each user action we calculated a number of the metrics that were described in Section 6.1.2, so that we

could evaluate each user action. Figure 6.5 depicts the steps of our evaluation process.

Participants

In this study, 26 persons5, males and females of varying age (i.e. between 23-43 years) and expertise

(i.e. tertiary education - PhD level) participated. We formed two groups. The first group, named plain5According to Faulkner (2003), 10 evaluators are enough for getting more than 82% of the usability problems of a user

interface (at least in their experiments).

Page 159: Papadakos PhD 2013

6.5. Evaluation of Hippalus System 133

Figure 6.5: Comparative Evaluation Process

users, consisted of 20 regular users, while the second one, expert users, consisted of 6 people with a

prior experience in using multi-dimensional services and preferences. Before starting the evaluation,

users were given a simple tutorial of 15 minutes6 to all the participants of the evaluation. Specifically,

initially users were given a description of the information base (domain, attributes). In the next five

minutes they were described the interactive process of information thinning and finally the rest of the

tutorial demonstrated the preference actions by showing specific examples. Users were allowed to get

acquainted with the UI and complete a number of simple tasks.

6A video is available with the tutorial in http://www.youtube.com/watch?v=Cah-z7KmlXc

Page 160: Papadakos PhD 2013

134 Chapter 6. Evaluation

Attribute Users percentage Attribute Users percentagePrice 90% (27/30) Trunk 40% (12/30)

Manufacturer 90% (27/30) Year 36.7% (11/30)Engine Volume 80% (24/30) Number of doors 33.3% (10/30)

Body Type 73.3% (22/30) Max Speed 16.7% (5/30)Fuel type 53.3% (16/30) Torque 13.3% (4/30)

Power 43.3% (13/30) Acceleration 13.3% (4/30)Consumption city 43.3% (13/30) Drive System 6.7% (2/30)

Consumption national 43.3% (13/30)

Table 6.4: Percentages of the 30 Users that Expressed a Preference Over a Valid Attribute

Information Base

We used an information base of 50 cars, indexed under a big number of classes and subclasses. Specif-

ically, there is a total of 23 classes and 85 subclasses. Some of them are hierarchically organized, like

Manufacturer and some other are flat like Vehicle Type (an example is shown in Figure 5.5).

Tasks

A question during the design of the user tasks was which attributes to use. So, in order to provide repre-

sentative tasks, we designed them on top of attributes for which real users expressed their preferences

(these user preferences were collected in the evaluation described previously in Section 6.3). Specifi-

cally, Table 6.4 shows the percentages of the 30 users that participated in the evaluation described in 6.3,

who expressed a preference over an attribute which is valid in our information base. Users expressed

preferences for a total of 15 attributes that appear in our information base (and for a number of other

attributes valid in our collection base, like color, ABS, etc.). In order to create the task of this evaluation

we identified the most important attributes for which users expressed their preferences. Specifically,

we only considered those with a percentage bigger than 50% (i.e. price, manufacturer, engine volume,

body type and fuel) for the design of the tasks. Notice that for the hierarchically organized values of at-

tribute Manufacturer, a number of users expressed preferences of the form Audi is better than BMW, while

others expressed preferences like Japanese are better to European which are better to American and Korean, so

we try to capture both of them in our task description. Finally, in this specific evaluation we make the

assumption that the user does not change his preference criteria as he is exploring the available choices.

We created two variations of equal7 tasks for the plain users evaluation and for the expert users eval-7In our context task equality is defined as tasks that consist of the same kind of preference actions and criteria.

Page 161: Papadakos PhD 2013

6.5. Evaluation of Hippalus System 135

uation. Each task, in the first subtask used prioritized preference actions, while in the second one used

Pareto composition. Tasks for plain users were designed on top of only 3 criteria. The tasks regarding

the expert users, were more difficult and complicated, since they used 6 different criteria. Specifically,

the tasks that users completed were the following:

Plain User-Based Evaluation

Task A You are supposed to buy a new car, which you will select through the Hippalus system. In

order to identify the best or the set of best cars, you have to consider the following criteria: a) Engine

Volume: You would like a car with an engine volume around 1200cc . b) Price: You are willing to

pay around 10000 Euros. c) Manufacturers: Generally, you prefer European to Korean, and German

manufacturers to other European. You consider Japanese cars better than Korean ones.

• Subtask 1: Which are the best cars according to the above description, if you consider that a) and

b) are equally important and the most important criteria for you, followed by the criterion c).

• Subtask 2: Which are the best cars according to the above description, if you consider that all of

the 3 criteria are equally important?

Task B You are supposed to buy a new car, which you will select through the Hippalus system. In

order to identify the best or the set of best cars, you have to consider the following criteria: a) Engine

Volume: You would like a car with an engine volume around 1600cc. b) Price: You are willing to pay

around 14000 Euros. c) Manufacturers: Generally, you prefer Asian manufacturers to European. From

European you prefer German. Finally, European are better than American.

• Subtask 1: Which are the best cars according to the above description, if you consider that a) and

b) are equally important and the most important criteria for you, followed by c).

• Subtask 2: Which are the best cars according to the above description, if you consider that all of

the 3 criteria are equally important?

Expert-Based Evaluation

Task A You are supposed to buy a new car, which you will select through the Hippalus system. In

order to identify the best or the set of best cars, you have to consider the following criteria: a) Engine

Page 162: Papadakos PhD 2013

136 Chapter 6. Evaluation

Volume: You would like a car with an engine volume around 1200cc. b) Price: You are willing to pay

around 10000 Euros. c) Manufacturers: Generally, you prefer European manufacturers to American,

and German to other European manufacturers. You consider Japanese cars better than Korean ones. d)

Body Type: You want a car with a hatchback body type. Finally, e) Fuel type: fuel should be gasoline

and not diesel and f) Year: you prefer a modern car, i.e. a car from a recent year.

• Subtask A.1: Which are the best cars according to the above description, if you consider that a), b)

and c) are equally important and the most important criteria for you, while d) is more important

than e) which is more important than f).

• Subtask A.2: Which are the best cars according to the above description, if you consider that all

of the 6 criteria are equally important?

Task B You are supposed to buy a new car, which you will select through the Hippalus system. In

order to identify the best or the set of best cars, you have to consider the following criteria: a) Engine

Volume: You would like a car with an engine volume around 1400cc. b) Price: You are willing to pay

around 14000 Euros. c) Manufacturers: Generally, you prefer European manufacturers to Asian and

German to other European. Finally, you prefer Japanese to Korean. d) Body Type: You do not want a car

with a body type of a minivan. Finally, e) Fuel type: fuel type should be diesel and f) Doors: you prefer

a car with 5 doors instead of 3.

• Subtask B.1: Which are the best cars according to the above description, if you consider that a), b

and c) are equally important and the most important criteria for you, while d) is more important

than e) which is more important than f).

• Subtask B.2: Which are the best cars according to the above description, if you consider that all

of the 6 criteria are equally important?

We used rotation and counterbalancing, in order to control for order effects and to increase the

chance that results can be attributed to the experimental treatments and conditions (Kelly 09). Specif-

ically, we used a Graeco-Latin Square Design, rotating both the order of tasks and the order in which

subjects experience the interfaces. Specifically, we created 4 user groups, UGP1, UGP2, UGP3, and

UGP48. Each group completed the tasks as shown in Table 6.5, where column headings represent points

8Unfortunately we only formed two groups of expert users UGE1 and UGE2, since only 6 experts were available.

Page 163: Papadakos PhD 2013

6.5. Evaluation of Hippalus System 137

Users Time 1 Time 2UGP1 UI1 : TaskA1, TaskB2, UI2 : TaskA2, TaskB1

UGP2 UI1 : TaskA2, TaskB1 UI2 : TaskA1, TaskB2

UGP3 UI2 : TaskA1, TaskB2 UI1 : TaskA2, TaskB1

UGP4 UI2 : TaskA2, TaskB1 UI1 : TaskA1, TaskB2

Table 6.5: Graeco-Latin Square Design

in time and order and the rows represent subjects 9.

Evaluation

Users were asked to evaluate two different UIs over theHippalus system, using the previously described

tasks. In the first UI (UI1), preference actions were disabled. As a result, in order to complete the afore-

mentioned tasks, they browsed the car collection and used the available information thinning function-

ality (selection of appropriate facets and terms to restrict their focus). The second UI (UI2), in addition

to the information thinning functionality described previously, provided on top of it the proposed in

this thesis preference actions through context menus. For both UIs, the users provided the set of cars

which they believed fulfilled the needs of each task.

For each task, an expert user provided the ordering of the collection according to preference. The

order was a bucket order, meaning that two cars can be incomparable (i.e. equally preferred ).

Users provided scores for the two exploratory systems, regarding Ease of use, Ease of learning,

Usefulness, Preference and Satisfaction using a psychometric Likert scale. We calculated Effective-

ness (Task completeness) and Efficiency (Time to complete a task) using the logged data.

Main Results

We gathered a number of interesting results from this evaluation. The main results can be synopsized

to the following:

• All plain users preferred the preferences UI instead of the non-preferences UI. Specifically 75%

of the 20 plain users preferred the preference UI very strongly, 20% strongly and only 5% strongly

enough. In addition all 6 expert users preferred the preference UI, 50% of them very strongly and

the other 50% strongly.9For expert users due to their low number, we only rotated the interfaces.

Page 164: Papadakos PhD 2013

138 Chapter 6. Evaluation

• The preference-enabled UI, allowed the users to complete successfully all the tasks, in average

less than a third of the time and with a third of user interactions compared to the plain FDT UI.

• None of the users was able to successfully complete both of the tasks with the plain UI (only 1

expert user and 2 plain users completed successfully one of the two tasks they were assigned using

UI1).

• As a result we verify the conclusions of the theoretical user effort analysis, since the preference-

based UI helped the users to find the desired results in less time and with fewer actions and less

decisions.

Fine Grained Results

Here we discuss in more detail the results of this evaluation.

Specifically, Figure 6.6 (a) depicts the aggregated results according to Plurality (i.e. how many times

each UI was ranked first) and Borda (i.e. the total score each UI gathered from all users and tasks), re-

garding Ease of Use, while Figure 6.6 (b) for Usefulness, Figure 6.6 (c) for Preference and Figure 6.6 (d) for

Satisfaction respectively. Scores are given for both plain and expert users.

It is easy to see that for each one of the above criteria, U2 (i.e. the UI with preferences) was ranked

almost always first for both expert and plain users. There were a number of ties between the two UIs,

especially in the case of plain users (e.g. 14 ties regarding Satisfaction). Notice though, that the less ties

(i.e. more wins for UI2) are in the case of the Preference criterion, where UI2 is a clear winner.

Regarding the total scores of each UI according to Borda, for plain users, UI2 scored on average

almost always 1/3 more than UI1. Specifically, UI2 reached almost 9/10 of the top score (200 in the

case of plain users) a system could score. On the other hand, expert users gave a bit lower rankings for

UI2, which in this case reached 3/4 of the top score (60 in the case of expert users). Again, UI2 was a

clear winner over all criteria, while UI1 reached 1/2 of the top score.

Table 6.6 reports the average, max and min timings and actions per each user group of both plain

and expert users. From the results it is obvious that the timings and number of user actions of UI2

(i.e. the UI with preferences) are much smaller than the ones gathered using UI1 (i.e. the UI without

preferences). Furthermore, we can see that the deviations of min and max actions and timings of UI2

Page 165: Papadakos PhD 2013

6.5. Evaluation of Hippalus System 139

Figure 6.6: Plurality and Borda results for (a) Ease of Use, (b) Usefulness, (c) Preference and (d) Satisfaction.

from the average ones are also much smaller than the respective deviations of UI110. In addition, the

timings and interactions for expert users is bigger for UI2 than the plain users since the users had to

express a lot more preference actions.

In more detail, Table 6.7 reports the average, max and min timings and actions per all and per each

task for both UIs. Lets discuss first the average timings and user actions for all tasks, for both plain and

expert users. It is obvious that UI2 is much more efficient in terms of timings and interactions for both

user groups. Specifically, plain users on average were almost 3 times more efficient with UI2 instead of

UI1 for both timings and user actions. On the other hand, expert users were on average more than 3.3

times more efficient and made half the interactions with UI2 instead of UI111.

10Notice that since a number of users were checking the correctness of the preferred cars returned by Hippalus for UI2,the timings and numbers of user actions reported here should be bigger than the results that would be gathered from usersthat are confident about Hippalus.

11It seems that expert users were more conservative regarding their interactions with the system.

Page 166: Papadakos PhD 2013

140 Chapter 6. Evaluation

Plain UGP1 UI1 A1 (sec) A1 (act.) B2 (sec) B2 (act.) U2 A2 (sec) A2 (act.) B1 (sec) B1 (act.)Average 688.71 108.2 681.00 86.2 379.40 43 223.74 38.8Max 1071.53 176 1104.38 122 803.70 58 409.92 49Min 270.75 71 347.41 50 146.19 31 130.22 25Plain UGP2 UI2 A2 (sec) A2 (act.) B1 (sec) B1 (act.) U1 A1 (sec) A1 (act.) B2 (sec) B2 (act.)Average 220.06 41.8 226.52 39.2 596.15 100.2 705.22 100.8Max 357.92 75 367.29 61 1438.11 165 1744.77 167Min 136.49 32 151.04 31 294.34 53 178.76 43Plain UGP3 UI2 A1 (sec) A1 (act.) B2 (sec) B2 (act.) U1 A2 (sec) A2 (act.) B1 (sec) B1 (act.)Average 284.58 36.2 154.47 33.2 878.51 136.6 554.29 87.8Max 347.59 45 189.43 42 1431.88 297 881.39 110Min 202.60 29 121.29 25 500.75 73 256.45 51Plain UGP4 UI1 A2 (sec) A2 (act.) B1 (sec) B1 (act.) U2 A1 (sec) A1 (act.) B2 (sec) B2 (act.)Average 949.86 116.4 631.56 125.2 243.77 37.8 133.99 33Max 1824.65 166 1062.11 205 391.87 52 188.46 36Min 480.35 77 119.93 19 146.22 33 75.82 27Expert UGE1 UI1 A1 (sec) A1 (act.) B2 (sec) B2 (act.) U2 A2 (sec) A2 (act.) B1 (sec) B1 (act.)Average 862.46 142.33 1020.00 125.33 192.47 47.33 308.85 57.33Max 1416.10 246 1636.02 157 260.92 58 355.73 64Min 441.72 59 394.92 70 61.67 27 282.89 50Expert UGE2 UI2 B2 (sec) B2 (act.) A1 (sec) A1 (act.) U1 A2 (sec) A2 (act.) B1 (sec) B1 (act.)Average 280.99 69.33 365.20 70.66 1083.08 148.33 842.79 92.66Max 346.04 75 580.59 85 1530.93 194 1447.35 100Min 185.60 58 246.26 42 853.40 57 434.22 78

Table 6.6:Plain and Expert Users Average, Max and Min Timings and User Actions for eachTask for both UIs per each User Group

The task that benefited the most from the preference interaction,seems that for plain users, was Task

B2, since the speedup for timings was 4.80x. On the contrary, the speedup for user actions was almost the

same for all tasks. Furthermore, notice that for Task B1, there was a plain user that completed the task

quickly and with only 19 interactions12. On the other hand, regarding expert users, Task A2 benefited

the most from the preference interaction, since the speedup for timings was 5.62x and 3.13x regarding

user actions. On the other tasks the speedup for user actions was much smaller. The above results are

synopsized in Figure 6.7 (a).

Finally, the preference-based approach gave better average values for each used metric during the

session of each exploratory task. Specifically, none of the users was able to successfully complete both

of the tasks with the plain UI UI1. On the contrary all the users completed successfully all the tasks with

UI2, a result that highlights the user-friendliness and efficiency of the proposed interaction scheme.

12The correct answer of this task included 4 cars and this user found only one of them.

Page 167: Papadakos PhD 2013

6.5. Evaluation of Hippalus System 141

Plain All Tasks UI1 (sec) UI2 (sec) Speedup U1 (act.) U2 (act.) SpeedupAverage 710.66 233.32 3.04x 107.67 37.87 2.84xPlain Task A1Average 642.43 264.17 2.43x 104.2 37 2.81xMax 1438.11 391.87 3.66x 176 52 3.38xMin 270.75 146.22 1.85x 53 29 1.82xPlain Task A2Average 914.18 299.73 3.04x 126.5 42.4 2.98xMax 1824.65 803.702 2.27x 297 75 3.96xMin 480.35 136.49 3.51x 73 31 2.35xPlain Task B1Average 592.93 225.13 2.63x 106.5 39 2.73xMax 1062.11 409.92 2.59x 205 39.2 5.22xMin 119.93 130.22 0.92x 19 38.8 0.48xPlain Task B2Average 693.11 144.23 4.80x 93.5 33.1 2.82xMax 1744.77 189.43 9.21x 167 42 3.97xMin 178.76 75.82 2.35x 43 25 1.72xExpert All Tasks UI1 (sec) UI2 (sec) Speedup U1 (act.) U2 (act.) SpeedupAverage 952.08 286.88 3.32x 127.17 61.17 2.08xExpert Task A1Average 862.46 365.20 2.36x 142.33 70.66 2.01Max 1416.11 580.60 2.44x 246 85 2.89xMin 441.72 246.27 1.79x 59 42 1.40xExpert Task A2Average 1083.08 192.47 5.63x 148.33 47.33 3.13xMax 1530.94 260.92 5.87x 194 58 3.35xMin 853.41 61.68 13.84x 57 27 2.11xExpert Task B1Average 842.79 308.85 2.73x 92.67 57.33 1.62xMax 1447.35 355.73 4.07x 100 64 1.56xMin 434.22 282.90 1.53x 78 50 1.56xExpert Task B2Average 1020.00 281.0 3.00x 125.33 69.33 1.34xMax 1636.03 346.05 4.73x 157 75 1.33xMin 394.92 185.61 2.13x 70 58 1.34x

Table 6.7:Plain and Expert Users Average, Max and Min Timings and User Actions per eachTask and All Tasks for both UIs

Notice, that 1 expert user and 2 plain users managed to successfully complete one of the two tasks they

were assigned using UI1. The above are depicted in the calculated Recall, Precision, and Average Preci-

sion metrics, which are reported in Table 6.8. Notice that on average for all tasks, there is a 2.30x and

3.49x improvement regarding average precision for plain and expert users respectively. Task B2 seems

to be the most difficult task for both plain and expert users, since the biggest gains in all three metrics

Page 168: Papadakos PhD 2013

142 Chapter 6. Evaluation

Figure 6.7:Average Values in Last Step of Each Task. (a) for Timings (T) and Actions (A), while(b) Depicts the Values for Recall (R), Precision (P) and Average Precision (AP)

were observed here (i.e. regarding average precision more than 3.62x improvement for plain and 6.36x

improvement for expert users.) Furthermore, notice that there were higher improvements per each

metric for expert users with UI2, since their tasks were much more complicated and the number of cri-

teria was bigger than the tasks of the plain users. As a result, although experts, these users achieved

lower rankings with UI1 for almost all metrics. The above results are synopsized in Figure 6.7 (b). Since

the results of the two approaches show such significant differences for the basic metrics of Recall, Pre-

cision and Average Precision, we did not consider evaluating the other more refined metrics described in

Section 6.1.

6.6 Evaluation Conclusion

In this chapter we have discussed a number of evaluation metrics and approaches for exploratory search

and we selected those that could apply in our preference–based approach.

In addition, the provided theoretical analysis of user effort in FDT interaction schemes described in

Section 6.2, shows the benefits of the FDT interaction (i.e. small number of interactions and decisions).

Specifically, the section provided an example where a user could find the desired 10 objects in a peta-size

collection with only 30 clicks (number of decisions is 90). The extension of this study to the proposed

preference-enriched scheme, assuming that a user has expressed a preference relation for each facet

and that the most preferred choice is prompted first, shows that the number of decisions is reduced

to the number of clicks (i.e. to 30 for a peta-sized collection).

Page 169: Papadakos PhD 2013

6.6. Evaluation Conclusion 143

Metric Plain Users Expert UsersAll Tasks UI1 UI2 Improv. UI1 UI2 Improv.Recall 0.56 1 1.7x 0.52 1 1.92xPrecision 0.61 1 1.62x 0.43 1 2.31xAverage Precision 0.433 1 2.30x 0.28 1 3.49xTask A1 UI1 UI2 Improv. UI1 UI2 Improv.Recall 0.63 1 1.57x 0.66x 1 1.5xPrecision 0.48 1 2.06x 0.44x 1 2.25xAverage Precision 0.42 1 2.33x 0.44x 1 2.25xTask A2 UI1 UI2 Improv. UI1 UI2 Improv.Recall 0.55 1 1.81x 0.33 1 3xPrecision 0.80 1 1.24x 0.53 1 1.87xAverage Precision 0.54 1 1.84x 0.26 1 3.75xTask B1 UI1 UI2 Improv. UI1 UI2 Improv.Recall 0.61 1 1.62x 0.83 1 1.2xPrecision 0.77 1 1.28x 0.38 1 2.57xAverage Precision 0.487 1 2.049x 0.277 1 3.601xTask B2 UI1 UI2 Improv. UI1 UI2 Improv.Recall 0.46 1 2.14x 0.25 1 4xPrecision 0.39 1 2.53x 0.36 1 2.76xAverage Precision 0.27 1 3.62x 0.15 1 6.36x

Table 6.8:Plain and Expert Users Recall, Precision and Average Precision Metrics per eachand all Tasks for both UIs

Subsequently, we formulated the an hypothesis expressing Difficulty of Formulating Effective Pref-

erences without Knowing the Options (DiFEPreKO), and the conducted user study showed that without

the ability to explore the existing choices, the expression of preferences is time-consuming and in most

cases results to incomplete preferences. Specifically, we found that only 20% of the users were able to

identify the ideal car from a list of cars according to their previously expressed preferences (the percent-

age raises to 33% if we also consider the second ideal car). Furthermore, users expressed preferences

that were inconsistent to their final decision (23% of the preferences). The statistical analysis over

the results, provide a strong evidence against the formulated null hypothesis and we can conclude that

”without exploring available cars only less than half of the users can express their preferences in

a way sufficient for returning the ideal car for them from a car collection”.

We also conducted two comparative user studies, one for evaluating the FDT per se and another

for the proposed preference-enriched FDT interaction. The first one was conducted over the Mitos

WSE and evaluated a number of different exploratory interfaces. This evaluation showed that the UI

that supported the FDT interaction scheme over static metadata was the most preferred UI among the

Page 170: Papadakos PhD 2013

144 Chapter 6. Evaluation

regular users. On the other hand, advanced users preferred by a small margin the FDT UI which in

addition to the static metadata, also uses dynamic metadata through a clustering algorithm. In any case,

the browsing interaction scheme of FDT with or without dynamic metadata was preferred, provided

better user satisfaction, and resulted in a higher task completeness degree with less queries, than

the other browsing interaction schemes (i.e. plain clustering).

Finally, the second user–based comparative evaluation that we conducted over theHippalus system,

showed that 100% of the users (both expert and plain ones) preferred the preference–based UI, a

result that was supported by each distinct qualitative result. The preference-enabled UI, allowed users

to complete successfully all the tasks, in average less than a third of the time and with a third of user

interactions compared to the plain FDT UI. Furthermore, none of the users was able to successfully

complete both of the tasks with the plain UI (1 expert user and 2 plain users completed successfully one

of the two they were assigned using UI1). As a result we verify the conclusions of the theoretical user

effort analysis, since the preference-based UI helps users to find the desired results in less time and with

fewer actions and less decisions. Finally, the preference-based approach gave better average values for

each used metric during the session of each exploratory task.

Page 171: Papadakos PhD 2013

Chapter 7

Conclusion and Future Research

Contents7.1 Synopsis of Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147

7.2 Directions for Future Work and Research . . . . . . . . . . . . . . . . . . . . . . 148

7.1 Synopsis of Contributions

In this thesis we motivated the need for real-time preference elicitation and we introduced a language

(including its syntax, semantics and GUI-level exploitation methods) for enriching the interaction scheme

of Faceted and Dynamic Taxonomies (FDT) with preference elicitation and preference-based interaction. Key

aspects of the proposed approach include, the support of hierarchically organized values, the support of

set-valued attributes, and the incremental preference specification mode with the scope-based method for re-

solving conflicts. In addition, the rapid reduction of the information space that is possible with FDT, makes

preference-based ordering feasible on large information bases, since the introduced algorithms for pro-

ducing the preference order are independent of the size of the information base; they depend on the size of

the focus, and the number of the preference actions enacted by the user. Furthermore, we provided a

top-k variation of the algorithm suitable for the case where the size of the focus is big.

To demonstrate the feasibility of our approach and for identifying possible difficulties or other is-

145

Page 172: Papadakos PhD 2013

146 Chapter 7. Conclusion and Future Research

sues related to implementation and application, we have designed and implemented a proof of concept

prototype, the Hippalus system. This system provides exploration services over RDF information bases

and supports the introduced preference framework through HTML 5 contextmenus. Specifically, the user

is able to order classes, subclasses and objects and he can compose object related preferences, using Priority,

Pareto and Pareto Optimal compositions.

We provided a theoretical analysis of user effort in FDT interaction schemes, plain and preference–

enabled ones, that suggests the effectiveness of the proposed approach in respect to the interaction and

decision cost. In addition, we formulated the Difficulty of Formulating Effective Preferences without

Knowing the Options (DiFEPreKO) hypothesis and the conducted user study showed that without the

ability to explore the existing choices, the expression of preferences is time-consuming and in most

cases results to incomplete preferences. Finally, we conducted two comparative user studies, one for

evaluating the FDT per se and another for the proposed preference-enriched FDT interaction. The first

one was conducted over the MitosWSE and suggested that the browsing interaction scheme of FDT with

or without dynamic metadata was preferred, provided better user satisfaction, and resulted in a higher task

completeness degree with less queries, than the other browsing interaction schemes (i.e. plain clustering).

The second one, conducted over the Hippalus system, showed that 100% of the users (both expert and

plain ones) preferred the preference–based UI. In more detail this UI allowed users to complete success-

fully all the tasks, in less than a quarter of the time and with a quarter of user interactions compared to the plain

FDT UI. Furthermore, none of the users was able to successfully complete any of the tasks with the plain

UI. As a result we verify the conclusions of the theoretical user effort analysis, since the preference-based

UI helps users to find the desired results in less time and with fewer actions and less decisions.

7.2 Directions for Future Work and Research

There are several issues that are worth further work and research.

As regards applicability, it is worth developing wrappers that can be used for feeding (synchronously

or asynchronously) Hippalus with the results of queries from web search engines (e.g. at least those

which are OpenSearch compatible), database sources, SPARQL queries, etc. The availability of such wrap-

pers can lead to a generic client of search services that can bring the benefits of Hippalus system to a

plethora of users. Furthermore, up to now Hippalus does not support multi-valued attributes.

Page 173: Papadakos PhD 2013

7.2. Directions for Future Work and Research 147

Regarding the interaction model, we have not realized any substantial requirement for change or ad-

vancement. This is also supported by the results of the user study.

As far as the algorithmic part is concerned, in this thesis we strongly suggest a process that contains

both information thinning and preference actions, since apart from giving users the required overview

for decision making, it also significantly reduces the computational effort for deriving the preference-

order. But it is still interesting to investigate optimizations for the case where the current answer is very

big, i.e. to further research the direction described in Section 4.

Finally, considering the structure of the information space (either of the information corpus or the

search results), (i.e. objects described according to a multidimensional space with hierarchically orga-

nized values), one possible future direction could be to consider more complex structures. For example,

objects described with values accompanied by numbers expressing various quality aspects like accuracy

Powley and Dale (2007), specificity Tzitzikas et al. (2013), certainty Webber et al. (2012), trust, authority

and popularity Kazai and Milic-Frayling (2008), etc. Then we can investigate the required advancements

of both the interaction model and the preference framework.

Page 174: Papadakos PhD 2013

148

Page 175: Papadakos PhD 2013

References

Abel, F., Celik, I., and Siehndel, P. 2011. “Towards a Framework for Adaptive Faceted Search on Twitter”.

In Procs of the InternationalWorkshop onDynamic andAdaptiveHypertext (DAH’11), ACMHypertext, Eindhoven,

The Netherlands.

Agrawal, R., Borgida, A., and Jagadish, H. 1989. “Efficient Management of Transitive Relationships in

Large Data and Knowledge Bases”. ACM SIGMOD Record 18, 2, 253–262.

Agrawal, R., Gollapudi, S., Halverson, A., and Ieong, S. 2009. “Diversifying Search Results”. In Procs of the

Second ACM International Conference on Web Search and Data Mining (WSDM’09). ACM, New York, NY, USA,

5–14.

Agrawal, R. and Wimmers, E. L. 2000. “A Framework for Expressing and Combining Preferences”. In Procs

of the 2000 ACM SIGMOD international conference on Management of data (SIGMOD ’00). ACM, New York, NY,

USA, 297–306.

Andreka, H., Ryan, M., and Schobbens, P.-Y. 2002. “Operators and Laws for Combining Preference Rela-

tions”. Journal of Logic and Computation 12, 1, 13–53.

Azzopardi, L. 2009. “Usage Based Effectiveness Measures: Monitoring Application Performance in Infor-

mation Retrieval”. In Procs the 18th ACM Conferemce on Information and Knowledge Management (CIKM’09).

ACM, New York, NY, USA, 631–640.

Balke, W.-T. and Güntzer, U. 2004. “Multi-Objective Query Processing for Database Systems”. In Procs of

the Thirtieth International Conference on Very large Data Bases (VLDB’04). VLDB Endowment, 936–947.

Barrett, R. and Salles, M. 2006. ”Social Choice with Fuzzy Preferences”. Economics Working Paper

Archive (University of Rennes 1 & University of Caen), Center for Research in Economics and Man-

agement (CREM), University of Rennes 1, University of Caen and CNRS.

149

Page 176: Papadakos PhD 2013

150 References

Basu, C., Hirsh, H., and Cohen, W. W. 1998. “Recommendation as Classification: Using Social and Content-

Based Information in Recommendation”. In In Procs of the Fifteenth National Conference on Artificial Intel-

ligence (AAAI/IAAI’98). 714–720.

Becker, C. and Bizer, C. 2009. “Exploring the Geospatial Semantic Web with DBpedia Mobile”. Web Seman-

tics: Science, Services and Agents on the World Wide Web 7, 4, 278 – 286.

Ben-Yitzhak, O., Golbandi, N., Har’El, N., Lempel, R., Neumann, A., Ofek-Koifman, S., Sheinwald, D.,

Shekita, E., Sznajder, B., and Yogev, S. 2008. “Beyond Basic Faceted Search”. In Procs of the Interna-

tional Conference on Web Search and Web Data Mining, (WSDM’08). Palo Alto, California, USA, 33–44.

Binshtok, M., Brafman, R. I., Shimony, S. E., Martin, A., and Boutilier, C. 2007. “Computing Optimal

Subsets”. In Procs of the 22nd National Conference on Artificial Intelligence - Volume 2 (AAAI’07). AAAI Press,

1231–1236.

Bizer, C., Heath, T., and Berners-Lee, T. 2009. “Linked Data - The Story So Far”. International Journal of

Semantic Web Information Systems 5, 3, 1–22.

Borda, J. C. 1781. “Memoire sur les Elections au Scrutin”. Histoire de l’Academie Royale des Sciences,

Paris.

Bot, R. S. and Wu, Y. B. 2004. “Improving Document Representations Using Relevance Feedback: The RFA

Algorithm”. In Procs of the 13th ACM International Conference on Information and Knowledge Management

(CICM’04). Washington, USA.

Boutilier, C., Brafman, R. I., Domshlak, C., Hoos, H. H., and Poole, D. 2004. ”CP-nets: A Tool for Repre-

senting and Reasoning with Conditional Ceteris Paribus Preference Statements”. Journal Of Artificial

Intelligence Research 21, 135–191.

Brafman, R. I., Domshlak, C., Shimony, S. E., and Silver, Y. 2006. “Preferences Over Sets”. In Procs of the

21st National Conference on Artificial Intelligence - Volume 2 (AAAI’06). AAAI Press, 1101–1106.

Braziunas, D. 2006. “Computational Approaches to Preference Elicitation”. Tech. rep., Department of

Computer Science, University of Toronto.

Page 177: Papadakos PhD 2013

References 151

Breese, J., Heckerman, D., and Kadie, C. 1998. “Empirical Analysis of Predictive Algorithms for Collab-

orative Filtering”. In Procs of the 14th Annual Conference on Uncertainty in Artificial Intelligence (UAI-98).

Morgan Kaufmann, San Francisco, CA, 43–52.

Burges, C. J. C., Shaked, T., Renshaw, E., Lazier, A., Deeds, M., Hamilton, N., and Hullender, G. N. 2005.

“Learning to Rank Using Gradient Descent”. In Procs of the 22nd international conference on Machine learn-

ing (ICML’05). 89–96.

Byström, K. and Järvelin, K. 1995. “Task Complexity Affects Information Seeking and Use”. In Information

Processing and Management. 191–213.

Börzsönyi, S., Kossmann, D., and Stocker, K. 2001. “The Skyline Operator”. In Procs of the 17th International

Conference on Data Engineering (ICDE’01). 421–430.

Callan, J. 1996. “Document Filtering with Inference Networks”. In Procs of the 19th Annual International

Conference on Research and Development in Information Retrieval (SIGIR’96). New York, NY, USA, 262–269.

Carpineto, C., Osiński, S., Romano, G., and Weiss, D. 2009. “A Survey of Web Clustering Engines”. ACM

Computing Surveys 41, 3, 17:1–17:38.

Carterette, B., Kanoulas, E., and Yilmaz, E. 2011. “Simulating Simple User Behavior for System Effective-

ness Evaluation”. In Procs of the 20th ACM International Conference on Information and Knowledge Manage-

ment (CIKM’11). ACM, New York, NY, USA, 611–620.

Carterette, B., Kanoulas, E., and Yilmaz, E. 2012. “Evaluating Web Retrieval Effectiveness”. In Web Search

Engine Research, D. Lewandowski, Ed. Emerald Books, 105–137.

Chakrabarti, K., Chaudhuri, S., and Hwang, S. 2004. “Automatic Categorization of Query Results”. Procs

of the 2004 ACM SIGMOD International Conference on Management of Data (SIGMOD’04), 755–766.

Chan, C.-Y., Jagadish, H. V., Tan, K.-L., Tung, A. K. H., and Zhang, Z. 2006a. “Finding k-Dominant Skylines

in High Dimensional Space”. In Procs of the 2006 ACM SIGMOD International Conference on Management of

Data (SIGMOD’06). ACM, New York, NY, USA, 503–514.

Page 178: Papadakos PhD 2013

152 References

Chan, C.-Y., Jagadish, H. V., Tan, K.-L., Tung, A. K. H., and Zhang, Z. 2006b. “On High Dimensional Skylines”.

In Procs of the 10th International Conference on Advances in Database Technology (EDBT’06). Springer-Verlag,

Berlin, Heidelberg, 478–495.

Chang, K. C. and Hwang, S. 2002. “Minimal Probing: Supporting Expensive Predicates for Top-k Queries”.

In Procs of the 2002 ACM SIGMOD International Conference on Managementt of Data (SIGMOD’02). 346–357.

Chapelle, O., Ji, S., Liao, C., Velipasaoglu, E., Lai, L., and Wu, S.-L. 2011. “Intent-Based Diversification of

Web Search Results: Metrics and Algorithms.”. Information Retrieval 14, 6, 572–592.

Chapelle, O., Metlzer, D., Zhang, Y., and Grinspan, P. 2009. “Expected Reciprocal Rank for Graded Rele-

vance”. In Procs of the 18th ACM Conference on Information and Knowledge Management (CIKM’09). 621–630.

Chaudhuri, S. and Gravano, L. 1999. “Evaluating Top-k Selection Queries”. In Procs of 25th International

Conference on Very Large Data Bases (VLDB’99). 397–410.

Chen, G. and Kotz, D. 2000. “A Survey of Context-Aware Mobile Computing Research”. Tech. rep.,

Hanover, NH, USA.

Chen, L. and Pu, P. 2004. ”Survey of Preference Elicitation Methods”. Tech. rep., Swiss Federal Institute

of Technology in Lausanne (EPFL).

Choi, J., Kim, M., and Raghavan, V. V. 2001. “Adaptive Feedback Methods in an Extended Boolean Model”.

In ACM SIGIR Workshop on Mathematical/Formal Methods in Information Retrieval. New Orleans, LA.

Chomicki, J. 2003. “Preference Formulas in Relational Queries”. ACMTransactions on Database Systems 28, 4,

427–466.

Chomicki, J. 2007. “Database Querying Under Changing Preferences”. Annual of Mathematics and Artificial

Intelligence 50, 1-2, 79–109.

Chomicki, J., Godfrey, P., Gryz, J., and Liang, D. 2003. “Skyline with Presorting”. Procs of Data Engineering,

International Conference (ICDE’03), 717–719.

Chowdhury, S., Gibb, F., and Landoni, M. 2011. “Uncertainty in Information Seeking and Retrieval: A

Study in an Academic Environment”. Information Processing & Management 47, 2, 157–175.

Page 179: Papadakos PhD 2013

References 153

Ciaccia, P. and Torlone, R. 2011. “Modeling the Propagation of User Preferences”. In Procs of the 30th

International Conference on Conceptual Modeling (ER’11). 304–317.

Clarke, C. L., Kolla, M., Cormack, G. V., Vechtomova, O., Ashkan, A., Büttcher, S., and MacKinnon, I. 2008.

“Novelty and Diversity in Information Retrieval Evaluation”. In Procs of the 31st Annual International ACM

SIGIR Conference on Research and Development in Information Retrieval (SIGIR’08). ACM, New York, NY, USA,

659–666.

Cohen, W. W., Schapire, R. E., and Singer, Y. 1999. “Learning to Order Things”. Journal of Artificial Intelli-

gence Research 10, 243–270.

Cooper, W. S. 1968. “Expected Search Length: A Single Measure of Retrieval Effectiveness Based on the

Weak Ordering Action of Retrieval Systems”. In American Documentation. 30–41.

Crawford, D. E. 2006. “Supporting Exploratory Search”. Communications of ACM 49, 4.

Croft, B. W. and Lafferty, J., Eds. 2003. “Language Modeling for Information Retrieval”. The Information

Retrieval Series, vol. 13. Springer.

Dakka, W., Ipeirotis, P., and Wood, K. R. 2005. “Automatic Construction of Multifaceted Browsing Inter-

faces”. In Procs of the 14th ACM International Conference on Information and Knowledge Management (CIKM

’05). New York, NY, USA, 768–775.

Dash, D., Rao, J., Megiddo, N., Ailamaki, A., and Lohman, G. 2008. “Dynamic Faceted Search for Discovery-

Driven Analysis”. In Procs of CIKM.

Delgado, J. and Ishii, N. 1999. “Memory-Based Weighted-Majority Prediction for Recommender Systems”.

desJardins, M., Eaton, E., and Wagstaff, K. L. 2006. “Learning User Preferences for Sets of Objects”. In Procs

of the 23rd International Conference on Machine Learning (ICML ’06). ACM, New York, NY, USA, 273–280.

desJardins, M. and Wagstaff, K. 2005. “DD-PREF: A Language for Expressing Preferences Over Sets”. In

Procs of the 20th national conference on Artificial intelligence (AAAI’05). 620–626.

Doyle, J. 2004. “Prospects for Preferences”. Computational Intelligence 20, 2, 111–136.

Page 180: Papadakos PhD 2013

154 References

Fafalios, P., Kitsos, I., Marketakis, Y., Baldassarre, C., Salampasis, M., and Tzitzikas, Y. 2012a. “Web Search-

ing with Entity Mining at Query Time”. In Procs of the 5th Information Retrieval Facility Conference (IRFC’05).

Vienna, Austria.

Fafalios, P., Kitsos, I., and Tzitzikas, Y. 2012b. “Scalable, Flexible and Generic Instant Overview Search”.

In Procs of the 21st international conference companion on World Wide Web. ACM, 333–336.

Fafalios, P., Salampasis, M., and Tzitzikas, Y. 2013. “Exploratory Patent Search with Faceted Search and

Configurable Entity Mining”. In Procs of the 1st International Workshop on Integrating IR technologies for

Professional Search (ECIR 2013 Workshop). Moscow, Russia.

Fafalios, P. and Tzitzikas, Y. 2013. “X-ENS: Semantic Enrichment of Web Search Results at Real-Time”.

In Procs of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval

(SIGIR’13 Demo paper). Dublin, Ireland.

Faulkner, L. 2003. “Beyond the Five-User Assumption: Benefits of Increased Sample Sizes in Usability

Testing”. Behavior Research Methods, Instruments & Computers 35, 3, 379–383.

Ferré, S. and Hermann, A. 2012. “Reconciling Faceted Search and Query Languages for the Semantic

Web”. International Journal of Metadata, Semantics and Ontologies 7, 1, 37–54.

Fishburn, P. 1970. “Utility Theory for Decision Making”. Wiley, New York.

Fishburn, P. 1999. “Preference Structures and their Numerical Representations”. Theoretical Computer

Science 217, 359–383.

Gadanho, S. C. and Lhuillier, N. 2007. “Addressing Uncertainty in Implicit Preferences”. In Procs of the

2007 ACM Conference on Recommender Systems (RecSys ’07). ACM, New York, NY, USA, 97–104.

Georgiadis, P., Kapantaidakis, I., Christophides, V., Nguer, E. M., and Spyratos, N. 2008. “Efficient Rewrit-

ing Algorithms for Preference Queries”. In Procs of the 24th International Conference on Data Engineering

(ICDE’08).

Golfarelli, M., Rizzi, S., and Biondi, P. 2011. “myOLAP: An Approach to Express and Evaluate OLAP Pref-

erences”. IEEE Transactions Knowledge and Data Engineering 23, 7, 1050–1064.

Page 181: Papadakos PhD 2013

References 155

Griffiths, D. 2009. “Head First Statistics”. Head first. O’Reilly, Sebastopol, CA.

Hansson, S. O. 2001. “Preference Logic”. In Handbook of Philosophical Logic, D. Gabbay and F. Guenthner,

Eds. Vol. 4. Kluwer, Chapter 4, 319–393.

Hearst, M., Elliott, A., English, J., Sinha, R., Swearingen, K., and Yee, K.-P. 2002. “Finding the Flow in Web

Site Search”. Communications of ACM 45, 9, 42–49.

Hearst, M. and Pedersen, J. 1996. “Reexamining the Cluster Hypothesis: Scatter/Gather on Retrieval

Results”. In Procs of the 19thAnnual International ACMConference onResearch andDevelopment in Information

Retrieval, (SIGIR’96). Zurich, Switzerland, 76–84.

Hearst, M. A. 2006. “Clustering versus Faceted Categories for Information Exploration”. Communications

of the ACM 49, 4, 59–61.

Herlocker, J. L., Konstan, J. A., Borchers, A., and Riedl, J. 1999. “An Algorithmic Framework for Performing

Collaborative Filtering”. In Procs of the 22nd Annual International ACM conference on Research and Develop-

ment in Information Retrieval (SIGIR’99). ACM Press, New York, NY, USA, 230–237.

Hildebrand, M., van Ossenbruggen, J., and Hardman, L. 2006. “/facet: A Browser for Heterogeneous

Semantic Web Repositories”. In Procs of International Semantic Web Conference, (ISWC’06). Athens, GA,

USA, 272–285.

Hofmann, T. and Puzicha, J. 1999. “Latent Class Models for Collaborative Filtering”. In Procs of the Sixteenth

International Joint Conference on Artificial Intelligence (IJCAI’99). Morgan Kaufmann Publishers Inc., San

Francisco, CA, USA, 688–693.

Hyvönen, E., Mäkelä, E., Salminen, M., Valo, A., Viljanen, K., Saarela, S., Junnila, M., and Kettula, S. 2005.

“MuseumFinland – Finnish Museums on the Semantic Web”. Journal of Web Semantics 3, 2, 25.

Ilyas, I. F., Aref, W. G., and Elmagarmid, A. K. 2004a. “Supporting Top-k Join Queries in Relational

Databases”. VLDB Journal 13, 3, 207–221.

Ilyas, I. F., Shah, R., Aref, W. G., Vitter, J. S., and Elmagarmid, A. K. 2004b. “Rank-Aware Query Opti-

mization”. In Procs of the 2000 ACM SIGMOD international conference on Management of data (SIGMOD ’04).

203–214.

Page 182: Papadakos PhD 2013

156 References

Inan, H. 2006. “Search Analytics: A Guide to Analyzing and Optimizing Website Search Engines”. Book Surge

Publishing.

Järvelin, K. and Kekäläinen, J. 2002. “Cumulated Gain-Based Evaluation of IR Techniques”. ACM Transac-

tions Information Systems 20, 4, 422–446.

Järvelin, K., Price, S. L., Delcambre, L. M. L., and Nielsen, M. L. 2008. “Discounted Cumulated Gain Based

Evaluation of Multiple-Query IR Sessions”. In European Conference on Information Retrieval (ECIR’08). 4–15.

Jin, R., Chai, J. Y., and Si, L. 2004. “An Automatic Weighting Scheme for Collaborative Filtering”. In Procs of

the 27th Annual International ACM Conference on Research and Development in Information Retrieval (SIGIR’04).

ACM Press, 337–344.

Kahn, A. B. 1962. “Topological Sorting of Large Networks”. Communications of the ACM 5, 11, 558–562.

Käki, M. and Aula, A. 2008. “Controlling the Complexity in Comparing Search User Interfaces via User

Studies”. Information Processing & Management 44, 1, 82–91.

Kanoulas, E., Carterette, B., Clough, P., and Sanderson, M. 2011a. “Evaluating Multi-Query Sessions”. In

Procs of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval

(SIGIR’11). 1053–1062.

Kanoulas, E., Carterette, B., Clough, P., and Sanderson, M. 2011b. “Session Track 2011 Overview”. In Procs

of the Twentieth Text REtrieval Conference Procs (TREC 2011). National Institute of Standards and Technol-

ogy.

Karlson, A. K., Robertson, G. G., Robbins, D. C., Czerwinski, M. P., and Smith, G. R. 2006. “FaThumb:

a Facet-Based Interface for Mobile Search.”. In Procs of the Conference on Human Factors in Computing

Systems, (CHI’06). New York, NY, USA, 711–720.

Kashyap, A., Hristidis, V., and Petropoulos, M. 2010. “FACeTOR: Cost-Driven Exploration of Faceted Query

Results”. In Procs of the 19th ACM international conference on Information and knowledge management (CIKM

’10). ACM, New York, NY, USA, 719–728.

Page 183: Papadakos PhD 2013

References 157

Kazai, G. and Milic-Frayling, N. 2008. “Trust, Authority and Popularity in Social Information Retrieval”.

In Procs of the 17th ACM conference on Information and knowledge management (CIKM’08). ACM, New York,

NY, USA, 1503–1504.

Keeney, R. L. and Raiffa, H. 1976. “Decisions with Multiple Objectives: Preferences and Value Tradeoffs”. John

Wiley & Sons.

Kelly, D. 2009. “Methods for Evaluating Interactive Information Retrieval Systems with Users”. Founda-

tions and Trends in Information Retrieval 3, 1-2, 1–224.

Kelly, D. and Belkin, N. J. 2001. “Reading Time, Scrolling and Interaction: Exploring Implicit Sources of

User Preferences for Relevance Feedback”. In Procs of the 24th Annual International ACM SIGIR Conference

on Research and Development in Information Retrieval (SIGIR’01). ACM, New York, NY, USA, 408–409.

Kelly, D., Dumais, S., and Pedersen, J. O. 2009. “Evaluation Challenges and Directions for Information-

Seeking Support Systems”. Computer 42, 3, 60–66.

Kelly, D. and Teevan, J. 2003. “Implicit Feedback for Inferring User Preference: a Bibliography”. SIGIR

Forum 37, 2, 18–28.

Kießling, W. 2002. “Foundations of Preferences in Database Systems”. In Procs of the 28th International

Conference on Very Large Data Bases (VLDB’02). VLDB Endowment, 311–322.

Kießling, W., Endres, M., and Wenzel, F. 2011a. “The Preference SQL System - An Overview”. IEEE Data

Engineering Bulletin 34, 2, 11–18.

Kießling, W., Hafenrichter, B., 0003, S. F., and Holland, S. 2001. “Preference XPATH: A Query Language for

E-Commerce”. In Wirtschaftsinformatik, H. U. Buhl, A. Huther, and B. Reitwiesner, Eds. Physica Verlag /

Springer, 32.

Kießling, W. and Kostler, G. 2002. “Preference SQL - Design, Implementation, Experiences”. In Procs of

the 28th International Conference on Very Large Data Bases (VLDB’02). Hong Kong, China, 990–1001.

Kießling, W., Soutschek, M., Huhn, A., Roocks, P., Endres, M., Mandl, S., Wenzel, F., and Zelend, A. 2011b.

“Context-Aware Preference Search for Outdoor Activity Platforms”. Tech. rep., Institut fur Informatik,

Universitat Augsburg, Augsburg, Germany. November.

Page 184: Papadakos PhD 2013

158 References

Kitsos, I., Magoutis, K., and Tzitzikas, Y. 2013. “Scalable Entity-Based Summarization of Web Xearch

Results Using MapReduce”. Distributed and Parallel Databases.

Konstan, J. A., Miller, B. N., Maltz, D., Herlocker, J. L., Gordon, L. R., and Riedl, J. 1997. “GroupLens:

Applying Collaborative Filtering to Usenet News”. Communications of the ACM 40, 3, 77–87.

Kopidaki, S., Papadakos, P., and Tzitzikas, Y. 2009. “STC+ and NM-STC: Two Novel Online Results Cluster-

ing Methods for Web Searching”. In Procs of the 10th International Conference on Web Information Systems

Engineering (WISE’09).

Koren, J., Zhang, Y., and Liu, X. 2008. “Personalized Interactive Faceted Search”. In Procs of the 17th

International Conference on World Wide Web (WWW’08). WWW, 477–486.

Korfhage, R. R. 1997. “Information Storage and Retrieval”. John Wiley & Sons.

Kossmann, D., Ramsak, F., and Rost, S. 2002. “Shooting Stars in the Sky: An Online Algorithm for Skyline

Queries”. In Procs of the 28th International Conference on Very large Data Bases (VLDB’02). 275–286.

Koutrika, G. and Ioannidis, Y. 2005. “Personalized Queries under a Generalized Preference Model”. In

Procs of the 21st International Conference onData Engineering (ICDE ’05). IEEE Computer Society, Washington,

DC, USA, 841–852.

Koutrika, G. and Ioannidis, Y. E. 2004. “Personalization of Queries in Database Systems”. In Procs of the

20th International Conference on Data Engineering (ICDE ’04). 597–608.

Kules, B. and Capra, R. 2008. “Creating Exploratory Tasks for a Faceted Search Interface”. In Workshop on

Computer Interaction and Information Retrieval, (HCIR’08 Workshop). 18–21.

Kules, B., Capra, R., Banta, M., and Sierra, T. 2009. “What do Exploratory Searchers Look at in a Faceted

Search Interface?”. In Procs of the 9th ACM/IEEE-CS joint conference on Digital libraries (JCDL’09). 313–322.

Le Phuoc, D., Parreira, J. X., Reynolds, V., and Hauswirth, M. 2010. “RDF On the Go: RDF Storage and

Query Processor for Mobile Devices”. In Procs of the 9th International Semantic Web Conference (ISWC’10

Posters&Demos).

Lee, J., You, G.-w., and Hwang, S.-w. 2009. “Personalized Top-k Skyline Queries in High-Dimensional

Space”. Information Systems 34, 1, 45–61.

Page 185: Papadakos PhD 2013

References 159

Levandoski, J. J., Mokbel, M. F., and Khalefa, M. E. 2010. “FlexPref: A Framework for Extensible Preference

Evaluation in Database Systems”. In Procs of the 26th International Conference on Data Engineering (ICDE’10).

828–839.

Lewis, D. D. 2001. “Applying Support Vector Machines to the TREC-2001 Batch Filtering and Routing

Tasks”. In Text Retrieval Conference (TREC-10). 286–292.

Li, G., Feng, J., Zhou, X., and Wang, J. 2011. “Providing Built-In Keyword Search Capabilities in RDBMS”.

The VLDB Journal 20, 1–19.

Lichtenstein, S. and Slovic, P. 2006. “The Construction of Preference” Thirteenth Ed. Cambridge University

Press.

Lin, X., Yuan, Y., Zhang, Q., and Zhang, Y. 2007. “Selecting Stars: the k Most Representative Skyline

Operator”. In Procs of the 23th International Conference on Data Engineering (ICDE’07).

Linden, G., Hanks, S., and Lesh, N. 1997. “Interactive Assessment of User Preference Models: The Au-

tomated Travel Assistant”. In Procs of the Sixth International Conference of User Modeling (UM’97), C. P. A.

Jameson and C. Tasso, Eds. Springer Wien, 67–78.

Lindgaard, G. and Chattratichart, J. 2007. “Usability Testing: What Have we Overlooked?”. In Procs of the

SIGCHI Conference on Human Factors in Computing Systems (CHI’07). ACM, New York, NY, USA, 1415–1424.

Liu, T.-Y. 2011. “Learning to Rank for Information Retrieval”. Springer.

Mäkelä, E., Hyvönen, E., and Saarela, S. 2006. “Ontogator - A Semantic Biew-Based Search Engine Service

for Web Applications”. In Procs of International Semantic Web Conference (ISWC’06). Athens, GA, USA, 847–

860.

Mäkelä, E., Viljanen, K., Lindgren, P., Laukkanen, M., and Hyvönen, E. 2005. “Semantic Yellow Page Ser-

vice Discovery: The Veturi Portal”. Poster paper at International Semantic Web Conference (ISWC’05),

Galway, Ireland.

Manolis, N. and Tzitzikas, Y. 2011. “Interactive Exploration of Fuzzy RDF Knowledge Bases”. In Procs of

the 8th Extended Semantic Web Conference (ESWC’11). Heraklion, Greece.

Page 186: Papadakos PhD 2013

160 References

Marchionini, G. 2006. “Exploratory Search: From Finding to Understanding”. Communications of the

ACM 49, 4, 41–46.

Meij, E., Mika, P., and Zaragoza, H. 2009. “An Evaluation of Entity and Frequency Based Query Com-

pletion Methods”. In Procs of the 32nd International ACM SIGIR Conference on Research and Development in

Information Retrieval (SIGIR’09). ACM, 678–679.

Melville, P., Mooney, R. J., and Nagarajan, R. 2001. “Content-Boosted Collaborative Filtering”. In Procs of

the 2001 SIGIR Workshop on Recommender Systems (SIGIR’01 Workshop).

Moffat, A. and Zobel, J. 2008. “Rank-Biased Precision for Measurement of Retrieval Effectiveness”. ACM

Transactions Information Systems 27, 1, 2:1–2:27.

Neumann, G. and Schmeier, S. 2012. “Exploratory Search on the Mobile Web”. In 4th International Confer-

ence on Agents and Artificial Intelligence (ICAART 2012). SciTePress, 110–119.

Neves, R. D. S. and Kaci, S. 2010. “Combining Totalitarian and Ceteris Paribus Semantics in Database

Preference Queries”. Logic Journal of the IGPL 18, 3, 464–483.

O’Brien, H. L., Toms, E. G., Kelloway, K., and Kelly, E. 2008. “Developing and Evaluating a Reliable Measure

of User Engagement”. 45, 1, 1–10.

Oren, E., Delbru, R., and Decker, S. 2006. “Extending Faceted Navigation for RDF Data”. In Procs of the 5th

Internation Semantic Web Conference (ISWC’06). Athens, GA, USA, 559–572.

Over, P. 1997. “TREC-7 Interactive Track Report”. In Procs of Text REtrieval Conference (TREC’97). 57–64.

Papadakos, P. 2009. “Exploratory Web Searching with Dynamic Taxonomies, Results Clustering and Vi-

sualization”. In Procs of the 13th European Conference on Digital Libraries Doctoral Consortium (ECDL’09 DC).

Corfu, Greece. http://www.ieee-tcdl.org/Bulletin/v6n1/Papadakos/papadakos.html.

Papadakos, P., Armenatzoglou, N., Kopidaki, S., and Tzitzikas, Y. 2012a. “On Exploiting Static and Dy-

namically Mined Metadata for Exploratory Web Searching”. Knowledge and Information Systems 30, 3,

493–525.

Page 187: Papadakos PhD 2013

References 161

Papadakos, P., Kopidaki, S., Armenatzoglou, N., and Tzitzikas, Y. 2009a. “Exploratory Web Searching with

Dynamic Taxonomies and Results Clustering”. In Procs of the 13th European Conference on Digital Libraries

(ECDL’09).

Papadakos, P., Kopidaki, S., Armenatzoglou, N., and Tzitzikas, Y. 2009b. “Exploratory Web Searching with

Dynamic Taxonomies and Results Clustering”. In Procs of the 8th Hellenic Data Management Symposium

(HDMS’09).

Papadakos, P., Theoharis, Y., Marketakis, Y., Armenatzoglou, N., and Tzitzikas, Y. 2008a. “Mitos: Design

and Evaluation of a DBMS-Based Web Search Engine”. In Procs of the 12th Pan-Hellenic Conference on

Informatics (PCI’08). Greece.

Papadakos, P., Theoharis, Y., Marketakis, Y., Armenatzoglou, N., and Tzitzikas, Y. 2009c. “Object-

Relational Database Representations for Text Indexing”. CoRR abs/0906.3112.

Papadakos, P., Tzitzikas, Y., and Zafeiri, D. 2012b. “An Interactive Exploratory System with Real-Time

Preference Elicitation”. In Procs of the 13th International Conference onWeb Information Systems Engineering

(WISE’12 Demo Paper).

Papadakos, P., Vasiliadis, G., Theoharis, Y., Armenatzoglou, N., Kopidaki, S., Marketakis, Y., Daskalakis,

M., Karamaroudis, K., Linardakis, G., Makrydakis, G., Papathanasiou, V., Sardis, L., Tsialiamanis, P.,

Troullinou, G., Vandikas, K., Velegrakis, D., and Tzitzikas, Y. 2008b. “The Anatomy of Mitos Web Search

Engine”. CoRR, Information Retrieval abs/0803.2220. Available at http://arxiv.org/abs/0803.2220.

Papadias, D., Ta, Y., Fu, G., and Seeger, B. 2005. “Progressive Skyline Computation in Database Systems”.

ACM Transactions on Database Systems 30, 1, 41–82.

Peintner, B., Viappiani, P., and Yorke-Smith, N. 2008. “Preferences in Interactive Systems: Technical

Challenges and Case Studies”. AI Magazine 29, 4, 13–24.

Pitkow, J., Schutze, H., Cass, T., Cooley, R., Turnbull, D., Edmonds, A., Adar, E., and Breuel, T. 2002. “Person-

alized Search: A Contextual Computing Approach may Prove a Breakthrough in Personalized Search

Efficiency”. Communications of the ACM 45, 9, 50–55.

Page 188: Papadakos PhD 2013

162 References

Pound, J., Paparizos, S., and Tsaparas, P. 2011. “Facet Discovery for Structured Web Search: a Query-Log

Mining Approach”. In Procs of the 2011 International Conference on Management of Data (SIGMOD’11). ACM,

New York, NY, USA, 169–180.

Powley, B. and Dale, R. 2007. “Evidence-Based Information Extraction for High-Accuracy Citation Ex-

traction and Author Name Recognition”. In Procs of the 8th RIAO International Conference on Large-Scale

Semantic Access to Content.

Pu, P. and Chen, L. 2008. “User-Involved Preference Elicitation for Product Search and Recommender

Systems”. AI Magazine 29, 4, 93–103.

Rashid, A. M., Albert, I., Cosley, D., Lam, S. K., Mcnee, S. M., Konstan, J. A., and Riedl, J. 2002. “Getting to

Know You: Learning New User Preferences in Recommender Systems”. In Procs of the 7th International

Conference on Intelligent User Interfaces (IUI’02). ACM Press, New York, NY, USA, 127–134.

Reisner, P. 1981. “Human Factors Studies of Database Query Languages: A Survey and Assessment”. ACM

Computing Surveys 13, 1, 13–31.

Robertson, S. 2008. “A New Interpretation of Average Precision”. In Procs of the 31st Annual International

ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’08). ACM, New York, NY,

USA, 689–690.

Robertson, S. E. and Jones, S. K. 1976. “Relevance Weighting of Search Terms”. Journal of the American

Society for Information Science 27, 3, 129–146.

Robertson, S. E., Kanoulas, E., and Yilmaz, E. 2010. “Extending Average Precision to Graded Relevance

Judgments”. In Procs of the 33rd International ACM SIGIR Conference on Research and Development in Infor-

mation Retrieval (SIGIR’10). ACM, New York, NY, USA, 603–610.

Rochio, J. 1971. “Relevance Feedback in Information Retrieval”. In The SMART Retrieval System, G. Salton,

Ed. Prentice Hall, Englewood Cliffs, NJ, 313–323.

Rose, D. E. and Levinson, D. 2004. “Understanding User Goals in Web Search”. In Procs of the 13th Interna-

tional Conference on World Wide Web (WWW’04). ACM, New York, NY, USA, 13–19.

Page 189: Papadakos PhD 2013

References 163

Ross, K. A. 2007. “On the Adequacy of Partial Orders for Preference Composition”. Tech. rep., In DBRank

Workshop.

Rossi, F., Venable, K. B., and Walsh, T. 2008. ”Preferences in constraint satisfaction and optimization”.

AI Magazine 28, 4.

Roy, S. B. and Das, G. 2009. “TRANS: Top-k Implementation Techniques of Minimum Effort Driven Faceted

Search For Databases”. In Procs of the 15th International Conference on Management of Data (COMAD’09),

S. Chawla, K. Karlapalem, and V. Pudi, Eds. Computer Society of India.

Roy, S. B., Wang, H., Das, G., Nambiar, U., and Mohania, M. 2008. “Minimum-Effort Driven Dynamic

Faceted Search in Structured Databases”. In Procs of the 17th ACMConference on Information andKnowledge

Management (CIKM’08). New York, NY, USA, 13–22.

Ruotsalo, T., Athukorala, K., Glowacka, D., Konyushkova, K., Oulasvirta, A., Kaipiainen, S., Kaski, S., and

Jacucci, G. 2013a. “Supporting Exploratory Search Tasks with Interactive User Modelling”. In Procs of

ASIST 2013, the 76th ASIS&T Annual Meeting.

Ruotsalo, T., Peltonen, J., Eugster, M. J., Głowacka, D., Konyushkova, K., Athukorala, K., Kosunen, I., Reijo-

nen, A., Myllymäki, P., Jacucci, G., et al. 2013b. “Directing Exploratory Search with Interactive Intent

Modeling”. In Procs of the 22nd ACM International Conference on Information and Knowledge Management

(CIKM’13).

Sacco, G. 2006a. “Some Research Results in Dynamic Taxonomy and Faceted Search Systems”. In Procs

of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval

Workshop on Faceted Search (SIGIR’06).

Sacco, G. M. 2006b. “Analysis and Validation of Information Access Through Mono, Multidimensional

and Dynamic Taxonomies”. In Flexible Query Answering Systems, 7th International Conference (FQAS’06).

659–670.

Sacco, G. M. and Tzitzikas, Y., Eds. 2009. “Dynamic Taxonomies and Faceted Search: Theory, Practise and

Experience”. Springer.

Schafer, J. B., Konstan, J. A., and Riedl, J. 2001. “E-Commerce Recommendation Applications”. DataMining

and Knowledge Discovery 5, 1-2, 115–153.

Page 190: Papadakos PhD 2013

164 References

Scherer, K. R. 2005. “What are Emotions? And How can They be Measured?”. Social Science Information 44,

695–729.

Schraefel, M. C., Karam, M., and Zhao, S. 2003. “mSpace: Interaction Design for User-Determined, Adapt-

able Domain Exploration in Hypermedia”. In Procs of Workshop on Adaptive Hypermedia and Adaptive Web

Based Systems (AH’03). Nottingham, UK, 217–235.

Schuth, A. and Marx, M. 2011. “Evaluation Methods for Rankings of Facetvalues for Faceted Search”.

In Procs of the Second International Conference on Multilingual and Multimodal Information Access Evaluation

(CLEF’11). Springer-Verlag, Berlin, Heidelberg, 131–136.

Shawe-Taylor, J., Cancedda, N., Cesa-Bianchi, N., Conconi, A., Gentile, C., Goutte, C., Graepel, T., Li, Y., and

Renders, J.-M. 2002. “Kernel Methods for Document Filtering”. In The Eleventh Text Retrieval Conference

(TREC 2002), E. Voorhees and L. P. Buckland, Eds. Vol. NIST Special Publication 500-251. Department of

Commerce, National Institute of Standards and Technology.

Shokouhi, M. 2013. “Learning to Personalize Query Auto-Completion”. In Procs of the 36th International

ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’13). ACM, New York, NY,

USA, 103–112.

Shokouhi, M. and Radinsky, K. 2012. “Time-Sensitive Query Auto-Completion”. In Procs of the 35th Inter-

national ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’12). ACM, New

York, NY, USA, 601–610.

Spyratos, N., Sugibuchi, T., and Yang, J. 2011. “Personalizing Queries over Large Data Tables”. In Procs

of the 15th East-European Conference on Advances in Databases and Information System (ADBIS 2011). Vienna,

Austria.

Stefanidis, K., Drosou, M., and Pitoura, E. 2010. “PerK: Personalized Keyword Search in Relational

Databases through Preferences”. In Procs of the 14th International Conference on Advances in Database

Technology (EDBT’10). 585–596.

Stefanidis, K., Koutrika, G., and Pitoura, E. 2011a. “A Survey on Representation, Composition and Appli-

cation of Preferences in Database Systems”. ACM Transactions on Database Systems 36, 19:1–19:45.

Page 191: Papadakos PhD 2013

References 165

Stefanidis, K., Pitoura, E., and Vassiliadis, P. 2011b. “Managing Contextual Preferences”. Information

Systems 36, 8, 1158 – 1180.

Tao, Y., Ding, L., Lin, X., and Pei, J. 2009. “Distance-Based Representative Skyline”. In Procs of the 2009

IEEE International Conference on Data Engineering (ICDE’09). IEEE Computer Society, Washington, DC, USA,

892–903.

Toms, E. G., O’Brien, H. L., Kopak, R. W., and Freund, L. 2005. “Searching for Relevance in the Relevance

of Search”. In Procs of the 5th International Conference on Conceptions of Library and Information Sciences

(CoLIS’05). 59–78.

Torlone, R. and Ciaccia, P. 2002. “Which are my Preferred Items?”. In Workshop on Recommendation and

Personalization in eCommerce, RPEC-2002. Malaga, Spain, 217–225.

Tvarožek, M. 2006. “Personalized Navigation in the Semantic Web.”. In Procs of the 4th International Confer-

ence on Adaptive Hypermedia and AdaptiveWeb-Based Systems (AH’06) (2006-06-27), V. P. Wade, H. Ashman,

and B. Smyth, Eds. Lecture Notes in Computer Science Series, vol. 4018. Springer, 467–472.

Tvarožek, M., Barla, M., Frivolt, G., Tomša, M., and Bieliková, M. 2008. “Improving Semantic Search Via

Integrated Personalized Faceted and Visual Graph Navigation.”. In Procs of the 34th Conference on Current

Trends in Theory and Practice of Computer Science (SOFSEM’08) (2008-01-09). Lecture Notes in Computer

Science Series, vol. 4910. Springer, 778–789.

Tvarožek, M. and Bieliková, M. 2007a. “Adaptive Faceted Browser for Navigation in Open Information

Spaces”. In Procs of the 16th International Conference on World Wide Web (WWW’07). ACM, New York, NY,

USA, 1311–1312.

Tvarožek, M. and Bieliková, M. 2007b. “Personalized Faceted Browsing for Digital Libraries”. In Procs of

the 11th European Conference on Digital Libraries (ECDL’07). 485–488.

Tvarožek, M. and Bieliková, M. 2007c. “Personalized Faceted Navigation for Multimedia Collections”. In

Procs of the Second International Workshop on Semantic Media Adaptation and Personalization (SMAP’07). IEEE

Computer Society, Washington, DC, USA, 104–109.

Tvarožek, M. and Bieliková, M. 2007d. “Personalized Faceted Navigation in the Semantic Web”. Web

Engineering, 511–515.

Page 192: Papadakos PhD 2013

166 References

Tzitzikas, Y., Armenatzoglou, N., and Papadakos, P. Sept. 3, 2008. “FleXplorer: A Framework for Providing

Faceted and Dynamic Taxonomy-based Information Exploration”. In Procs of 20th International Database

and Expert Systems Application Workshop FIND’2008 (DEXA’08 FIND Workshop). Torino, Italy, 212–216.

Tzitzikas, Y., Kampouraki, M., and Analyti, A. 2013. “Curating the Specificity of Ontological Descriptions

under Ontology Evolution”. Journal on Data Semantics, 1–32.

Tzitzikas, Y. and Papadakos, P. 2013. “Interactive Exploration of Multi-Dimensional and Hierarchical

Information Spaces with Real-Time Preference Elicitation”. Fundamenta Informaticae 122, 4, 357–399.

Vee, E., Shanmugasundaram, J., and Amer-Yahia, S. 2009. “Efficient Computation of Diverse Query Re-

sults”. Bulletin of the IEEE Computer Society Technical Committee on Data Engineering 32, 4, 57–64.

Wagner, A., Ladwig, G., and Tran, T. 2011. “Browsing-Oriented Semantic Faceted Search”. In Procs of 22th

International Conference on the Database and Expert Systems Applicatione (DEXA’11). 303–319.

Wagstaff, K. L., desJardins, M., and Eaton, E. 2010. “Modelling and Learning User Preferences Over Sets”.

Journal of Experimental & Theoretical Artificial Intelligence 22, 237–268.

Wang, J., de Vries, A. P., and Reinders, M. J. T. 2006. “Unifying User-Based and Item-Based Collaborative

Filtering Approaches by Similarity Fusion”. In Procs of the 29th Annual International ACM Conference on

Research and Development in Information Retrieval (SIGIR’06). ACM Press, New York, NY, USA, 501–508.

Wasserman, L. 2004. “All of Statistics : A Concise Course in Statistical Inference”.

Webber, W., Chandar, P., and Carterette, B. 2012. “Alternative Assessor Disagreement and Retrieval

Depth”. In Procs of the 21st ACM international conference on Information and knowledgemanagement (CIKM’12).

ACM, New York, NY, USA, 125–134.

Wellman, M. P. and Doyle, J. 1991. “Preferential Semantics for Goals”. In Procs of the 9th National Conference

on Artificial Intelligence (AAAI’91). 698–703.

White, R. W., Bennett, P. N., and Dumais, S. T. 2010. “Predicting Short-Term Interests Using Activity-

Based Search Context”. In Procs of the 19th ACM International Conference on Information and Knowledge

Management (CIKM’10). ACM, New York, NY, USA, 1009–1018.

Page 193: Papadakos PhD 2013

References 167

White, R. W., Drucker, S. M., Marchionini, G., Hearst, M. A., and Schraefel, M. C. 2007. “Exploratory Search

and HCI: Designing and Evaluating Interfaces to Support Exploratory Search Interaction”. In Procs of

the Extended Abstracts on Human Factors in Computing Systems (CHI’07 EA), M. B. Rosson and D. J. Gilmore,

Eds. ACM, 2877–2880.

Wilson, M. L. and Schraefel, M. C. 2007. “Bridging the Gap: Using IR Models for Evaluating Exploratory

Search Interfaces”. In Workshop on Exploratory Search and HCI (SIGCHI’2007). ACM.

Xia, T., Zhang, D., and Tao, Y. 2008. “On Skylining with Flexible Dominance Relation”. In Procs of the 2008

IEEE 24th International Conference on Data Engineering (ICDE’08). IEEE Computer Society, Washington, DC,

USA, 1397–1399.

Yang, Y. and Lad, A. 2009. “Modeling Expected Utility of Multi-session Information Distillation”. In

Procs of the 2nd International Conference on Theory of Information Retrieval: Advances in Information Retrieval

Theory (ICTIR’09). Springer-Verlag, Berlin, Heidelberg, 164–175.

Yang, Y., Yoo, S., Zhang, J., and Kisiel, B. 2005. “Robustness of Adaptive Filtering Methods in a Cross-

Benchmark Evaluation”. In Procs of the 28th Annual International ACM Conference on Research and Develop-

ment in Information Retrieval (SIGIR’05). ACM, New York, NY, USA, 98–105.

Yee, K., Swearingen, K., Li, K., and Hearst, M. 2003. “Faceted Metadata for Image Search and Browsing”.

Procs of the SIGCHI Conference on Human Factors in Computing Systems (CHI’03), 401–408.

Yilmaz, E., Shokouhi, M., Craswell, N., and Robertson, S. 2010. “Expected Browsing Utility for Web Search

Evaluation”. In Procs of the 19th ACM international conference on Information and knowledge management

(CIKM’10). 1561–1564.

Yiu, M. L. and Mamoulis, N. 2007. “Efficient Processing of Top-k Dominating Queries on Multi-

Dimensional Data”. In Procs of the 33rd International Conference on Very Large Data Bases (VLDB’07). VLDB

Endowment, 483–494.

Yu, K., Tresp, V., and Yu, S. 2004. “A Non-Parametric Hierarchical Bayesian Framework for Information

Filtering”. In Procs of the 27th annual International ACM Conference on Research and Development in Informa-

tion Retrieval (SIGIR’04). ACM, New York, NY, USA, 353–360.

Page 194: Papadakos PhD 2013

168 References

Zamir, O. and Etzioni, O. 1998. “Web Document Clustering: A Feasibility Demonstration”. In Procs of the

21th Annual International ACM Conference on Research and Development in Information Retrieval, (SIGIR’98).

Melbourne, Australia, 46–54.

Zha, H., Zheng, Z., Fu, H., and Sun, G. 2006. “Incorporating Query Difference for Learning Retrieval

Functions in World Wide Web Search”. In Procs of the 15th ACM International Conference on Information

and Knowledge Management (CIKM’06). ACM, New York, NY, USA, 307–316.

Zhai, C. and Lafferty, J. 2006. “A Risk Minimization Framework for Information Retrieval”. Information

Processing and Management 42, 31–55.

Zhai, C. X., Cohen, W. W., and Lafferty, J. 2003. “Beyond Independent Relevance: Methods and Evaluation

Metrics for Subtopic Retrieval”. In Procs of the 26th Annual International ACM SIGIR Conference on Research

and Development in Informaion Retrieval (SIGIR’03). ACM, New York, NY, USA, 10–17.

Zhang, S., Mamoulis, N., Cheung, D. W., and Kao, B. 2010. “Efficient Skyline Evaluation Over Partially

Ordered Domains”. Procs of VLDB Endowment 3, 1-2, 1255–1266.

Zhang, X. and Chomicki, J. 2011. “Preference Queries Over Sets”. In Procs of the 27th International Conference

on Data Engineering (ICDE’11). 1019–1030.

Zhang, Y. and Koren, J. 2007. “Efficient Bayesian Hierarchical User Modeling for Recommendation Sys-

tem”. In Procs of the 30th Annual International ACM Conference on Research and Development in Information

Retrieval (SIGIR’07). ACM, New York, NY, USA, 47–54.

Zigoris, P. and Zhang, Y. 2006. “Bayesian Adaptive User Profiling with Explicit & Implicit Feedback”. In

Procs of the 15th ACM international Conference on Information and Knowledge Management (CIKM’06). ACM,

New York, NY, USA, 397–404.

Page 195: Papadakos PhD 2013

Appendix A

Complete Syntax of Preference Language

In this section we give the complete syntax of the language described in Section 3.1.

⟨stmt⟩ ::= ⟨scopeType⟩⟨spec⟩

| facets order : prefer facet ⟨Fi⟩ to ⟨Fj⟩

| terms order : prefer term ⟨ti⟩ to ⟨tj⟩

| objects order : prefer term ⟨ti⟩ to ⟨tj⟩

| objects order : Pareto ⟨setOfFacets⟩

| objects order : ParetoOptimal ⟨setOfFacets⟩

| objects order : Priority ⟨orderOfFacets⟩

| objects order : Combinational ⟨bucketOrderOfFacets⟩

⟨scopeType⟩ ::= facets order : | terms order : | objects order :

⟨spec⟩ ::= ⟨anchor⟩⟨rankSpec⟩

⟨anchor⟩ ::= facet ⟨Fi⟩

| term ⟨tj⟩

| object ⟨ok⟩

| ϵ // the empty string

169

Page 196: Papadakos PhD 2013

170 Appendix A. Complete Syntax of Preference Language

⟨rankSpec⟩ ::= {lexicographic | count | value | indexedBy} {min|max}

| best | worst

| use scoreFunction ⟨score()⟩ {min|max}

⟨nonEmptyFacetElems⟩ ::= ⟨Fi⟩{‘‘, ”⟨Fj⟩}

⟨setOfFacet⟩ ::= ‘‘{”⟨nonEmptyFacetElems⟩‘‘}”

⟨orderOfFacets⟩ ::= ‘‘ < ”⟨nonEmptyFacetElems⟩‘‘ > ”

⟨bucketOrderOfFacets⟩ ::= ‘‘ < ”⟨setOfFacet⟩‘‘ > ”

Page 197: Papadakos PhD 2013

Appendix B

Binary Relations

Here we list several typical properties of binary relations. A binary relation R over a set S is called:

• reflexive, if ∀a ∈ S, aRa

• irreflexive, if ∀a ∈ S,¬(aRa)

• symmetric, if ∀a, b ∈ S, aRb⇒ bRa

• asymmetric, if ∀a, b ∈ S, aRb⇒ ¬(bRa)

• antisymmetric, if ∀a, b ∈ S, (aRb ∧ bRa)⇒ a = b

• transitive if ∀a, b, c ∈ S, (aRb ∧ bRc)⇒ (aRc)

• negatively transitive if ∀a, b, c ∈ S, (¬(aRb) ∧ ¬(bRc))⇒ ¬(aRc)

• connected (strongly complete or total), if ∀a, b ∈ S, (aRb) ∨ (bRa) ∨ (a = b)

The above properties are not independent. Asymmetry implies irreflexivity, while irreflexivity and

transitivity imply assymetry.

Based on its properties a binary relation is characterized as follows:

• A binary relation is a preorder or quasi-order, if it is reflexive and transitive. If it is in addition

antisymmetric, then it is a partial order.

• A binary relation is a strict partial order (or irreflexive partial order) if it is irreflexive, assymetric and

transitive.

171

Page 198: Papadakos PhD 2013

172 Appendix B. Binary Relations

• A binary relation is a total order, if it is a strict partial order and it is also connected.

• A binary relation is a weak order, if it is a negatively transitive strict partial order.

Page 199: Papadakos PhD 2013

Appendix C

Acronyms

AI Artificial Intelligence

DB Database

BMO Best Matches Only

DiFEPreKO Difficulty of Formulating Effective Preferences without Knowing the Options

ES Exploratory Search

FDT Faceted and Dynamic Taxonomies

HCI Human Computer Interaction

IIR Interactive Information Retrieval

IIPP Intentional Inconsistent Preferences Percentage

IR Information Retrieval

IS Information System

NIIP Number of Intentional Inconsistent Preferences

NUP Number of Unused Preferences

UI User Interface

UPP Unused Preferences Percentage

WSE Web Search Engine

173

Page 200: Papadakos PhD 2013

174 Appendix C. Acronyms

Page 201: Papadakos PhD 2013

175

Page 202: Papadakos PhD 2013
Page 203: Papadakos PhD 2013