RESEARCH ROADMAP REPORT - CORDIS · the quality of the cluster analysis on the trajectory data is...

60
page 1/60 RESEARCH ROADMAP REPORT

Transcript of RESEARCH ROADMAP REPORT - CORDIS · the quality of the cluster analysis on the trajectory data is...

Page 1: RESEARCH ROADMAP REPORT - CORDIS · the quality of the cluster analysis on the trajectory data is preserved. In [MTRPB10] authors presented the general idea of an approach for the

page 1/60

RESEARCH

ROADMAP

REPORT

Page 2: RESEARCH ROADMAP REPORT - CORDIS · the quality of the cluster analysis on the trajectory data is preserved. In [MTRPB10] authors presented the general idea of an approach for the

page 2/60

MODAP Working Group Structure MODAP Research community and the research area will be stimulated through working groups under the structure specified below. Individual working groups prepared their research roadmaps for the course of MODAP. The main aim of each WG was provided below together with the research roadmap of each working group in the following sections. • WG1: Privacy Observatory: Responsible for the harmonization of mobility, data mining

and the privacy issues. • WG2: Applications: Responsible for the new applications within MODAP which will

motivate/direct MODAP research areas and could be adopted by high-tech SMEs • WG3: Mobility Data Representation: Various data models will be studied and evaluated

for MODAP • WG4: Mobility Data Storage: Data streaming and warehousing issues will be

investigated under this WG. • WG5: Mobility Patterns and Pattern Mining: Mobility data mining algorithms will be

investigated under this WG. • WG6: Visual Analytics: Visual analytics tools for supporting mobility data mining will

be investigated under this WG.

WG2-A

PPLIC

ATIO

NS

WG3-D

ATA COLLECTIO

N

REPRESENTATIO

N

WG4-D

ATA STORAGE

RE

PR

ESE

NT

AA

TIO

N

WG5-M

OBIL

ITY PATTERNS

RE

PR

ESE

NTA

AT

ION

WG6- V

ISUAL ANALYTIC

S

RE

PR

ESE

NTA

AT

ION

WG1 - PRIVACY

OBSERVATORY

Page 3: RESEARCH ROADMAP REPORT - CORDIS · the quality of the cluster analysis on the trajectory data is preserved. In [MTRPB10] authors presented the general idea of an approach for the

page 3/60

MODAP WG1 – Privacy Observatory

Dino Pedreschi, Anna Monreale, Chiara Renso (KDDLab, CNR, Italy)

Maria Luisa Damiani, Pierluigi Perri, Giovanni Ziccardi (University of

Milan, Italy)

1. Introduction

The objective of WG1 is twofold: one more community-based and the other more research-based. The first objective is building an interdisciplinary community – the Privacy Observatory - of researchers with laywers and jurists and national and international privacy authorities with the aim of implementing the integration of research method with privacy regulations.

The second objective is the definition of a theoretical, methodological and operational framework for fair knowledge discovery in support of the knowledge society, where fairness refers to privacy-preserving knowledge discovery and discrimination-aware knowledge deployment. In other words, the general objective is the reformulation of the foundations of data mining in such a way that privacy protection and discrimination prevention are inscribed into the foundations themselves, dealing with every moment in the data-knowledge life-cycle: from (off-line and on-line) data capture, to data mining and analytics, up to the deployment of the extracted models. The notions of privacy, anonymity and discrimination are the object of laws and regulations and they are in continuous development. This implies that the technologies for data mining and its deployment must be flexible enough to embody rules and definitions that may change over time and adapt in different contexts. Here comes a big scientific challenge: we cannot hope to construct a technology by hardwiring the rules and the definitions into our software systems with a procedural oriented approach. We need to come out with a declarative, intelligible, representation of the legal rules and definitions that may be used to drive data mining and the deployment of its results, taking into account that rules and concepts derive their dynamic meaning from case-to- case reasoning or legal hermeneutics.

2. State of the art

2.1 Privacy-Preserving Data Mining and Data Publishing

The development of techniques that incorporate privacy concerns has become a fruitful direction for database and data mining research, mainly in the context of general tabular data sets. As an example [SAM01] presents a large category of privacy attacks to re-identify individuals by joining the published table with some external information modeling the background knowledge of users. To avoid this type of attacks, the mechanism of k-anonymity was proposed in [SS98] and [SWE02]. These works introduce the distinction between quasi-identifier attributes, i.e., the minimal set of attributes in the table that can be joined with external information to re-identify individual records, and sensitive attributes, i.e., the information to be protected. A dataset is k-anonymous (k >= 1) if, on the quasi-identifier attributes, each record is indistinguishable from at least k - 1 other records within the same dataset. The larger the value of k, the better the privacy is protected. Although it has been shown that finding an optimal k-anonymization is NP-hard [MW04] and that k-anonymity has some limitations, this framework is still very relevant. The limitations of the k-anonymity are addressed in [MGKV06] [LLV07], where authors

Page 4: RESEARCH ROADMAP REPORT - CORDIS · the quality of the cluster analysis on the trajectory data is preserved. In [MTRPB10] authors presented the general idea of an approach for the

page 4/60

proposed l-diversity and t-closeness as alternative solutions. Some recent works addressed the privacy issues in spatio-temporal data [ABN08][NAS07] [TM08]][YBLW09][MAAGPR10][MTRPB10][AABG07][AABGP07]. In [ABN08], the authors studied the privacy-preserving publication of a moving object database. They proposed the notion of (k-delta)-anonymity for moving object databases, where delta represents the possible location imprecision. The authors also proposed an approach, called Never Walk Alone based on trajectory clustering and spatial translation. In [NAS07] Nergiz et al. addressed privacy issues regarding the identification of individuals in static trajectory datasets. They provided privacy protection by: (1) first enforcing k-anonymity, i.e. all released information refers to at least k users/trajectories, (2) randomly reconstructing a representation of the original dataset from the anonymization. Yarovoy et al. in [YBLW09] study the k-anonymization of moving object databases in order to publish them. Different objects in this context may have different quasi-identifiers and thus, anonymization groups associated with different objects may not be disjoint. Therefore, an innovative notion of k-anonymity based on spatial generalization is provided. In fact, the authors proposed two approaches in order to generate anonymity groups that satisfy the novel notion of k-anonymity. These approaches are called Extreme Union and Symmetric Anonymization. In [TM08], a suppression-based algorithm is suggested. Given the head of the trajectories, it reduces the probability of disclosing the tail of the trajectories. It is based on the assumption that different attackers know different and disjoint portions of the trajectories and the data publisher knows the attacker’s knowledge. Thus, the solution is to suppress all the dangerous observations. A very recent work [MAAGPW10] proposed a method for achieving true anonymity in a dataset of published trajectories, by defining a transformation of the original GPS trajectories based on spatial generalization and k-anonymity. The proposed method offers a formal data protection safeguard, quantified as a theoretical upper bound to the probability of re-identification. The proposed anonymity technique achieves the conflicting goals of data utility and data privacy: the achieved anonymity protection is much stronger than the theoretical worst case, while the quality of the cluster analysis on the trajectory data is preserved. In [MTRPB10] authors presented the general idea of an approach for the generalization of semantic trajectories that can be adopted for obtaining datasets satisfying the c-safe property. Specifically, this method exploits ontologies to realize a framework for publishing semantic trajectories while preserving privacy of the tracked users. Lastly, [AABG07, AABGP07] tackled the problem of hiding frequent trajectory patterns that are considered sensitive, while keeping high the quality and utility of the data. The hiding of sensitive trajectory patterns is obtained by coarsening (i.e., removing few automatically-selected spatio-temporal points) some of the trajectories in the database.

2.2 On-line privacy of semantic locations

Location information is peculiar in many aspects. For example, location has a dual dimension, geometric and semantic [HB01]: the geometric position consists of coordinates; a semantic location is a geographical place. Protecting the geometric position is generally not enough to ensure the protection of geographical places. On the other hand, the protection of geographical places typically only ensures the privacy of a subset of points in space. This dual view on location privacy has been for first emphasized in the research area related to LBS.

Since the pioneering work of Gruteser et al. [GG03] and Beresford et al. [BS03], a large number of solutions have been developed to prevent undesirable inferences in LBS that can defeat the privacy protection mechanism. Also, various schemes are proposed to categorize those solutions ranging from fine-grained classifications focused on narrow contexts such as [JLY09, BMWF09], to coarse-grained classifications embracing a large spectrum of approaches such as [DK06, K09]. In all these solutions, however, the position has exclusively a geometric value.

Page 5: RESEARCH ROADMAP REPORT - CORDIS · the quality of the cluster analysis on the trajectory data is preserved. In [MTRPB10] authors presented the general idea of an approach for the

page 5/60

The privacy of places is a more recent issue [DBS08]. Places can be sensitive. The privacy of semantic locations is compromised when the sensitive place in which the user is located is disclosed without explicit user’s consent [DBS10-1]. To prevent this kind of privacy breach, a possible approach is to provide methods for the generation of coarse cloaked regions blurring the actual place. In this case, however, the position, cannot be coarsened exclusively when the user is known to be inside a sensitive region because in that case it would be trivial to infer that the user is in a position that he wants to keep private. The problem is thus how to determine the proper granularity of the region to be released and define an appropriate computational model so as to prevent undesirable inferences. A few techniques have been recently developed to safeguard semantic location privacy. A first class of solutions are developed to strengthen the privacy of user’s identity in anonymous LBS [BLW08] [KGMP07][XKP09]. These solutions however present important limitations in that they assume a simplified notion of place [DBS]. A second class of solutions target the safeguard of semantic location privacy in non-anonymous LBS. For example, Cheng et al. [CZBP06] propose a probabilistic privacy metric and an approach to compute spatial queries over an uncertainty area blurring the user's position. A different probabilistic approach is proposed in Probe [DBS10-1, DBS10-2, DBS]. In this case, the positions are assumed to be non-uniformly distributed, i.e., the places that people frequent and the degree of frequentation are different and are publicly known. The Probe solution however is only effective in those cases in which the users’ positions are sporadically reported. Conversely, an attacker can leverage knowledge of the movement, for example of the user’s speed to prune the cloaked regions and thus more precisely bound the user’s position. This type of privacy attack is called velocity-based linkage attack. A solution is presented in [GDSB09]

3 Research Roadmap

The research roadmap follows the directions highlighted in the state of the art: privacy-by-design and non-discrimination-by-design.

3.1 Privacy-by-design Off-line Privacy-by-Design. The goal is to devise a coherent theory of privacy-by-design, as generic as possible, enabling new disruptive standards for the development of legally-supported privacy-preserving KDD systems. We plan to consider:

• a comprehensive repertoire of analytical and mining models (patterns, clustering, classification and prediction models, etc.)

• several challenging forms of human activity data, ranging from tabular micro-data and mobility data to query-logs and social networking data

• several realistic attack models – the stronger and more diverse reference attacks are considered, the stronger safeguards can be devised.

The challenge is indeed high: the rich semantics of social relations in the large social graphs that are becoming available, for instance, support the discovery of novel models of the hidden rules and dynamics governing our society, but at the same time novel surprising attacks to our privacy arise, that take advantage of the topology of the network. The development of a theory for privacy-by-design will require to solve several open problems, including the formal definition of context knowledge describing data, mining models and legal rules, of data transformations and associated properties, of measures of protection and risk, of analytical utility of the transformed data in terms of distortion from the original ones, and finally, of what is legally reasonable and technically measurable with reference to the context. In different contexts, it will be needed to adopt different techniques for inscribing privacy protection within the KDD process, including:

• privacy-preserving data publishing methods, when the goal is the safe disclosure of transformed data under suitable formal safeguards

• privacy-preserving knowledge publishing methods, when the goal is the safe disclosure of

Page 6: RESEARCH ROADMAP REPORT - CORDIS · the quality of the cluster analysis on the trajectory data is preserved. In [MTRPB10] authors presented the general idea of an approach for the

page 6/60

mined models under suitable formal safeguards – as also extracted patterns, rules or clusters may reveal sensitive information in certain circumstances

• knowledge hiding methods, when the goal is the disclosure of data transformed in such a way that certain specified secret patterns, hidden in the original data, cannot be found anymore

• secure multiparty mining over distributed datasets, when the goal is to mine datasets that are partitioned and distributed among several parties that do not want to or cannot share the data.

• measures of privacy assurance, quantifying the extent of privacy granted by different methods listed above.

• privacy attacks catalogue, collecting together known attacks on published or collected datasets, e.g. attacks on obfuscation methods, attacks on k-anonymization results, etc.

The last item is helpful in analyzing the robustness of proposed privacy-by-design schemes, but is not a total solution, as new, yet unknown types of attacks will no doubt be designed and carried out. In particular, the greater the privacy measure in protecting the data using methods relying on modification of the original data, the lower the quality of the data produced by those methods. It will be interesting therefore to investigate the cost, computational or economical, of different attacks, and to characterize the robustness of different privacy protection schemes against attacks not in the absolute sense, but in the sense of attacks of a certain strength (or budget).

On-line Privacy-by-Design. Typically, KDD is an off-line process, i.e., the complete dataset is entirely acquired before the analysis/mining starts; accordingly, privacy is not intended to be protected against the entity that acquires the data, but for later release of the data, and privacy-preserving techniques can assume that all the data to be published is available at the time the technique is applied. On the other hand, there are situations in which personal data need to be protected at the time it is acquired, and before being stored in the data collector infrastructure. Examples are on-line service requests or social network postings, in which service providers (SP) or other users that would have immediate access to the data are not fully trusted. Consider for example, a location-based service request reporting the user to be in a given place at a given time and asking for nearest restaurants serving a particular kind of food. If the SP is not trusted and the user considers as private information the fact of being at that precise location, the request must be modified before reaching the SP ensuring that the location information the SP acquires about the user satisfies the user’s privacy preferences as well as the legislation rules that may apply. Note that this on-line process is actually an incremental process since each time new data is acquired privacy must be protected taking into account both the new data and all previously released data. In the above example, the knowledge of the position of the user at an earlier time combined with the current position may reveal movements or pattern of movements that the user may not want to disclose. Location Based Services (LBS) demanding on-line privacy and data protection rights are becoming increasingly popular both as stand-alone services (like e.g., Google Latitude), and as services integrated in geo-referenced social networks (geoSN, like e.g., Brightkite, Loopt, Twitter, and soon Facebook). They not only include queries to detect the nearest resources, but also continuous queries to detect friends whenever they happen to be in proximity to the user, queries to find events currently attended by many people, or queries to get all the news coming from a specific area. These new services require on-line privacy protection while at the same time they should still support a good quality of service. To this purpose, the WG1 aims at investigating also on-line privacy-by-design techniques.

Data audit techniques for privacy/anonymity assessment. Assume that a given data provider discloses (e.g., sells) human activity datasets, claiming that they are anonymous because obtained as a result of effective privacy- preserving data publishing methods. How can we trust this statement? How can we certify that the released data guarantee a sufficiently low probability that a given repertoire of attack models succeed? We believe that, as a by- product

Page 7: RESEARCH ROADMAP REPORT - CORDIS · the quality of the cluster analysis on the trajectory data is preserved. In [MTRPB10] authors presented the general idea of an approach for the

page 7/60

of our privacy-by-design framework, it will be relatively easy to create data audit tools explicitly tailored to the above purpose, i.e., measuring and certifying the privacy-preserving quality of released data. Such tools will represent valuable assistants for the legal protection and enforcement operated by the control authority, such as the Data Protection commissions in Europe – the participation of the Italian Garante Privacy in our project will facilitate the design and assessment of the achieved data audit tools.

3.2 Non-Discrimination-by-Design

Discrimination is the unfair or unequal treatment of people based on membership to a category, group or minority, without regard to individual merit. Civil right laws prohibit discrimination on the basis of several attributes: race, color, religion, nationality, sex, marital status, age and pregnancy, and in a number of settings, including: credit and insurance; sale, rental, and financing of housing; personnel selection and wages; access to public accommodations, education, nursing homes, adoptions, and health care. The key legal references are the European Union, United Nations Legislation and Recommendations. Many authorities (regulation boards, consumer advisory councils, commissions) monitor and report on discrimination. For instance, the European Commission publishes an annual report on the progress in implementing the Equal Treatment Directives by the member states [BCP07]. Given the current state of the art of decision support systems (DSS), socially sensitive decisions may be taken by automatic systems, e.g., for screening or ranking applicants to a job position, to a loan, to school admission and so on. For instance, data mining and machine learning classification models are constructed on the basis of historical data exactly with the purpose of learning the distinctive elements of different classes, or profiles, such as good/bad debtor in credit/insurance scoring systems or good/bad worker in personnel selection. When applied for automatic decision making, DSS can potentially guarantee more uniform decisions, but still they can be discriminating in the social, unfair sense. Currently, what the state of the art can offer is the verification of an hypothesis of possible discrimination by means of statistical analysis of past decision records.

Our aim is precisely to devise a coherent theory for non-discrimination-by-design approach, as generic as possible, enabling the development of legally-supported KDD systems which are, by construction, inscribed with formal and measurable protections against discrimination. As in the case of privacy-by-design, the development of a theory for non-discrimination-by-design will require to solve many open problems, including the formal definition of context knowledge describing data, decision models and legal rules, of data transformations and associated properties, of measures of protection and risk, of realistic attack models against indirect discrimination, of analytical utility of the transformed data in terms of distortion from the original ones, and finally, of what is legally reasonable and technically measurable with reference to the context. In particular, we will study the data-driven, legally-grounded definition of quantitative measures of the discrimination suffered by a given group (e.g., an ethnic minority) in a given context (e.g., a geographic area) with respect to a decision (e.g., credit denial). Indirect discrimination also needs to be formalized, in terms of latent rules of inference against a private attribute, e.g., “belonging to a minority group”. In different contexts, it will be needed to adopt different techniques for inscribing discrimination protection within the KDD process, in order to go beyond the discovery (and possibly sanctioning) of unfair discrimination, and achieve the much more challenging goal of preventing discrimination, before it takes place.

3.2 Privacy usability and user’s awareness in on-line applications

The privacy-by design paradigm provides the conceptual framework and the guidelines which potentially can drive the design of privacy-enhanced information services. It is however

Page 8: RESEARCH ROADMAP REPORT - CORDIS · the quality of the cluster analysis on the trajectory data is preserved. In [MTRPB10] authors presented the general idea of an approach for the

page 8/60

difficult to foresee when and to what extent these guidelines will we adopted, for example, by forthcoming on-line applications. On the other hand, we are witnessing the booming of on-line services offering more and more functionalities with little or incomplete support for privacy protection. A key question is thus how to find a balance between the tumultuous evolution of commercial applications which calls for quick answers to privacy concerns and the development of principled approaches to privacy protection. To that end, we emphasize the need of a complementary bottom-up approach which targets user participation. We highlight the following lines of research:

Creating user awareness: systematic approaches to privacy analysis need to be investigated and applied to the assessment of privacy risks in existing LBS and recent standards. For example, the W3C has recently proposed a privacy-enhanced geo-location standard which allows websites to gather accurate position of websites users. Although this standard is currently used by most browsers, there is still little awareness of the implications on privacy. The privacy analysis should be conducted in the light of some reference paradigm like the aforementioned privacy-by-design. The overall purpose is to set the basis for the achievement of better user awareness and education. Importantly, this effort can bring the additional advantage of providing input to research on privacy-enhanced technologies (PET) as well as on juridical aspects contributing in this way to the development of a privacy ecosystem.

Privacy personalization: a key notion concerns the distinction between personal and sensitive data. This distinction is fundamental in any privacy law and as such it is also included in the Madrid Resolution of 2009 concerning the development of standards for the protection of personal data and privacy. However, the very nature of personal data (i.e., whether personal data is sensitive or not) is variable, and depends on the individual preferences, profile and surrounding context. As such, the protection of sensitive data cannot be entirely delegated to an automated process. Rather, users must be provided with the capability of specifying and controlling what is sensitive for them, e.g., the localization in hospital. A user-centric vision of privacy implementation provides a flexible solution to obtain an up-to-date protection of data and compliance with laws.

Privacy usability. Privacy usability is a major issue in LBS. The problem is to find a tradeoff between the simplicity of use and the effectiveness of protection. For example, a privacy-enhanced application can request users to give explicit consent to the disclosure of location data. However, if this mechanism becomes too invasive, the users can prefer to simply disable any protection. Usability also means to involve users in the early evaluation of new PETs. This paves the way to technical and organizational problems that have been only partially experienced and reported in literature.

4. Privacy Observatory The Privacy Observatory building process follows different steps. The core component have been the MODAP partners, enlarged to researcher and lawyers involved in different activities. A first activity has been to prepare an European Proposal called FairKDD submitted to FET Open focusing on “Fair knowledge discovery techniques”, ranging from privacy preserving data mining to non-discrimination techniques. This proposal enlarged the community towards the following universities and institutes: Tilburg University (Prof. Koops), the polish Instytut Podstaw Informatyki Polskiej (Prof. Matwin), Eindhoven Technical University (Prof. Calders), Free University of Brussels (Prof. Gutwirth), the Spanish University Rovira I Virgili (Prof. Ferrer) and the Italian Privacy authority (Dr. Comella) – apart from University of Milan and Wind already participating in MODAP. A second activity towards the enlargement of the Privacy Observatory community has been to

Page 9: RESEARCH ROADMAP REPORT - CORDIS · the quality of the cluster analysis on the trajectory data is preserved. In [MTRPB10] authors presented the general idea of an approach for the

page 9/60

organize a web magazine, where privacy experts from the areas of both law and research collaborate in writing articles on different aspects of privacy in movement data. The URL is http://www.privacyobservatory.org and the first issue is at the moment in preparation. Example of people who may contribute to the first issue are Prof. Stan Matwin, Dr Riccardo Mazza from Wind, Domingo-Ferrer, Dr. Rototà Stefano, Dr. Mireille Hildebrandt, Kenneth Neil Cukier from The Economist. We believe that the role of the Privacy Observatory, once established, would be to act also as an adviser and counselor to the European Community in defining the new European Privacy law.

References [AABGP07] O. Abul, M. Atzori, F. Bonchi, F. Giannotti, D. Pedreschi. "Privacy-Aware Knowledge Discovery from Location Data". In Proceedings of the International Workshop on Privacy-Aware Location-based Mobile Services (PALMS), in conjunction with the 8th International Conference on Mobile Data Management, 2007. [AABG07] O. Abul, M. Atzori, F. Bonchi and F. Giannotti. "Hiding Sensitive Trajectory Patterns". In Proceedings of the 6th International Workshop on Privacy Aspects of Data Mining held in conjunction with the IEEE International Conference on Data Mining (ICDM 2007)

[ABN08] O. Abul, F. Bonchi, and M. Nanni. Never walk alone: Uncertainty for anonymity in moving objects databases. In ICDE, pages 376–385, 2008.

[BCP07] M. Bell, I. Chopin, F. Palmer. Developing Anti-Discrimination Law in Europe.

European Network of Legal Experts in Anti-Discrimination, 2007. http://ec.europa.eu/social/main.jsp?langId=en&catId=423 [BLW08] B. Bamba, L. Liu, P. Pesti, and T. Wang. Supporting Anonymous Location Queries in Mobile Environments with PrivacyGrid. In Proc. of 17th International World Wide Web Conference (WWW), 2008. [BMWF09] C. Bettini, S. Mascetti, X. S. Wang, D. Freni, and S. Jajodia. Anonymity and historical-anonymity in location-based services. In Privacy in Location-Based Applications, pages 1-30, 2009. [BS03] A. R. Beresford and F.Stajano. Location Privacy in Pervasive Computing. IEEE Pervasive Computing, 2(1):46-55, 2003. [CMA09] C. Chow, M. F. Mokbel, and W. G. Aref. Casper*: Query Processing for Location Services without Compromising Privacy. ACM Transactions on Database Systems, (34)4, 2009. [CZBP06] R. Cheng, Y. Zhang, E. Bertino, and S. Prabhakar. Preserving User Location Privacy in Mobile Data Management Infrastructures. In Proc. of the6th Workshop on Privacy Enhancing Technologies, 2006. [DBS] M.L. Damiani, E. Bertino, C. Silvestri. Fine-grained cloaking of sensitive positions in location sharing applications. IEEE Pervasive Computing (to appear) [DBS08] M.L. Damiani, E. Bertino, C. Silvestri. Protecting Location Privacy through Semantics-aware Obfuscation Techniques. Proc. IFIPTM 2008 Conferences on Privacy, Trust Management and Security, Trondheim, June 2008 [DBS10-1] M.L. Damiani, E. Bertino, and C. Silvestri. The PROBE Framework for the Personalized Cloaking of Private Locations. Transactions on Data Privacy, (3)2:123-148, 2010.

Page 10: RESEARCH ROADMAP REPORT - CORDIS · the quality of the cluster analysis on the trajectory data is preserved. In [MTRPB10] authors presented the general idea of an approach for the

page 10/60

[DBS10-2] M.L. Damiani, C. Silvestri, E. Bertino. Analyzing semantic location cloaking techniques in a probabilistic grid-based map, ACM GIS 2010 [DK06] Duckham M. and Kulik L . Location privacy and location aware computing. In Drummond J (ed) Dynamic & mobile GIS: investigating change in space and time. Boca Raton. CRC Press, 2006. [GDSB09] G. Ghinita, M.L. Damiani, C. Silvestri, and E. Bertino. Preventing Velocity-based Linkage Attacks in Location-Aware Applications. In Proc. of the 17th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems (ACM GIS), 2009. [GG03] M. Gruteser and D. Grunwald. Anonymous Usage of Location-Based Services Through Spatial and Temporal Cloaking. In Proc. of the 1st International Conference on Mobile systems, Applications and Services. ACM Press, 2003. [HB01] J. Hightower, G. Borriello. Location Systems for Ubiquitous Computing. Computer, vol. 34, no. 8, pp. 57-66, 2001 [JLY09] C. S. Jensen, H. Lu, and M.L. Yiu. Location Privacy Techniques in Client-Server Architectures. In Privacy in Location-Based Applications: Research Issues and Emerging Trends. Springer-Verlag, 2009. [K09] J. Krumm. A survey of computational location privacy. Personal and Ubiquitous Computing, (13)6:391-399, 2009. [KGMP07] P. Kalnis, G. Ghinita, K. Mouratidis, and D. Papadias. Preventing Location-Based Identity Inference in Anonymous Spatial Queries. IEEE Transactions on Data and Knowledge Engineering, (19)12:1719-1733, 2007. [MAAGPW10] A. Monreale, G. Andrienko, N. Andrienko, F. Giannotti2, D. Pedreschi S. Rinzivillo, S. Wrobel: Movement Data Anonymity through Generalization. Transactions on Data Privacy , 2010 [LLV07] N. Li, T. Li, and S. Venkatasubramanian. t-closeness: Privacy beyond k-anonymity and l-diversity. In ICDE, pages 106–115. IEEE, 2007. [MGKV06] A. Machanavajjhala, J. Gehrke, D. Kifer, and M. Venkitasubramaniam. l-diversity: Privacy beyond k-anonymity. In ICDE. IEEE Computer Society, 2006. [MTRPB10] A. Monreale , R. Trasarti , C. Renso, D. Pedreschi, V. Bogorny: Preserving Privacy in Semantic-Rich Trajectories of Human Mobility. SPRINGL 2010 (to appear) [MW04] A. Meyerson and R. Williams. On the complexity of optimal k-anonymity. In PODS ’04, pages 223– 228, 2004. [NAS07]M. E. Nergiz, M. Atzori, and Y. Saygin. Perturbation-driven anonymization of trajectories. Technical Report 2007-TR-017, ISTI-CNR, Pisa, 2007 [SAM01] P. Samarati, “Protecting respondents’ identities in microdata release,” IEEE

Transactions on Knowledge and Data Engineering, vol. 13, no. 6, pp. 1010–1027, 2001. [SS98] P. Samarati and L. Sweeney, “Generalizing data to provide anonymity when disclosing

Page 11: RESEARCH ROADMAP REPORT - CORDIS · the quality of the cluster analysis on the trajectory data is preserved. In [MTRPB10] authors presented the general idea of an approach for the

page 11/60

information,” in PODS’98. [SWE02] L. Sweeney, “K-anonymity: a model for protecting privacy,” International Journal on

uncertainty, Fuzziness and Knowledge-based System, vol. 10, no. 5, pp. 557–570, 2002. [TM08] M. Terrovitis and N. Mamoulis. Privacy preservation in the publication of trajectories. InMDM, pages 65–72, 2008. [YBLW09] R. Yarovoy, F. Bonchi, L. V. S. Lakshmanan, and W. H. Wang. Anonymizing moving objects: how to hide a mob in a crowd? In EDBT, pages 72–83, 2009. [XKP09] M. Xue, P. Kalnis, and H.K. Pung. Location Diversity: Enhanced Privacy Protection in Location Based Services. In Proc. of the 4th International Symposium on Location and Context Awareness (LoCA), 2009.

Page 12: RESEARCH ROADMAP REPORT - CORDIS · the quality of the cluster analysis on the trajectory data is preserved. In [MTRPB10] authors presented the general idea of an approach for the

page 12/60

MODAP WG2 – Applications

Monica Wachowicz (ALTERRA)

1. Introduction

This WG2 examines the current status of data collection methods employing location/ time-aware devices to observe evolving patterns of spatio-temporal behaviour, including patterns that are affected by ICTs. Drawing mostly on transport research, it is suggested that two streams of development have emerged: a “passive” stream that maximises the automatic interpretation of positioning data, and an “active” stream, that is using increasingly sophisticated mobile computing devices and/or the Internet to engage respondents in the validation, interpretation and enhancement of their own data. Recent and future developments are promising to go beyond simply using technologies to carry out conventional travel surveys: rather, some new classes of data may be obtained, notably because of common or overlapping interests with other fields, such as public health research. There are, however, some ethical and public acceptability constraints that must be respected.

2. State of the art in data collection strategies and its impact on the development of new applications

2.1 Human Behaviour

Collecting data on people’s space-time behaviour has a long history in disciplines such as transportation, geography, tourism and urban planning. Such data have been predominantly been collected using travel surveys. Respondents were typically asked for one or more days to report the activities and travel they conducted during that time, plus some trip characteristics such as duration, motive, destination and transport mode. Because response rates have been declining, researchers and survey agencies have explored new data collection strategies, which are less demanding for participants. The use of GPS technology has been dominant in this regard.

Table 1 provides an overview of studies that examined the use of GPS to collect data on travel information. It shows that the late 1990’s evidenced the first pilot studies on using GPS data collection. Most of these were small scale pilot studies and were based on in-car GPS systems. Later these were complemented with wearable devices. In the beginning, the devices were quite heavy, but the last decade has witnessed a rapid increase in light GPS devices with longer lasting batteries, PDA’s and cellular phones with a GPS device. The availability of such systems and the fact that they are easier to use increased significantly the potential use of GPS data collection. GPS has been therefore been introduced into official data collections, predominantly in the U.S. (Wolf et al., 2003). Moreover, Table 1 shows that in addition to transportation, it has been used in retailing, tourism and related disciplines.

The overview also shows that early studies have focused on the general acceptance of GPS, their accuracy and on the comparison of travel data collected using traditional means and by using GPS. These studies suggest that GPS outperforms traditional data collection approaches

Page 13: RESEARCH ROADMAP REPORT - CORDIS · the quality of the cluster analysis on the trajectory data is preserved. In [MTRPB10] authors presented the general idea of an approach for the

page 13/60

in terms of collecting more accurate spatial and temporal data on travel of vehicles and people. For example, Battelle (1997) concluded for the Lexington Area Travel Data Collection that respondents take more trips of shorter distances than past national estimates would suggest. Similar results were also found by Murakami and Wagner (1999).

In particular, GPS traces provide information about the route taken, distance covered, timing and duration of travel, and speed, while additional data can be derived from these data. For example, stops and the number of trips are often derived by searching the traces for some arbitrarily set period of no movement (usually from 30 seconds to 2 minutes; see e.g. Schönfelder et al., 2002; Chung and Shalaby, 2005; Wolf, et al., 2004; Forrest and Pearson, 2005; ; Li and Shalaby, 2008; Bohte and Maat, 2008). In some cases, this may be too crude and additional information may help to identify trip ends and activity locations.

It should also be acknowledged however that although GPS traces provide a lot of information, they are not perfectly reliable. For example, Ohmori et al. (2006) found that the averaged error in distance was about 50-150 meters. Moreover, data may be missing. This may be due to weak signals and other technology issues, respondent burden or inherent characteristics of the technology. For example, in-vehicle GPS systems not necessarily provide information about trip destination as the car may be parked and the last leg of the trip is likely to be performed by foot. Hence, GPS traces need to be filtered and some imputation will be necessary as errors may occur due to weak signals from a network of satellites, urban canyons and tunnels, limitations of the technology itself, such as the time required for the GPS device to locate the satellites (cold starts, see Stopher et al., 2003), limited battery life, and errors due to human behaviour (e.g., GPS not turned on).

If we wish to provide travellers with context-sensitive information or recommendations, in addition to these basic data, we need to interpret and enrich these data to induce what activity they are doing, and we may also need to capture information of their past performance and identify any routines and responses to particular information/advice. Table 1 shows that this kind of work is still in its infancy. Existing research relies on accurate and detailed geo-spatial information systems, including transport networks with stop locations. In addition, geo-coded land uses data (educational establishments, shopping centres, hospitals, parks, etc.) are used jointly with ad hoc heuristics to induce activities (e.g., Wolf, et al., 2001). As argued by Tsui and Shalaby (2006), and Schuessler and Axhausen (2008) fuzzy logic seems to provide new possibilities in extracting data on activity type from GPS traces. Unfortunately, no empirical results are presented yet in these studies. However, ongoing work based on Bayesian belief networks rather than fuzzy logic indicates that activities can be induced with sufficient accuracy (Moisteeva et al., 2009).

Because GPS traces and the data imputation algorithms are not perfect, and because researchers may require additional data such as number of people travelling and costs associated with travel, so-called prompted recall surveys have become increasingly popular. It means that respondents receive the interpreted GPS traces and are invited to check these results, modify incorrect data and provide details for the additional questions. These so-called prompted recall instruments vary in terms of administration and the use of different technology to collect the information. As discussed by Doherty et al, (2006) Sequential methods involve systematic querying of subjects for missing or supplemental activity-trip pattern attributes either later by phone or internet. Alternatively, the data can be collected simultaneously to location tracking. Temporal/tabular

prompted recall methods involve a time-ordered display of GPS or cellular phone-detected activities and trips, including start and end times, travel times/modes and activity locations. Spatial prompted recall methods involve use of a GIS or cellular phone to generate a map showing a person’s routes, activity stops/location, and an array of text boxes or other map attributes depicting such items as mode, speed, location name, start/end times, trip and activity sequence (in the day), overlaid on the road and land-use network for context.

Page 14: RESEARCH ROADMAP REPORT - CORDIS · the quality of the cluster analysis on the trajectory data is preserved. In [MTRPB10] authors presented the general idea of an approach for the

page 14/60

Prompted recall surveys have been typically used to provide check interpreted GPS traces and provide supplementary data for a single day. One aspect of GPS data that does not seem to have been taken into account is that multiple observations of essentially the same trip may allow a better identification of travel information, and that observations of other respondents may be used to better estimate missing information of a particular respondent. Moiseeva et al. (2009) argue that if learning and data imputation algorithms are applied, respondent burden should decrease over time and hence it may become increasingly feasible to collect multi-day travel information across longer periods of time, with acceptable respondent burden. A pilot study will start soon. Such multi-week information is critical in better understanding the dynamics underlying behaviour and in assessing responses to information/advice (e.g. Axhausen et al., 2002). Geo-information is required to identify activity types. Even if such data are not available immediately, however, prompted recall data can be used to collect data over time, using intelligent learning algorithms. Similarly, temporal information can be used.

Page 15: RESEARCH ROADMAP REPORT - CORDIS · the quality of the cluster analysis on the trajectory data is preserved. In [MTRPB10] authors presented the general idea of an approach for the

page 15/60

Authors Year Country Technology Sample Length Facets Learning Remarks

Trips Route Destination Mode Activity

Auld et al 2009 US Portable GPS device + web-based prompted recall

5 2 weeks x x x x p No

Bellemans et al 2008 Belgium GPS-enabled PDA 816 1 week x x x x x No

Schuessller & Axhausen

2008 Switzerland Portable GPS device 4882 6 days x x x x No Fuzzy logic

Bohte & Maat 2008 Netherlands Portable GPS device + web-based prompted recall

1104 1 week x x x x Limited

Li & Shalaby 2008 Canada Portable GPS device + web-based prompted recall

15 35 days

deVries et al 2008 Eindhoven Cell phone 8 1 day x x No Pictures, etc

Spek van der 2008 Europe Portable GPS device 150-250 hours x No pedestrians

Krijgsman et al 2008 South Africa Cell phone 83-129 2 days x x x x No

Zou & Golledge 2007 USA GPS+pocket PC 20 1 week x x x x x No speech

Shoval & Isaacson 2007 Israel Portable GPs device 1 hours x No tourism

Asakura & Iryo 2007 Japan Wearable GPS 56 1 day x x No tourism

Ahas et al. 2007 Estonia Cell phone Roaming 1 year x x No tourism

Du & Aultman-Hall 2007 USA In-vehicle GPS 276 10 days x x x No

Hato et al. 2006 Japan Cell phone + web-based prompted recall

100 1 month x x x x x No

Ohmori et al. 2006 Japan Cell phone + PDA 49 2 days x x x x No

Hato 2006 Japan Multiple sensor 1 trip x x x x No Sound, pressure

Itsubo & Hato 2006 Japan Cell phone + web-based prompted recall

31 5 days x x x x No No

Tsui & Shalaby 2006 Toronto Canada

GPS 9 1 day x x x x No Hot/cold stops Fuzzy logic

Doherty et al 2006 Canada Portable GPS device + web-based prompted recall

1 1 day x x x x x No

Shoval & Isaacson 2006 Israel Portable GPs device 1 2+ hours x No pedestrians

Ohmori et al 2005 Japan Cell phone + PDA 13-38 1 week x x x No

Li et al. 2005 Atlanta In-vehicler GPS 182 10 days x x x No Commuter trips

Stopher et al 2005 Australia Wearable GPS 32-51 6 days x x x No

Stopher & Collins 2005 Australia GPS device+ web-based prompted recall

29 1 day x x x No

Chung & Shalaby 2005 Toronto Wearable GPS 1 --- x x No

Forrest & Pearson 2005 Laredo In-vehicle GPS 150 1 day x

Axhausen et al. 2004 Sweden In-vehicle GPS 186 30 days x x x x No Ad hoc processing

Wolf et al. 2004 Sweden In-vehicle GPS 186 30 days x x x x No Ad hoc processing

Stopher et al. 2003 Sydney In-vehicle GPS + prompted recall 52 5 days x x x No

Pierce et al. 2003 USA In-vehicle GPS + PDA 20 4 m x x x No

Zmud & Wolf 2003 USA Wearable GPS 292 20weeks x x x No

Marca et al. 2002 USA In-vehicle GPS 4 ? x x x x No Kriging surfaces

Schonfelder et al 2002 Sweden In-vehicle GPS 310 v 14 months

x x x x No

Page 16: RESEARCH ROADMAP REPORT - CORDIS · the quality of the cluster analysis on the trajectory data is preserved. In [MTRPB10] authors presented the general idea of an approach for the

page 16/60

2.2. Environmental Monitoring

Automatic and participatory environmental monitoring is an active research area in recent years. Several environmental observing systems have been designed and implemented (Arzberger 2005). These include the Oceans Observatory Initiative (OOI), the National Ecological Observatory Network (NEON), the Collaborative Large-scale Engineering Analysis Network for Environmental Research (CLEANER), the Hydrologic Observatory Initiative (HOI), the Southern California Earthquake Center (SCEC) (http://www.scec.org/), the Real-time Observatories, Applications, and Data management Network (ROADNet) (http://roadnet.ucsd.edu/), Science Environment for Ecological Knowledge (SEEK) (http://seek.ecoinformatics.org/), and Laboratory for the Ocean Observatory Knowledge INtegration Grid (LOOKING) (http://lookingtosea.ucsd.edu/). The monitoring targets vary including sea (e.g. LOOKING), sky, the earth (e.g. SCEC), the galaxies, plants, seabird nesting environment (Mainwaring, Polastre et al. 2002) and animals (Naumowicz, Freeman et al. 2008). However, these systems mainly use the mote-based sensors and none of these observatories targets acoustic observation. Mote-based sensors have program memory and data storage capacity and they can communicate with other sensors. However, their memory and data storage capacities are very limited and the communication range is quite short.

A number of systems exist for monitoring animal behaviour using acoustic or imagery sensors. For example, Gage’s system for monitoring birds’ calls (Gage, Ummadi et al. 2004), strapping cameras to crows (Greenemeier 2007), Deer Net (DeerNet 2007). These projects use webcams or home-made systems containing webcams and radios as sensors. The drawbacks of these sensors are their short communication distance.

The Owl project at MIT (http://owlproject.media.mit.edu/) used mobile phones to capture bird calls via two different approaches. The first approach allows a phone to be called and sound to be captured through a VOIP connection. The second approach allows sound capture through a custom microphone array connected to smartphones via Bluetooth. We have found the sound quality of ordinary phone calls (which are filtered and compressed by the telephone hardware and network) to be too poor to use. Mobile phones have also been used as sensors for personal medical sensing or for social networking (Oliver and Flores-Mangas 2006; Kansel, Goraczko et al. 2007). The drawbacks of using mobile phones as sensors are their relatively high communication cost. Mobile phone sensors also need more power than traditional mote sensors.

Unlike other mote-based sensors, smartphone sensors have relatively large memory, higher processing power, and can communicate with existing mobile networks directly. Relying on existing mobile networks means that we can better use existing networking facilities and avoid the in-network processing in mote-based systems. The sophisticated programming facilities are another advantage arising from the popularity of smartphones. Micro-servers having all these capabilities are more expensive than smartphones. By choosing to use smartphones we benefit from the investment in their hardware capabilities and their relatively cheap price.

For acoustic sensing, data loggers provide a simple solution but cannot be remotely controlled nor can data be accessed in real-time. The difficulty of accessing the selected grounds such as an airport or an isolated island makes data loggers undesirable. To control data collection and to make it available in real time requires some kind of networking. Audio data is large, so a high bandwidth connection is necessary. Directional WiFi offers a potential solution; however, several powered repeaters would be required. A bigger problem with the airport application is the need to

Page 17: RESEARCH ROADMAP REPORT - CORDIS · the quality of the cluster analysis on the trajectory data is preserved. In [MTRPB10] authors presented the general idea of an approach for the

page 17/60

get any high power radio equipment approved by the airport to guarantee that it will not interfere with the airports existing communications infrastructure.

In Australia, Telstra has deployed a state of the art 3G network across both urban and rural areas. This is present at the selected sites (Brisbane airport, Samford Ecological Research Facility, and St Bees Island) and offers an ideal communications medium. The disadvantage of 3G networks over WiFi is the relatively high operating cost. We believe the prices for 3G networking and smartphones will fall due to the mass production, market competition, and the nature of electronic products.

The sensors themselves need to be powerful enough to record and compress audio, to temporarily store audio data files and to drive a 3G radio. They also need to be programmable and remotely controllable so that scientists can manage and reconfigure them without going to the field. Currently, these requirements favour some kind of micro-server rather than mote type sensors. Smartphones typically contain a powerful processor, tens of megabytes of memory, SD card storage facility, a lithium ion battery, a microphone and a 3G radio. In addition, they are programmable. Phone software is fixed and cannot be easily changed, so a phone operating system which supports the addition of new services and applications is necessary. We have used phones running the Windows Mobile operating systems, but others could be used. It should be noted that smartphones can be obtained at a similar price as cheap notebooks. While notebooks require considerably more power than smartphones, they can be used, at least as a test platform.

We have also found the inbuilt microphones on smartphones to be very good, with enough sound frequency response to satisfy the needs of ecologists. We also have the option to use the hands-free capability to connect a better quality microphone if required. Unfortunately, the sound is only monaural. We use external microphones for better quality, weatherproof, and flexibility for positioning reasons. The external microphones can also be connected to amplifiers to boost the volume of signals.

Due to the size of the acoustic/photographic data obtained from sensors and corresponding power requirements, it is not possible for sensors to maintain continuous recording. The sensors need to be designed to take periodic readings according to a schedule dictated by the scientists using a network.

3. Open research issues

This WG2 points to a rethinking of the architecture of human activity measurement to accommodate multiple dimensions and multiple scales. But to what extent has data collection aided by some ICTs put us in a position to understand the effects of ICTs in general on spatio-temporal behaviour? At the current time, the greatest benefit no doubt comes from the rapidly improving potential for person-based (as opposed to vehicle-based or place-based) observation. The combination of micro-behavioural detail and long periods of observation would seem to offer, in the near future, the prospect of holistic analyses of activity space/time at a level of sophistication never before seen, and with unprecedented opportunities to study the spatio-temporal fragmentation of activities. In addition, multi-person streams of data can extend this to households, social networks and groups. There is promise, albeit less imminent, to observe the

Page 18: RESEARCH ROADMAP REPORT - CORDIS · the quality of the cluster analysis on the trajectory data is preserved. In [MTRPB10] authors presented the general idea of an approach for the

page 18/60

articulation between personal movement and personal communication, by linking automatically logged communications episodes to automatically logged personal movement using LADs. The interpretation of such articulation, for example in terms of scheduling and micro-coordination, and the observation of other ICT-related phenomena such as multi-tasking, depends for the foreseeable future on what we have called the “active track”, in which respondents react to summary representations of their own data and add content to the observations. But the tools of “prompted recall” are also developing rapidly.

From a research perspective, this can be seen as a highly adaptive approach to surveys. But in addition to the increasing difficulty to set priorities among a large number of possibilities, we can expect that the collective effects of a burgeoning number of ICT-aided studies will lead to increased public concern about privacy. Until now, the properly informed respondents have almost universally welcomed, not feared, technologies such as GPS when used in surveys; indeed, we could reasonably conclude that far fewer respondents have thought in terms of “Big Brother” than was widely anticipated. But there is a legitimate and real concern about the cumulative build up of data, especially those on personal movements. We should expect increasingly strong ethics guidelines for publicly funded research. In the United States, certain types of data must be destroyed when the study is finished, and to the authors’ knowledge this has been applied to at least one GPS study: such a rule, widely applied, would be a major loss to any work on the evolution of activity and travel patterns. There are some obvious risks: sloppy data management; the abuse of research data in law suits; or the publication of results that compromise an individual, a household or a neighbourhood. These risks have been around for decades and are largely in the hands of researchers to avoid. In the case of most universities, researchers must seek the counsel and approval of ethics review boards. It is the experience of the authors that such boards help to clarify the available solutions. They will want assurance that respondents are fully informed about the nature of the monitoring, and are well instructed in how to opt out at any stage, before giving their consent, and that data security measures are commensurate with the sensitivity of personal time-space traces. The nature of ICT-aided measurement means that some errors of judgement by researchers could attract sensational publicity. As in other fields, there are some things that we may learn how to do that we choose not to do. There is a need for survey researchers to share ways of respecting reasonable limits on our data collection, and of being proactive in designing our own procedures to protect our respondents.

One key way to continue to offset future privacy concerns is to focus on applications that are seen to “empower” consumers rather than “surveille” them. This implies that the plethora of data that can be obtained automatically from a person, be it GPS, physiological, or otherwise, is transformed into useful information that the respondent has complete control over using, sharing, selling, or feeding into other applications for their own personal benefit.

Further advances in wearable tracking devices are sure to open not only new applications, but also those with more capacity. In the case of LAD technologies, new applications will come in the form of indoor movement tracking (as signals become more usable), longer term observation for the detection of behavioural patterns such as routines, disturbances or triggers (as battery drain is minimized), and more specific behavioural pattern identification such as impaired movements (as coordinates become more accurate and/or precise). New applications will also surely accrue as GPS is combined with other wearable or local information to produce new synergies (accelerometers, physiological conditions, telecommunications use, weather, ambient), or as larger segments of the population are become location-enabled, allowing detection of group or mass-movement behavioural patterns. In principle, evaluation methods could thereby embrace

Page 19: RESEARCH ROADMAP REPORT - CORDIS · the quality of the cluster analysis on the trajectory data is preserved. In [MTRPB10] authors presented the general idea of an approach for the

page 19/60

new classes of real-time experiment, for example of user response to dynamic congestion pricing of roads.

Perhaps the key to radically new applications will not be how it is designed, but rather how it is processed – meaning the transformation of large amounts of personal movement data into more viable, simple and useful information for the analyst and the respondent alike. This is beginning to happen in the context of particular studies, such as the public health examples. But there is also tremendous potential from the integration of data over large territories and long periods, provided that the concerns about cumulative archiving are addressed. Some aspects of individual and public decision-making could be transformed by the synoptic mapping of such data, made possible by the flexibility to freely aggregate the data at many different temporal and spatial scales. For example, the mapping of energy use or carbon production resulting from spatio-temporal behaviour could integrate patterns that currently are observed only within silos such as vehicle trips or residential fuel demand. In the future, such synopses may have wide implications for decision-support in public policy areas such as land-use and transport planning, resource allocation or energy efficiency. It may also turn out that some of the same tools will help individuals make more sense of the environmental and financial consequences of the choices they make as consumers, home-buyers or job-seekers.

4. Dissemination, training/educational activities

Dissemination exploits the organisation of workshops, tutorials, special sessions in conferences, and joint publications that emphasise an in depth multi-disciplinary discussions on the latest research and new experimental approaches aimed at addressing the current gaps in knowledge in the field of MODAP. The topics considered in scope for MODAP include, but are not limited to:

Tracking and Sensing Issues

- Experiences in providing the location of mobile objects: Is the future really about real-time processing? Do we really need massive data collection?

- Experiences with privacy as a concern when collecting the data sets: How to unravel trust and confidence problems, as well as security and privacy rights?

Methodological Issues

- Experiences in analysing tracking loggings: Can we really understand implicit and explicit rules underlying movement behaviour based on trajectory patterns?

- Experiences in developing and applying privacy preserving data mining, space-time reasoning, and visualisation techniques: What have been the advantages and limitations?

Technology Issues

- Experiences with mobile technologies: How can we make better use of existing mobile technologies? Are there gaps? Is there a need for new technologies?

Page 20: RESEARCH ROADMAP REPORT - CORDIS · the quality of the cluster analysis on the trajectory data is preserved. In [MTRPB10] authors presented the general idea of an approach for the

page 20/60

5. Conclusions

Sustainability is just one word and yet there exists over 300 definitions. The term was originally used in 1987 by the World Commission on Environment and Development, which coined what has become the most often-quoted definition of sustainable development as “forms of progress that meet the needs of the present without compromising the ability of future generations to meet their needs.” In particular, sustainable mobility refers to the needs of society to move freely, gain access, communicate, trade and establish relationships without sacrificing other essential human or ecological requirements today or in the future (Mobility Project 2030, World Business Council for Sustainable Development). This can only be achieved by establishing a set of principles that provide a framework for policy goals that will change over time, in response to the priorities in the economy (e.g. access to jobs and economic resiliency), social/equity (e.g. mobility choices, health societies, and community legacy) and environment (e.g. climate change, pollution, energy use, landscape, and resource efficiency). The strategies derived from these principles would not only be about the modes people are using, nor only about transportation. They will evolve from the knowledge about the collective movement patterns that are evidence of the human behaviour at different scales. The proliferation of mobile technologies for “everywhere, anytime” services and applications is already helping in the fulfilment of some of these strategies in travel behaviour, nature preservation, and health monitoring. Recently, Gartner Inc. has identified eight mobile technologies that have evolved significantly through 2010, and will have an impact on short-term sustainable mobility strategies and policies. They are Bluetooth 3.0, Mobile User Interfaces (UIs), Location Sensing, 802.11n, Display Technologies, Mobile Web and Widgets, Cellular Broadband, and Near Field Communication. The work in the MODAP WG2 is expected to have an impact on the use of location sensing technology and its wider applicability in supporting sustainable mobility by developing innovative applications.

References

Ahas, R., Aasa, A., Mark, Ü., Pae, T. & Kull, T. (2007). Seasonal tourism spaces in Estonia: case study with mobile positioning data. Tourism Management, 28, pp. 898-910.

Arzberger, P. (2005). Sensors for Environmental Observatories. National Science Foundation

(NSF) Report, Center for Embedded Network Sensing.

Asakura, Y. & Iryo, T. (2007). Analysis of tourist behaviour based on the tracking data collected using a mobile communication. Transportation Research Part A, 41, pp. 684-690.

Auld, J., Williams, C., Mohammadian, A. & Nelson, P. (2009). An automated GPS-based prompted recall survey with learning algorithms. Transportation letters, 1, pp. 58-79.

Axhausen, K.W., Zimmermann, A., Schonfelder, S., Rindsfuser, G. & Haupt, Th. (2002). Observing the rhythms of daily life: A six-week travel diary. Transportation, 29, pp. 95-124.

Battelle (1997). Lexington area travel data collection test. Final Report. Prepared for Federal Highway Administration, September, 1997.

Page 21: RESEARCH ROADMAP REPORT - CORDIS · the quality of the cluster analysis on the trajectory data is preserved. In [MTRPB10] authors presented the general idea of an approach for the

page 21/60

Bellemans, T., Kochan, B., Janssens, D., Wets, G. & Timmermans, H.J.P. (2008). In the field evaluation of the impact of a GPS-enabled personal digital assistant on activity-travel diary data quality. Presented at 87th Annual Meeting of the Transportation Research Board, Washington, DC.

Bohte, W. & K. Maat (2008)., Deriving and validating trip destinations and modes for multi-day GPS-based travel surveys: An application in the Netherlands. Proceedings 87

th Annual Meeting of

the Transportation Research Board, Washington D.C., USA.

Chung, E. & Shalaby, A. (2005). Development of a trip reconstruction tool for GPS-based personal travel surveys. Journal of Transportation Planning and Technology, 28, pp. 381-401.

DeerNet. (2007). DeerNet: Wireless sensor networking for wildlife behavior analysis and interaction modeling. Retrieved 29 Oct., 2007, from https://winet.ece.ufl.edu/deernet/index.php/Main_Page.

Doherty, S.T., Lee-Gosselin, M.E.H. & Papinski, D. (2006). Internet-based prompted recall diary with automated GPS activity-trip detection: system design. Proceddings of the 85

Th Annual

Conference of the Transportation Research Board, Washington, D.C., USA

Du, J. & L. Aultman-Hall (2007). Increasing the accuracy of trip rate information from passive multi-day GPS travel datasets: Automatic trip end identification issues. Transportation Research A, 41, pp. 220-232.

Forrest, T.L. & D.F. Pearson (2005). A comparison of trip determination methods in GPS enhanced household travel surveys. Presented at 84th Annual Meeting of the Transportation

Research Board, Washington, D.C., 2005.

Gage, S., P. Ummadi, et al. (2004). Using GIS to develop a network of acoustic environmental sensors. ESRI International User Conference.

Greenemeier, L. (2007) The Secret Lives of Tool-Wielding Crows. Scientific American, October 4, 2007

Hato, (2006), Development of behavioral context addressable loggers in the shell for travel-activity analysis. Proceedings IATRB Conference, Kyoto, Japan.

Itsubo. S. & E. Hato (2006). A study of the effectiveness of a household travel survey using GPS-equipped cell phones and a WEB diary through a comparative study with a paper based travel survey. Proceedings 85

th Annual Meeting of the Transportation Research Board, Washington

D.C., USA.

Kansel, A., M. Goraczko, et al. (2007). Building a sensor network of mobile phones. Proceedings

of the 6th International Conference on Information Processing in Sensor Networks (IPSN ’07):

547-548.

Krijgsman, S., Nel, J. & De Jong, T.G. (2008). Deriving transport data with cellphones: Methodological lessons from South Africa. Paper presented at the 8th International Conference

on Survey Methods in Transport, Annecy, France.

Page 22: RESEARCH ROADMAP REPORT - CORDIS · the quality of the cluster analysis on the trajectory data is preserved. In [MTRPB10] authors presented the general idea of an approach for the

page 22/60

Li, H., Guensler, R., Ongle, J. & Wang, J. (2005). Using GPS data to understand the day-to-day dynamics of the morning commute behavior. Proceedings 84

th Annual Meeting of the

Transportation Research Board, Washington D.C., USA.

Li, Z. & A. Shalaby (2008). Web-based GIS system for prompted recall of GPS-assisted personal travel surveys: system development and experimental study. Proceedings 87

th Annual Meeting of

the Transportation Research Board, Washington D.C., USA.

Mainwaring, A., J. Polastre, et al. (2002). Wireless Sensor Networks for Habitat Monitoring. Proceedings of the 1st ACM international workshop on Wireless sensor networks and

applications. Atlanta, Georgia, USA: 88-97

Marca, J., Rindt, C.R. & McNally, M. (2002). Collecting activity data from GPS readings. Proceedings 81

st Annual Meeting of the Transportation Research Board, Washington D.C., USA.

Moiseeva, A., Arentze, T.A. & H.J.P. Timmermans (2009), Imputing relevant information from multi-day GPS tracers for retail planning and management using fusion and context-sensitive learning. The 16th RARSS conference, Niagara Falls, Canada, July 6 – 9 2009.

Murakami, E. & Wagner, D.P. (1999). Can using Global Positioning System (GPS) improve trip reporting? Transportation Research C, 7, pp. 149-165.

Naumowicz, T., R. Freeman, et al. (2008). Autonomous monitoring of vulnerable habitats using a wireless sensor network. The workshop on real-world wireless sensor networks ACM, New York, NY, USA 51-55

Ohmori, N., Nakazato, M. & Harata, N. (2005). GPS mobile phone-based activity diary survey. Proceedings of the Eastern Asia Society of Transportation Studies, 5, pp. 1104-1115.

Ohmori, N., Nakazato, M., Harata, N., Sasaki, K. & Nishii, K. (2006). Activity diary surveys using GPS mobile phones and PDA. Proceedings 85

th Annual Meeting of the Transportation

Research Board, Washington D.C., USA.

Oliver, N. and F. Flores-Mangas (2006). HealthGear: A Real-time Wearable System for Monitoring and Analyzing Physiological Signals. Proceedings of the International Workshop on

Wearable and Implantable Body Sensor Networks (BSN'06): 61-64.

Pierce, B., Casas, J. & Giaimo, G. (2003). Estimating trip rate under-reporting: preliminary results from the Ohio household travel survey. Proceedings 83

th Annual Meeting of the

Transportation Research Board, Washington D.C., USA.

Schönfelder, S., K.W. Axhausen, N. Antille & M. Bierlaire (2002). Exploring the potentials of automatically collected GPS data for travel behaviour analysis - a Swedish data source. In J. Möltgen & A. Wytzisk (Eds.), GI-Technologien für Verkehr und Logis-tik, IfGIprints, 13, Institut für Geoinformatik, Universität Münster, Münster, 220, pp.155-179.

Schuessler, N. &Axhausen, K. (2008). Identifying trips and activities and their characteristics from GPS raw data without further information. Paper presented at the 8th International

Conference on Survey Methods in Transport, Annecy, France.

Page 23: RESEARCH ROADMAP REPORT - CORDIS · the quality of the cluster analysis on the trajectory data is preserved. In [MTRPB10] authors presented the general idea of an approach for the

page 23/60

Shoval, N. & Isaacson, M. (2006). The application of tracking technologies to the study of pedestrian spatial behaviour. The Professional Geographer, 58, pp. 172-183.

Shoval, N. & M. Isaacson (2007). Tracking tourists in the digital age. Annals of Tourism

Research, 34, pp. 141-159.

Spek, S. van der (2008). Measuring and observing pedestrian activity: Tracking pedestrians in Norwich, Rouen and Koblenz. Proceedings Walk 21.

Stopher, P, Bullock, P.J. & Horst, F.N.H. (2003). Conducting a GPS survey with a time-use diary. Proceedings 83th Annual Meeting of the Transportation Research Board, Washington, D.C.

Stopher, P., Greaves, S. & Fitzgerald, C. (2005). Developing and deploying a new wearable gps device for transport applications. Proceedings 18

th Australasian Transport Research Forum,

Sydney, Australia.

Stopher, P. & Collins, A. (2005). Conducting a GPS prompted recall survey over the internet. TRB Annual Meeting, CD-ROM, Transportation Research Board, National Research Council, Washington, D.C..

Tsui, S.Y.A. & Shalaby, A.S. (2006). An enhanced system for link and mode identification for GPS-based personal travel Surveys. Proceedings 85

th Annual Meeting of the Transportation

Research Board, Washington D.C., USA.

Vries, B. de, Lin, Y. & Jessurun, J. (2008). Sense of the City. Proceedings DDSS Conference, Leende, The Netherlands.

Wolf, J., Loechl, M., Myers, J. & Arce, C. (2001). Trip rate analysis in GPS-enhanced personal travel surveys. International Conference on Transport Survey Quality and Innovation Kruger

Park, South Africa. August 2001.

Wolf, J., Oliviera, M. & Thompson, M. (2003). Impact of underreporting on mileage and travel time estimates: results from Global Positioning System-enhanced household travel survey. In Transportation Research Record: Journal of the Transportation Research Board, No. 1854, TRB, National Research Council, Washington D.C., 2003, pp. 189-198.

Wolf, J., Schönfelder, S., Samaga, U., Oliveira, M. & Axhausen, K.W. (2004). Eighty weeks of GPS traces: approaches to enriching trip information. Proceedings 83

th annual meeting of the

transportation research board, Washington D.C., USA.

Zhou, J. & Golledge, R. (2007). Real-time tracking of activity scheduling/schedule execution within unified data collection framework. Transportation Research A, 41, pp. 444-463.

Zmud, J. & Wolf, J. (2003). Identifying the correlates of trip misreporting - results from the California statewide household travel survey GPS study. 10th International conference on Travel Behavior Research, 2003.

Page 24: RESEARCH ROADMAP REPORT - CORDIS · the quality of the cluster analysis on the trajectory data is preserved. In [MTRPB10] authors presented the general idea of an approach for the

page 24/60

MODAP WG3: Mobility Data Collection and Representation

Christine Parent (UNIL)

1. Introduction

The general goal of MODAP is to coordinate and promote mobility research activities by 1) creating a forum to bring the fragmented research work together, 2) consolidating the results that have already been obtained, and 3) paving the way for future research directions and innovative application ideas.

The specific domain addressed by WG3 is data collection and representation. The intent, as specified in the project proposal, is for WG3 to investigate how the use of improved data collection techniques can provide applications with data sets that are both richer and more compact than the initial raw data collected by sensors. To achieve this goal WG3 necessarily has to explore which representations, in terms of data structures and concepts, are best suitable to endorse mobility data and organize it in the way that best corresponds to application requirements, while caring for data privacy protection.

WG3's choice to foster this activity has been to first build a common reference framework to help in homogenizing the vocabulary used by different teams. The framework will provide agreed definitions for basic terms and concepts in mobility data handling, as well as agreed semantic descriptions of the methods that can be used to process mobility data.

Beyond elaborating a description of the domain (one could say a mobility ontology) that would allow any researcher external to the field to quickly understand the major features of the research domain, the benefit of such a framework mainly relies in facilitating exchanges among the participating teams and consequently promoting complementarities of efforts.

The above general goals materialize into a work programme that on the one hand focuses on representational issues mainly at the conceptual level to promote reasoning at the application level in terms of trajectories and behaviors of mobile objects, and on the other hand on processes, known as trajectory reconstruction, that actually perform the transformation from low level data into semantically meaningful trajectories. The output of these activities provides a suitable input for further activities in knowledge extraction, behavior understanding and patterns identification: trajectory analyses and trajectory mining are typical examples of such knowledge-oriented processes.

Following a general trend, WG3 focuses on movement data generated by moving agents, e.g., movement of persons (for e.g. city services planning), animals (for e.g. environmental monitoring), cars (for e.g. traffic management), parcels (for logistics). Records of such movements typically hold a sequence of (point, instant) couples that memorizes the evolving position of the agent, spatially represented as a point. This is the raw data collected by GPS and other similar devices. Other forms of movement exist (e.g. movement of body parts, movement of

Page 25: RESEARCH ROADMAP REPORT - CORDIS · the quality of the cluster analysis on the trajectory data is preserved. In [MTRPB10] authors presented the general idea of an approach for the

page 25/60

pollution clouds represented as areal objects with shape deformations) but they are only of interest for specific applications.

The sequel of this report discusses a preliminary state of art in data collection, representation and analysis, and develops a first set of ideas for a research roadmap identifying significant open research issues in the field.

2. State of the art

2.1 Trajectory Modeling: An Application-Oriented View of Movement

To be useful to applications, raw movement data has to be synthesized and reorganized into data structures that correspond to application's requirements. To this purpose researchers have defined conceptual data models that provide a semantic view on movement with the genericity and flexibility needed to accommodate representational requirements from a variety of applications.

A substantial consensus fostered the proposal in [SPD+ 08] elaborated in the context of the GeoPKDD EU project. This work emphasized the need to model movement in terms of trajectories, each trajectory holding a segment of the agent's movement that is a semantic unit of interest for a given application. Each trajectory is a sequence of (point, instant) couples, and is identified by the agent and the two specific spatio-temporal positions, called Begin and End, that are the first and the last positions of the agent for this trajectory. Agents usually perform many trajectories, one after the other and temporally disjoint. Trajectories may include semantic gaps, i.e. periods where movement is irrelevant and therefore purposely not captured, as well as holes, i.e. periods where data is accidentally missing due to some data capture problem (e.g. a car going through a tunnel).

The trajectory concept provides the basic construct to support application's processes. It has been used for e.g. animal monitoring [AA 07]. It has also been used (e.g. in [Z 09]) to provide a more structured representation of movement as trajectories with alternating stops (periods where the movement is considered as stationary) and moves (periods in between stops). In many other cases trajectory data is enhanced with additional semantic data aiming at providing answers to questions such as what where the activities of the agent during a trajectory, what were the places visited, what was the purpose of the trajectory, and so on. Such additional data materializes as annotations on trajectories [GMSK 08] and trajectory components (e.g. stops and moves). Having a set of annotations associated with trajectories enables their analysis in whatever terms that correspond to application goals. For example, annotations about means of transportation used by moving persons enable transport-related analyses within a city [ZCX+ 10]. To support analyses, trajectories are dynamically structured into episodes that are homogeneous with respect to the value of a given annotation.

Another key to specific interpretations of trajectories is their behavioral features. Trajectory

behavior (also called pattern) is indeed frequently useful for characterizing a trajectory or a part of it. Formally, behavior is a predicate that bears on any characteristics (spatial, temporal, semantic) of the trajectories, and selects all the trajectories that contain at least a segment that complies with the predicate. Behaviors may be simple or complex, i.e., behaviors defined as including simple behaviors. Examples of frequent behaviors are:

• Shape behaviors used to characterize and select the trajectories that draw a specific shape in space, like the "Straight", "Star", "Loop" behaviors;

• Geo-located behaviors, e.g., crossing or staying for a while in a specific place or a specific kind of place;

Page 26: RESEARCH ROADMAP REPORT - CORDIS · the quality of the cluster analysis on the trajectory data is preserved. In [MTRPB10] authors presented the general idea of an approach for the

page 26/60

• "Go and come back" behavior;

• "Repetitive" behaviors where the moving agent repeats the same behavior many times.

• "Sequence" behaviors define a list of simple behaviors that must be satisfied in a specific order while traveling the trajectory. For example: Start at some given point, then pass through a specific place P1, later pass through a specific place P2, and finally end at the starting point.

• A "Tourist" behavior may be defined as: at least 70% of the trajectory stops are in places of kind "TouristPlace".

In many applications, the ultimate goal of analyzing trajectories is getting more information about their moving agents. The agent behavior is a list of activities that are performed by the agent during the trajectories. This list may be enhanced by the identification of the places and times where and when the activities took place. The general agreement on the above set of concepts limits ongoing research in trajectory modeling to focus on formal definitions of additional concepts derived to support specific uses of trajectory data. Alternative models may be found in specific domains, e.g. robotics [R 09], but given their restricted scope they are not considered here.

2.2 Trajectory reconstruction

Raw movement data in itself is semantically poor. Moreover, the imprecision and malfunctioning of GPS devices frequently lead to erroneous values in the data and missing as well as noisy data. Techniques for data cleaning are therefore needed (see e.g. [SA 09]). These techniques include error correction, outliers' detection [LHL 08], noise elimination, and interpolation of missing points. Data compression is also used to reduce the volume of source data. The literature is plentiful of contributions on these issues, which constitute a traditional research area in series or sequence data processing. When dealing with spatially-constrained data, as in movement of cars or trucks, trains, and planes, map-matching processes have to be used to ensure that the points in a trajectory are indeed positioned on the underlying network and that their sequence form a meaningful path within that network [QON 07]. Once the raw data has been cleaned and corrected, it can be segmented into trajectories. This is the trajectory identification step (see e.g. [SA 08]), where several criteria can be used, typically based on spatial and temporal gaps in the raw data sequence.

2.3 Structuring trajectories

Often applications need more than raw trajectories, they need structured trajectories where semantic has been added by analyzing the raw trajectories, their geographical and temporal context and, if they exist, the annotations provided by the application. A frequent analysis is segmenting the trajectory into episodes of kind stop and move. A stop is defined as any maximal part of the trajectory such that, during at least a minimal amount of time, the speed is lower than a given threshold [PBK+ 08], or the direction varies a lot [MTO+ 10], or the moving object stays (possibly moving) in one of the geographic places defined by the application as of interest [ABK+ 07]. In a second step more information may be attached to the stops. For instance, [PBK+ 08] annotates every stop whose location is inside the geographic extent of one of the objects defined as of interest for the application. The annotation identifies the geographic object.

Beyond stops and moves, other kind of episodes may be inferred. For instance, [ZCX+ 10] automatically infers the transportation modes of humans from their GPS trajectories by analyzing their velocity, acceleration, heading change rate, and stop rate. The trajectories are segmented into episodes according to the changes of transportation means.

Page 27: RESEARCH ROADMAP REPORT - CORDIS · the quality of the cluster analysis on the trajectory data is preserved. In [MTRPB10] authors presented the general idea of an approach for the

page 27/60

2.4 Inferring the behavior of a trajectory

Dodge et al. have defined a classification of trajectory behaviors, called patterns [DWL 08]. Most of the patterns are defined for sets of trajectories, i.e. patterns that may show up only when there exist several trajectories, e.g., the Convergence pattern. A few patterns apply for a single trajectory. The classification is based on the spatial and temporal characteristics of the trajectories. Single trajectory patterns listed by [DWL 08] are: the patterns that define trajectories (or part of trajectories) such that their spatio-temporal characteristics do not change (Constancy), change irregularly (Fluctuation), or change regularly (Trend); the operator-patterns that combine patterns (Repetition, Periodicity); and the sequence pattern which is defined as an ordered list of places to be visited, possibly with temporal constraints. The sequence pattern is more semantic than the previous ones, because it refers to geographic objects of the application. Other patterns have been defined, like the spatial patterns that are defined by the spatial shape drawn by the trajectory (or part of it). Examples of spatial patterns are Star, Straight, Loop.

On the other hand, many works analyze the trajectories in order to find if they comply with some pattern and which one(s), and in fine to infer the activity or goal pursued by the agent. Many approaches have come out in the last few years in the Artificial Intelligence field. Most of them use classification techniques such as decision trees or Markov models to infer the activity of the person. For instance, [GNP+ 07] perform Sequence pattern mining, where a Sequence pattern is defined in terms of both space (i.e., the places visited) and time (i.e., the duration of visits). Some approaches use an elaborate description of the application context in order to infer more semantics about the trajectories and their agents. In the Athena project [BMR 09] the domain knowledge is represented in an ontology and axioms help inferring agents' behaviors, e.g., persons whose trajectory stops in hotels and museums are classified as tourists. The AIDA system [AIDA] analyzes the trajectories of cars in a city and uses knowledge about the application context, like business and shopping districts, tourist and residential areas, real-time event information and environmental conditions. Driver preferences are also integrated into AIDA. One mandatory task for AIDA is to predict the destination of the drivers as well as the most likely route that they will follow.

2.5 Mining sets of trajectories

In their classification of trajectory behaviors, Dodge et al. listed many patterns for sets of trajectories [DWL 08]. Their list contains three main classes of patterns: The patterns that require a set of trajectories sharing the same or similar spatio-temporal characteristics (Meet, Moving-cluster); The patterns that require a set of trajectories with spatio-temporal characteristics that change in the same way (Concurrence), in opposite ways (Opposition), or in random ways (Dispersion); The last class of patterns are application dependent patterns that are meaningful only in their specific application domain. Examples are Pursuit/Evasion for animals and Congestion for cars.

A group of works developed methods for discovering patterns by focusing on the geometrical properties of the set of trajectories. For instance [LKI 05] discover patterns like Convergence, Encounter, Flock, and Leadership. Flock patterns have been mined by [BGH+ 06], [GKS 07] [GK 06] and of Moving-clusters by [LHY 04], [KMB 05].

A second group of works mine sets of trajectories in order to get more information about geo-objects of the application, like popular places of interests or routes. For instance, Gkoulalas-Divanis et al. designed algorithms for identifying frequent routes that are traveled by users [GDV08, GDVB 09]. In these works, a route is considered to be frequent if it contains sequences of places that are regularly visited by many people at approximately the same time-of-day (i.e., within pre-specified time windows)

Page 28: RESEARCH ROADMAP REPORT - CORDIS · the quality of the cluster analysis on the trajectory data is preserved. In [MTRPB10] authors presented the general idea of an approach for the

page 28/60

Another group of works partition the set of trajectories in clusters sharing common properties. There exist many clustering algorithms, but they are tailored to work with point data and not moving points. Several works have extended some of these methods to trajectories, clustering either whole trajectories or parts of trajectories. Nanni and Pedreschi [NP 06] proposed T-OPTICS, an adaptation of OPTICS [ABK+ 99] density-based clustering algorithm to whole trajectories. Pelekis et al. [PKK+ 09] proposed an approach that takes advantage of local patterns in the time dimension as the base for identifying clusters of whole uncertain trajectories. Gaffney et al. [CGS 00] [GS 99] proposed probabilistic algorithms for clustering whole short trajectories using a regression mixture model. TRACLUS [LHW 07] is a partition-and-group framework for clustering trajectories based on the discovery of common sub-trajectories. The algorithm partitions the trajectories by using the minimum description length principle. Based on this idea of sub-trajectories, Lee et al. [LHL+ 08] proposed an algorithm for trajectory classification showing that it is necessary and important to mine interesting knowledge on sub-trajectories rather than on whole trajectories.

Lately, research began on trajectory sampling, i.e., finding a representative trajectory for each cluster. Trajectory sampling has a variety of applications including trajectory summarization, visualization, searching and retrieval. In [PKP+ 10] the authors extend the idea of density-biased sampling by taking into account density properties and the similarity of trajectories segments. An approach focusing on the visualization of large trajectories datasets has been recently proposed in [AAR+ 09a], [AAR+ 09b]. The authors use uniform sampling, density-based clustering, and partitioning and medoids. In [PPK 09], the authors proposed an approach for expressing the "representativeness'' of a trajectory via a voting process that is applied for each segment of a given trajectory. Extending this approach, recently, in [PKP+ 10] the authors proposed an unsupervised trajectory sampling approach.

3 Open Research Issues

3.1 Modeling

Trajectory Data Models. Work has to be continued to further characterize and formally define concepts related to movement, trajectory, behavior, and their analyses.

Semantic Trajectory Data Warehouse Model. This work aims at devising a semantic model for trajectory data warehouses. The basic idea is to make trajectory as a first class component in data warehousing. Currently, a model based on episodes has been defined, and some basic operations specified and implemented. The work needs to be extended to OLAP operations for trajectories and the use of complex trajectory objects in fact table.

Semantic Trajectory Visualization. Suitable visual artifacts can be powerful tools for understanding trajectories [AAW 07]. Relying on semantic trajectory models makes visualization readily useful for application users. A method for describing and choosing automatically trajectory visualization artifacts by using ontologies exists. Further work is needed.

Privacy issues and solutions related to the representation of sensitive mobility data must be investigated.

3.2 Privacy

Protection against disclosure of sensitive places. The goal is to extend and consolidate the results achieved so far regarding the protection of sensitive places in location-aware applications. A place is sensitive for an individual if the individual does not want to be known that he has been in this place. Examples of sensitive places are hospitals and religious buildings. A comprehensive

Page 29: RESEARCH ROADMAP REPORT - CORDIS · the quality of the cluster analysis on the trajectory data is preserved. In [MTRPB10] authors presented the general idea of an approach for the

page 29/60

model for the protection of sensitive locations, called PROBE, has been presented in [DBS 10a] and extensions are reported in [DSB 10b, GDS+ 09]. Future work includes extending the PROBE privacy model to include a semantically richer description of places and the protection against location inferences when the position is continuously recorded.

Anonymization of semantic trajectories datasets. Semantic trajectories datasets, representing the places where people stop, may disclose sensitive information, even when the identifiers of the trajectories are removed. This problem has led the researchers to propose novel methods for privacy-aware trajectory data publishing. State-of-the-art research in this area is primarily conducted along two principal directions: providing on-site, restricted access to in-house trajectory data [GDV2 08], and providing off-site publication of sanitized (a.k.a. anonymous) data [TM 08, ABN 08]. The latter direction involves the construction of a dataset that can be broadly disseminated as it prohibits the disclosure of sensitive knowledge regarding users’ whereabouts. However, techniques are still lacking for offering (semantic) K-anonymity on trajectory datasets, where the generalization method will be based on domain ontology. An objective can be to propose a new privacy model called c-safe, where the probability of disclosing places visited by users is below a given value c. The method is based on generalization through a domain ontology, thus preserving semantics of the trajectories.

3.3 Trajectory reconstruction (and noise reduction).

Problem outline: Given a sequence of time-stamped location recordings, concatenate them into ‘realistic’ trajectories. Two variations exist: either online (as soon as recordings are produced and sent to a central server) or offline (recordings are already stored in a database and ad-hoc trajectories are generated for analysis purposes).

Online cases can be characterized by a number of parameters that should be taken into consideration in order to decide whether a new time-stamped location recording belongs to an existing trajectory or initiates a new one, or should be removed because it is redundant, or should be flagged and removed because it is noise.

For the offline case, gaps between consecutive locations can be detected according to trajectory definitions set by the user. The main weaknesses of current results are (i) the values of the parameters should be set by the user, (ii) they do not support network constraints [MFN+ 08]. As for the first, work on (semi-) automatic detection of the ‘good’ settings based on statistics and the identification of different movement types (pedestrian, bicycle, motorbike, car, truck, etc.) can offer a solution so as to apply customized trajectory reconstruction. The second can be addressed by exploring map-matched extensions of existing approaches. Inherent location uncertainty (e.g. GPS errors) should be taken into account.

3.4 Trajectory compression and map matching.

Problem outline: Given a trajectory (perhaps, as the result of the reconstruction step described earlier) which is not necessarily map-matched, compress it as much as possible and, at the same time, make it fit to the underlying road network.

The work of the Piraeus group offers, on the one hand, a theoretical analysis on the effect of

T Y

X

Y

X

T

Page 30: RESEARCH ROADMAP REPORT - CORDIS · the quality of the cluster analysis on the trajectory data is preserved. In [MTRPB10] authors presented the general idea of an approach for the

page 30/60

trajectory compression in querying, and, on the other hand, improves and extends line simplification algorithms to compress noise free trajectories using appropriate shortest path computations. These approaches cover both the offline and the online case. The main weakness of their current results is the running time complexity, which is quadratic. Work is ongoing to reduce this cost.

3.5 Trajectory sampling.

Problem outline: Given a set of trajectories, provide a subset of sample trajectories that more or less illustrates the behavior of the full dataset. Variation: Given a set of trajectories, first partition each one of them with global criteria, so as to get each sub-trajectory represented uniformly w.r.t the whole database, then select a sample of sub-trajectories that best describes the whole database (i.e., in the first case, the result consists of trajectories while in the second case, it consists of sub-trajectories)

Current work of the Piraeus group follows a voting approach where every trajectory votes for its neighbors based on their similarity. As such, every trajectory (actually, every reasonable sub-trajectory) gathers votes and, finally, the overall winners are the representatives / samples of the original dataset. As in the previous piece of work, this work is complemented by evaluating the result of the approach by adopting it in trajectory data mining tasks (e.g. clustering), as well as on its applications in visualization. The main weakness of current results is that robust criteria for assessing the effectiveness of the approach remain to be found.

3.6 Knowledge extraction

Understanding the places visited by a moving person. It is possible to infer a probability ranked list of possible places visited by moving persons. The main aim is to associate to each trajectory stop a (list of) place(s) visited and make a final inference about the overall activity of the traveling person. The heuristics is based on a taxonomy of places and a set of behavioral rules. Possible directions of research include enhancing the heuristics to assign probability to the association between a stop and a point of interest, use more detailed domain-dependent rules, and develop automatic tuning of application-dependent parameters.

Inferring people activity during a stop. Such approaches aim at developing a method for analyzing the movement of a moving agent during a trajectory stop. A stop is not necessarily considered a zero velocity part of a trajectory, but the part of a trajectory where the velocity is considerably lower than the average velocity of the entire trajectory. In a sense, here a “stop” is a place where the agent performs some activity. Applications include animal movements, ships movements, and people moving in a city.

3.7 Mining

Semantic trajectories data mining. Mining movement data is inherently complex due to the role of the context in giving meaning to data and discovered patterns. The idea of this line of research is to define an enhanced knowledge discovery process, where semantic information is used for interpreting discovered patterns. Preliminary results have been reached by analyzing flock patterns in pedestrian movement.

Mining interactions among moving objects. The basic idea is to develop a method for extracting semantics from moving objects’ trajectory interactions. Another objective is to understand the properties of those interactions, which can characterize the interactions and

Page 31: RESEARCH ROADMAP REPORT - CORDIS · the quality of the cluster analysis on the trajectory data is preserved. In [MTRPB10] authors presented the general idea of an approach for the

page 31/60

behavior of the set of moving agents. This work needs a model for describing interactions of moving objects. Currently, methods based on complex network techniques are being investigated.

References

[AA 07] N. Andrienko, G. Andrienko, Designing visual analytics methods for massive collections of movement data, Cartographica, 2007, v.42 (2), pp. 117-138

[AAR+ 09a] Andrienko G., Andrienko N., Rinzivillo S., Nanni M., and Pedreschi D.: A visual analytics toolkit for cluster-based classification of mobility data. In Proceedings of SSTD (2009)

[AAR+ 09b] Andrienko G., Andrienko N., Rinzivillo S., Nanni M., Pedreschi D., and Giannotti F.: Interactive visual clustering of large collections of trajectories. In: Proc of VAST (2009)

[AAW 07] G. Andrienko, N. Andrienko, S. Wrobel, Visual Analytics Tools for Analysis of Movement Data, ACM SIGKDD Explorations, 2007, v.9 (2), pp. 38-46

[ABK+ 99] M. Ankerst, M. M. Breunig, H.-P. Kriegel, and J. Sander. Optics: Ordering points to identify the clustering structure. In Proceedings of the 1999 ACM SIGMOD International Conference on Management of Data, 1999

[ABK+ 07] Alvares, L. O., Bogorny, V., Kuijpers, B., de Macedo, J. A. F., Moelans, B., and Vaisman, A. A model for enriching trajectories with semantic geographical information. In ACM-GIS, 2007, pages 162–169, New York, NY, USA. ACM Press

[ABN 08] O. Abul, F. Bonchi, M. Nanni. Never Walk Alone: Uncertainty for Anonymity in Moving Objects Databases. ICDE 2008.

[AIDA] Affective Intelligent Driving Agent (AIDA) - http://senseable.mit.edu/aida/

[DBS 10a] M.L. Damiani, E. Bertino, C. Silvestri. The PROBE framework for the personalized cloaking of private locations, Transactions on Data Privacy, Aug. 2010

[DSB 10b] M.L. Damiani, C. Silvestri, E. Bertino. Analyzing semantic location cloaking techniques in a probabilistic grid-based map, ACM GIS 2010, Nov. 2010 (Demo)

[BGH+ 06] M. Benkert, J. Gudmundsson, F. Hubner, and T. Wolle. Reporting flock patterns. In ESA ’06: Proceedings of the 14th European Symp. on Algorithms, pp. 660–671, 2006

[BMR 09] M. Baglioni, J.A. de Macêdo, C. Renso, R. Trasarti, M. Wachowicz: Towards Semantic Interpretation of Movement Behavior, in AGILE Conference 2009: 271-288

[CGS 00] I. V. Cadez, S. Gaffney, and P. Smyth, A general probabilistic framework for clustering individuals and objects, In Proceedings of SIGKDD, 2000

[GDS+ 09] G. Ghinita, M.L. Damiani, C. Silvestri, E. Bertino, Preventing Velocity-based Linkage Attacks in Location-Aware Applications, ACM GIS 2009, Seattle (US), Nov. 2009

[GDV 08] A. Gkoulalas-Divanis, V. S. Verykios. A Free Terrain Model for Trajectory K-Anonymity. DEXA 2008.

[GDV2 08] A. Gkoulalas-Divanis, V. S. Verykios. A Privacy-Aware Trajectory Tracking Query Engine. ACM SIGKDD Explorations, 10(1): 40-49, 2008.

[GDVB 09] A. Gkoulalas-Divanis, V. S. Verykios, P. Bozanis. A Network-Aware Privacy Model for Online Requests in Trajectory Data. Data & Knowledge Engineering, 68(4): 431-452, 2009.

Page 32: RESEARCH ROADMAP REPORT - CORDIS · the quality of the cluster analysis on the trajectory data is preserved. In [MTRPB10] authors presented the general idea of an approach for the

page 32/60

[DWL 08] S. Dodge, R. Weibel, A.K. Lautenschütz, Taking a Systematic Look at Movement: Developing a Taxonomy of Movement Patterns. The AGILE workshop on GeoVisualization of Dynamics, Movement and Change, Girona, Spain, May 5, 2008

[GNP+ 07] Giannotti, F., Nanni, M., Pinelli, F., and Pedreschi, D. (2007). Trajectory pattern mining. In ACM KDD 2007, Berkhin, P., Caruana, R., and Wu, X., Eds, pp. 330–339

[GK 06] J. Gudmundsson and M. J. van Kreveld. Computing longest duration flocks in trajectory data. Proceedings of the 14th ACM International Symposium on Geographic Information Systems, 2006.

[GKS 07] Gudmundsson J., Kreveld M. J., and Speckmann B.: Efficient detection of patterns in 2d trajectories of moving points. GeoInformatica, 11(2):195-215 (2007)

[GMSK 08] B. Guc, M. May, Y. Saygin, C. Korner, Semantic Annotation of GPS Trajectories, 11th AGILE International Conference on Geographic Information Science, Gerona, Spain, 2008

[GS 99] Gaffney S., and Smyth P.: Trajectory Clustering with Mixtures of Regression Models. In Proceedings of SIGKDD (1999)

[KMB 05] Kalnis P., Mamoulis N., Bakiras S.: On discovering moving clusters in spatio-temporal data. In Proceedings of SSTD (2005)

[LHL 08] J.-G. Lee, J. Han, and X. Li. Trajectory Outlier Detection: A Partition-and-Detect Framework. ICDE, 140–149, 2008

[LHL+ 08] J.G. Lee, J. Han, X. Li, and H. Gonzalez. Traclass: trajectory classification using hierarchical region-based and trajectory-based clustering. PVLDB, pp. 1081-1094, 2008

[LHY 04] Li Y., Han J., and Yang J.: Clustering moving objects. In Proceedings of KDD 2004

[LHW 07] J.G. Lee, J. Han, and K.Y. Whang, Trajectory clustering: a partition-and-group framework. In Proc. of SIGMOD, 2007

[LKI 05] P. Laube, M. van Kreveld, and S. Imfeld. Finding REMO - Detecting relative motion patterns in geospatial lifelines. In Developments in Spatial Data Handling, Proceedings of the 11th International Symposium on Spatial Data Handling, pp. 201-214, 2004

[MFN+ 08] G. Marketos, E. Frentzos, I. Ntoutsi, N. Pelekis, A. Raffaeta and Y. Theodoridis. Building Real-World Trajectory Warehouses. In the Proceedings of the 7th International ACM SIGMOD Workshop on Data Engineering for Wireless and Mobile Access (MobiDE’08), Vancouver, Canada, 2008.

[MTO+ 10] Manso, J. A., Times, V. C., Oliveira, G., Alvares, L. O., and Bogorny, V. (2010). Db-smot: a direction-based spatio-temporal clustering method. In IEEE International Conference on Intelligent Systems (IEEE IS), 2010

[NP 06] Nanni M. and Pedreschi D.: Time-focused clustering of trajectories of moving objects. Journal of Intelligent Information Systems, 27(3) (2006)

[PBK+ 08] Palma, A. T., Bogorny, V., Kuijpers, B., and Alvares, L. O. A clustering-based approach for discovering interesting places in trajectories. In ACM SAC 2008, pp. 863–868, New York, NY, USA. ACM Press

[PKK+ 09] Pelekis N., Kopanakis I., Kotsifakos E., Frentzos E. and Theodoridis Y.: Clustering Trajectories of Moving Objects in an Uncertain World. In: Proc. of ICDM (2009)

[PKP+ 10] N. Pelekis, I. Kopanakis, C. Panagiotakis and Y. Theodoridis. “Unsupervised Trajectory Sampling”, In the Proceedings of the ECML PKDD 2010 European Conference on

Page 33: RESEARCH ROADMAP REPORT - CORDIS · the quality of the cluster analysis on the trajectory data is preserved. In [MTRPB10] authors presented the general idea of an approach for the

page 33/60

Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD’10), LNAI 6323, pp. 17-33. Springer, Barcelona, Spain, 2010

[PPK 09] C. Panagiotakis, N. Pelekis, and I. Kopanakis. Trajectory Voting and Classification based on Spatiotemporal Similarity in Moving Object Databases, In the Proceedings of the 8th International Symposium on Intelligent Data Analysis (IDA’09), Lyon, France, 2009

[QON 07] M. A. Quddus, W. Y. Ochieng, and R. B. Noland. Current map-matching algorithms for transport applications: State-of-the art and future research directions. Transportation Research Part C: Emerging Technologies, 15(5): 312–328, 2007

[R 09] P. Roduit, Trajectory Analysis using Point Distribution Models, EPFL PhD Thesis N°4262, 2009

[SA 08] N. Schuessler, K. W. Axhausen, Identifying Trips and Activities and their Characteristics from GPS Raw Data without further information. 8th International Conference on Survey Methods in Transport, Annecy, France, May 25-31, 2008

[SA 09] N. Schüssler and K. W. Axhausen. Processing GPS Raw Data Without Additional Information. Transportation Research, 8, 2009

[SPD+ 08] S. Spaccapietra, C. Parent, M. L. Damiani, J. A. de Macedo, F. Porto, and C. Vangenot. A Conceptual View on Trajectories. Data and Knowledge Engineering, 65:126–146, 2008

[TM 08] M. Terrovitis, N. Mamoulis. Privacy Preservation in the Publication of Trajectories. MDM 2009.

[Z 09] Y. Zhixian, Towards Semantic Trajectory Data Analysis: A Conceptual and Computational Approach, PhD Consortium, VLDB'09, Lyon, France, August 24-28, 2009

[ZCX+ 10] Y. Zheng, Y. Chen, X. Xie, and W.Y. Ma, Understanding transportation mode based on GPS data for Web applications, in ACM Transaction on the Web, Association for Computing Machinery, Inc., January 2010

[ZCX+ 10] Y. Zheng, Y. Chen, X. Xie, and W.Y. Ma. Understanding transportation mode based on GPS data for Web applications ACM Transaction on the Web, Association for Computing Machinery, Inc., January 2010

Page 34: RESEARCH ROADMAP REPORT - CORDIS · the quality of the cluster analysis on the trajectory data is preserved. In [MTRPB10] authors presented the general idea of an approach for the

page 34/60

MODAP WG4 – Mobility Data Storage

Yannis Theodoridis (U. Piraeus)

1. Introduction

“The research area of MODAP has the need for representing mobility data in databases to

perform ad hoc querying, analysis, as well as data mining on them. During the last decade, there

has been a lot of research ranging from data models and query languages to implementation

aspects, such as efficient indexing, query processing, and optimization techniques. The

realization of data models proposed in the literature as well as packaging corresponding

functionality to specific technical solutions results in moving object database engines. However,

privacy needs to be integrated tightly into mobility data storage and retrieval as a first step in

mobility data mining. In the context of WG4, the issue of privacy preserving storage and querying

of mobility data will be investigated” (source: MODAP technical annex, p. 20).

2. State of the art in (Privacy-preserving) Mobility Data Storage and Management

The significant advances in the technology of location-detection devices (GPS, RFIDs, etc) have made possible the collection of user location at a very high accuracy. Datasets depicting user mobility are increasingly compiled nowadays to support decision making in tasks such as urban planning, transportation engineering and traffic control, as well as to enable the offering of location-aware advertising and location-based services. However, potential disclosure of accurate trajectory information about individuals to untrusted third parties (e.g., companies that specialize in mining such data) can severely compromise the individuals’ privacy. This is due to the fact that user location, together with the corresponding time of its recording, forms a quasi-identifier and can be used to reveal the true user identity of the users when not properly anonymized. Research on the field combines advances in Moving Object Database (MOD) management systems, MOD aggregation and warehousing, and privacy-preserving MOD publishing. In particular:

2.1. MOD management systems

The MOD research area [GS05] has addressed the need for representing movements of objects (i.e. trajectories) in databases in order to perform ad-hoc querying, analysis, as well as data mining on them. During the last decade there has been a lot of research ranging from data models and query languages to implementation aspects, such as efficient index structures and query processing-optimization techniques. Realization of the data models proposed in the literature as well as packaging corresponding functionality to specific technical solutions results in MOD engines. Focusing on trajectory data types there are two working systems introduced in the field.

Page 35: RESEARCH ROADMAP REPORT - CORDIS · the quality of the cluster analysis on the trajectory data is preserved. In [MTRPB10] authors presented the general idea of an approach for the

page 35/60

The first development, called Secondo, concerns the consecution of the study of abstract moving object data types and algorithms defined in [GBE+00]. Whereas [FGN+00] just provides a succinct look into this issue, in [LFG+03] the authors present a systematic study of algorithms for a subset of the methods introduced in [GBE+00]. The final outcome of this work has been demonstrated in [AGB06]. The system has recently been extended with algorithms for Nearest Neighbor search algorithms [GBX10] and a benchmark framework for evaluating MOD engines [DBG09].

An alternative framework, called Hermes, capable of aiding a database developer in modeling, constructing, and querying a database with dynamic objects that change location, shape and size, either discretely or continuously in time, has been recently introduced in [PTV+06], [PT06], [PFG+08]. Hermes exploits on the extensibility interface of ORDBMS that already have extensions for static spatial data types and methods that follow the Open Geospatial Consortium (OGC) standard, and extends a spatially-enabled ORDBMS by supporting time-varying geometries. It further extends the data definition and manipulation language of the ORDBMS with spatio-temporal semantics and functionality based on advanced spatio-temporal indexing and query processing techniques.

2.2. MOD aggregation and warehousing

Despite the efficiency that a MOD engine can achieve by using indexes and techniques to reduce the size of data related to a trajectory while preserving error bounds, storing all moving object trajectories is simply unfeasible in case the amount of data to manage is streaming, hence unbounded. Thus, analytical applications for massive mobility datasets, such as the one generated by cellular phone tracks, can hardly be built on top of MOD systems like the ones presented in the previous section unless we focus our attention on a restricted timespan. As a solution to this bottleneck, Trajectory Data Warehouse (TDW) offers a powerful technological support to visual analysis of movement data by efficiently aggregating the data in various ways and at different spatial and temporal scales.

TDW is a fairly new research topic, with significant intersection with at least two extensively studied fields: MODs [GS05] and spatial data warehouses [HSK98]. In [GKM+09] the authors exhaustively survey some relevant approaches in both fields and present a description of the recent developments on Spatio-Temporal Data Warehouses (STDW). As stated in [VZ09], there is no commonly agreed definition of what a STDW is and what functionality such a data warehouse should support. The same paper presents a conceptual framework for defining STDWs and a survey of the main approaches in the literature, classified according to a taxonomy of supported spatio-temporal OLAP queries.

The first prototype of a TDW, called T-Warehouse, has been presented in [OOR+07]. It introduces a data model for storing measures related to trajectories, and it proposes methods to deal with the more challenging issues involved in its implementation. Then, the same model is used in [MFN+08] to evaluate design solutions that integrate MODs [PFG+08] and TDWs. Finally, in [LMF+10, RLM+10], this framework is applied to examine traffic data, in combination with tools for the visual analysis of spatio-temporal data. Technically, T-Warehouse is a system that incorporates all the required steps for Visual TDW, from trajectory reconstruction and ETL processing to Visual OLAP analysis on mobility data. In particular, its overall architecture consists of (i) a stream-based module whose purpose is to reconstruct trajectories from the collected observations (i.e. spatial locations with temporal labels); (ii) Hermes MOD storing the reconstructed trajectories and offering powerful and efficient operations for their manipulation and for the implementation of an efficient trajectory-oriented Extract-Transform-

Page 36: RESEARCH ROADMAP REPORT - CORDIS · the quality of the cluster analysis on the trajectory data is preserved. In [MTRPB10] authors presented the general idea of an approach for the

page 36/60

Load (ETL) process; (iii) a TDW which incorporates appropriate aggregation mechanisms suitable for the trajectory oriented cube model; and (iv) a Visual OLAP interface that allows for multidimensional and interactive analysis.

2.3. Privacy-preserving MOD publishing

Two directions of approaches have been recently proposed in the research area of privacy-aware data publishing to help tackle this serious problem. The first direction (e.g., [TM08, ABN08]) aims at providing off-site publication of sanitized trajectory data; the unsafe mobility data is first deprived of sensitive (unique) identifiers and subsequently is anonymized so that the recorded trajectories can no longer be matched to specific individuals – the transformed (a.k.a. anonymous) dataset is then published.

The second direction of approaches, still in its infancy, aims at providing on-site, restricted access to in-house data. Unlike the methodologies in the first category which are based on the implicit assumption that most of the knowledge in a dataset can become publicly available without causing privacy breaches, the approaches of the second category are motivated from the requirement of organizations to keep most of their data private. To achieve that, the data has to reside in-house to the hosting organization and privacy-enhancing mechanisms have to be supported by the MOD to prohibit the disclosure of confidential information when user queries about trajectories are answered. The first approach of this kind was proposed by Gkoulalas-Divanis and Verykios in [GDV08] and operates by generating some carefully crafted fake records of trajectories to augment the real ones in query answer sets of less than K records, thus providing K-anonymous answers to the end-user. The supported types of queries span from traditional queries involving non-spatial, non-temporal attributes to spatiotemporal queries and, in particular, nearest neighbors queries, range queries, queries for landmarks and queries for routes. The proposed privacy-preserving query engine allows the MOD to block two types of attacks, namely user identification and sequential tracking, that would otherwise allow malevolent users to expose the identity of the owners of the trajectories (which are part of the answer set) through the use of carefully crafted ad hoc queries.

3. Open research issues

During the next few years, advances in basic research and prototype system development in the field of (privacy-preserving) mobility data storage and management would span across the following directions.

3.1. Trajectory reconstruction and map matching

Reconstruction of trajectories from position samples (either in the free movement or under road network constraints) is evidently necessary. This procedure involves (i) data cleaning like filtering out noise, (ii) data compression like reducing the erroneous recordings, (iii) map matching if motion is network-constrained, (iv) further data abstraction for computing high-level semantic trajectories, etc. Current MODAP work (by EPFL and Piraeus) includes algorithms and systems for the procedure of trajectory reconstruction. In particular:

o Self-tuned trajectory reconstruction: In order to build a trajectory data warehouse, Marketos et al. [MFN+08] design the method of using several meaningful parameters for trajectory reconstruction as the first step. Typical parameters include spatial gap,

Page 37: RESEARCH ROADMAP REPORT - CORDIS · the quality of the cluster analysis on the trajectory data is preserved. In [MTRPB10] authors presented the general idea of an approach for the

page 37/60

temporal gap, maximum speed, distance tolerance etc. In future work, self-tuning of those parameters for online trajectory reconstruction is a challenge.

o Efficient trajectory compression: Trajectory data is generated continuously and usually in a high frequency. The data will sooner or later go beyond the holding capability of application systems. Therefore, data compression is a fundamental technique for supporting the scalable applications. Considering [MB04] as the starting point, more robust and efficient compression techniques are required to handle the erroneous recordings in trajectory reconstruction.

o Integrated map matching: Regarding network-constrained trajectory data, map matching can be applied for determining the correct positioning and remove the noisy - by integrating positioning data with road networks to identify the correct road on which a vehicle is traveling and to determine the approximate location (if needed) in the road. The map matching can be used for data cleaning as well as data compression (only keep the roads not the exact locations).

o Semantic-based trajectory reconstruction: Semantic-based trajectory model has recently emerged as a hot topic for reconstructing trajectories, such as the stop-move concept in [SPD+08]. From the semantic point of view, raw trajectories by a sequence of GPS points can be abstracted in terms of sequences of meaningful episodes (e.g. ‘begin’, ‘move’, ‘stop’, ‘end’). Yan et al. [YSC+10] [YPSC10] design a computing platform to progressively generate such trajectories from the raw GPS tracking feeds, including spatio-temporal, structured and semantic trajectories. The construction of stops and moves can be computed with given geographic artefacts or only depends on the spatio-temporal criteria like density, velocity, direction etc.

In addition to these perspectives about trajectory reconstruction, real time trajectory computation becomes a demanding requirement in real life trajectory applications. Therefore, the next trajectory reconstruction roadmap should cover online cleaning, compression, and the computation of semantic trajectories. The ideal system should also consider distributed computation because real-time positioning samples are in different Antenna sources.

3.2. TDW modeling and ETL/OLAP processing

Definition of a new flexible model of TDW (flexible = arbitrary space and time partitioning), support of non-numerical measures (e.g. trajectory representatives or frequent patterns), development of appropriate aggregate indexes to improve ETL/OLAP performance are important aspects. Current MODAP work (by Venice partner) includes TDW modelling and ETL/OLAP processing using predefined space and time partitioning. In particular:

o General TDW model: In order to adequately satisfy the user requirements, it is important that a TDW model could be customised according to different application scenarios. For example, if we consider cars moving along a network, the spatial domain would consist of a set of road segments as minimum granularity and districts-counties-countries as spatial hierarchy. Hence the TDW model should allow for an arbitrary space and time partitioning.

o Non-numerical measures: Interesting measures about trajectory data can be obtained through a knowledge discovery process. The aim is to model a representative trajectory

that describes the trend of movement within a spatio-temporal granule of the TDW. The main research challenge concerns the definition of aggregate functions that generate the representative trajectories for coarser granularities, used to answer roll-up queries. In

Page 38: RESEARCH ROADMAP REPORT - CORDIS · the quality of the cluster analysis on the trajectory data is preserved. In [MTRPB10] authors presented the general idea of an approach for the

page 38/60

particular, some approximations should be introduced in aggregate computation, in order to give a feasible solution to the problem posed by the holistic nature of the pattern aggregation functions.

3.3. Prototype systems for privacy-aware MOD/TDW management

Basic research results should be incorporated into prototype systems and tools in order for their efficiency and effectiveness to be assessed. Current MODAP work (by Piraeus, Thessaly and Venice partners) includes Hermes MOD engine, its Hermes++ privacy-aware variation, and T-Warehouse. In particular:

o Improving Hermes MOD engine: Hermes currently offers quite a wide range of MOD management functionality; nevertheless two parallel lines of research are already in progress. The first builds upon and extends the current framework with novel indexing and optimization techniques aiming at enhancing the scalability of the system. The second has as target to transform Hermes to an integrated MOD/KDD platform where well-known and/or new data mining algorithms will be incorporated into the system tightly integrated with the structures offered and as such handled as advanced query processing techniques.

o Improving Hermes++ privacy-aware MOD engine: Current work on privacy preservation in the context of WG4 aims at significantly improving the approach of [GDV08] to protect the MOD against more types of potential attacks through intelligent auditing strategies, introduce less distortion to the original database when answering (originally) insecure user queries, and provide increased privacy by better concealing the real user trajectories of the produced answer set among the introduced fakes.

o Improving T-Warehouse: In order to make T-Warehouse more general and flexible, we are currently working on several topics. First, we want to design a new TDW model which is able to deal with arbitrary space and time partitioning, overcoming the limitations of our previous prototypes [OOR+07,LGM+10], where space is organised as a set of regular grids. Second, we plan to manage and aggregate non-numerical measures such as representatives for sets of trajectories and frequent patterns in TDW granules. This will require a tight integration of data mining algorithms to compute results on a mix of aggregate and raw data. Finally, a different TDW model calls for a different visualisation and interaction paradigm. We will investigate appropriate spatial and temporal visualisation techniques supporting OLAP analysis on these complex measures.

3.4. Benchmark(s) for MODs/TDWs

Apart from the above specific directions, a parallel activity would include a nice wrap-up of the work done so far (GeoPKDD and early MODAP results) and the establishment of a well-formed benchmark for MODs/TDWs focusing on data storage and management issues; related work includes [DBG09]. Previous GeoPKDD work (by KDDLAB and Piraeus partners) includes preliminary standalone benchmarks for each of the proposed contributions and trajectory synthesizers for MODs. However, a more consolidated approach is needed that will validate the prototype results more thoroughly, it will show up new interrelationships between the various approaches, as such it will reveal new opportunities for research in the domain. At the same time, such an approach is expected to be highly accepted by the research community, as individual research outcomes will be able to be verified by following ultimately standardized methodologies and ground-truth-producing tools.

Page 39: RESEARCH ROADMAP REPORT - CORDIS · the quality of the cluster analysis on the trajectory data is preserved. In [MTRPB10] authors presented the general idea of an approach for the

page 39/60

4. Dissemination, training/educational activities

Dissemination would exploit on tools and systems built during GeoPKDD. In particular:

o Real mobility dataset repository – a collection of a great variety of real datasets is crucial for arguing about the usefulness and applicability of our research

o Synthetic mobility dataset generator s/w – collected real datasets cannot simulate a wide range of eventual motion behaviors. As such, data generators are important for experimental purposes.

o GeoPKDD s/w dissemination – Systems and tools developed so far (Hermes, Hermes++, T-Warehouse, etc.) could be considered for free distribution for research purposes, as desktop applications and/or web applications/services. The open-source software paradigm could be investigated as well.

Regarding training / education, our contribution in could range from short courses/tutorials (perhaps online) to textbooks. This type of activity could better be done in synchronisation with relevant activity of other WGs.

5. Conclusions

Research on mobility data storage and management is challenging. Our overall goal is that MODAP could turn out to be a step more productive and successful than (the generally accepted as very successful) GeoPKDD!

Acknowledgments

Many thanks to Aris Gkoulalas-Divanis (IBM Zurich), Nikos Pelekis (Piraeus), Alessandra Raffaetà (Venice), Vassilis Verykios (Thessaly), and Zhixian Yan (EPFL), for their contributions.

References

[ABN08] O. Abul, F. Bonchi, M. Nanni. Never Walk Alone: Uncertainty for Anonymity in Moving Objects Databases. In Proceedings of ICDE, 2008.

[AGB06] V.T. de Almeida, R.H. Güting, T. Behr. Querying Moving Objects in SECONDO. In Proceedings of MDM, 2006.

[DBG09] C. Düntgen, T. Behr, R. H. Güting. BerlinMOD: a benchmark for moving object databases. VLDB Journal, 18(6): 1335-1368, 2009.

[FGN+00] L. Forlizzi, R. H. Guting, E. Nardelli, M. Schneider. A Data Model and Data Structures for Moving Objects Databases. In Proceedings of ACM SIGMOD, 2000.

[GBX10] R.H. Guting, T. Behr and J. Xu. Efficient k-nearest neighbor search on moving object trajectories. VLDB Journal, 2010.

Page 40: RESEARCH ROADMAP REPORT - CORDIS · the quality of the cluster analysis on the trajectory data is preserved. In [MTRPB10] authors presented the general idea of an approach for the

page 40/60

[GDV08] A. Gkoulalas-Divanis, V. S. Verykios. A Privacy-Aware Trajectory Tracking Query Engine. ACM SIGKDD Explorations, 10(1): 40-49, 2008.

[GKM+09] L. Gómez, B. Kuijpers, B. Moelans, A. Vaisman. A Survey on Spatio-Temporal Data Warehousing. Int.l J. of Data Warehousing and Mining. 5(3):28—55. 2009.

[GS05] R.H. Güting and M. Schneider. Moving Objects Databases. Morgan Kaufmann Publishers, 2005.

[HSK98] J. Han, N. Stefanovic, K. Kopersky. Selective Materialization: An Efficient Method for Spatial Data Cube Construction. PAKDD 1998.

[LFG+03] J. A. C. Lema, L. Forlizzi, R. H. Guting, E. Nardelli, M. Schneider. Algorithms for Moving Objects Databases. The Computer Journal, 46(6): 680-712, 2003.

[LMF+10] L. Leonardi, G. Marketos, E. Frentzos, N. Giatrakos, S. Orlando, N. Pelekis, A. Raffaetà, A. Roncato, C. Silvestri, Y. Theodoridis. T-Warehouse: Visual OLAP Analysis on Trajectory Data. ICDE 2010.

[MB04] N. Meratnia and R. A. de By. Spatiotemporal Compression Techniques for Moving Point Objects. In Proceedings of EDBT, 2004.

[MFN+08] G. Marketos, E. Frentzos, I. Ntoutsi, N. Pelekis, A. Raffaetà, and Y. Theodoridis. Building real-world trajectory warehouses. In Proceedings of MobiDE, 2008.

[OOR+07] S. Orlando, R. Orsini, A. Raffaetà, A. Roncato, C. Silvestri. Trajectory Data Warehouses: Design and Implementation Issues. JCSE, 1(2): 240-261. 2007.

[PFG+08] N. Pelekis, E. Frentzos, N. Giatrakos and Y. Theodoridis. HERMES: Aggregative LBS via a Trajectory DB Engine. In Proceedings of ACM SIGMOD, 2008.

[PT06] N. Pelekis, Y. Theodoridis. Boosting Location-Based Services with a Moving Object Database Engine. In Proceedings of MobiDE, 2006.

[PTV+06] N. Pelekis, Y. Theodoridis, S. Vosinakis and T. Panayiotopoulos. Hermes - A Framework for Location-Based Data Management. In Proceedings of EDBT, 2006.

[RLM+10] A. Raffaetà, L. Leonardi, G. Marketos, G. Andrienko, N. Andrienko, E. Frentzos, N. Giatrakos, S. Orlando, N. Pelekis, A. Roncato, C. Silvestri. Visual Mobility Analysis using T-Warehouse. Accepted for publication in International Journal of Data Warehousing and Mining (IJDWM).

[SPD+08] S. Spaccapietra, C. Parent, M. L. Damiani, J. A. de Macedo, F. Porto, and C. Vangenot. A Conceptual View on Trajectories. Data and Knowledge Engineering, 65:126–146, 2008.

[TM08] M. Terrovitis, N. Mamoulis. Privacy Preservation in the Publication of Trajectories. In Proceedings of MDM, 2009.

[VZ09] A. Vaisman, E. Zimányi. What is Spatio-Temporal Data Warehousing? In Proceedings of DaWaK, 2009.

[YPSC10] Z. Yan, C. Parent, S. Spaccapietra, and D. Chakraborty. A Hybrid Model and Computing Platform for Spatio-Semantic Trajectories. In Proceedings of ESWC, 2010.

[YSC+10] Z. Yan, L. Spremic, D. Chakraborty, C. Parent, S. Spaccapietra, and K. Aberer. Automatic Construction and Multi-level Visualization of Semantic Trajectories. In Proceedings of GIS, 2010.

Page 41: RESEARCH ROADMAP REPORT - CORDIS · the quality of the cluster analysis on the trajectory data is preserved. In [MTRPB10] authors presented the general idea of an approach for the

page 41/60

MODAP WG5 – Mobility Patterns and Pattern Mining

Dimitrios Gunopulos (NKUA)

1. Introduction

Mobility data are produced in several applications. Examples include the mobility data of objects (such as people, vehicles, other objects, molecules etc) or more complex events (groups of people, natural phenomena, etc) in real or virtual spaces. The ever increasing availability of cheap and ubiquitous monitoring and tracking devices and the exponential growth of inexpensive data storage create huge volumes of mobility data.

Taking the specific example of the trajectories of moving objects (for example a car moving through the city), such a trajectory is typically modeled as a sequence of consecutive locations in a multi-dimensional (generally two or three dimensional) Euclidean space. Depending on the application it is possible to further constrain the trajectory, perhaps to lie on the street grid. Such data types arise in many applications where the location of a given object is measured repeatedly over time. Typical trajectory data are obtained during a tracking procedure with the aid of various sensors. Here also lies a main limitation of such data; they may contain a significant amount of outliers or incorrect data measurements (unlike for example, stock data which contain no errors whatsoever).

Trajectory data with the characteristics discussed above (multi-dimensional and noisy) also appear in many scientific data. In environmental, earth science and biological data analysis, scientists may be interested in identifying similar patterns (e.g. weather patterns), cluster related objects or subjects based on their trajectories and retrieve subjects with similar movements (e.g., in animal migration studies). In medical applications similar problems may occur, for example, when multiple attribute response curves in drug therapy are analyzed.

In many monitoring applications, detecting movements of objects or subjects that exhibit similarity in space and time can be useful. These movements may have been reconstructed from a set of sensors, including cameras and movement sensors and therefore are inherently noisy. Another set of applications arise from cell phone and mobile communication applications where mobile users are tracked over time and patterns and clusters of these users can be used for improving the quality of the network (i.e., by allocating appropriate bandwidth over time and space).

Mobility data in general share a set of characteristics: They are typically produced as streams (can be multidimensional, and also the origins can be spatially separated). The data distribution can be stationary, but typically is not, as for example is the case when a natural phenomenon is monitored. Consequently, the techniques that will be developed have to consider concept drift, and provide for dynamic models (or dynamically updated models) to capture the distribution of the data when parametric methods are used. In many applications mobility data contain a great

Page 42: RESEARCH ROADMAP REPORT - CORDIS · the quality of the cluster analysis on the trajectory data is preserved. In [MTRPB10] authors presented the general idea of an approach for the

page 42/60

deal of information about the users. Consequently, it is imperative to take the need of user privacy into account when using the data. Finally, mobility data are inherently distributed, as they are typically captured at several locations, they may refer to objects of spatial extend, and due to their volume they may be impractical to consolidate in one location.

In WG5 we focus mainly on techniques for applications of Mobility Data mining and Analysis. Specifically we consider techniques for clustering, classification, frequent pattern mining. It is important to address the specific characteristics of the data when designing specific algorithms. In addition it is worthwhile to explore general mechanisms that allow a user to interactively explore the data. This can be achieved using efficient and flexible query languages and advanced visualization techniques. As a result, work in this area is closely tied to work on new techniques for Mobile Object and Trajectory Databases (where the data is stored), and Mobile Wireless Sensor networks (where the data is typically collected).

When new techniques are developed, the following desiderata must be addressed:

i) improved scalability: This can be achieved by the investigation of several factors, including database integration approaches, distributed algorithms, incremental learning;

ii) privacy protection: To achieve this, methods for provably and measurably protecting the privacy of the data in the extracted patterns are needed;

iii) ease of use: Making systems easy and intuitive to use is a fundamental part for making them truly useful. Using well established paradigms from databases, mechanisms to express constraints and queries into a simplified data mining query language, in which the data mining tasks can be formulated, are needed.

2. State of the art Review

In this section we review the current state of the art in the field of Mobility Data Mining.

2.1. Trajectory Similarity/Distance Measures

Computing the similarity or the distance between two trajectories is a fundamental operation that can be used in the clustering, the classification, or the indexing tasks. As a result, there has been a large volume of research work, focusing on producing flexible and useful distance measures that can also be efficiently compute [KGT08][KVG08].

Trajectories are modeled as multi-dimensional time series. Most of the early work on trajectory data analysis has concentrated on the use of some metric Lp norm. For p=2 it is the well known Euclidean distance and for p=1 the Manhattan distance. The advantage of this simple metric is that it allows efficient indexing with a dimensionality reduction technique [AFS93],[FRM94]. On the other hand, the model cannot deal well with outliers and is very sensitive to small distortions in the time axis [VKG02]. There are a number of interesting extensions to the above model to support various transformations such as scaling [RM00], shifting and normalization [GK95] and moving average. Other techniques to define time series similarity are based on extracting certain features (landmarks [PWZP00] or signatures [FJMM97]) from each time-series and then using

Page 43: RESEARCH ROADMAP REPORT - CORDIS · the quality of the cluster analysis on the trajectory data is preserved. In [MTRPB10] authors presented the general idea of an approach for the

page 43/60

these features to define the similarity. Another approach is to represent a time series using the direction of the sequence at regular time intervals [QWWS98].

Although the majority of database/data mining research on time series data mining has focused on Euclidean distance, virtually all real world systems that use time series matching as a subroutine, utilize a similarity measure which allows warping. In retrospect, this is not very surprising, since most real world processes, particularly biological processes, can evolve at varying rates. For example, in bioinformatics, it is well understood that functionally related genes will express themselves in similar ways, but possibly at different rates. Therefore, the Dynamic Time Warping (DTW) distance has been used for many datasets of this type. The method to compute DTW between two sequences is based on Dynamic Programming [BC94] [L66] and is more expensive than computing Lp norms. Approaches to mitigate the large computational cost of the DTW distance for trajectories have appeared in [VHGK03][VGD04],[ZS03] where lower bounding functions are used in order to speed up the execution of DTW. Furthermore, an approach to combine the benefits of warping distances and Lp norms has been proposed in [CN04].

The flexibility provided by DTW is very important, however its efficiency deteriorates for noisy data, since by matching all the points, it also matches the outliers distorting the true distance between the sequences. An alternative approach is the use of Longest Common Subsequence (LCSS), which is a variation of the edit distance [L66]. The basic idea is to match two sequences by allowing them to stretch, without rearranging the order of the elements but allowing some elements to be unmatched. Using the LCSS of two sequences, one can define the distance using the length of this subsequence [BDGM97][DGM97].

2.2. Trajectory Clustering

Clustering is one of the general approaches to explore and analyze large amounts of data, since it allows the analyst to consider groups of objects rather than individual objects, which are too many. Clustering associates objects in groups (clusters) such that the objects in each group share some properties that do not hold (or hold much less) for the other objects. Spatial clustering builds clusters from objects being spatially close and/or having similar spatial properties (shapes, spatial relationships among components, etc.). Clustering of trajectories implies considering space, time and movement characteristics within a similarity notion: simple distance-based clustering methods are not effective in separating trajectory clusters that exhibit a non convex (non globular) shape, as it often occurs in practice [RPN+08], [AAR+09], [RDN+09], [NP06]. In [AAW07] a route similarity distance function and used it for clustering of trajectories was proposed, followed by an incremental procedure of progressive clustering with the use of multiple similarity functions [RPN+08]. This approach was extended for clustering very large trajectory databases by means of classification [AAR+09]. A method for computing visual summaries of clusters of similar trajectories was proposed in [AA10] and was extended for anonimyzing trajectories [MAA+10].

2.3. Trajectory Classification and Prediction

Predictive models for trajectory data include a classification method for inferring the category of a trajectory, (e.g., the transportation means associated to a trajectory: private car, public transportation, pedestrian, etc.), and predictors of the next location of a moving object given its past trajectory [I06],[CJV03],[MPTG09],[M06],[M07].

There is strong current interest in inferring the location of a mobile object based on its previous locations, in that it enables several intelligent location-based services. Examples include Location

Page 44: RESEARCH ROADMAP REPORT - CORDIS · the quality of the cluster analysis on the trajectory data is preserved. In [MTRPB10] authors presented the general idea of an approach for the

page 44/60

Based Advertisements or Remote Farming. In location-based commerce advertisement, if customers are willing to receive advertisements, retail stores will distribute e-Flyers to potential customers’ mobile devices based on their locations. In this setting, finding common moving patterns of mobile devices is valuable for inferring potential movement of mobile device users, and thus helps to efficiently distribute the advertisement. Using a remote sensing system, the animals in a large farming area can be tracked. The sensors are limited in power and may fail from time to time. By mining the imprecise trajectories of animals, it is possible to determine migration patterns of certain animal or groups of animals. These patterns could be useful to analyze the migration behavior of different species of animals.

In the literature, this task is achieved by applying various learning methods to the history of each moving object for the purpose of creating an individual location predictor. Other approaches exploit the discovery of some moving patterns that are common to a large set of mobile objects, then these moving patterns may be useful for predicting the locations of an object in the future.

2.4. Trajectory Pattern Mining

Knowledge discovery over trajectory databases [GP08] discovers behavioral patterns of moving objects that can be exploited in several fields. Example domains include traffic engineering, climatology, social anthropology and zoology, studying vehicle position data, hurricane track data, human and animal movement data, respectively. In the literature, there have been proposed several works that try to classify trajectory data [LGL+08], to sample them in an unsupervised way aiming at preserving the hidden mobility patterns of the whole dataset in the sampled subset [PKP+10], to analyze them either in an exploratory way [AAW07], [AAR+09] and to mine movement-aware patterns, such as clusters of moving objects [GS99], [CGS00], [NP06], [LHW07], [PKK+09], [LHY04], [KMB05], sequential trajectory patterns [GNP+7], flock patterns [GKS07], convoy patterns [JYZ+08], online discovery of flock patterns [VBT09], and sub-trajectory outliers [LHL08]. A novel notion of spatio-temporal pattern was introduced in [GNPP07], which formalizes the idea of aggregated movement behaviors. A trajectory pattern, as defined in [GNPP07], represents a set of individual trajectories that share the property of visiting the same sequence of places with similar travel times. Therefore, two notions are central: (i) the regions of interest in the given space, and (ii) the typical travel time of moving objects from region to region. In this approach a trajectory pattern is a sequence of spatial regions that, on the basis of the source trajectory data, emerge as frequently visited in the order specified by the sequence; in addition, the transition between two consecutive regions in such a sequence is annotated with a typical travel time that, again, emerges from the input trajectories.

2.5. Privacy-preserving MOD publishing

Current advances on location-detection devices (GPS, RFIDs) are enabling the collection of user mobility data that can be used to identify individual users and therefore jeopardize the privacy expectations of users. The threat is through the disclosure of information about individuals to untrusted third parties. Even if individual data are not released, there exists the potential that through the combination of more general queries and data analysis an adversary may deduce data about individual users. Recent work in the field focuses on approaches to protect the privacy of users yet allow meaningful data mining and pattern discovery from the data [KPSS10] [KPSS08] [RPT10] [MAA+10].

Page 45: RESEARCH ROADMAP REPORT - CORDIS · the quality of the cluster analysis on the trajectory data is preserved. In [MTRPB10] authors presented the general idea of an approach for the

page 45/60

3. Open research issues

During the next few years, advances in basic research and prototype system development in the field of (privacy-preserving) mobility data mining could address the following directions.

3.1. Pattern Mining

Based on the previous work, and the current state of the art research roadmap on mobility patterns and pattern mining includes the following perspectives:

(i) Due to the complex nature of mobility data, a weakness of the proposals so far is to provide measures and guarantees of the statistical significance of the discovered patterns. Further theoretical work is required towards the foundation of the various mobility patterns.

(ii) Algorithms for the discovery of mobility patterns often do not pay too much attention on scalability issues, compromised with the provision of rather a fuzzy sketch of how one could use a general purpose indexing structure in order to enhance the efficiency of the algorithm. So, a more comprehensive approach is needed and tight integration with Mobile Object Databases engines will be advantageous. Current work is considering how to efficiently evaluate complex queries in large motion databases [VBT10]. It is useful to investigate how efficient query engines can be utilized in a algorithm that identifies frequent patterns of motion. An interesting direction is also the investigation of synergies between spatio-temporal query languages and MOD engines enhanced with advanced querying-mining operations, both under a unified framework.

(iii) A closely related topic is the invention of new algorithms and techniques exploring inter-pattern relationships and transformations between the various proposals [JYZ+08]. This will allow far more powerful explorative and progressive analysis.

(iv) The computational paradigm followed so far bases on offline methods (that assume the apriori knowledge of the data). An interesting direction is to adapt the proposed techniques, also taking into account the above issues, so as to operate in an online fashion that will allow a new class of dynamic applications. Distributed and cloud computing techniques seem to be promising methodologies to succeed the previous goals.

3.2. Mining Mobility Data using Mobile Devices

Following up on the general Pattern Mining problem, we focus on the important case of mobility pattern mining for and from mobile devices.

In recent years, there is growing demand for applications that execute on powerful, programmable mobile devices (such as smartphones, navigation systems, etc). The expanded capabilities of the mobile devices have strengthen the need for and the development of programming constructs that simplify the programmability and deployment of the applications on the mobile devices. On the other hand, such devices have strict resource restrictions on energy, bandwidth, and memory usage. Also, the nature of the applications, and the nature of the networked system that connects such devices create additional requirements for real time guarantees in computation and communication. [CMG10] [CZG10] provide online optimization techniques to efficiently utilize resources in a mobile sensor network environment. Furute research directions incude:

Page 46: RESEARCH ROADMAP REPORT - CORDIS · the quality of the cluster analysis on the trajectory data is preserved. In [MTRPB10] authors presented the general idea of an approach for the

page 46/60

(i) Efficient, online algorithms, both centralized and distributed, for collecting data from mobile devices, and either analyze them locally or sending them to a central server without disturbing the regular processing. Instrumenting operating systems Is a current approach towards this need. Note that efficient local analysis techniques provide additional benefits because it is easier to integrate privacy considerations.

(ii) Investigating the demands that real-time constraints (and also the energy, computation and bandwidth constraints) place on the algorithms. This includes research on scalable streaming algorthms, and on optimizing in-network computation. We envisage the use of learning techniques to enhance resource management so that constraints can be satisfied.

A fundamental problem is creating the framework and platforms that will harness the ever-increasing capabilities of powerful connected mobile devices. In [DKG+10] we address the problem of developing applications in distributed and mobile settings. Current practices include the development of platform-specific languages and run-time systems such as nesC and TinyOS for sensor systems or open programming platforms such as Android. In all such developments, the main goal is to provide a simple way to facilitate the efficient usage of the mobile infrastructure, however the techniques are either low-level and therefore difficult to use, limited or ad-hoc. We have proposed Misco, a MapReduce framework targeted at cell phones, mobile devices and personal commodity machines. Our goal is to provide a powerful programming abstract to allow software development without the need to deal with the underlying problems of distributed computing. We have implemented our system on a testbed of Nokia's third generation NSeries phones.

(i) An important future direction of research is the development of general, easy to use platforms that can be used to build distributed applications that can take advantage of the cloud infrastructure and most importantly exploit the power and connectivity of current and future mobile devices. Such mechanisms have the potential to enable applications that can revolutionize the use of the network for mobile users.

3.3. Incorporating Mobile Data Mining concepts into NetScience

The analysis of movement attracted recently a lot attention in the network science community of researchers; the first proxy of human mobility used in this area was the data from a popular banknote tracking web site [BHG06]; later, large datasets of mobile phone call records were analyzed, to the purpose of discovering and validating the macro-level laws of human mobility, such as the law governing the distribution of travelled distances [GHB08][SKPB09][SQBB10]; applications of these findings concern the spreading patterns of phone viruses [WGHB09] and the analysis of the entropy and predictability of human mobility.

Combining the two paradigms (Mobile Data Mining and Netscience for mobility) will yield the formal framework needed to built i) new generation of movement simulators; ii) new analysis tools which combine general laws with local patterns.

3.4. Understanding Mobility phenomena

Mobility data represent a glimpse on very complex and sophisticated phenomena. For example we may have the sequence of locations that a person has been at, however this sequence is the expression of an abstract process of the persons’ motives and needs. In general, understanding higher-level events from plain data is a major research goal. The Mobile Data Mining machinery is a powerful tool than can be put at work in many different situations in order to better

Page 47: RESEARCH ROADMAP REPORT - CORDIS · the quality of the cluster analysis on the trajectory data is preserved. In [MTRPB10] authors presented the general idea of an approach for the

page 47/60

understand complex mobility phenomena, and also to present them more intuitively to the user by finding presentation metaphors that simplify and master the complexity. There are two different issues we envisage future work on:

(i) to provide software platforms aimed at mastering the entire knowledge discovery process. A robust and comprehensive framework is represented by M-Atlas, a library-based approach is represented by Movemine [LJL+10][T10][T09][TGN+10].

(ii) to use the mobility mining tools within real application scenarios, in order to build new services [MPTG09][ZZXM09][ZZXY09][LDH+10][GNPP09].

MDM is a multi-level analysis process: mobility is a collective phenomenon that emerges as sum of a large number of individual phenomena -- the mobility of each individual itself being the result of a complex combination of personal needs, habits and constraints --occurring in several different (and dynamic) contexts and environments. We expect that a precondition for reaching a full understanding of the global mobility needs to pass through a satisfactory joint analysis of individual behaviours and features of local contexts.

3.5. Creating benchmarks

Currently, the various proposed approaches are evaluated in an ad hoc way, independently of each other and without targeting to a wide range of applications. Towards this direction, a more concentrated approach is required that will aim, on the one hand, at designing benchmark methodologies, while on the other hand, it will develop techniques and platforms to support the validation of the proposals.

A specific important topic is the creation of a well-formed benchmark for Trajectory and Mobility Data Mining to help evaluate future techniques in an objective and open way. Previous GeoPKDD work (by KDDLAB and Piraeus partners) includes preliminary standalone benchmarks for each of the proposed contributions and trajectory synthesizers for MODs. However, a more consolidated approach is needed that will validate the prototype results more thoroughly, it will show up new interrelationships between the various approaches, as such it will reveal new opportunities for research in the domain. The availability of such a benchmart is also expected to be highly accepted by the research community, as individual research outcomes will be able to be verified by following ultimately standardized methodologies and ground-truth-producing tools.

Acknowledgments

Many thanks to Gennady Andrienko, Fosca Giannotti, Katharina Morik, Nikos Pelekis, Chiara Renso, and Yannis Theodoridis for their contributions.

REFERENCES

Page 48: RESEARCH ROADMAP REPORT - CORDIS · the quality of the cluster analysis on the trajectory data is preserved. In [MTRPB10] authors presented the general idea of an approach for the

page 48/60

[AA10] Andrienko N., Andrienko G.: Spatial Generalization and Aggregation of Massive Movement Data. IEEE Transactions on Visualization and Computer Graphics (TVCG) (2010)

[AAR+09] Andrienko G., Andrienko N., Rinzivillo S., Nanni M., Pedreschi D., Giannotti F.: Interactive Visual Clustering of Large Collections of Trajectories. In: Proc. of Visual Analytics Science and Technology (VAST), pp. 3-10 (2009)

[AAR+09] Andrienko G., Andrienko N., Rinzivillo S., Nanni M., and Pedreschi D.: A visual analytics toolkit for cluster-based classification of mobility data. In: Proc. of SSTD, pp. 432-435 (2009)

[AAW07] Andrienko G., Andrienko N., and Wrobel S.: Visual Analytics Tools for Analysis of Movement Data. ACM SIGKDD Explorations, 9(2) (2007)

[AFS93] Agrawal R. Faloutsos C., Swami A.: Efficient similarity search in sequence databases. In: Proc. of FODO, pp. 69–84 (1993)

[AGKS06] Peggy Agouris, Dimitrios Gunopulos, Vana Kalogeraki, Anthony Stefanidis: Knowledge Aquisition and Data Storage in Mobile GeoSensor Networks. GSN 2006: 86-108

[BC94] Berndt D., Clifford J.: Using dynamic time warping to find patterns in time series. In: AAAI Workshop on Knowledge Discovery in Databases, pp. 229–248 (1994)

[BDGM97] Bollobás B., Das G., Gunopulos D., Mannila H.: Time-series similarity problems and well-separated geometric sets. In: Proc. of SCG (1997)

[BHG06] Brockmann D., Hufnagel L., and Geisel T.: The scaling laws of human travel. Nature, 439:462-465 (2006)

[BMR+09] Baglioni M., Macedo J., Renso C., Trasarti R., and Wachowicz M.: Towards semantic intepretation of movement data. In: Proc. of AGILE (2009)

[CGS00] Cadez V., Gaffney S., and Smyth P.: A general probabilistic framework for clustering individuals and objects. In: Proc. of SIGKDD (2000)

[CJB03] Cheng C., Jain R., and Berg E.: Location prediction algorithms for mobile wireless systems. Wireless internet handbook: technologies, standards, and application, B. Furht and M. Ilyas editors, pp. 245–263. CRC Press (2003)

[CHMG09] Georgios Chatzimilioudis, Huseyin Hakkoymaz, Nikos Mamoulis, Dimitrios Gunopulos: Operator Placement for Snapshot Multi-predicate Queries in Wireless Sensor Networks. Mobile Data Management 2009: 21-30

[CMG10] Chatzimilioudis G., Mamoulis N., Gunopulos D.: A Distributed Technique for Dynamic Operator Placement in Wireless Sensor Networks. In: Proc of Mobile Data Management, pp. 167-176 (2010)

[CNg04] Chen L., Ng R.T.: On the marriage of lp-norms and edit distance. In: Proc. of VLDB, pp. 792–803 (2004)

Page 49: RESEARCH ROADMAP REPORT - CORDIS · the quality of the cluster analysis on the trajectory data is preserved. In [MTRPB10] authors presented the general idea of an approach for the

page 49/60

[CZG10] Chatzimilioudis G., Zeinalipour-Yazti D., Gunopulos D.: Minimum-hot-spot query trees for wireless sensor networks. In: Proc of MobiDE, pp. 33-40 (2010)

[DGM97] Das G., Gunopulos D., Mannila H.: Finding Similar Time Series. In: Proc. of PKDD, pp. 88–100 (1997)

[DKG+10] Dou A.-J., Kalogeraki V., Gunopulos D., Mielikäinen T., Tuulos V.H.: Misco: a MapReduce framework for mobile systems. In: Proc. of PETRA (2010)

[FJMM97] Faloutsos C., Jagadish H.V., Mendelzon A., Milo T.: Signature technique for similarity-based queries. In: Proc. of SEQUENCES (1997)

[FRM94] Faloutsos C., Ranganathan M., Manolopoulos I.: Fast subsequence matching in time series databases. In: Proc. of SIGMOD, May (1994)

[GHB08] Gonzalez M., Hidalgo C.A., and Barabasi A.-L.: Understanding individual human mobility patterns. Nature, 453:779–782 (2008)

[GK95] Goldin D., Kanellakis P.: On similarity queries for time-series data. In: Proc. of CP (1995)

[GKS07] Gudmundsson J., Kreveld M. J., and Speckmann B.: Efficient detection of patterns in 2d trajectories of moving points. GeoInformatica, 11(2):195-215 (2007)

[GNPP07] Giannotti F., Nanni M., Pedreschi D., Pinelli F.: Trajectory Pattern Mining. In: Proc. of SIGKDD (2007)

[GNPP07] Giannotti F., Nanni M., Pinelli F., Pedreschi D.: Trajectory pattern mining. In: Proc. of KDD, pp. 330-339 (2007)

[GNPP09] Giannotti F., Nanni M., Pedreschi D., Pinelli F.: Trajectory pattern analysis for urban traffic. GIS-IWCTS, pp. 43-47 (2009)

[GP08] Giannotti F., Pedreschi D.: Mobility, Data Mining and Privacy, Geographic Knowledge Discovery. Springer-Verlag (2008)

[GS99] Gaffney S., and Smyth P.: Trajectory Clustering with Mixtures of Regression Models. In: Proc. of SIGKDD (1999)

[I06] Ioannidis Y. et al. (Eds.): TrajPattern: Mining Sequential Patterns. In: Proc. of EDBT, LNCS 3896:664–681, Springer-Verlag Berlin Heidelberg (2006)

[JYZ+08] Jeung H., Yiu M. L., Zhou X., Jensen C. S., Shen H. T.: Discovery of Convoys in Trajectory Databases. In: Proc. of VLDB (2008)

[KMB05] Kalnis P., Mamoulis N., Bakiras S.: On discovering moving clusters in spatio-temporal data. In: Proc. of SSTD (2005)

[KPSS10] Emre Kaplan, Thomas Brochmann Pedersen, Erkay Savas, Yücel Saygin: Discovering private trajectories using background information. Data Knowl. Eng. 69(7): 723-736 (2010)

Page 50: RESEARCH ROADMAP REPORT - CORDIS · the quality of the cluster analysis on the trajectory data is preserved. In [MTRPB10] authors presented the general idea of an approach for the

page 50/60

[KPSS08] Emre Kaplan, Thomas Brochmann Pedersen, Erkay Savas, Yücel Saygin: Privacy Risks in Trajectory Data Publishing: Reconstructing Private Trajectories from Continuous Properties. KES (2) 2008: 642-649

[KTG08] George Kollios, Vassilis J. Tsotras, Dimitrios Gunopulos: Mobile Object Indexing. Encyclopedia of GIS 2008: 662-670

[KVG08] George Kollios, Michail Vlachos, Dimitrios Gunopulos: Trajectories, Discovering Similar. Encyclopedia of GIS 2008: 1168-1173

[L66] Levenshtein V.: Binary codes capable of correcting deletions, insertions, and reversals. In: Soviet Physics – Doklady 10(8):707–710 (1966)

[LDH+10] Li Z., Ding B., Han J., Kays R., Nye P.: Mining periodic behaviors for moving objects. In: Proc. of SIGKDD, pp 1099–1108 (2010)

[LGL+08] Lee J.-G., Han J., Li X. and Gonzalez H.: TraClass: Trajectory Classification Using Hierarchical Region-Based and Trajectory-Based Clustering. PVLDB, Auckland, New Zealand, (2008)

[LGL+10] Li Z., Ji Μ., Lee J.-G., Tang L.A., Yu Y., Han J., and Kays R.: Movemine: mining moving object databases. In: Proc. of SIGMOD, pp. 1203–1206 (2010)

[LHL08] Lee J.-G., Han J. and Li X.: Trajectory Outlier Detection: A Partition-and-Detect Framework. In: Proc. of ICDE (2008)

[LHW07] Lee J.-G., Han J., and Whang K.-Y.: Trajectory clustering: a partition-and-group framework. In: Proc. of SIGMOD (2007)

[LHY04] Li Y., Han J., and Yang J.: Clustering moving objects. In: Proc. of KDD (2004)

[M06] Morzy M.: Prediction of moving object location based on frequent trajectories. In: Proc. of ISCIS, pp. 583-592 (2006)

[Μ07] Morzy M.: Mining frequent trajectories of moving objects for location prediction. In: Proc. of MLDM, pp. 667-680 (2007)

[MAA+10] Monreale A., Andrienko G., Andrienko N., Giannotti F., Pedreschi D., Rinzivillo S., Wrobel S.: Movement Data Anonymity through Generalization Transactions on Data Privacy, 3(3):91-121 (2010)

[MPTG09] Monreale A., Pinelli F., Trasarti R., and Giannotti F.: WhereNext: a Location Predictor on Trajectory Pattern Mining. In: Proc. of SIGKDD (2009)

[NP06] Nanni M., Pedreschi D.: Time-focused clustering of trajectories of moving objects. Journal of Intelligent Information Systems, 27(3):267-289(2006)

[PKK+09] Pelekis N., Kopanakis I., Kotsifakos E., Frentzos E. and Theodoridis Y.: Clustering Trajectories of Moving Objects in an Uncertain World. In: Proc. of ICDM (2009)

[PKP+10] Pelekis N., Kopanakis I., Panagiotakis C. and Theodoridis Y.: Unsupervised Trajectory Sampling. In: Proc. of ECML PKDD (2010)

Page 51: RESEARCH ROADMAP REPORT - CORDIS · the quality of the cluster analysis on the trajectory data is preserved. In [MTRPB10] authors presented the general idea of an approach for the

page 51/60

[RPT10] Salvatore Ruggieri, Dino Pedreschi, Franco Turini: Data mining for discrimination discovery. TKDD 4(2): (2010)

[PWZP00] Perng S., Wang H., Zhang S., Parker D.S.: Landmarks: a new model for similarity-based pattern querying in time series databases. In: Proc. of ICDE, pp. 33–42 (2000)

[QWW98] Qu Y., Wang C., Wang X.S.: Supporting fast search in time series for movement patterns in multiple scales. In: Proc. of ACM CIKM, pp. 251–258 (1998)

[RM00] Rafiei D., Mendelzon A.: Querying time series data based on similarity. IEEE Transactions on Knowledge and Data Engineering, 12(5):675–693 (2000)

[RPN+08] Rinzivillo S., Pedreschi D., Nanni M., Giannotti F., Andrienko N., Andrienko G.: Visually–driven analysis of movement data by progressive clustering. Information Visualization, 7(3/4):225-239 (2008)

[RPN+09] Rinzivillo S., Pedreschi D., Nanni M., Giannotti F., Andrienko N., and Andrienko G.: Interactive Visual Clustering of Large Collections of Trajectories. In: Proc. of IEEE VAST (2009)

[SKWB10] Song C., Koren T., Wang P., and Barabasi A.-L.: Modelling the scaling properties of human mobility. Nature Physics, 6(10):818-823 (2010)

[SQBB10] Song C., Qu Z., Blumm N., and Barabasi A.-L.: Limits of predictability in human mobility. Science, 327:1018–1021 (2010)

[T10] Trasarti R.: Mastering the Spatio-Temporal Knowledge Discovery Process. PhD in Computer science, University of Pisa (2010)

[TBR09] Trasarti R., Baglioni M., and Renso C.: Damsel: A system for progressive querying and reasoning on movement data. In: DEXA Workshops, pp. 452–456 (2009)

[TGN+10] Trasarti R., Giannotti F., Nanni M., Pedreschi D., and Renso C.: A query language for mobility data mining. International Journal of Data Warehousing and Mining (IJDWM) (2010, to appear)

[TRP+10] Trasarti R., Rinzivillo S., Pinelli F., Nanni M., Monreale A., Renso C., Pedreschi D., Giannotti F.: Exploring Real Mobility Data with M-Atlas. ECML/PKDD (3): 624-627 (2010)

[VBT10] Marcos R. Vieira, Petko Bakalov, Vassilis J. Tsotras: Querying trajectories using flexible patterns. EDBT 2010: 406-417

[VBT09] Marcos R. Vieira, Petko Bakalov, Vassilis J. Tsotras: On-line discovery of flock patterns in spatio-temporal data. GIS 2009: 286-295

[VGD04] Michail Vlachos, Dimitrios Gunopulos, Gautam Das: Rotation invariant distance measures for trajectories. KDD 2004: 707-712

[VKG02] Vlachos M., Kollios G., Gunopulos D.: Discovering similar multidimensional trajectories. In: Proc. of ICDE, pp. 673–684 (2002)

Page 52: RESEARCH ROADMAP REPORT - CORDIS · the quality of the cluster analysis on the trajectory data is preserved. In [MTRPB10] authors presented the general idea of an approach for the

page 52/60

[VHGK03] Michail Vlachos, Marios Hadjieleftheriou, Dimitrios Gunopulos, Eamonn J. Keogh: Indexing multi-dimensional time-series with support for multiple distance measures. KDD 2003: 216-225

[WGHB09] Wang P., Gonzalez M., Hidalgo C.A., and Barabasi A.-L.: Understanding the spreading patterns of mobile phone viruses. Science, 324:1071–1076 (2009)

[ZLG06] Demetrios Zeinalipour-Yazti, Song Lin, Dimitrios Gunopulos: Distributed spatio-temporal similarity search. CIKM 2006: 14-23

[ZS03] Zhu Y., Shasha D.: Query by humming: a time series database approach. In: Proc. of SIGMOD (2003)

[ZZXM09] Zheng Y., Zhang L., Xie X., Ma W.Y.: Mining interesting locations and travel sequences from gps trajectories for mobile users. In: Proc. of the 18th International Conference on World Wide Web (WWW), pp. 791–800 (2009)

[ZZXY09] Zheng V.W., Zheng Y., Xie X., Yang Q.: Collaborative location and activity recommendations with gps history data. In: Proc. of WWW, pp 1029–1038 (2009)

Page 53: RESEARCH ROADMAP REPORT - CORDIS · the quality of the cluster analysis on the trajectory data is preserved. In [MTRPB10] authors presented the general idea of an approach for the

page 53/60

MODAP WG6 – Visual Analytics

Gennady Andrienko (Fraunhofer)

1. Introduction

The essence of Visual Analytics is enabling synergistic work of humans and machines in analyzing large and complex data and solving complex, ill-defined problems. In other words, Visual Analytics is about creating such working conditions in which humans and computers can utilize their inherent capabilities in the best possible ways while complementing and amplifying the capabilities of the other side. Interactive visual interfaces serve as a means of communication between humans and computers.

Humans have many unique capabilities that are valuable for analysis and problem solving. Thus, humans are flexible and inventive, can deal with new situations, can act reasonably in cases of incomplete and/or inconsistent information, and can solve problems that are hard to formalize. Two inherent qualities of humans are especially relevant in the context of MODAP:

� the capability to flexibly employ previous knowledge and experience, not only those related to special education and to professional activities but also those related to the everyday life and common sense intelligence;

� the capability to establish various associations among pieces of information.

Since these qualities are precious for analysis, Visual Analytics aims at enabling humans to utilize them in the most effective ways. However, the utilization of these capabilities in data analysis has the potential of increasing the threats to the privacy of individuals whose characteristics or activities are reflected in the data. This applies, in particular, to data about mobility. For example, Andrienko et al. (2007) demonstrate the ease of identifying person’s home and work places and other frequently visited places by interpreting spatial and temporal patterns of the person’s moves and stops from the positions of the human common sense. The interpretation emerged from considering movement data in spatial and temporal contexts. The spatial context was provided by visualizing the data in a cartographic map display. The temporal context (in particular, days of the week and hours of the day) was provided by temporal histogram displays.

Researchers on privacy protection in data analysis are typically concerned with the possible threats to privacy arising from computational data processing and from integration of two or more datasets. Little is done on studying the privacy issues arising from the involvement of human analysts empowered with interactive visual tools. Regarding mobility data, it appears necessary to investigate what associations can be established and what inferences can be made by a human, in particular, by considering the data in context.

Page 54: RESEARCH ROADMAP REPORT - CORDIS · the quality of the cluster analysis on the trajectory data is preserved. In [MTRPB10] authors presented the general idea of an approach for the

page 54/60

Visual Analytics can contribute to the privacy protection research in two ways. First, visual analytics researchers can identify what kinds of information can be extracted from various forms of mobility data by means of visually supported analysis and consider potential implications to personal privacy. These findings can be communicated to privacy protection researchers for developing methods to remove or decrease the detected privacy threats. Second, to allow humans to deal with large datasets, visual analytics researchers often employ techniques for data generalization and abstraction. Some of the techniques that are devised for the purposes of visualization can be adapted for protecting personal privacy. A good example is the recent work by Monreale et al. (2010). Both work directions imply close inter-disciplinary collaboration. The MODAP project aims at promoting such collaborations.

The remainder of this report considers the state of the art in Visual Analytics in relation to mobility analysis and outlines the major research directions relevant to privacy protection.

2. State of the art

2.1 Supporting visual analysis of movement in context

Movement always takes place in a certain context. The context includes

� geographical (or, more generally, physical) space and inherent properties of different locations and parts of the space (e.g. street vs. park);

� physical time and inherent properties of different time moments and intervals (e.g. day vs. night);

� various objects existing or occurring in the space and time: static spatial objects (objects having particular constant positions in space), events (objects having particular positions in time), and other moving objects (objects having spatial positions that change over time).

Comprehensive analysis of movement requires consideration of the movement context and investigation of various relationships occurring between moving objects and the context.

Cartographic map can convey to a certain extent the heterogeneity of geographical space and various relationships occurring within it (Andrienko et al. 2008). Hence, visual representation of movement on a map enables a human analyst to detect some of the relationships between the movement and the spatial context by his/her eyes. Since maps are quite weak in representing time, they are often used in combination with other displays representing the temporal aspect, such as time graph (Andrienko et al. 2003). The most common approach to dealing with space and time together is map animation; however, its effectiveness is quite limited (Tversky et al. 2002). Another approach is space-time cube, where the horizontal plane represents space and the vertical dimension represents time. The idea was introduced by T. Hägerstrand in 1960s (Hägerstrand 1970) but software implementations appeared relatively recently (Kraak 2003, Andrienko et al. 2003, Kapler and Wright 2005). Both map and space-time cube are quite limited with respect to the number of trajectories that can be effectively explored, the length of the time period, and the capacity to represent various aspects of the movement context: when much information is included in a display, it becomes illegible due to the visual clutter.

As larger and larger collections of movement data become available, visual analytics researchers combine interactive visualization with computational techniques for data processing and analysis. Data aggregation and clustering are the groups of computational techniques being predominantly used in visual analytics. These groups are reviewed in more detail in the next subsection. A

Page 55: RESEARCH ROADMAP REPORT - CORDIS · the quality of the cluster analysis on the trajectory data is preserved. In [MTRPB10] authors presented the general idea of an approach for the

page 55/60

common feature in applying computational techniques is that they operate on movement data only, i.e. do not involve any data about the context. Investigation of relationships between the movement and the context is supported by putting the computation results (i.e. clusters or aggregates) on a cartographic background and/or on temporal displays, which enables the analyst to detect certain relationships visually. Examples described in the literature demonstrate that such visualizations enable sophisticated interpretations (Dykes and Mountain 2003, Mountain 2005, Andrienko et al. 2007, Willems et al. 2009, Andrienko and Andrienko 2010).

There are relatively few works on the use of computational techniques to support establishing links between movement and context. Methods are developed for extracting particular types of relationships between moving objects and certain aspects of the context, e.g. other moving objects, selected locations, or weather conditions. To enable exploration and interpretation of the results by a human analyst, the authors of the methods suggest suitable visualizations. Thus, Crnovrsanin et al. (2009) visualize the dynamics of the distances of moving objects to selected locations on a time graph. Lundblad et al. (2009) attach data about weather conditions to positions of ships and visualize the data on interactive linked displays. Yu (2006) computationally detects occurrences of three types of spatio-temporal relationships among moving objects, co-location in space, co-location in time, and co-location in both space and time. The places and times of the occurrences are visualized in a spatio-temporal GIS. Orellana et al. (2009) detect occurrences of proximity between moving objects and visualize their spatial and temporal positions and characteristics.

2.2 Aggregation and abstraction of mobility data in visual analytics

A survey of the aggregation methods used in visual analytics for dealing with large amounts of movement data is done by Andrienko and Andrienko (2010). To study the distribution of movement characteristics over space, movement data are aggregated into continuous density surfaces (e.g. Dykes and Mountain 2003, Willems et al. 2009) or discrete grids (e.g. Forer and Huisman 2000, Andrienko and Andrienko 2010). Mountain (2005) further processes density surfaces generated from movement data to extract their topological features: peaks, pits, ridges, saddles, and so on. Brillinger et al. (2004) aggregate movement data into a vector field using a regular grid: in each grid cell, a vector is built with the angle corresponding to the prevailing movement direction and length and width proportional to the average speed and the amount of movement, respectively.

To study links between places, movement data are aggregated into origin-destination matrices (Guo 2007) and flow maps (Tobler 1987, 2005). The algorithm for automated design of flow maps suggested by Phan et al. (2005) minimizes crossings between symbols. Drecki and Forer (2000) suggest a three-dimensional representation of flows among locations in several consecutive time intervals.

One of the methods originally invented for generalizing a large number of trajectories and building flow maps (Andrienko and Andrienko 2011) gave an idea of a possible approach to privacy protection and has been lately adapted for anonymization of trajectories (Monreale et al. 2010).

In all aggregations discussed so far the results are numeric values such as counts, sums, statistical means, etc. Buliung & Kanaroglou (2004) derive a kind of geometric summary of several trajectories. The authors use functions of ArcGIS to build a convex hull containing the trajectories, compute the central tendency and dispersion of the paths, and represent the results on a map as the averaged path. Such geometric summarization can work well only when the

Page 56: RESEARCH ROADMAP REPORT - CORDIS · the quality of the cluster analysis on the trajectory data is preserved. In [MTRPB10] authors presented the general idea of an approach for the

page 56/60

trajectories are similar in shape and close in space. It can be applied, for example, to groups of similar trajectories resulting from clustering.

Guo et al. (2010) suggests a graph-based approach to generalization of movement data that divides the space into a hierarchy of regions such that locations inside a region share more trajectories with each other than with locations in other regions. The hierarchy of regions can help generalize trajectories for better comparison and clustering.

Clustering is the data abstraction method actively used in visual analytics. Particularly, clustering has been applied to trajectories of moving objects (Andrienko et al. 2007, Rinzivillo et al. 2008, Andrienko and Andrienko 2009, Guo et al. 2010), to spatio-temporal positions of stops extracted from movement data (Andrienko et al 2007), and to aggregates derived from movement data (Andrienko et al. 2010). Andrienko and Andrienko (2010, 2011) suggest approaches to a summarized representation of clusters based on spatial generalization and aggregation.

2.3 Analysis of community-contributed data

A recent trend in the visual analytics research is analysis of community-contributed data from Web 2.0, for example, personal blogs. Some of the Web 2.0 data have geographical and temporal references, in particular, photos posted on the photo sharing web sites Flickr and Panoramio and Twitter messages. These data are related to mobility and activities of people. A number of research papers describe extraction of various kinds of information from the community-contributed spatio-temporal data (Girardin et al. 2008, Crandall et al. 2009, Andrienko et al. 2009, Kisilevich et al. 2010, Jankowski et al. 2010, Andrienko et al. 2010). These include estimations of people densities over space, interesting places, interesting events, temporal patterns of people’s visits to places, spatio-temporal patterns of event occurrences, patterns of people’s movement between places, and meetings between people, in particular, re-occurring meetings of same people. These kinds of information may be valuable for some applications, such as tourism, transportation management, city planning, public event management, etc. Information extracted from community-contributed spatio-temporal data can also be used as context information in analysis of other data. However, there are also threats to personal privacy. It is quite probable that people who put their space- and time-referenced data in Web 2.0 are not well aware about the kinds of personal information that can be extracted by analyzing their data in context.

2.4 Privacy

Privacy issues have not been addressed in the visual analytics research so far. However, responses from the leading scientists to a questionnaire distributed by the WP6 participants showed their potential interest in research related to data privacy. There is a need in raising the awareness of the visual analytics research community about privacy issues and in providing some guidelines concerning the ways in which visual analytics researchers can contribute to protecting personal privacy.

3 Research directions

Visual and interactive techniques can pose specific challenges to privacy by enabling a human analyst to link data to context, pre-existing knowledge, and additional information obtained from various sources. Unlike in computational analysis, relevant knowledge and information do not have to be represented in a structured form in order to be used effectively by a human. Furthermore, humans can note such kinds of patterns and relationships that are hard to formalize and detect by computational techniques.

Page 57: RESEARCH ROADMAP REPORT - CORDIS · the quality of the cluster analysis on the trajectory data is preserved. In [MTRPB10] authors presented the general idea of an approach for the

page 57/60

The privacy issues related to the use of visual analytics methods are currently studied neither in the area of visual analytics nor in the area of data mining and computational analysis. There is a need to fill this gap, which requires concerted inter-disciplinary efforts. The following research directions are suggested for the inter-disciplinary research community.

Taxonomy of movement context

Sensitive personal information may be uncovered by linking movement data to the context. Human analysts are very flexible in using various kinds of context information available in various forms, e.g. as structured data, as background knowledge, or as texts or images retrieved from the Web. The research question is: What kinds of general and specific knowledge about movement context can enable unwanted discoveries of personal information from movement data? Creation of a taxonomy describing various elements of movement context, their relevant properties, and possible relationships to people’s activities and movement may form a basis for a systematic investigation of the potential threats to personal privacy arising from linking movement data to context. The taxonomy should include typologies of spatial locations, paths, spatial objects, time moments/intervals, events, etc. with regard to people’s activities and movement. For instance, the typology of spatial locations should contain such notions as home place, work place, shopping place, recreation place, business area, and so on. The typology of paths should include notions of high speed road, main street, minor street, footpath, crossing, etc. Besides, the taxonomy of context should include the possible types of relationships that may occur between moving objects and elements of the context (e.g. spatial proximity, temporal proximity).

Taxonomy of activities

Movement of people is connected to people’s activities. There are examples demonstrating that general knowledge of the possible types of activities and their typical places, and/or typical times, and/or typical durations may allow a human analyst to extract personal information from movement data (e.g. Andrienko et al. 2007). An analyst may also use specific knowledge about the kinds of activities that are usually conducted in particular places. A taxonomy of activities and their possible relationships to elements of movement context (places, times, objects, events) would allow researchers to go beyond particular examples and derive general understanding of what can be inferred from movement data by involving general or specific knowledge about people’s activities in combination with context information.

Taxonomy of derivable knowledge/information

This taxonomy should describe the types of knowledge/information that can be extracted from movement data linked to context and activity information. The potentially sensitive types of information should be identified.

A step in this direction is the taxonomy of movement patterns suggested by Dodge et al. (2008). This taxonomy, however, is limited to defining possible relationships between movements of two or more objects. With respect to a particular moving object, other moving objects are a part of the movement context. However, other parts of the movement context and activities of moving objects are not considered.

The theoretical work outlined in subsections 3.1-3.3 is useful not only for the research on preserving personal privacy. It may also provide foundations for developing new analysis methods, both in visual analytics and in data mining. In particular, visual analytics researchers

Page 58: RESEARCH ROADMAP REPORT - CORDIS · the quality of the cluster analysis on the trajectory data is preserved. In [MTRPB10] authors presented the general idea of an approach for the

page 58/60

may use the taxonomies in the design of the visual interfaces and interactive tools that can effectively support establishing links between movement data and other kinds of information and inferring new information.

Generalization and abstraction

Methods for data generalization and abstraction are used in visual analytics for the visualization of large amounts of data. A side effect of using these methods is that detailed personal information may be hidden, which is a positive feature from the perspective of preserving personal privacy. Hence, data generalization methods can potentially be adapted to the needs of preserving privacy. This refers, in particular, to methods devised for movement data. We suggest that one of the future research directions should be examination of existing and emerging methods for generalization and abstraction of movement data from the perspective of their possible adaptation for privacy protection. Of course, generalization alone does not necessarily guarantee data anonymity and safety. Hence, like the other research directions, this direction requires cooperation between specialists in visual analytics, data mining, and privacy protection.

References

G. Andrienko and N. Andrienko, “Interactive Cluster Analysis of Diverse Types of Spatiotemporal Data”, ACM SIGKDD Explorations, 2009, v.11 (2), pp. 19-28

G. Andrienko and N. Andrienko, “A General Framework for Using Aggregation in Visual Exploration of Movement Data”, The Cartographic Journal, 2010, v.47 (1), pp. 22-40

G. Andrienko, N. Andrienko, P. Bak, S. Kisilevich, D. Keim, “Analysis of community-contributed space- and time-referenced data (example of flickr and panoramio photos)”, in Proceedings IEEE Visual Analytics Science and Technology (VAST 2009), IEEE Computer Society Press, 2009, pp.213-214

G. Andrienko, N. Andrienko, S. Bremm, T. Schreck, T. von Landesberger, P. Bak, and D. Keim, “Space-in-Time and Time-in-Space Self-Organizing Maps for Exploring Spatiotemporal Patterns”, Computer Graphics Forum, Vol.29(3), 2010, pp. 913-922.

G. Andrienko, N. Andrienko, J. Dykes, S. Fabrikant, and M. Wachowicz , “Geovisualization of dynamics, movement and change: key issues and developing approaches in visualization research”, Information Visualization, vol. 7, no. 3/4, pp. 173-180, 2008.

G. Andrienko, N. Andrienko, M. Mladenov, M. Mock, C. Poelitz, “Discovering Bits of Place Histories from People's Activity Traces”, in Proceedings IEEE Visual Analytics Science and Technology (VAST 2010), accepted

G. Andrienko, N. Andrienko, and S. Wrobel, “Visual Analytics Tools for Analysis of Movement Data”, ACM SIGKDD Explorations, 2007, v.9 (2), pp.38-46

N. Andrienko and G. Andrienko, “Spatial Generalization and Aggregation of Massive Movement Data”, IEEE Transactions on Visualization and Computer Graphics, 2011, accepted; published version: http://doi.ieeecomputersociety.org/10.1109/TVCG.2010.44

N. Andrienko, G. Andrienko, and P. Gatalsky, “Exploratory Spatio-Temporal Visualization: an Analytical Review”, J. Visual Languages and Computing, vol. 14, no. 6, pp. 503-541, Dec. 2003

D.R. Brillinger, H.K. Preisler, A.A. Ager, and J.G. Kie, “An exploratory data analysis (EDA) of the paths of moving animals”, Journal of statistical planning and inference, 122(2), 2004, pp. 43-63

Page 59: RESEARCH ROADMAP REPORT - CORDIS · the quality of the cluster analysis on the trajectory data is preserved. In [MTRPB10] authors presented the general idea of an approach for the

page 59/60

D.J. Crandall, L. Backstrom, D. Huttenlocher, and J. Kleinberg, “Mapping the world’s photos”, In Proceedings of the 18th international conference on World Wide Web, ACM, 2009, pp. 761–770.

T. Crnovrsanin, C. Muelder, C. Correa, and K.-L. Ma, “Proximity-based Visualization of Movement Trace Data”, In Proc. IEEE Symposium on Visual Analytics Science and Technology (VAST), October 12 - 13, 2009, Atlantic City, New Jersey, USA, pp. 11-18

S. Dodge, R. Weibel, and A.-K. Lautenschütz, “Towards a taxonomy of movement patterns”, Information Visualization, 7(3-4), Autumn/Winter 2008, pp. 240-252.

I. Drecki and P. Forer, “Tourism in New Zealand - International Visitors on the Move” (A1 Cartographic Plate); Tourism, Recreation Research and Education Center (TRREC): Lincoln University, Lincoln, New Zealand, 2000.

J. A. Dykes and D. M. Mountain, “Seeking structure in records of spatio-temporal behaviour: visualization issues, efforts and applications”, Computational Statistics and Data Analysis, vol. 43, pp. 581-603, 2003.

P. Forer and O. Huisman, “Space, Time and Sequencing: Substitution at the Physical/Virtual Interface”, In D.G. Janelle and D.C. Hodge (editors), “Information, Place and Cyberspace: Issues in Accessibility”, Springer-Verlag, Berlin, 2000, pp. 73-90

F.Girardin, F.Fiore, C.Ratti, and J.Blat, “Leveraging explicitly disclosed location information to understand tourist dynamics: a case study”. Journal Location Based Services, 2008, 2(1), 41-56

D. Guo, “Visual Analytics of Spatial Interaction Patterns for Pandemic Decision Support”, International Journal of Geographical Information Science, 21(8), 2007, pp. 859-877

D. Guo, S. Liu, and H. Jin, “A Graph-based Approach to Vehicle Trajectory Analysis”, Journal of Location Based Services, 2010, in press

T. Hägerstrand, “What about people in regional science?”, Papers of the Regional Science Association, vol. 24, pp. 7-21, 1970.

P.Jankowski, N.Andrienko, G.Andrienko, and S.Kisilevich, “Discovering landmark preferences and movement patterns from photo postings”, Transactions in GIS, 2010, accepted

T. Kapler and W. Wright, “GeoTime information visualization”, Information Visualization, vol. 4, no. 2, pp. 136-146, 2005

S. Kisilevich, M. Krstajic, D. Keim, N. Andrienko, and G. Andrienko, “Event-based analysis of people's activities and behavior using Flickr and Panoramio geotagged photo collections”, in Proceedings 14th International Conference on Information Visualization IV 2010 (27-29 July, 2010, London, UK), IEEE Computer Society, Los Alamitos, California, 2010, pp.289-296.

M.-J. Kraak, “The space-time cube revisited from a geovisualization perspective”, Proc. 21st Int. Cartographic Conf., pp. 1988-1995, 2003.

P.Lundblad, O.Eurenius, and T.Heldring, “Interactive Visualization of Weather and Ship Data”, in Proc. 13th Int. Conf. Information Visualization IV2009, IEEE Computer Society, 2009, pp. 379-386.

A. Monreale, G. Andrienko, N. Andrienko, F. Giannotti, D. Pedreschi, S. Rinzivillo, and S. Wrobel, “Movement Data Anonymity through Generalization”, Transactions on Data Privacy, 2010, v.3 (3), pp. 91-121

Page 60: RESEARCH ROADMAP REPORT - CORDIS · the quality of the cluster analysis on the trajectory data is preserved. In [MTRPB10] authors presented the general idea of an approach for the

page 60/60

D.M. Mountain, “Visualizing, querying and summarizing individual spatio-temporal behaviour”, in J.A. Dykes, M.J. Kraak, and A.M. MacEachren (editors), “Exploring Geovisualization”, Elsevier, London, 2005, pp. 181-200

D. Orellana, M. Wachowicz, N. Andrienko, and G. Andrienko, "Uncovering Interaction Patterns in Mobile Outdoor Gaming", Proc. Int. Conf. Advanced Geographic Information Systems & Web Services, pp. 177-182, 2009.

D. Phan, L. Xiao, R. Yeh, P. Hanrahan, and T. Winograd, “Flow Map Layout”. In Proc. IEEE Symposium on Information Visualization InfoVis 05, Minneapolis, Minnesota, USA, 23-25 October, 2005, 219-224

S. Rinzivillo, D. Pedreschi, M. Nanni, F. Giannotti, N. Andrienko, and G. Andrienko, “Visually–driven analysis of movement data by progressive clustering”, Information Visualization, 7(3/4), 2008, pp. 225-239.

W. Tobler, “Experiments in migration mapping by computer”, The American Cartographer, 14 (2), 1987, pp. 155-163

W. Tobler, “Display and Analysis of Migration Tables”, http://www.geog.ucsb.edu/~tobler/presentations/shows/A_Flow_talk.htm, 2005

B. Tversky, J. B. Morrison, and M. Bétrancourt, “Animation: can it facilitate?”, Int. J. Human-Computer Studies, vol. 57, no. 4, pp. 247-262, 2002

N. Willems, H. van de Wetering, and J.J. van Wijk, “Visualization of vessel movements”, Computer Graphics Forum (Proc. EuroVis 2009), 28(3), 2009, pp. 959-966

H. Yu, “Spatial-temporal GIS design for exploring interactions of human activities”, Cartography and Geographic Information Science, vol. 33, no. 1, pp. 3-19, 2006.