A Solution for Navigating User-Generated...

2
A Solution for Navigating User-Generated Content Severi Uusitalo, Peter Eskolin, Petros Belimpasakis Mixed Reality Solutions, Nokia Research Center, Finland ABSTRACT We have implemented a prototype solution for discovering shared digital media via a novel user interface, presenting the spatial relationships of the content. 1 INTRODUCTION We are interested how to contextualize the digital content of an individual user with the help of content from other users, how to create a positive user experience from that, and what kind of interaction that can encourage between users. As a result the environment can be modelled. Creating a digital representation of the world from photographs is an exciting opportunity, for e.g. as a registering system for Augmented Reality (AR), forming one cornerstone for what Höllerer et al.[3] call Anywhere Augmentation (AA). In this paper we present the user experience of the prototype and its key features. 2 RELATED WORK A large amount of documented work exists on structuring photos. Davis et al. [2] emphasize the use of automatically created metadata for organizing, searching, browsing digital photos, and for creating new experiences. Geotagging has become a common way to structure photos and present them on top of a map view in services like Flickr or Google Earth. In Aspen moviemap described by e.g. Mohr Error! Reference source not found., a series of spatially structured imagery provided an immersive interactive navigation experience. Google Streetview follows moviemap principle. Snavely et al. [6] developed a structure and motion computing pipeline for structuring and rendering photo collections. This way, 3D models are automatically constructed from photographs when a sufficient number of photos about a subject exist. This system has been made available for public, as Microsoft PhotoSynth. Our system aims at creating a navigable mirror world from even a sparse network of photos. Torniai et al. [7] use recording of heading information from a separate device for enabling spatial browsing. Browser interface uses metadata to provide arrows for moving towards photos taken in the respective direction from current open viewpoint. The system does not include full camera attitude, as pitch and roll are not detected. People have different behaviors with regard to consumer photography. Chalfen [1] and subsequent studies describe in detail the behavior of telling narratives with the aid of one’s photos. The prime audience and co-actors of the narratives in this Kodak Culture are friends and family. We have aimed to support this and some other documented photography behaviors with our solution. 3 SYSTEM USER EXPERIENCE The system uses photos from the community of users, starting from an empty 3D space linked to real world coordinates. Over the usage of the system the space starts to fill with photos from users. Together with a map and satellite image, the neighboring photos and videos provide the user cognitive cues for understanding the semantic structure of a place. In the following User Experience description we follow the Get-Enjoy-Manage- Share (GEMS) lifecycle model of digital media [5]. 3.1 Get The user experience of creating content to Image Space starts when the user takes photos with the regular camera application of the device. The only additional information to the camera application is an indication about the current attitude of the device and sensor calibration status (Figure 1). Photo is taken by pressing the shutter key as usual. User can upload the photo to the on-line sharing service, utilizing the build-in sharing software that ships with the device. Figure 1. Camera pose presented on bottom of viewfinder. 3.2 Enjoy As a result of their activity in taking photos, the users can enjoy a developing representation of their part of the world: a Mirror World with detail level defined by them. The representation is currently accessible via a standard web browser. The user is initially presented with the latest photo uploaded to his account. The map underneath the picture shows the location as well as the heading of the photo. If the user moves the cursor to the upper part of the view, other photos in the view from the current viewpoint are shown as rectangles, in the same attitude as the camera-phone was when taking the photo (Figure 2). Figure 2. Web UI of Image Space The photos (rectangles) which are viewed in angles between 90-270 degrees i.e. from backside are not shown in the 3D view. {severi.uusitalo, peter.eskolin, petros.belimpasakis}@nokia.com 219 IEEE International Symposium on Mixed and Augmented Reality 2009 Science and Technology Proceedings 19 -22 October, Orlando, Florida, USA 978-1-4244-5419-8/09/$25.00 ©2009 IEEE

Transcript of A Solution for Navigating User-Generated...

Page 1: A Solution for Navigating User-Generated Contentusers.cis.fiu.edu/~yangz/to_read/2009_ISMAR/05336451.pdf · services like Flickr or Google Earth. In Aspen moviemap described by e.g.

A Solution for Navigating User-Generated Content

Severi Uusitalo, Peter Eskolin, Petros Belimpasakis

Mixed Reality Solutions, Nokia Research Center, Finland

ABSTRACT We have implemented a prototype solution for discovering

shared digital media via a novel user interface, presenting the spatial relationships of the content.

1 INTRODUCTION We are interested how to contextualize the digital content of an individual user with the help of content from other users, how to create a positive user experience from that, and what kind of interaction that can encourage between users. As a result the environment can be modelled. Creating a digital representation of the world from photographs is an exciting opportunity, for e.g. as a registering system for Augmented Reality (AR), forming one cornerstone for what Höllerer et al.[3] call Anywhere Augmentation (AA). In this paper we present the user experience of the prototype and its key features.

2 RELATED WORK A large amount of documented work exists on structuring

photos. Davis et al. [2] emphasize the use of automatically created metadata for organizing, searching, browsing digital photos, and for creating new experiences. Geotagging has become a common way to structure photos and present them on top of a map view in services like Flickr or Google Earth.

In Aspen moviemap described by e.g. Mohr Error! Reference source not found., a series of spatially structured imagery provided an immersive interactive navigation experience. Google Streetview follows moviemap principle. Snavely et al. [6] developed a structure and motion computing pipeline for structuring and rendering photo collections. This way, 3D models are automatically constructed from photographs when a sufficient number of photos about a subject exist. This system has been made available for public, as Microsoft PhotoSynth. Our system aims at creating a navigable mirror world from even a sparse network of photos. Torniai et al. [7] use recording of heading information from a separate device for enabling spatial browsing. Browser interface uses metadata to provide arrows for moving towards photos taken in the respective direction from current open viewpoint. The system does not include full camera attitude, as pitch and roll are not detected.

People have different behaviors with regard to consumer photography. Chalfen [1] and subsequent studies describe in detail the behavior of telling narratives with the aid of one’s photos. The prime audience and co-actors of the narratives in this Kodak Culture are friends and family. We have aimed to support this and some other documented photography behaviors with our solution.

3 SYSTEM USER EXPERIENCE The system uses photos from the community of users, starting

from an empty 3D space linked to real world coordinates. Over the usage of the system the space starts to fill with photos from users. Together with a map and satellite image, the neighboring photos and videos provide the user cognitive cues for understanding the semantic structure of a place. In the following User Experience description we follow the Get-Enjoy-Manage-Share (GEMS) lifecycle model of digital media [5].

3.1 Get The user experience of creating content to Image Space starts

when the user takes photos with the regular camera application of the device. The only additional information to the camera application is an indication about the current attitude of the device and sensor calibration status (Figure 1). Photo is taken by pressing the shutter key as usual. User can upload the photo to the on-line sharing service, utilizing the build-in sharing software that ships with the device.

Figure 1. Camera pose presented on bottom of viewfinder.

3.2 Enjoy As a result of their activity in taking photos, the users can

enjoy a developing representation of their part of the world: a Mirror World with detail level defined by them. The representation is currently accessible via a standard web browser. The user is initially presented with the latest photo uploaded to his account. The map underneath the picture shows the location as well as the heading of the photo. If the user moves the cursor to the upper part of the view, other photos in the view from the current viewpoint are shown as rectangles, in the same attitude as the camera-phone was when taking the photo (Figure 2).

Figure 2. Web UI of Image Space

The photos (rectangles) which are viewed in angles between 90-270 degrees i.e. from backside are not shown in the 3D view.

{severi.uusitalo, peter.eskolin, petros.belimpasakis}@nokia.com

219

IEEE International Symposium on Mixed and Augmented Reality 2009Science and Technology Proceedings19 -22 October, Orlando, Florida, USA978-1-4244-5419-8/09/$25.00 ©2009 IEEE

Page 2: A Solution for Navigating User-Generated Contentusers.cis.fiu.edu/~yangz/to_read/2009_ISMAR/05336451.pdf · services like Flickr or Google Earth. In Aspen moviemap described by e.g.

Selecting a rectangle positions the virtual camera view to the respective photo, by first displaying a flying animation towards that image, in order to give the illusion of moving in the space. By navigating through photos of a place, awareness of it can be obtained. User can divert from one’s own photos and find new places or views to places, from various times of day and season. Audio and video clips provide additional cognitive and memory cues. User can link other content to photos for annotation purposes, and comment and tag them.

Another mode of viewing photos is provided if the viewed area has enough photos for 3D reconstruction by structure and motion computation (Figure 3). A point cloud model is presented and user can rotate it in 3D space. In this mode the photos are projected on a plane within the model. The attitude and place of the projection plane is defined by the distribution of extracted points from the photo in question.

Figure 3. Browsing the 3D reconstruction

For emphasizing the social nature of the solution the author’s nickname and avatar image are shown on the lower left corner of photo. Additionally, the friends of the user are shown on a panel to the right from the browsing view and the map. The individual friends can be selected and their photos and Scenes (see 3.3) can be viewed. There is a tab on the panel also for showing the photos taken nearby by any user. User can filter the photos on display on temporal axis, according to the order of creation. This reveals tracks the users have taken when creating them.

3.3 Manage and Share The photos find their places and attitudes in the 3D world automatically if they have sufficient metadata. Description, comments, or tags authored in the external content repository service can be viewed as well. User can also manually adjust the attitude and location of the photos. Privacy management is done in the content repository service.

A Scene is a subset of photos of one’s own or those of others, created to present a narrative (Figure 4). It is created by simply adding photos to a timeline. When the Play button is selected, viewer is flown through the Scene. Respective photos on the map get highlighted as the story progresses as well. It is possible to share a Scene, by emailing a link, from within the service.

Figure 4. Screenshot showing a Scene at the lower part of

the browsing view

4 SYSTEM ARCHITECTURE Logically, the system consists of three main entities, namely the

mobile client, the backend infrastructure and the Web UI client, as shown in Figure 5.

Figure 5. High level architecture of our system

The mobile client includes a daemon which automatically records the needed sensor parameters to represent the captured content in the 3D-space. Those include the exact GPS location, the yaw, roll and pitch angles, at the moment of capturing. The backend infrastructure is responsible for accepting the uploading content, storing it and making it available to the “Web UI Client”. Any on-line sharing service can be used as a content repository, as long as it provides open web interfaces for creating mash-ups with it. The “Web UI Client”, which runs on the user’s end machine, is responsible for visualizing the mirror world, and providing the Image Space experience via a Flash enabled browser.

5 CONCLUSION We have presented a mirror world solution based on user

generated content, building on existing documented photography behaviors. By exploiting a sensor set of camera sensor, GPS, accelerometer and magnetometer in a single device the system automatically structures and creates a spatial presentation of the content. Also, social aspects of content sharing and storytelling have been considered.

REFERENCES [1] R. Chalfen. Snapshot Versions of Life. Bowling Green, Ohio:

Bowling Green State University Popular Press, 1987. [2] M. Davis, S. King, N. Good, and R. Sarvas. From context to content:

leveraging context to infer media metadata. In Proceedings of the 12th Annual ACM international Conference on Multimedia (New York, NY, USA, October 10 - 16, 2004). MULTIMEDIA '04. ACM, New York, NY, 188-195.

[3] T. Höllerer, J. Wither, and S. DiVerdi. Anywhere Augmentation: Towards Mobile Augmented Reality in Unprepared Environments. G. Gartner, M. P. Peterson, and W. Cartwright (Eds.), Location Based Services and TeleCartography, Series: Lecture Notes in Geoinformation and Cartography, Springer Verlag.

[4] R. Mohl. Cognitive Space in the Interactive Movie Map: An Investigation of Spatial Learning in Virtual Environments, PhD dissertation, Education and Media Technology, M.I.T., 1981

[5] I. Salminen, J. Lehikoinen, and P. Huuskonen. Developing an extensible mobile metadata ontology. In Proceedings of the IASTED Conference of Software Engineering and Applications (SEA), ACTA Press, (2005), 266-272.

[6] N. Snavely, S.M. Seitz, and R. Szeliski. Photo tourism: exploring photo collections in 3D. ACM Trans. Graph. 25, 3 (Jul. 2006), 835-846.

[7] C. Torniai, S. Battle, S. Cayzer. The Big Picture: Exploring Cities through Georeferenced Images and RDF Shared Metadata, CHI 2007 Workshop “Imaging the City”, San Jose,19 April, 2007

220