Bridging the semantic gap with computational media...

3
1070-986X/03/$17.00 © 2003 IEEE Published by the IEEE Computer Society 15 Guest Editors’ Introduction C ontent processing and analysis research in multimedia systems has one central objective: develop technologies that help sift and eas- ily access useful nuggets of information from media data streams. A fundamental need exists to analyze, cull, and categorize information automatically and systematically from media data and to manage and exploit it effectively despite rapidly accumulating digital media collections. However, user expectations of such systems are far from being met, despite continued research for nearly a decade. Currently, only simple, generic, low-level content metadata is made available from analysis. This metadata isn’t always useful because it deals primarily with rep- resenting the perceived content rather than the semantics of it. In the last few years, we’ve seen much atten- tion given to the semantic gap problem in automatic content annotation systems. The semantic gap is the gulf between the rich mean- ing and interpretation that users expect sys- tems to associate with their queries for search- ing and browsing media and the shallow, low- level features (content descriptions) that the systems actually compute. For more informa- tion on this dilemma, see Smeulders et al., 1 who discuss the problem at length and lament that while “the user seeks semantic similarity, the database can only provide similarity on data processing.” A new approach for interpreting media To address this fundamental issue of the data–meaning gulf and to build innovative, high-level, semantics-based content description tools for reliable media location, access, and nav- igation services, we proposed an approach called computational media aesthetics. 2 We define this approach as the algorithmic study of a variety of image, space, and aural elements employed in media. This study is based on the media ele- ments’ usage patterns in production and the associated computational analysis of the princi- ples that have emerged while clarifying, intensi- fying, and interpreting some event for the audience. We advocate here that if we’re going to create tools for automatically understanding video, it’s usually best to interpret the data with its maker’s eye. Numerous stakeholders are engaged in this endeavor. They represent the whole multimedia value chain and are involved in content design, authoring, production, archiving, management, distribution, and delivery. Each of these fields brings with it different facets of the semantic issue, thus emphasizing the need and impor- tance of media semantics research in the broad- est sense. The general knowledge-guided semantic analysis in media is exciting to many researchers who are frustrated with the contin- ued focus on low-level features that can’t answer high-level queries from real users. They’re applying this principled approach to interpreting diverse video domains such as movies, instructional media, surveillance, and so on, with well-grounded research. For this special issue of IEEE MultiMedia, our goal is to show some of the different aspects of this grow- ing research. We attempt to broadly paint a pic- ture of emerging themes and show the influence of computational media aesthetics. In what follows, we briefly describe the contribu- tions of each of the four articles appearing in the current issue. Bridging the Semantic Gap with Computational Media Aesthetics Chitra Dorai IBM T.J. Watson Research Center Svetha Venkatesh Curtin University, Australia

Transcript of Bridging the semantic gap with computational media...

Page 1: Bridging the semantic gap with computational media ...ivizlab.sfu.ca/arya/Papers/IEEE/Multimedia/2003/April/Computational... · thetics as a viable framework addressing some of the

1070-986X/03/$17.00 © 2003 IEEE Published by the IEEE Computer Society 15

Guest Editors’ Introduction

Content processing and analysisresearch in multimedia systemshas one central objective: developtechnologies that help sift and eas-

ily access useful nuggets of information frommedia data streams. A fundamental need existsto analyze, cull, and categorize informationautomatically and systematically from mediadata and to manage and exploit it effectivelydespite rapidly accumulating digital media collections.

However, user expectations of such systemsare far from being met, despite continuedresearch for nearly a decade. Currently, onlysimple, generic, low-level content metadata ismade available from analysis. This metadata isn’talways useful because it deals primarily with rep-resenting the perceived content rather than thesemantics of it.

In the last few years, we’ve seen much atten-tion given to the semantic gap problem inautomatic content annotation systems. Thesemantic gap is the gulf between the rich mean-ing and interpretation that users expect sys-

tems to associate with their queries for search-ing and browsing media and the shallow, low-level features (content descriptions) that thesystems actually compute. For more informa-tion on this dilemma, see Smeulders et al.,1

who discuss the problem at length and lamentthat while “the user seeks semantic similarity,the database can only provide similarity ondata processing.”

A new approach for interpreting mediaTo address this fundamental issue of the

data–meaning gulf and to build innovative,high-level, semantics-based content descriptiontools for reliable media location, access, and nav-igation services, we proposed an approach calledcomputational media aesthetics.2 We define thisapproach as the algorithmic study of a variety ofimage, space, and aural elements employed inmedia. This study is based on the media ele-ments’ usage patterns in production and theassociated computational analysis of the princi-ples that have emerged while clarifying, intensi-fying, and interpreting some event for theaudience. We advocate here that if we’re goingto create tools for automatically understandingvideo, it’s usually best to interpret the data withits maker’s eye.

Numerous stakeholders are engaged in thisendeavor. They represent the whole multimediavalue chain and are involved in content design,authoring, production, archiving, management,distribution, and delivery. Each of these fieldsbrings with it different facets of the semanticissue, thus emphasizing the need and impor-tance of media semantics research in the broad-est sense.

The general knowledge-guided semanticanalysis in media is exciting to manyresearchers who are frustrated with the contin-ued focus on low-level features that can’tanswer high-level queries from real users.They’re applying this principled approach tointerpreting diverse video domains such asmovies, instructional media, surveillance, andso on, with well-grounded research. For thisspecial issue of IEEE MultiMedia, our goal is toshow some of the different aspects of this grow-ing research. We attempt to broadly paint a pic-ture of emerging themes and show theinfluence of computational media aesthetics. Inwhat follows, we briefly describe the contribu-tions of each of the four articles appearing inthe current issue.

Bridging theSemantic GapwithComputationalMedia Aesthetics

Chitra DoraiIBM T.J. Watson Research Center

Svetha VenkateshCurtin University, Australia

Page 2: Bridging the semantic gap with computational media ...ivizlab.sfu.ca/arya/Papers/IEEE/Multimedia/2003/April/Computational... · thetics as a viable framework addressing some of the

In this issueIn “Where Does Computational Media

Aesthetics Fit?” Adams provides a comprehensivesurvey of existing approaches to multimedia con-tent management and examines them accordingto the tenets of computational media aesthetics.He highlights two types of indices generated as aresult of general content processing—structuralelements and content entities—and groups pop-ular techniques accordingly. He raises importantquestions evaluating the effectiveness of differ-ent approaches, data sets to benchmark, andsemantic inference validation mechanisms.Finally, he positions computational media aes-thetics as a viable framework addressing some ofthe questions he raises.

The second article, “Pivot Vector SpaceApproach for Audio–Video Mixing,” illustratescomputational media aesthetics in practice. HereKankanhalli et al. automate audio–video mixingof home videos. Their approach includes exploit-ing aesthetic principles used in mixing musicand moving images to guide the decision-making process and to adeptly match audio andvideo clips. They correlate the video shots withaudio clips using a set of high-level perceptualaudiovisual descriptors extracted and matchedon the basis of aesthetic heuristics with pivotspace mapping.

In “Sounding Objects,” Rochesso et al. takeon the issue of sound design for interactive mul-timedia systems, describing the need for design-ing sounds that richly convey information aboutthe environment while simultaneously provid-ing aesthetically interesting interface elements.This article explains how a perception-guidedsound design can help decipher ecologically rel-

evant auditory phenomena and expressivelydeliver faithful environmental information. Thearticle argues for the use of cartoon-like physicalmodels of sound—simplified sound descriptionswith specific features exaggerated—thus realiz-ing computational efficiency and sharpness inthe sounds created.

In “Editing out Video Editing,” Davis makesthe case for a new computational model formedia production that can enable mass pro-duction of video for consumers. At its core,media production is a computational processwhich, based on input media and parameters,can produce new content-exploiting capabili-ties. This model transforms media creationfrom an expensive, craft-based production intoa standardized process with reusable parts thatusers can combine for mass customization.The article describes the research issuesinvolved in such a transformation and pro-vides examples of connectable and reusablemedia structures.

Future challengesTogether, these articles begin to address the

fundamental issues spanning the data–meaninggulf by offering a systematic understanding andapplication of media production methods.However, the efforts toward building computa-tional frameworks to bridge the semantic gap areonly beginning. We still need to examine pro-duction principles for

❚ manipulation of affect and meaning;

❚ the representation, extraction, and synthesisof expressive elements in movies and video;and

❚ metrics to assess automatic extraction tech-niques and representational power of expres-sive elements.

Solutions to these issues will spur the develop-ment of novel production practices that will blurthe distinction between content annotation andproduction. Computationally understandingexpressive elements will in turn allow new andexciting modes of capture and artistic manipula-tion of media.

We hope that readers will find this special issuean enjoyable mix and a spotlight on new themesemerging in the field. We’re grateful to all thereviewers for carefully poring over the submis-

16

IEEE

Mul

tiM

edia

Computationally

understanding expressive

elements will in turn allow

new and exciting modes of

capture and artistic

manipulation of media.

Page 3: Bridging the semantic gap with computational media ...ivizlab.sfu.ca/arya/Papers/IEEE/Multimedia/2003/April/Computational... · thetics as a viable framework addressing some of the

sions. We also want to thank the IEEE MultiMediastaff for helping us produce this issue. MM

References:1. A. Smeulders et al., “Content-Based Image Retrieval

at the End of the Early Years,” IEEE Trans. Pattern

Analysis and Machine Intelligence, vol. 22, no. 12,

Dec. 2000, pp. 1349-1380.

2. C. Dorai and S. Venkatesh, “Computational Media

Aesthetics: Finding Meaning Beautiful,” IEEE

MultiMedia, vol. 8, no. 4, Oct.–Dec. 2001, pp. 10-12.

Chitra Dorai is a member of the

research staff at the IBM T.J. Wat-

son Research Center, New York,

where she leads the e-learning

content management and media

semantics projects. Her current

research focuses on developing technologies for con-

tent management and media analysis in various

domains that are useful in content-based structuraliza-

tion, annotation and search, and smart browsing. Dorai

received a BTech from the Indian Institute of Technol-

ogy, Madras, an MS from the Indian Institute of Sci-

ence, Bangalore, and a PhD from Michigan State Uni-

versity. She’s a senior member of the IEEE and a mem-

ber of the ACM.

Svetha Venkatesh is the codirec-

tor for the Center of Excellence in

Intelligent Operations Manage-

ment and a professor at the

School of Computing at Curtin

University of Technology, Perth,

Australia. Her research focuses on large-scale pattern

recognition, image understanding, and applications of

computer vision to image and video indexing and

retrieval.

Readers may contact Chitra Dorai at dorai@

watson.ibm.com and Svetha Venkatesh at svetha@cs.

curtin.edu.au.

For further information on this or any other computing

topic, please visit our Digital Library at http://computer.

org/publications/dlib.

17

Ap

ril–June 2003

IEEE Pervasive Computing delivers the latest peer-revieweddevelopments in pervasive, mobile, and ubiquitous computing and acts as a catalyst for realizing the vision of pervasive (orubiquitous) computing, described by Mark Weiser nearly adecade ago. In 2003, look for articles on

• Security & Privacy

• The Human Experience

• Building Systems that Deal with

Uncertainty

• Sensor and Actuator Networks

SUBSCRIBE NOW!

http://computer.org/pervasive

M. SatyanarayananCarnegie Mellon Univ. and Intel Research Pittsburgh

Associate EICs

IEEE Pervasive Computing

Editor in Chief

Roy Want, Intel Research; Tim Kindberg, HP Labs; Deborah Estrin, UCLA; Gregory Abowd, GeorgiaTech.;

Nigel Davies, Lancaster University and Arizona University