Qualitative Journal 1

Semantic annotation and retrievalof documentary media objects

Dimitris KanellopoulosEducational Software Development Laboratory, Department of Mathematics,

University of Patras, Rio Patras, Greece

Abstract

Purpose – This paper aims to propose a system for the semantic annotation of audio-visual mediaobjects, which are provided in the documentary domain. It presents the system’s architecture, amanual annotation tool, an authoring tool and a search engine for the documentary experts. The paperdiscusses the merits of a proposed approach of evolving semantic network as the basis for theaudio-visual content description.

Design/methodology/approach – The author demonstrates how documentary media can besemantically annotated, and how this information can be used for the retrieval of the documentarymedia objects. Furthermore, the paper outlines the underlying XML schema-based content descriptionstructures of the proposed system.

Findings – Currently, a flexible organization of documentary media content description and therelated media data is required. Such an organization requires the adaptable construction in the form ofa semantic network. The proposed approach provides semantic structures with the capability tochange and grow, allowing an ongoing task-specific process of inspection and interpretation of sourcematerial. The approach also provides technical memory structures (i.e. information nodes), whichrepresent the size, duration, and technical format of the physical audio-visual material of any mediatype, such as audio, video and 3D animation.

Originality/value – The proposed approach (architecture) is generic and facilitates the dynamic useof audio-visual material using links, enabling the connection from multi-layered information nodes todata on a temporal, spatial and spatial-temporal level. It enables the semantic connection betweeninformation nodes using typed relations, thus structuring the information space on a semantic as wellas syntactic level. Since the description of media content holds constant for the associated timeinterval, the proposed system can handle multiple content descriptions for the same media unit andalso handle gaps. The results of this research will be valuable not only for documentary experts but foranyone with a need to manage dynamically audiovisual content in an intelligent way.

Keywords Documentary, Semantic annotation, Video, Temporal and spatial levels of audiovisual data,Content management, Audiovisual media, Multimedia

Paper type Research paper

1. IntroductionIn the last few years, the general public’s interest in documentaries has grownenormously. A documentary is the presentation of factual events, often consisting offootage recorded at the time and place of their occurrence and generally accompanied bya narrator (Rosenthal and Corner, 2005). Documentary is a media work category, appliedto photography, film and television. It has been developed internationally across a widerange of formats, including the use of dramatization, observational sequences andvarious combinations of interview material with images that portray the real withdeferent degrees of referentiality and aesthetic crafting. Documentaries often depictvarious important topics (e.g. animal life, historical events, tourist attractions etc) by

The current issue and full text archive of this journal is available at

www.emeraldinsight.com/0264-0473.htm

Documentarymedia objects

721

Received October 2011Revised February 2012

Accepted March 2012

The Electronic LibraryVol. 30 No. 5, 2012

pp. 721-747q Emerald Group Publishing Limited

0264-0473DOI 10.1108/02640471211275756

mixing photos and videos with commentaries and opinions from experts. All theseelements are organized in narrative form. The definition of documentary oftenundertakes a discursive path. Two factors play consistently in various definitions:

(1) reality is captured in some forms of documents; and

(2) the documents are subjected to assemblage to serve a larger context.

For the definition of documentary, we adopt the simplest task definition, that of Vertov:“to capture fragments of reality and combine them meaningfully” (Barnouw, 1993, p. 55).It can be said that making documentaries is not a piece of science. Documentaries canrelate data from science, but they are not scientific reports. They mix science, narrative,images, while the filmmakers’ point of view affects the way these are mixed. Forexample, a travel documentary is a documentary film (or television program) thatdescribes travel or tourist attractions in a non-commercial way. It is not a scientific reportbut it is based on knowledge about tourist attractions. A representative traveldocumentary is Word Travels (IMDb, n.d.) that follows the lives of two youngprofessional travel writers (Robin Esrock and Julia Dimon), as they journey around theworld in search of stories to experience, write about, and file for their editors.

According to Nichols (2001) in documentary film and video, we can identify sixmodes of representation that function something like sub-genres of the documentaryfilm genre itself: poetic, expository, participatory, observational, reflexive, andperformative. Table I shows the main characteristics and deficiencies of thesedocumentary modes.

Modern lightweight digital video cameras and computer based-editing have reallyaided documentary makers. The first film to take full advantage of this change wasMartin Kunert and Eric Manes’ Voices of Iraq, where 150 digital video cameras were sentto Iraq during the war and passed out to Iraqis to record themselves. Multimediatechnology allows text, graphics, photos, and audio to be transmitted effectively and

Documentary mode Main characteristics Deficiencies

Poetic documentary (1920s) Reassemble fragments of theworld poetically

Lack of specificity, tooabstract

Expository documentary (1920s) Directly address issues in thehistorical world

Overly didactic

Observational documentary (1960s) Eschew commentary andreenactment; observe thingsas they happen

Lack of history, context

Participatory documentary (1960s) Interview or interact withsubjects; use archival film toretrieve history

Excessive faith in witnesses,naive history, too intrusive

Reflexive documentary (1980s) Question documentary form,defamiliarize the other modes

Too abstract, lose sight ofactual issues

Performative documentary (1980s) Stress subjective aspects of aclassically objective discourse

Loss of emphasis onobjectivity may relegate suchfilms to the avant-garde;“excessive” use of style

Table I.Documentary modes

EL30,5

722

rapidly across media platforms. Media organizations must cope with multimediachanges that move exponentially to the next competing delivery device. Nowadays, thereis a potentially wide range of applications in the media domain such as search, filteringof information, media understanding (surveillance, intelligent vision, smart cameras etc.)or media conversions (speech to text, picture to speech, visual transcoding etc).Understanding semantics and meaning of documentaries is directly needed (Choi, 2010).Finding the bits of interest (the important part of a documentary) becomes increasinglydifficult, frustrating, and a time consuming task. Internet users need an intelligent searchengine for performing complex media search and help users finding media chunks basedon semantics in media itself (Dorai et al., 2002). However, media is so rich in its contentvariety that it will never sufficiently be described by text or words (Dorai and Venkatesh,2001). Besides, humans must take the time to annotate the media chunks.

Media information systems for documentaries should incorporate mechanisms thatinterpret, manipulate and generate visual media as well as audible information. Amedia infrastructure for documentaries should manipulate self-sufficient componentsof documentaries, which can be used in any given production. In order to use such anindependent media item, it is required to extract the relationship between the signs ofthe audio-visual information unit and the semantics they represent (Eco, 1997). As aresult, media information systems for documentaries such as Terminal_Time (Mateas,2000) should manage independent media objects and their representations for use inmany different productions. Therefore, we need tools that utilize human actions toextract the important syntactic, semantic and semiotics aspects of its content(Brachman and Levesque, 1983) in order descriptions (based on a formal language) canbe constructed. The increasing amount of various documentaries and theircombinatorial use requires the annotation of media during their production.

Media annotation and querying for documentaries is still a major challenge, as thegap between the documentary features and the existing media tools is wide. In the lasttwo decades, many authoring tools have been proposed for multimedia data (Tien andCecile, 2003; Ryn et al., 1989). These authoring tools are either application dependent orprovide insufficient authoring features. High-level annotation facilities like annotationof objects, time, location, events etc can be provided by existing video annotation toolssuch as Vannotator (Costa et al., 2002), IBM VideoAnnEx (IBM, n.d.), ELAN (TheLanguage Archive, n.d.), CAVIAR (The University of Edinburgh, n.d.), and ViPER-GT(Sourcegorge.net, n.d.). Rincon and Martinez-Cantos (2007) describe a video annotationtool (called AVISA) for video understanding. They analyze the features that must bepresent in a video annotation tool for video understanding. However, these featuresneed to be complemented with finer level annotation methods that are required for thevideo documentaries. Automatic video generation systems use descriptions(annotations) of the media items in order to make decisions about how to create avideo sequence. The structure of annotations is composed of two parts:

(1) The structure of the description (e.g. a documentary film can be described byfields, such as title, director).

(2) The structure of the values used to fill the description (e.g. “The Civil War” canbe the value of the field title).

According to Bocconi et al. (2008) there are three different types of descriptionstructures:


723

(1) Keywords-based description structures (or K-annotations), in which each item isassociated with a list of words that represent the item’s content. Representativevideo generation systems that use K-annotations are Lev Manovich’s Soft Cinema(n.d.) and the Korsakow System (Korsakow, n.d.) , systems that edit in real-timeby selecting media items from a database. ConTour (Murtaugh, 1996) is anotherindicative system that supports evolving documentaries, i.e. documentaries thatcould incorporate new media items as soon as they were made.

(2) Properties-based description schemes (or P-annotations) in which items areannotated with property-value pairs. Representative system of this category isSemInfo (Little et al., 2002).

(3) Structure-based on relations (or R-annotations). Here, items are annotated withproperty-value pairs as in P-annotations only that some of these values arereferences to other annotations. A representative system is DISC (Geurts et al.,2003), which is a multimedia presentation generation system for the domain ofcultural heritage. DISC uses the annotated multimedia repository of theRijksmuseum (n.d.) to create multimedia presentations.

Benitez et al. (2000) presented description schemes (DSs) for image, video, multimedia,home media, and archive content proposed to the MPEG-7 standard. They used theXML to illustrate and exemplify their description schemes by presenting applicationsthat already use the proposed structures. These applications are the visual apprentice,the AMOS-search system, a multimedia broadcast news browser, a storytellingsystem, and an image meta-search engine, MetaSEEk.

The AUTEUR system (Nack and Parkes, 1997) synchronizes automatic storygeneration for visual media with the stylistic requirements of narrative and mediumrelated presentation. The AUTEUR system consists of an ontological representation ofnarrative elements such as actions, events, and emotional and visual codes, based on asemantic net of conceptual structures related via six types of semantic links(e.g. synonym, sub-action, opposition, ambiguity, association, conceptual). A coherentaction-reaction dynamic is provided by the introduction of three event phases,i.e. motivation, realization and resolution. The essential categories for the structuresare action, character, object, relative position, screen position, geographical space,functional space and time. The textual representation of this ontology describessemantic, temporal and relational features of video in hierarchically organizedstructures, which overcomes the limitations of keyword-based approaches.

We believe that formal semantics can support the annotation, analysis, retrieval orreasoning about multimedia assets in the documentary industry. The proliferation ofdocumentaries and their applications require media annotation that bridges the gapbetween documentary technology and media semantics. In line with this, Dorai andVenkatesh (2001, p. 10) state:

A serious need exists to develop algorithms and technologies that can annotate content withdeep semantics and establish semantic connections between media’s form and function, forthe first time letting users access indexed media and navigate content in unforeseeable andsurprising ways.

The aim of this paper is to propose an agent-oriented programming approach using aframework for describing the inherent semantics of the documentaries pieces. In

EL30,5

724

agent-oriented programming, agent-oriented objects typically have just one method,with a single parameter. This parameter is a sort of message that is interpreted by thereceiving object, or “agent”, in a way specific to that object or class of objects.Documentaries pieces are unique to video documentaries. For this reason, we havecreated a domain specific representation for the documentary pieces to improve theretrieval accuracy of the documentary video queries.

The remainder of the paper is structured as follows. In Section 2, we discuss issuesconcerning documentary authoring, while in Section 3 we present the semantics ofdocumentary media. In Section 4 we describe the system architecture. In Section 5, wepresent our approach for implementing the repository for documentaries; our semanticnetwork based approach for the data storage and management and we illustrate theproposed XML schema-based representational structures. In Section 6, we explain the useof the proposed system through the tools for annotation, semi-automatic authoring andsemantic retrieval that we have implemented for the documentary video environments.Finally, in Section 7 we conclude the paper and give directions for further work.

2. Documentary authoringThe conventional understanding of documentary production involves a three-phaseworkflow:

(1) pre-production;

(2) production; and

(3) post-production.

Figure 1 illustrates a traditional documentary production model.The production model formalizes a cyclic process as opposed to a linear workflow.

Pre-production is a phase of research and ideation where visions are selectively auditedthrough sketches mostly in text and graphical forms. Production and Post-productionare the phases of iterative processes for gathering and assessing media resources.Screening is a main method for assessment through daily production and plays animportant role in assessments of daily results and edited sequences, determiningfurther materials needed and methods for acquiring the materials. In particular, adocumentary screening is the displaying of a documentary referring to a specialshowing as part of a documentary’s production and release cycle. The different typesof screenings follow here in their order within a documentary’s development:

(1) Test screening. For early edits of a documentary, informal test screenings areshown to small target audiences to judge if a documentary will require editing,reshooting or rewriting.

(2) Focus group screenings are formal test screenings of a documentary with verydetailed documentation of audience responses.

(3) Critic screenings are held for national and major market critics well in advance ofprint and television production-cycle deadlines, and are usually by invitation only.

(4) Public preview screenings may serve as final test screenings used to adjustmarketing strategy (radio and TV promotion, etc) or the documentary itself.

(5) A sneak preview is an unannounced documentary screening before formalrelease, generally with the usual charge for admission.


725

Actually, media production for documentaries is a complex, resource demandingprocess that provides a multidimensional network of relationships among themultimedia information.

Documentary authoring is based on the fundamental processes of media orhypervideo production. Aubert et al. (2008) identified these fundamental (or canonical)processes that can be supported in semantically aware media production tools.According to Aubert et al. (2008) these processes are:

. Premeditate (1) Inscription of marks/organization/browsing. The premeditateprocess takes place in every step of the authoring activity. Input: thoughts of theauthor. Output: necessary schemas, annotations, queries or views.

. Create (2) This process exploits existing audiovisual documents.

. Package (3) Inscription of marks/organization/browsing. The metadata structureand accompanying queries and views are present, and can be materializedpackage.

. Annotate (4) Inscription of marks. Creation of the annotations, withspatio-temporal links to the media assets. Input: Media sources. Output:annotation structure.

. Query (5) Organization. Queries allow selecting appropriate annotations. Input:basic elements. Output: basic elements matching a specify query.

. Construct message (6) Organization. Structuration of the presentation of data.Input: the ideas from the premeditate process, the annotation structure, queries.Output: draft of views.

Figure 1.Traditional documentaryproduction model

EL30,5

726

. Organize (7) Organization. Definition of views to render the selected annotations.Input: basic elements. Output: view definitions.

. Publish (8) Browsing, Publishing. Content packaging-publishing, meansgeneration of documents from the templates, occurs in the browsing phaseand also in the publishing phase. Input: basic elements. Output: a package and/orrendered views.

. Distribute (9) Browsing, Publishing. The rendition of view is currently donethrough a standard web browser, or the instrumented video player integratedinto the prototype.

Hardman et al. (2008) identified a small set of canonical processes and specified theirinputs and outputs, but deliberately do not specify their inner workings, concentratingrather on the information flow between them. Indicative examples of invokingcanonical processes are given in (Aubert et al., 2008). Currently, many standardsfacilitate the exchange between the different media process stages (Pereira et al., 2008),such as MXF (Media Exchange Format), AAF (Advance Authoring Format), MOS(Media Object Server Protocol), and Dublin Core.

The process of documentary authoring can be arranged in three phases: modeling,annotation and authoring of documentary media.

(1) The modeling phase identifies the various semantics that exist in thedocumentary media.

(2) The annotation phase provides the human annotator the various utilities for thefree text representation of their perception of the documentary.

(3) The authoring phase is meant for the semiautomatic translation of theannotated media information into XML, validated by the XML Schemavalidation tools. Using XML technologies, the semantic multimedia content ofthe documentary can be represented in an interoperable way. It is a good idea topropose substantial customizations based on XML technologies for thedocumentaries. Thus, the produced item will be an XML document thatrepresents the annotation of the real-time video documentary.

Documentary information systems must accommodate these three phases, providing acommon framework for the storage of the authored documentary and for its presentationinterface. Documentary analysis tools should perform the interpretation ofdocumentaries in the context of culture, mode of documentary, mode of speech, action,gestures and emotions. Existing tools and systems provide annotation features for thedocumentary videos often based on a particular type of documentary (Mateas, 2000). Inaddition, they offer a limited number of annotation facilities, thus it becomes difficult toderive generic facilities. These tools do not provide semiautomatic authoring, which is animportant requirement. It is worth mentioning that Bocconi et al. (2008) describe a modelfor automatically generating video documentaries. This allows viewers to specify thesubject and the point of view of the documentary to be generated. However, the domainof Bocconi et al. is matter-of opinion documentaries based on interview.

Agius and Angelides (2005) proposed the COSMOS-7 system that models the objectsalong with a set of events in which the objects participate, as well as events along with aset of objects and temporal relationships between the objects. This system/model


727

represents the events at a higher level only like speak, play, listen and not at the level ofactions, gestures and movements. Harry and Angelides (2001) proposed a semanticcontent-based model for semantic-level querying that makes full use of the explicit mediastructure, objects, spatial relationships between objects, events and actions involvingobjects, temporal relationships between events and actions, and integration betweensyntactic and semantic information. Ramadoss and Rajkumar (2007) considered a systemfor the semiautomatic annotation of an audio-visual media of dance domain, while Nackand Putz (2004) presented a framework for the creation, manipulation, andarchiving/retrieval of media documents, applied for the domain of News. In the digitalgames and entertainment industry, Burger (2008) stressed the importance of the use offormal semantics (ontologies) by providing a potential solution based on semantictechnologies. AKTive Media (Chakravarthy et al., 2006) is an ontology-based cross-mediaannotation (images and text) system. It includes an automatic process of annotation bysuggesting knowledge to the user in an interactive way while the user is annotating. Thissystem actively works in the background, interacting with web services and queries thecentral annotational store to look for context specific knowledge. Chakravarthy et al.(2009) present OntoFilm, a core ontology for film production. OntoFilm provides astandardized model, which conceptualizes the domain and workflows used at variousstages of the film production process starting from pre-production and planning, shootingon set, right through to editing and post-production.

In this paper, we propose a documentary video framework in order to incorporatemedia semantics for documentaries. This framework provides the XML authoredcontent of the documentary from the supplied semantic and semiotic annotations bythe human annotators. The proposed requirements are:

(1) A layer oriented model depicting the documentary pieces as events, whichincorporates the gesture, actions and spatial-temporal relationships of thesubjects (e.g. documentarists) and objects in a documentary. Besidesdocumentary pieces, other examples for events are setup, background scenechange, role change by a documentarist.

(2) A semantic network representing the documentary, the individual documentarypieces, besides the cognitive aspects, setting, cultural features and story.

(3) An annotation tool for the documentary experts to manually perform thesemantic and semiotic annotations of the documentary media objects likedocumentary, documentarists etc.

(4) A semantic querying tool for the documentary experts and users/spectators tobrowse and query the documentary media features for designing newdocumentary sequences. Some examples of documentary media or videoqueries are:. show me all the pieces of natural history documentaries from Africa;. tell me all documentary pieces where documentarist is in danger; and. find all historical documentary pieces representing the invasion of

Normandy etc.

The query engine should be assisted by proper representations so that the retrievedresult achieves high precision and high recall.

EL30,5

728

3. The semantics of documentary mediaThe spatial-temporal delivery of a sequence of the documentary pieces is recorded in adocumentary video, in which each documentary piece consists of a set of subject’sactions. Each subject action denotes the action of the characters, such as commentarist,speaker, interviewee etc. The action is represented as , subject-verb-object-adverb .using verb-argument structure (Sarkar and Tripasai, 2002) that exists in Linguistics.This section explains some of the characteristics of documentary media briefly.

Definition 3.1 (Documentary)The documentary numbered i DCi;n

� �consists of a set of documentary video clips

Ci;j

� �performed at a particular setting. That is, DCi;n ¼ Ci;1; Ci;2; . . . ;Ci;n

� �where n

is the total number of documentary clips. In this sense, the documentary DC2;3 ¼C2;1; C2;2 C2;3

� �denotes the second documentary that consists of three documentary

clips C2;1; C2;2; C2;3

� �. For example, if the second documentary DC2;3 is a travel

documentary and is presenting Holidays in Greece, then the three video clips could beC2,1 ¼ Arriving at the airport of Athens, C2,2 ¼ Touring Athens and C2,3 ¼ Cruise inthe Rodos island.

Definition 3.2 (Documentary Clip)A documentary clip Ci;j of the documentary DCi;n consists of a set of documentarypieces (DP) that are performed by the documentarists. That is, Ci;j;m ¼DPi;j1; DPi;j2; . . . ;DPi;jm

� �where m is the total number of documentary pieces.

For example, the documentary clip C2;3;7 ¼DP2;3;1; DP2;3;2; DP2;3;3; DP2;3;4; DP2;3;5; DP2;3;6; DP2;3;7

� �denotes the third video clip

(in our example Cruise in the Greek islands) of the second documentary. This clipincludes seven documentary pieces: DP2,3,1, DP2,3,2, DP2,3,3, DP2,3,4, DP2,3,5, DP2,3,6,DP2,3,7.

Definition 3.3 (Documentary Piece)A documentary piece is the basic semantic unit of a documentary, which has a set ofsubject’s actions that are performed either sequentially or concurrently by the subjects(documentarists). It encapsulates the mood, genre, culture, and characters, apart fromthe actions. A documentary piece DPi;j;k

� �of the video clip Ci, j represents a meaningful

sequence of subject’s (documentarist) actions (A). DPi;j;k ¼ A1;A2; . . .Akf g where k isthe total number of subject’s actions in this documentary piece. For example, thedocumentary piece DP2;3;4 ¼ A1;A2;A3;A4f g denotes that piece of the third video clipthat (belongs to the second documentary) includes the first four sequential actionsA1;A2;A3;A4

� �performed by the subject (documentarist). In our example, these

actions could be:

A1: “The documentarist is visiting the main attractions of the Rodos island inGreece”.

A2: “The documentarist is taking a swim”.

A3: “The documentarist is participating in the local festival”.

A4: “The documentarist is taking a taste of Rodos nightlife”.


729

Definition 3.4 (Subject’s (documentarist) action)The subject/documentarist’s action (A) is represented by an action of a character and isdefined as a tuple, , Agent-Action-Target-Speed . where agent and target are thebody-parts of the subject/object, action represents the static poses and gestures in theuniverse of actions and speed denotes the speed of the delivery of the actions, that isspeed ¼ (low, medium, fast, gradual ascending, gradual descending). If only one agentinvolves in an action, then it is called primitive action. That is, the target agent is emptyor Nil. For example, , documentaristi.larm move- nil-fast . shows that documentarist imoves his left arm fast. If multiple agents involve in an action or gesture, then the actionis known as composite action. For instance, , Documentaristi.rhand – touch –gorillaj.head – low . denotes that documentarist i touches the head of gorilla j slowlywith his right hand. The content representational structures for these documentarymedia semantics are discussed in following sections.

4. The architecture for authoring and querying documentariesThe proposed system (shown in Figure 2) provides an environment supporting theannotation, authoring, archiving and querying of the documentary media objects. Theaim is to apply the framework to all sorts of documentary types such as natural historydocumentary, travel documentary etc.

The environment is based on various modules: annotation, archival, querying,representation structures and the underlying database. The documentary expertsaccess each of these modules to carryout their specific tasks. It is essential for ourdevelopments that these modules need to be easy and simple for use, therebyminimizing the complexity of acquaintance with the system. The annotation moduletakes the raw digital video as input and allows the human annotator to annotate thedifferent documentary media objects. The generated annotations are described in therepresentational structures such as linked lists and hash tables. The authoring moduletakes the annotations representing the documentary sequence and translates them intoXML instances automatically. The XML Schema instances that are instantiated by theauthoring module are stored in the back-end database. The query-processing moduleallows the documentary experts to pose the different free-text documentary videoqueries to the XML annotation, performs search using XQuery (after stemming,

Figure 2.The architecture of theproposed system

EL30,5

730

removing the stop words and converting the tokens into XQuery form) and returns theresults of these queries back to the users. Based on the observation, we have identifieda set of required data structures and the associated relations and have developed toolsfor accomplishing the documentary video tasks. Figures 3-5 depict the annotation,query and semantic annotation processes correspondingly.

Figure 5.The semantic annotation

process in a UML classdiagram

Figure 4.The query process in a

UML class diagram

Figure 3.The annotation process in

a UML class diagram


731

5. The model of semantics for documentary mediaAccording to Nack and Putz (2004) annotation is a dynamic and iterative process, andthus annotations should be incomplete and change over time. Consequently, it isimperative to provide semantic representation schemes with the capability to changeand grow. In addition, the relation between the different types of structures should beflexible and dynamic. To achieve this, media annotation should not result to amonolithic document, rather it should be organized as a semantic network of contentdescription documents (Ramadoss and Rajkumar, 2007).

5.1 Layer oriented event descriptionIn the design of the proposed system, we adopted the strata-oriented approach(Aguierre Smith and Davenport, 1992) and setting (Parkes, 1989) for describing theevents such as documentary pieces. Strata oriented content modeling is animportant knowledge representation method and more suitable to model the eventsof the documentary presentation. In our framework, each video documentary istechnically described using the size, duration, technical format of the material suchas such as mpg, avi etc. Therefore, each documentary can be represented partiallyusing technical details that belong to the layer of technical details. In addition, eachvideo documentary is conceptually annotated using high-level semantic descriptorsand thus it can be complementarily represented using such semantic descriptorsthat belong to the layer of semantic annotations. The connection between thedifferent layers is accomplished by a triple , media identifier, start time, endtime . . The proposed representation structure includes many layers (one layer foreach description). The triple identifier is applied in order to be achieved theconnection between the different layers and the data to be described (e.g. the actualaudio, video, or audio visual stream). For instance, a documentarist may perform anumber of actions in the same time span. Start and end time can be used to identifythe temporal relation between the actions. Documentary pieces can be represented inthis way, thereby enabling semantic retrieval. Figure 6 depicts the layeredrepresentation of a shot of 100 frames, representing three actions. Suppose a query“find a documentary piece of a natural history documentary from Africa, wheredocumentarist is speaking and touching a gorilla, while gorilla is eating a banana”.This question can be easily retrieved by isolating the common parts of the shot asdepicted in shaded portion of Figure 6. The temporal relationship between them canbe identified using the start and end point with which those actions are associated.In this way, complex structured behavior concepts can be represented and hence theaudio-visual material retrieved on this basis.

Figure 6.Layered annotation ofactions and isolatedsegment of a shot a query

EL30,5

732

5.2 Nodes of the proposed frameworkNodes are used to build linked data structures concerning documentaries. Each nodecontains some data and possibly links to other nodes. A node can be thought of as alogical placeholder for some data. It is a memory block, which contains some data unitand perhaps references to other nodes, which in turn contain data and perhaps referencesto yet more nodes. Links between nodes are implemented by pointers or references. Byforming chains of interlinked nodes, very large and complex data structures concerningdocumentaries can be formed. As a consequence, semantic structures of documentary’spieces can be implemented easily. In our framework, we distinguish two types of nodes,i.e. data nodes (D-nodes) and conceptual annotation nodes (CA-nodes):

(1) A D-node represents physical audio-visual material of any media type, such astext, audio, video, 3D animation, 2D image, 3D image, and graphic. The size,duration, and technical format of the material is not restricted, nor are anylimitations present with respect to the content, i.e. number of persons, actionsand objects. A data node might contain a complete documentary film or merelya scene. The identification of the node is realised via a URI.

(2) A CA-node provides high-level descriptions of a video documentary. Ahigh-level description is one that describes “top-level” goals, overall features ofa documentary, is more abstracted, and is typically more concerned with thevideo documentary as a whole, and its goals. For example, the events occur in adocumentary (as well as the location, date and time of an event) can bedescribed by high-level descriptors. The mood (e.g. subjective content-happy,sorrow, romantic etc) of a documentary and so many other features can also bedescribed by high-level descriptors. Such descriptors are usually difficult toretrieve using automatic extraction methods. This type of nodes is usuallycreated manually.

Each node is best understood as an instantiated schema. The available number of nodeschemata is restricted, thus indexing and classification can be performed in acontrolled way, whereas the number of provided nodes in the descriptional informationspace might consist of just one node or up to n nodes. The obvious choice forrepresenting CA-nodes, each of them describing audiovisual content, would have beenusing the DDL of MPEG-7 or suggested schemata by MPEG-7. The MPEG-7 standard(Martinez et al., 2002; Salembier and Smith, 2002) concentrates on multimedia contentdescription and constitutes the greatest effort for multimedia description. It is based ona set of XML Schemas that define 1,182 elements, 417 attributes and 377 complextypes. It is divided into four main components:

(1) the Description Definition Language (DDL, the basic building blocks for theMPEG-7 metadata language);

(2) audio (the descriptive elements for audio);

(3) visual (those for video); and

(4) the Multimedia Description Schemes (MDS, the descriptors for capturing thesemantic aspects of multimedia contents, e.g. places, documentarists, objects,events, etc).


733

We do not choose using MPEG-7 because the main weakness of the MPEG-7 standardis that formal semantics are not included in the definition of the descriptors in a waythat can be easily implementable in a system (Nack et al., 2005). Therefore, we chose touse XML Schema as a representational scheme for the documentary media due to itssimplicity and maturity. The use of XML technologies implies that a great part of thesemantics remains implicit. Therefore, each time an application is developed;semantics must be extracted from the standard and re-implemented.

For our documentary media environment, we have developed a set of 14 schematathat describe the denotative and technical content of the documentary video. Theschemata are designed such a way that they are semi-automatically instantiated orauthored. These are shown in Table II.

The XML schema representation of the 14 schemes can be found in Subsection 5.4.With these schemes one can perform the browse (e.g. documentary, actions,documentarists, documentary piece, culture, objects etc) and semantic search (e.g. showme all natural history documentary pieces).

5.3 RelationshipsIn our framework, all metadata about the actual audio and video streams of thedocumentary are organized in the form of a semantic network. A semantic network is anetwork that represents semantic relations among concepts. This is often used as aform of knowledge representation and it is a directed or undirected graph consisting ofvertices, which represent concepts, and edges. Figure 7 depicts a possible semantic netof a documentary annotation.

From this figure, we can also understand the two ways of annotating documentarydata, based on the requirements of the documentary expert.

Schema Description

Documentary High-level organizational scheme of a documentary presentationcontaining all documentary clips

Documentary Clip High-level scheme of a documentary consisting of all annotations andrelations to other clips

Documentary Piece An event representing a meaningful collection of the actions ofdocumentarists

Subject/Documentarist’sAction

The basic pose, gesture or action done by the documentarist

Event The event that occurs in a documentary clipPerson Person participating in a documentary, e.g. documentarists,

interviewees, narrators, speakersEmotion Subjective content like mood or feeling etcSetting The location, date time of an eventLifeSpan Duration with start and end timesRelation Between documentary media elementsSTRelation Spatial-temporal relationships of the documentaristLink Connections between the media source and the document schemesResource Relation to any URI addressBasic Info Basic information about the documentary such as language, video

type, recording information, archive information, access rights etc

Table II.Schemata fordocumentaries

EL30,5

734

(1) either as part of a documentary; or

(2) as a single documentary clip representing one documentary.

Annotation networks of a documentary, clip, documentary piece, media source can beinterconnected together with the links and relations. There are two types ofconnections among the nodes:

(1) Link type: to connect media source and description nodes (represented usingarrow).

(2) Relation type: to connect different annotation nodes (represented using line).

Link connects the media source (audio and video files) to the data node along with itslife spans (i.e. on a temporal level). The XML schema representation of Link type isshown below.

5.4 Description schemes for documentaries in XML SchemaThe XML schema representation of the relation types is presented hereafter (Figures 8-10).

In our environment, DocumentaryDS and DocumentaryClipDS hold link types,enabling connections to the documentary video and audio sources. Note that, these twodescription schemes serve as an entry point to the semantic network. Our front-endannotation tool performs the semiautomatic instantiation of links. Relation typesperform the connection among the description schemes that are represented asCA-nodes. Between two nodes, there may exist up to m relationships and we define thefollowing relations for our documentary media environment.

. For events: follows, precedes.

. For character, setting, object: part of, association, before, equal, meets, overlaps,during, starts, finishes.

. For documentary pieces: we propose two temporal semantic relationships for thedocumentary pieces: follows and precedes.

These temporal semantic relationships help to infer the type of documentary duringquery processing. In our environment, relationships are instantiated

Figure 7.A semantic net of a

documentary annotation


735

semi-automatically by the tool. We now introduce our documentary annotation andquerying tool to instantiate the description schemes that have been designed based onthe concepts of semantic net. Also, we then introduce our search engine that allows theusers to browse and query the documentary features for composing newdocumentaries and for learning purposes.

Figure 8.

EL30,5

736

Figure 9.


737

Figure 10.

EL30,5

738

6. Tools for documentaries6.1 Annotation and authoring toolDocumentary experts can annotate the documentary or clip by looking at the runningvideo and using the annotation tool. The video player provides all the standardfacilities like play, start, stop, pause and replay. We used the Cinepak codec for theconversion of the running video (WinAmp media file) to AVI format. The annotationtool provides to the documentary experts the facility to annotate the documentarypieces using free-text and controlled vocabulary independently on the storageorganization of the generated annotations. We developed the annotation tool by usingJ2SE1.5 and Java Media Framework 2.0. Figure 11 depicts the GUI of the initial screenfor determining the documentary information.

It is noteworthy that a documentary, a documentary clip constitutes an entry pointto the annotation. The annotation process begins by the documentary expert withdescribing the metadata about the documentary. The basic metadata (descriptions)those are common for all documentaries are shown in Table III.

Once the annotation of the documentary has been completed, the documentaryexpert can describe individual documentary presentations that are part of thatdocumentary. We have identified a set of features that correspond to a documentaryclip as depicted in Table IV. The metadata describing a documentary piece that canbe annotated through the annotation tool are as follows (Table V). The metadataabout the person, object and basic media info are shown in Tables VI-VIII,respectively.

Figure 11.A snapshot of the

annotation tool fordetermining the

documentary information


739

The semi-automated editing suite (Figure 12) provides the documentary expert with aninstant overview of the available material and its essential relations representedthrough the spatial order of its presentation. The documentary expert can mark therelevant video clips or pieces by pointing at the preferred clips or pieces. The order ofpointing indicates the sequential appearance of the clips or pieces. The editing suitebased on a simple planner performs an automated composition of the documentaryclip. At the present stage of development our editing suite uses the meta-informationobtained from the annotation tool to support the video editing process.

Documentary piece Description

MoodID Subjective content-happy, sorrow, romantic, etcCulture Indian, western, etc documentary piecesGenre Such as poetic, expository, observational participatory, reflexive,

performativeMode of documentary speech Commentary speech, presenter speech, interview speech in shot,

overhead interchange, dramatic dialogueObject Background and foreground objects used in a documentary pieceAction Spatial-temporal actions, gestures, poses of the characters

Agent Body parts involvedRelated action Associated actionTarget Target body part of the opponent if anySpeed Slow, medium, fast, gradual ascending, gradual descending

Life span Duration of the documentary piece

Table V.Metadata of adocumentary piece

Documentary clip Description

Character name, role, gender,life span

Role played by the documentarist such as commentarist, presenter etc.Life span of the character is necessary. Because several roles by thesame documentarist in a documentary clip are possible

Context Identifies whether it is a historical, travel or documentary withoutwords etc

Documentary genre Such as poetic, expository, observational participatory, reflexive,performative

Language Language used by the documentarists in the audio. Several languagesmay be used in the same documentary

Life span Duration of the documentary clip

Table IV.Metadata of adocumentary clip

Documentary Description

Date and time Date and time of video recording of the documentaryMedia locator Links to video and audio streamsMedia format Format of the video such as mpg, avi etcMedia type Type of the media like video, audio, text etcTitle Name of the documentaryOrigin Originating country of the documentaryDuration Life span, i.e. length of the documentary in minutes

Table III.Metadata of adocumentary

EL30,5

740

6.2 Search engineThe search engine facilitates the documentary experts to design a new documentaryand users to view the documentary pieces themselves. In particular, user can search inmany dimensions for specific documentary pieces belonging to a video clip. Forexample, user can search for all documentary pieces denoting specific objects such assun, moon etc. In addition, user can search for certain subject’s actions incorporatedinto documentary pieces. Furthermore, user can search for documentary pieces, wherethe subject (e.g. documentarist) has certain mood (happy, angered etc). In another case,user can search for documentary pieces, in which the speed of the delivery of subject’sactions are low, or medium or fast or gradual ascending or gradual descending. Usercan also search for documentary pieces in which a “specific” song is played. Finally,user can use this search engine as a browsing tool with several built in categories of thedocumentary information and as a query tool to pose free text documentary queries.The retrieval tool facilitates several browsing features for the users. These are:

Documentary To browse all documentary clips along with their video of thedocumentary pieces. Output is rendered in the output window.

Documentary clip To view all documentary pieces of a clip.

Documentary piece To view all subject/documentarist actions of a particular clip.

Objects Displays all documentary pieces denoting sun, moon, etc.

Tempo Users can browse the documentary pieces according to thespeed categories.

Person Description

Name Name of the personFunction Commentarist, speaker, intervieweeE-mail Contact details

Table VI.Metadata of persons

Object Description

Name Name of the background or foreground objectType Background or foreground objectNumber of Number of objectsShape Shape of the object (in text)Color Color of the object (in text)Texture Pattern

Table VII.Metadata of objects

Basic information Description

Recording speed Speed of recordingCamera details Description of the camera used while recording the documentaryAccess rights Access information

Table VIII.Metadata of media


741

Mood To browse according to the feeling like happy, romantic, etc.

Culture Indian, western, etc.

Documentarist All documentary pieces that are part of a documentarist.

Genre Poetic, expository, observational, participatory, reflexive,performative, etc.

Speech mode Commentary speech, presenter speech, interview speech inshot, overhead interchange, dramatic dialogue.

Actions View by specific actions.

Song View documentary pieces of a song.

Documentary users/spectators can submit their documentary queries in the querywindow using keywords as free text. For example, consider the query Q: Show me allpieces of natural history documentaries. Our framework uses a semantic informationretrieval mechanism, which is similar to that presented in Chen et al. (2010). The use ofsemantic information, especially which derived from spatio-temporal analysis is of greatvalue in multimedia annotation, archiving and retrieval. Ren et al. (2009) survey the useof spatiotemporal semantic knowledge for information-based video retrieval and drawimportant conclusions on where future research is headed. Liu and Chen (2009) present anovel framework for content-based video retrieval. They use an unsupervised learning

Figure 12.The semi-automatedediting suite fordocumentary clips

EL30,5

742

method to automatically discover and locate the object of interest in a video clip. Thisunsupervised learning algorithm alleviates the need for training a large number of objectrecognizers. Regional image characteristics are extracted from the object of interest toform a set of descriptors for each video. A novel ensemble-based matching algorithmcompares the similarity between two videos based on the set of descriptors each videocontains. Videos containing large pose, size, and lighting variations are used to validatetheir approach. Finally, Chen et al. (2010) developed a semantic-enable informationretrieval mechanism that handles the processing, recognition, extraction, extensions andmatching of content semantics to achieve the following objectives to:

. analyze and determine the semantic features of content, to develop a semanticpattern that represents semantic features of the content, and to structuralize andmaterialize semantic features;

. analyze user’s query and extend its implied semantics through semanticextension so as to identify more semantic features for matching; and

. generate contents with approximate semantics by matching against theextended query to provide correct contents to the querist.

This mechanism is capable of improving the traditional problem of keyword searchand enables the user to perform a semantic-based query and search for the requiredinformation, thereby improving the reusing and sharing of information.

7. Future work: an ontology for video documentariesMultimedia ontologies (especially MPEG-7-based ontologies) have the potential toincrease the interoperability of applications producing and consuming multimediaannotations. Hunter (2003) provided the first attempt to model parts of MPEG-7 inRDFS, later integrated with the ABC model. Tsinaraki et al. (2004) start from the coreof this ontology and extend it to cover the full Multimedia Description Scheme (MDS)part of MPEG-7, in an OWL DL ontology. Isaac and Troncy (2004) proposed a coreaudio-visual ontology inspired by several terminologies such as MPEG-7, TV Anytimeor ProgramGuideML., while Garcia and Celma (2005) produced the first completeMPEG-7 ontology, automatically generated using a generic mapping from XSD toOWL. All these methods perform a one to one translation of MPEG-7 types into OWLconcepts and properties. This translation however does not guarantee that the intendedsemantics of MPEG-7 is fully captured and formalized. On the contrary, the syntacticinteroperability and conceptual ambiguity problems remain.

A video documentary ontology can increase the interoperability of documentaryauthoring tools. It can represent documentary concepts and their relationships that willhelp to retrieve the required result. From another perspective, the application ofmultimedia reasoning techniques on top of semantic multimedia annotations can enable amultimedia authoring application more intelligent (Van Ossenbruggen et al., 2004).Currently, we are engaged in representing the complete media semantics of a documentaryusing the Web Ontology Language (OWL) (Smith et al., 2004). We aim to describe the videodocumentary ontology. In the near future, we will examine how we can raise the quality ofdocumentary annotation and improve the usability of content-based video search andretrieval systems. Figure 13 depicts a portion of our ontology for documentaries.


743

8. ConclusionsTools for automatically understanding video are required in the documentary domain.Semantics-based annotations will break the traditional linear manner of accessing andbrowsing documentaries and will support vignette-oriented access of audio and video.In this paper, we have presented a framework for the modeling, annotation, andretrieval of media documents, applied to the domain of documentary. Using a basic setof 14 semantic description schemes, we demonstrated how a documentary video can beannotated and how this information can be used for the retrieval to supportdocumentary design. We emphasized tools and technologies for the manual annotationof the documentary media objects. Flexible annotation facilities are required tofacilitate documentary creativity by way of semantic networks because the annotationprocess is dynamic and annotations can grow over time. We have proposed a flexibleorganization of media content description and the related media data. Thisorganization requires the adaptable construction in the form of a semantic network.The proposed concept features three significant functions, which make it suitable as aplatform for supporting the needs of documentary production:

(1) It provides semantic and technical memory structures (i.e. information nodes)with the capability to change and grow, allowing an ongoing task specificprocess of inspection and interpretation of source material.

(2) Our approach facilitates the dynamic use of audio-visual material using links,enabling the connection from multi-layered information nodes to data on atemporal, spatial and spatial-temporal level. Moreover, since the description ofmedia content holds constant for the associated time interval, we are now in theposition to handle multiple content descriptions for the same media unit andalso to handle gaps.

(3) It enables the semantic connection between information nodes using typedrelations, thus structuring the information space on a semantic as well as syntacticlevel.

We believe that our approach (audio-visual strategy) can be used for improving trainingand education in documentary communication and to this end we have also indicatedfuture efforts to create an ontology for video documentaries with enhanced annotation.

Figure 13.A part of the domainontology fordocumentaries

EL30,5

744

References

Agius, H. and Angelides, M. (2005), “COSMOS-7: video-oriented MPEG-7 scheme for modelingand filtering of semantic content”, The Computer Journal, Vol. 48 No. 5, pp. 545-62.

Aguierre Smith, T.G. and Davenport, G. (1992), “The stratification system: a design environmentfor random access video”, Proceedings of the ACM Workshop on Networking andOperating System Support for Digital Audio and Video, San Diego, CA, Lecture Notes inComputer Science, Vol. 712, Springer, Berlin, pp. 250-61.

Aubert, O., Champin, P.-A., Prie, Y. and Richard, B. (2008), “Canonical processes in active readingand hypervideo production”, Multimedia Systems Journal, Vol. 14 No. 6, pp. 427-33.

Barnouw, E. (1993), Documentary: A History of the Non-fiction Film, Oxford University Press,Oxford.

Benitez, A., Paek, S., Chang, S.-F., Puri, A., Huang, Q., Smith, J., Li, C.-S., Bergman, L. and Judice,C. (2000), “Object-based multimedia content description schemes and applications forMPEG-7”, Signal Processing: Image Communication, Vol. 16 Nos 1/2, pp. 235-69.

Bocconi, S., Nack, F. and Hardman, L. (2008), “Automatic generation of matter-of-opinion videodocumentaries”, Journal of Web Semantics, Vol. 6, pp. 139-50.

Brachman, R.J. and Levesque, H.J. (1983), Readings in Knowledge Representation, MorganKaufmann, San Mateo, CA.

Burger, T. (2008), “The need for formalizing media semantics in the games and entertainmentindustry”, Journal of Universal Computer Science, Vol. 14 No. 10, pp. 1775-91.

Chakravarthy, A., Ciravegna, F. and Lanfranchi, V. (2006), “Cross-media document annotationand enrichment”, Proceedings of the 1st Semantic Authoring and Annotation Workshop(SAAW 2006), Athens, GA, November 6.

Chakravarthy, A., Beales, R., Matskanis, N. and Yang, X. (2009), “OntoFilm: a core ontology forfilm production”, in Chua, T.-S., Kompatsiaris, Y., Merialdo, B., Haas, W., Thallinger, G.and Bailer, W. (Eds), Proceedings of the 4th International Conference on Semantic andDigital Media Technologies (SAMT 2009), Lecture Notes in Computer Science, Vol. 5887,Springer, Berlin, pp. 177-81.

Chen, M.-Y., Chu, H.-C. and Chen, Y.M. (2010), “Developing a semantic-enable informationretrieval mechanism”, Expert Systems with Applications, Vol. 37 No. 1, pp. 322-40.

Choi, I. (2010), “From tradition to emerging practice: a hybrid computational production modelfor interactive documentary”, Entertainment Computing, Vol. 1 Nos 3/4, pp. 105-17.

Costa, M., Correia, N. and Guimaraes, N. (2002), “Annotations as multiple perspectives of videocontent”, Proceedings of the ACM Conference on Multimedia, San Francisco, CA,2-7 November, pp. 283-6.

Dorai, C. and Venkatesh, S. (2001), “Computational media aesthetics: finding meaning beautiful”,IEEE Multimedia, Vol. 8 No. 4, pp. 10-12.

Dorai, C., Mauthe, A., Nack, F., Rutledge, L., Sikora, T. and Zettl, H. (2002), “Media semantics: whoneeds it and why?”, Proceedings of Multimedia ’02, December 1-6, Juan-les-Pins, pp. 580-3.

Eco, U. (1997), A Theory of Semiotics, Macmillan, London.

Garcia, R. and Celma, O. (2005), “Semantic integration and retrieval of multimedia metadata”,Proceedings of the Fifth International Workshop on Knowledge Markup and SemanticAnnotation, 7 November, Galway.

Geurts, J., Bocconi, S., van Ossenbruggen, J. and Hardman, L. (2003), “Towards ontology-drivendiscourse: from semantic graphs to multimedia presentations”, in Fensel, D., Sycara, K.and Mylopoulos, J. (Eds), Proceedings of the Second International Semantic WebConference (ISWC 2003), Sanibel Island, FL, 20-23 October, Springer, Berlin.


745

Hardman, L., Obrenovic, Z., Nack, F., Kerherve, B. and Piersol, K. (2008), “Canonical processes ofsemantically annotated media production”, Multimedia Systems, Vol. 14, pp. 327-40.

Harry, W.A. and Angelides, M.C. (2001), “Modeling content for semantic level querying ofmultimedia”, Multimedia Tools and Applications, Vol. 15 No. 1, pp. 5-37.

Hunter, J. (2003), “Enhancing the semantic interoperability of multimedia through a coreontology”, IEEE Transactions: Circuits and Systems for Video Technology, Vol. 13 No. 1,pp. 49-58.

IBM (n.d.), “alphaWorks community, VideoAnnEx annotation tool”, available at: www.alphaworks.ibm.com/tech/videoannex

IMDb (n.d.), “World Travels”, available at: www.imdb.com/title/tt1392723/

Isaac, A. and Troncy, R. (2004), “Designing and using an audio-visual description core ontology”,paper presented at the Workshop on Core Ontologies in Ontology Engineering, 5-8October, Whittlebury.

Korsakow (n.d.), “Korsakow system”, available at: www.korsakow.com/ksy/index.html

Little, S., Geurts, J. and Hunter, J. (2002), “Dynamic generation of intelligent multimediapresentations through semantic inferencing”, Proceedings of the 6th European Conferenceon Research and Advanced Technology for Digital Libraries, Pontifical GregorianUniversity, Rome, Springer, Berlin.

Liu, D. and Chen, T. (2009), “Video retrieval based on object discovery”, Computer Vision andImage Understanding, Vol. 113 No. 3, pp. 397-404.

Martinez, J., Koenen, R. and Pereira, F. (2002), “MPEG-7 – The generic multimedia contentdescription standard Part 1”, IEEE MultiMedia Magazine, Vol. 9 No. 2, pp. 78-87.

Mateas, M. (2000), “Generation of ideologically-biased historical documentaries”, Proceedings ofthe 17th National Conference on Artificial Intelligence and Innovative Applications ofArtificial Intelligence Conference (AAAI-00), Austin, TX, pp. 36-42.

Murtaugh, M. (1996), “The automatist storytelling system”, PhD thesis, Massachusetts Instituteof Technology, available at: http://alumni.media.mit.edu/,murtaugh/thesis/

Nack, F. and Parkes, A. (1997), “Towards the automated editing of theme-oriented videosequences”, Applied Artificial Intelligence, Vol. 11 No. 4, pp. 331-66.

Nack, F. and Putz, W. (2004), “Saying what it means: semi-automated (news) media annotation”,Multimedia Tools and Applications, Vol. 22 No. 3, pp. 263-302.

Nack, F., Ossenbruggen, J.v. and Hardman, L. (2005), “That obscure object of desire: multimediametadata on the web (Part II)”, IEEE Multimedia, Vol. 12 No. 1, pp. 54-63.

Nichols, B. (2001), “What types of documentary are there?”, Introduction to Documentary,Indiana University Press, Bloomington, IN, pp. 99-138.

Parkes, A.P. (1989), “Settings and the settings structure: the description and automatedpropagation of networks for perusing videodisk image states”, in Belkin, N.J. andRijsbergen, C.J. (Eds), Proceedings of SIG Information Retrieval ’89, Cambridge, MA, ACMPress, New York, NY, pp. 229-38.

Pereira, F., Vetro, A. and Sikora, T. (2008), “Multimedia retrieval and delivery: essential metadatachallenges and standards”, Proceedings of the IEEE, Vol. 96 No. 4, pp. 721-44.

Ramadoss, B. and Rajkumar, K. (2007), “Semi-automated annotation and retrieval of dance mediaobjects”, Cybernetics and Systems, Vol. 38 No. 4, pp. 349-79.

Ren, W., Singh, S., Singh, M. and Zhu, Y.S. (2009), “State-of-the-art on spatio-temporalinformation-based video retrieval”, Pattern Recognition, Vol. 42 No. 2, pp. 267-82.

Rijksmuseum (n.d.), available at: www.rijksmuseum.nl

EL30,5

746

Rincon, M. and Martinez-Cantos, J. (2007), “An annotation tool for video understanding”,in Moreno-Dıaz, R., Pichler, F. and Quesada Arencibia, A. (Eds), Proceedings of the11th International Conference on Computer Aided Systems Theory and Technology(EUROCAST 2007), Las Palmas, 12-16 February, Lecture Notes in Computer Science,Vol. 4739, Springer, Berlin, pp. 701-8.

Rosenthal, A. and Corner, J. (2005), New Challenges for Documentary, 2nd ed., ManchesterUniversity Press, Manchester.

Ryn, J., Sohn, Y. and Kin, M. (1989), “MPEG-7 metadata authoring tool”, Proceedings of the ACMConference on Multimedia, pp. 267-70.

Salembier, P. and Smith, J. (2002), “Overview of MPEG-7 multimedia description schemes andschema tools”, in Manjunath, B.S., Salembier, P. and Sikora, T. (Eds), Introduction toMPEG-7: Multimedia Content Description Interface, Wiley, Chichester.

Sarkar, A. and Tripasai, W. (2002), “Learning verb argument structure from minimallyannotated corpora”, Proceedings of the 19th International Conference on ComputationalLinguistics, August 24-September Vol. 1, Taipei, pp. 1-7.

Smith, M.K., Welty, C. and McGuinness, D.L. (2004), “OWL web ontology language,W3C recommendation”, available at: www.w3c.org/TR/owl-guide/

Soft Cinema (n.d.), available at: www:softcinema.net

Sourcegorge.net (n.d.), “VIPER-GT annotation tool”, available at: http://viper-toolkit.sourcegorge.net

The Language Archive (n.d.), “ELAN annotation tool”, available at: www.lat-mpi.eu/tools/elan

Tien, T.T. and Cecile, R. (2003), “Multimedia modeling using MPEG-7 for authoring multimediaintegration”, Proceedings of the ACM Conference on Multimedia Information Retrieval,pp. 171-8.

Tsinaraki, C., Polydoros, P. and Christodoulakis, S. (2004), “Integration of OWL ontologies inMPEG-7 and TVAnytime compliant semantic indexing”, Proceedings of the 16thInternational Conference on Advanced Information Systems Engineering (CAiSE 2004),Riga, June 7-11, pp. 143-61.

(The) University of Edingurgh (n.d.), “CAVIAR: Context Aware Vision using Image-basedActive Recognition”, available at: http://homepages.inf.ed.ac.uk/rbf/CAVIAR/

Van Ossenbruggen, J., Nack, F. and Hardman, L. (2004), “That obscure object of desire:multimedia metadata on the Web (Part I)”, IEEE Multimedia, Vol. 11 No. 4, pp. 38-48.

About the authorDimitris Kanellopoulos holds a PhD in multimedia communications from the Department ofElectrical and Computer Engineering of the University of Patras, Greece. He is a member of theEducational Software Development Laboratory in the Department of Mathematics at theUniversity of Patras. His research interests include multimedia communications, knowledgerepresentation, intelligent systems, and Web engineering. He has authored many papers ininternational journals and conferences at these areas. He serves as a member of the editorialboards in ten academic journals. Dimitris Kanellopoulos can be contacted at:[email protected]


747

To purchase reprints of this article please e-mail: [email protected] visit our web site for further details: www.emeraldinsight.com/reprints

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

Qualitative Journal 1

Documents

Transcript of Qualitative Journal 1