A Proposal for a Video Modeling for Composing Multimedia Document Cécile ROISIN - Tien TRAN_THUONG...

30
A Proposal for a Video Modeling for Composing Multimedia Document Cécile ROISIN - Tien TRAN_THUONG - Lionel VILLARD Presented by: Tien TRAN THUONG Project OPERA - INRIA Grenoble - France

Transcript of A Proposal for a Video Modeling for Composing Multimedia Document Cécile ROISIN - Tien TRAN_THUONG...

A Proposal for a Video Modeling for Composing Multimedia Document

Cécile ROISIN - Tien TRAN_THUONG - Lionel VILLARD

Presented by: Tien TRAN THUONG

Project OPERA - INRIA

Grenoble - France

Work Context

Need: composition of semantic video fragments with other basic media elements (image, text, sound, ...)

Theme: Multimedia Document (Madeus) Authoring system for multimedia structured

documents Basic media: sound, video, text, image, etc. Document composed by relations

Temporal Synchronization Example

INRIA’s positions document Pictures &

Titles synchronized

with video parts

Video Presentation

Video Frames

Logical organization of document

InriaIntroduction

Video PresentationBuildingsOverview

Image Image

Rocq.Picture

RhôneAlpesPicture

Text Text

Rocq.Title

RhôneAlpesTitle

Rocq.appears

R.A.appears

Locations ofINRIA’s units

Rennesappears

Lorraineappears

S.A.appears

Locations of INRIA’s units ConclusionIntroduction

Rocq. appears Ren. appears S.A. app . . .Lorraine appears. . . RA app.

Time line view of the document

Time

Rocquencourt Title & Picture

Rennes Title & Picture

Sophia-Antipolis Title & Picture

Raw video

Lorraine Title & Picture

Rhône-Alpes Title & Picture

Texts grow up

Video fragments

Spatial Synchronization examples

Ok

Hyperlink Tracking The text follows a character

Spatial layout of text follow video object document

Location of the video object region that is moving region in the video region

Document Region

(Left, Top, Width, Height)

Text Region

(Width, Height)

Video Region

(Left, Top, Width, Height)Ok

Right-Top-Align

. . ... . .... . .

Video Object Region {x(t), y(t)}

Objective and plan of that work

Research and development on the video modeling for the description of the video content relevant to multimedia applications: Video modeling: video description for multimedia

composition, Multimedia application: our VideoMadeus is an

editing and presentation system.

Video Description

Dublin core: the semantic indexing schema for video content description.

MPEG-7: the future standard tools will enable to define the semantic schemas for description of the audiovisual information.

Video => Analysis -> Description -> Applications

Scheme of audiovisual applications

Our video modeling for composing multimedia document.

Methodology

Specification of a modeling for the description of video content: Multi-level structuration, temporal and spatial relations, actions interactive on the video elements.

Specification in XML Experimentation in Madeus (VideoMadeus)

Video Content Description

Video Content Description

Multi-level Structuration

Video

Structure Structural Description

Semantic

Semantic Description

Thesaurus

Thesaurus

<!--XML schema for the description of VideoContent-->

<!ELEMENT VideoContent (MetaInfo, MediaInfo, Summary, Structure,

Semantic, Thesaurus)>

<!ELEMENT Structure (Sequence+, Relation?)>

<!ELEMENT Semantic (VideoObject*, EventSemantics*)>

<!ELEMENT Thesaurus (ReferenceDictionary*, UserDictionary*)>

Raw video

Occ.1 Occ.2 Occ.3 Occ.4

NabilIrene

Structure

Semantic

Thesaurus

Researcher

Video Structure Description

Motivation: for composition, the basis is to have the Structure description level.

Semantic and Thesaurus are more necessary for retrieval applications or as a support for structuration level.

First step is Structure description

High Level Description

Video

Video StructureVideo Structure

Sequences

Sequences

Scenes

Scenes

Shots

Shots

<!--XML schema for the description high level

structure -->

<!ELEMENT Structure (Sequence+, Relation?)>

<!ELEMENT Sequence (Scene+,Relation?)>

<!ELEMENT Scene (Shot+,Event*, Relation?)>

<!ELEMENT Shot (Transition?,Event*,Occurrence*,

Background?, Relation?)>

Shot Content Description

Shot Content

Shot

Transition

Trans.

SpatialLayout

SpatialLayout

Reference

Event

Event

Semantic

Index

Background

Background

Occurrence OccurrenceCameraWork

Camerawork

<!-- XML Shot Description -->

<!ELEMENT Shot (Transition?,Event*,Occurrence*,

Background?, Relation?) >

<!ELEMENT Transition EMPTY >

<!ELEMENT Event EMPTY>

<!ELEMENT Background (Region+)>

<!ELEMENT Occurrence (Region+, Trajectory?,

Occurrence*) >

<!ELEMENT SpatialLayout (2DBStringDS+) >

<!ELEMENT CameraWork (CameraMotion?) >

*

Occurrence Content Description

Occurrence Content

Occurrence Trajectory

Trajectory

RegionsRegion

Occurrences

Occurrence

Texture

Contour

Contour

Texture Centroid Region

Color

Color

Regions

<!-- XML Occurrence description -->

<!ELEMENT Occurrence (Trajectory*, Region+, Occurrence*)>

<!ELEMENT Trajectory …>

<!ELEMENT Region (Contour+, Color*, Texture*, Centroid, Region*)>

<!ELEMENT Contour … >

<!ELEMENT Color … >

<!ELEMENT Texture … >

<!ELEMENT Centroid … >

<!ELEMENT Region … >

Model summary

The model focuses on the description of video elements useful for composing a multimedia document (shot, scene, occurrence, event, relation, etc.)

It has a XML specification that makes it independent and easy to apply to multimedia applications (ex. our VideoMadeus).

Experimentation of the model in

Madeus - VideoMadeus

Madeus Architecture

JAVA Xerces JMF

OUTILS

Editor/Presentation Tools

EXECUTION View

TIME LINE View

HIERARCHICAL View

VIDEO STRUCTURED View . . .

PARSERS LOGIC

STRUCTURATION

TEMPORAL

STRUCTURATION

SPATIAL

STRUCTURATION

EVENT

MANAGEMENT

MODEL MANAGEMENT

MADEUS

Madeus document

To extend Madeus to VideoMadeus, video content description is handled both in composition and in presentation parts.

SAVE

Internal Document

Madeus Document Model

Structured document organized according to the dimensions: Logical, temporal, spatial.

Madeus Document

Actor

Content

Temporal

Spatial

Logical

<Madeus>

<Content> … </Content>

<Actor> . . . </Actor>

<Temporal> . . . </Temporal>

<Spatial> . . . </Spatial>

</Madeus>

Madeus Document

Actor

Content

Temporal

Spatial

Content that describes the content information of the document Actor that defines how this basic information in the content part is

used in the document (style information, link, etc.) Temporal for the synchronization between document parts Spatial for layout specification

Relations Temporal relations (Allen extension)

meets, starts, equals, during, overlaps, parmin,etc. Spatial relations

left_align, right_align, center_v, center_h, top_align, bottom_align, etc.

<Temporal> …

<Relations>

<start Interval1=« a » Interval2=« b » />

<meet Interval1=« b » Interval2=« d » /> …

<Relations>

</Temporal>

<Spatial> …

<Relations>

<left-align Region1=« b » Region2=« d » />

<Relations>

</Spatial>

d

Overview of VideoMadeus

Video edition View

Structure ViewSemantic ViewThesaurus View

Element Management

EditPlaySearch

Execution View

TemporalSpatial

SynchronizationManagement

HyperlinkFollow-upEraseDisplay, etc...

Behavior ManagementSynchronization

Video

Index on video

Requested descriptions

Modified description

Requested descriptions

XML Description of video content

Data ManagementInternal Structure

(MODEL)

Parser

Modify

Editing and Presentation Tools

VideoMadeus document<Madeus>

<Content> . . .

<VideoContentDS> . . .

<Scene ID = « MyScene » ... > . . . </Scene>

</VideoContentDS>

</Content>

<Actor> . . .

<VideoElement ID=«SceneVideo» Content = «MyScene » . . . > . . .</VideoElement>

</Actor>

<Temporal>

<Interval ID=“ScenceInt” Actor=“SceneVideo” Duration=“...” … />

<Relations> . . . </Relations>

</Temporal>

<Spatial>

<Region ID=“ScenceReg” Actor=“SceneVideo” Height =“288” Width=“352” … />

<Relations> . . . </Relations>

</Spatial>

</Madeus>

Editing features Editing of the video description

shot detection (automatic or manual) extract manually video objects, events, spatialLayout, etc.

Creating of semantic groups (manual) group shots in a scene, group scenes in a sequence detection occurrences of a character (group occurrences in

objects) creation of the other semantic indexing classifying of the video elements (thesaurus)

scenario editing (composing) Set temporal and spatial relations between video element

and other media Set actions on the video elements

Conclusion

Provide support for deeper access into video data in the multimedia authoring system: temporal/spatial synchronization with the other

media elements (image, text, sound, etc.), actions on the video elements (hyperlink, follow-up,

erasing, etc.) Develop experimentally the video editing view to

help the user create and modify descriptions of video data in accordance with our video model.

Perspectives More experimentation for spatial synchronization, Extension and experimentation of the semantic

parts (Semantic and Thesaurus) -> semantic queries,

Use the MPEG-7 tools to specify our video model, Develop the video content description editing tool:

Integration and adaptation of the video analyzing algorithms for generating more automatically possible the video elements,

Timeline editing view for video structure, etc. Semantic queries for playing a part of video through

network.

Video content description in Madeus document

<?xml version="1.0"?><Madeus Name="DocMadeus" Version="2.0" Width="800" Height="600"> <Content>

<VideoContent ID="InriaInfoco" … > <Structure ID="InriaInfocoStruc" … >

<Sequence ID="Seq" Start_Time ="0" Stop_Time ="76.69" … > <Scene ID="Scene1" Start_Time ="0" Stop_Time ="4.91" … > </Scene> <Scene ID="Scene2" Start_Time ="4.91" Stop_Time ="11.09" … > <Shot ID="Shoti" Start_Time ="4.91" Stop_Time ="8.71" …

/> <Shot ID="Shotii" Start_Time ="8.71" Stop_Time ="11.09"

… /> </Scene> <Scene ID="Scene3" Start_Time ="11.09" Stop_Time ="29.07" … > … </Scene> …

</Sequence> </Structure> …

</VideoContent > <VideoContent ID="InriaGen" … > … </VideoContent> …

</Content>

</Madeus>

Video element definition

The operations can be defined in the instance of the described video: Hyperlink, Tracking, Erasing, Jumping, etc.

<?xml version="1.0"?>

<Madeus Name="DocMadeus" Version="2.0" Width="800" Height="600">

<Content> . . . </Content>

<Actor>

<VideoElement ID=«WesternScene» Content=«WesternDS.Seq.Scene1»

TypeRenderer=«LightWeight». . . >

<VideoObject ID=«VO1» Object = «Shot2.ActorOcc1» Actions=«Follow-up;Hyrperlink;...»

HRef =«file:///C:/Users/ttran/Multimedia/Madeus/opera.html» />

. . .

</VideoElement>. . .

</Actor>

<Temporal> . . . </Temporal>

<Spatial> . . . </Spatial>

</Madeus>

Temporal part of Inria introduction document

<?xml version="1.0"?>

<Madeus Name="DocMadeus" Version="2.0" Width="800" Height="600"> ...

<Temporal>

<T-Group ID="Temporal" Duration="pref:20s min:15s max:22s">

<!-- Interval of three hypertexts -->

<Interval ID="ControlOperaInterval" Actor="ControlOperaInfo" Duration="pref:20s min:15s max:22s"/>

<!-- Interval of the video element -->

<Interval ID="MovieInriaURScene4" Actor ="InriaURScene4" Fill="freeze" Duration="pref:20s min:15s max:22s"/>

<!-- Interval of the texts -->

<Interval ID="txtInriaURScene4Shotii" Actor ="TxtInriaURScene4" Duration="pref:20s min:15s max:22s" />

...

<Interval ID="txtInriaURScene4Shotvi" Actor ="TxtInriaURScene4Shotvi" Duration="pref:20s min:15s max:22s" />

<!-- Interval of the images -->

<Interval ID="ImgInriaURScene4Shotii" Actor ="ImgRoc" Duration="pref:20s min:15s max:22s" />

...

<Interval ID="ImgInriaURScene4Shotvi" Actor ="ImgRA" Duration="pref:20s min:15s max:22s"/>

<Relations>

<!-- Equals relations of the texts with the video elements -->

<Equals Interval1="MovieInriaURScene4.Shotii" Interval2="txtInriaURScene4Shotii.SizeAnimation" />

...

<Equals Interval1="MovieInriaURScene4.Shotvii" Interval2="txtInriaURScene4Shotvii.SizeAnmationi" />

<!-- Start relations of the images with the video elements -->

<Starts Interval1="MovieInriaURScene4.Shotii" Interval2="ImgInriaURScene4Shotii" />

...

<Starts Interval1="MovieInriaURScene4.Shotvi" Interval2="ImgInriaURScene4Shotvi" />

</Relations>

</T-Group>

</Temporal> <Spatial> … </Spatial>

</Madeus>

Spatial part of Spatio-Temporal Relation Demo document

<?xml version="1.0"?>

<Madeus Name="DocMadeus" Version="2.0" Width="800" Height="600">

...

<Spatial>

<S-Group ID="TOTOSpatial">

<!-- Video region -->

<Region ID=  "WesternVideoRegion" Actor="WesternVideo" Left="206" Top="140" Height="288" Width="352" Depth="1"/>

<!-- Three hypertext regions-->

<Region ID= " LinkOperaInfoRegion" Actor ="ControlOperaInfo" Left="236.0" Top="492.0" Width="210.0" Depth="2.0"/>

<Region ID="LinkAutoCitroenRegion" Actor ="ControlAutoCitroen" Left="36.0" Top="492.0" Width="210.0" Depth="2.0"/>

<Region ID="LinkSTRST" Actor ="ControlSpatioTemp" Left="472.0" Top="492.0" Width="236.0" Depth="2.0"/>

<Region ID="TxtOperaIntroRegion" Actor ="TxtOperaIntro" Left="168" Top="46" Height="42" Width="429" Depth="2.0"/>

<!-- Regions of the text following the video object-->

<Region ID="TxtMotionRegion" Actor ="TxtMotion" Height="16.0" Width="69" Depth="2.0"/>

<Relations>

<Top_align Region1="WesternVideoRegion.Shot1.Obj" Region2="TxtMotionRegion" />

</Relations>

</S-Group >

</Spatial>

</Madeus>