Project Report of ISO/IEC 23000 MPEG-A Multimedia Application Format

Table Of Contents

1. INTRODUCTION .............................................................................................................................................. 2

2. MPEG-A MULTIMEDIA APPLICATION FORMAT ................................................................................... 2

2.1. Introduction .................................................................................................................................................... 2

2.2. Creating MAF ................................................................................................................................................. 3

2.3. Overview of Technologies used in MAF ....................................................................................................... 4

3. WORK ITEMS ................................................................................................................................................. 12

3.1. MPEG-A Part 3 2nd

Edition: Protected Music Player Application Format ............................................ 12

3.2. MPEG-A Part 4: Musical Slide Show Application Format ...................................................................... 18


Edition: Protected Musical Slide Show Application Format .................................. 27

3.4. MPEG-A Part 10: Video Surveillance Application Format...................................................................... 44

4. IMPLEMENTATION ...................................................................................................................................... 52







5. ACHIEVEMENTS ............................................................................................................................................ 75







6. CONCLUSIONS ............................................................................................................................................... 80

7. REFERENCES .................................................................................................................................................. 81

1. Introduction

This document is the project report of development of MPEG-A Multimedia Application Format (MAF)

standardization project. The MPEG-A standardization project includes the development of the followings:

- MPEG-A Part 3 2nd

Edition (ISO/IEC 23000-3 2nd

Edition) Protected Music Player Application Format,

- MPEG-A Part 4 (ISO/IEC 23000-4) Musical Slide Show Application Format,

- MPEG-A Part 4 2nd

Edition (ISO/IEC 23000-4 2nd

Edition) Protected Musical Slide Show Application

Format,

- MPEG-A Part 10 (ISO/IEC 23000-10) Video Surveillance Application Format

The document is structured as follows: Section 2 will describe the overview of MAF; Section 3 will describe the

specification of each MAF in form of file format, system architecture, and metadata schema; while the

implementations and reference software are presented in Section 4. Section 5 will describe the results of

achievements of the technical implementation of each MAF: MPEG input contribution documents, MPEG

output documents, and research papers. Finally, Section 6 will conclude this report.

2. MPEG-A Multimedia Application Format

2.1. Introduction

MPEG-A (ISO/IEC 23000) is a new standard that have been developed by the Moving Picture Experts

Group by selecting existing technologies from all published MPEG standards, as well as technologies from

other body of standards such as JPEG and 3GPP, and combining them into so-called “Multimedia

Application Formats” or MAFs.

Selecting readily tested and verified tools available from the MPEG standards reduces the need for time-

consuming research, development and testing of new technologies. If MPEG cannot provide the required

technology, then additional technologies originating from other body of standards can be included by

reference in order to facilitate the creation of MAF. In other words, a MAF is created by cutting

horizontally through all MPEG standards, selecting existing parts and profiles appropriately for the new

application.

The aforementioned concept of MAF is illustrated in Figure 1. Boxes on the right represent MPEG

standards while boxes on the left represent other body‟s standards. The parts and profiles of the

technologies are represented by bold square boxes, and their combinations are used by the particular MAFs

as shown in the center box. An example shown in Figure 1 is Protected Musical Slide Show MAF that uses

parts and profiles from MPEG-1, MPEG-4, MPEG-7 and MPEG-21 from MPEG; and JPEG and 3GPP

Timed Text from other standard.

MPEG-A (ISO/IEC 23000)

VIDEO SURVEILLANCE

PROTECTED MUSICAL SLIDE SHOW

MUSICAL SLIDE SHOW

PROTECTED MUSIC PLAYER

MUSIC PLAYER

JP

EG

(ISO

/IEC

10

91

8)

3G

PP

TIM

ED

TE

XT

(TS

26

.24

5)

MP

EG

-1 (IS

O/IE

C 1

11

72

)

MP

EG

-4 (IS

O/IE

C 1

44

96

)

MP

EG

-7 (IS

O/IE

C 1

59

38

)

MP

EG

-21

(ISO

/IEC

21

00

0)

●●●●●●

●●●

MPEG TechnologiesOther standards

••• OTHER MAF

Figure 1 – Conceptual overview of MPEG-A

2.2. Creating MAF

The work items for creating MAF, as shown in Figure 2, takes into account the specific conceptual nature

of MAF. The first step of work in creating MAF starts with a submission that gives evidence that there

exists a need to develop and standardize the MAF by providing documentation that describes an anticipated

application scenario that benefits from the existence of an appropriately designed standard for a MAF. The

submission shall include an assessment of the positioning of the proposal in the technology landscape,

pointing to solutions that may already exist, whether they are standards-based or proprietary. Then MPEG

requests documentation of industry support to successfully complete the work and to deploy the candidate

format as important aspect to decide whether the MAF shall be continued or not. If it is decided to be

continued, the MAFs under consideration for standardization will be regularly updated and published on

the MPEG web site to gather input, comments and feedbacks as well as contributions from interested

parties. If MPEG determines that there is enough demand to warrant a MAF, then the pertaining application

scenario description is used to derive requirements based on which a new MAF standard is drafted.

From this part, the parties developing the MAF should express their commitment by creating and releasing

reference software as the initial implementation of the MAF, and support is documented in the form of

registered MPEG input contributions. Based on such documentation, the proponents, with the help of

knowledgeable MPEG experts, select the technologies that the MAF shall employ to arrive at a detailed

technical specification for the MAF. The chosen technologies expected to be advanced enough in their

respective standardization process of Final Draft International Standard (FDIS) stage or beyond, once the

MAF itself reaches the FDIS stage.

To let the world know the MAF under development, the proponents are requested to produce relevant

marketing material, including at least one white paper explaining the benefits of the new MAF. At the final

step, participating experts check the validity of the specification which indicates the completion of the work

for a new MAF. Cross-checking of multimedia standards is done by exchanging bit streams among

different parties to check whether the bits created by one party according to the new specification can be

decoded and executed successfully by another party using a decoder implementation which has been built

according to the specification.

Work Items for

Multimedia Application Formats

Application Scenario Description

Value Proposition and Technology Landscape

Requirements

Documented Industry Support

Detailed Technical Specification

Reference Software Implementation

Marketing Material – White Paper

Cross-Checking

MAF – Part X of MPEG-A

Te

ch

nic

al W

ork

Do

cu

me

nta

tion

Figure 2 – Work Items of the MPEG-A process for creating MAFs

2.3. Overview of Technologies used in MAF

This sub section describes the overview of MPEG technologies used in MAFs related to this project report.

The more detailed specification of each technology can be seen in the corresponding specification

documents. Table 1 shows the MPEG technologies used in the corresponding MAF.

Table 1 — MPEG Technologies

MPEG Technologies Protected

Music Player

AF

Musical Slide

Show AF

Protected

Musical Slide

Show AF

Video

Surveillance

AF

MPEG-1 Layer III

MP3 on MP4

ISO Base File Format

MPEG-4 File Format

MPEG-4 AVC

MPEG-4 LASeR

MPEG-7 Visual

MPEG-7 MDS

MPEG-21 DID

MPEG-21 IPMP Components

MPEG-21 REL

MPEG-21 File Format

2.3.1. MPEG-1 Layer 3

MPEG-1 Layer 3 (MP3) from MPEG-1 Audio (ISO/IEC 11172-3:1993) specification is one of the

most widely deployed MPEG audio standards ever due to its good compression performance and

simplicity of implementation. Most of compressed music archives use MP3 encoding.

Layer 3 specifies a self-synchronizing transport, making it amenable to both storage in a computer

file and transmission over a channel without byte framing. In the context of transmission channels,

Layer 3 can operate over a constant-rate isochronous link, and has constant-rate headers. However,

Layer 3 is an instantaneously-variable-rate coder, which adapts to the constant-rate channel by

using a “bit buffer” and “back pointers”. Each of the headers signals the start of another block of

audio signal may be in a prior segment of the bit stream, pointed to by the back pointer (in Figure

3, curved arrows pointing to main_data_begin).

Figure 3 – Layer 3 bit stream organization

2.3.2. MPEG-4 “MPEG-1/2 Audio in MPEG-4”

MPEG-4 Audio (ISO/IEC 14496-3:2005) Section 9 “MPEG-1/2 Audio in MPEG-4” specifies a

method for segmenting and formatting Layer 3 bit streams into MPEG-4 Access Units, and

therefore is often referred to as “MP3onMP4”. This consists primarily of re-arranging the

compressed data associated with a given header such that it follows the header, This typically

results in a new segments that are no longer of constant length but that is perfectly in accordance

with the definition of MPEG-4 Access Units. Example is as shown in Figure 4.

Figure 4 – Converting an MPEG-1/2 Layer 3 bit stream into mp3_channel_elements

2.3.3. ISO Base Media File Format

The ISO Base Media File Format is designed to contain timed media information for a

presentation in a flexible, extensible format that facilitates interchange, management, editing, and

presentation of the media. The ISO Base Media File Format is a base format for media file formats.

In particular, the MPEG-4 file format derives from this base file format.

The file structure is object oriented as shown in Figure 5, which means that a file can be

decomposed into constituent objects very simply, and the structure of the objects inferred directly

from their type. The file format is designed to be independent of any particular network protocol

while enabling efficient support for them in general.

ISO file

Movie datatrak (video)

trak (audio)

other boxother box

other box

Media data

Interleaved, time-ordered, video

and audio frames

Figure 5 – Example of a simple ISO file used for interchange, containing two streams

2.3.4. MPEG-4 Part 14: File Format

ISO/IEC 14496-12:2005 and ISO/IEC 14496-14:2003 together specify the MPEG-4 File Format.

This supports storage of compressed audio data in tracks. It also provides support for metadata in

the form of „meta‟ boxes at the file-, movie- and track-level. This allows support for static (un-

timed) metadata. The type of contents inside the file format is specified by the file type box

„ftyp‟. Figure 6 schematically illustrates the location of the boxes.

MP4 file

ftyp meta moov (Movie)

meta trak (Track)

meta

trak (Track)

meta

mdat (Media

data)

Figure 6 – ISO/MP4 file schema

2.3.5. MPEG-4 Part 10: Advanced Video Coding

ISO/IEC 14496-10 Advanced Video Coding (MPEG-4 AVC) provides higher compression of

moving pictures for various applications such as videoconferencing, digital storage media,

television broadcasting, internet streaming, and communication. It is also designed to enable the

use of the coded video representation in a flexible manner for a wide variety of network

environments.

A conceptually distinction has been made in the specification between a video coding layer (VLC)

and a network abstraction layer (NAL). The VLC comprises the signal processing part of the

codec such as transform, quantization, etc. The output of VLC is referred to as slices containing an

integer number of macroblocks and the information of the slice header. A macroblock being a

16x16 block of luma and corresponding chroma samples. The NAL provides formatting and

encapsulatin of the VLC output in a way compliant to the chosen transmission channel or storage

media. Packet-oriented as well as bitstream systems are being supported by adding appropriate

header information.

Higher layer meta information necessary to appropriately handle the data and to operate the

decoder are conveyed in parameter sets. The specification distinguishes between two types of

parameter sets: sequence parameter set (SPS) and picture parameter set (PPS). An active sequence

parameter set remains unchanged throughout a coded video sequence and an active picture

parameter set remains unchanged within a coded picture. Higher layer meta information is

supposed to be transmitted reliably and in advance. Figure 7 shows the layer abstraction of the

MPEG-4 AVC.

A main property of the specification is the decoupling of the decoding process and time (e.g.

sampling time, transmission time, presentation time, etc.). The design requires only 16-bit

arithmetic for processing on encoding and decoding side. Furthermore, it is the first MPEG video

standard achieving exact quality of decoded video because of the definition of an exact-match

inverse transform.

VCL Data – Frame 1 VCL Data – Frame 1

NAL

(SPS)

NAL

(SPS)

NAL

(SEI)

NAL

(VCL)

NAL

(VCL)

NAL

(VCL)

NAL

(VCL)

NAL

(VCL)

NAL

(VCL)•••

•••

NAL

header

RBSP (Raw Bytes

Sequence Payload)

NAL

header

RBSP (Raw Bytes

Sequence Payload)•••

MB MB skip_run MB ••• MB MB

slice header slice data slice header slice data •••

mb_type mb_pred coded_residual

Figure 7 – MPEG-4 AVC layer structure

2.3.6. MPEG-4 Part 20: LASeR

The MPEG-4 Lightweight Application Scene Representation (LASeR) (ISO/IEC 14496-20:2006)

is a scene description format that specifies various aspects of 2D scene representation and updates

of scenes as a part of rich media content. A scene description is composed of graphics, animation,

text, and spatial and temporal layout.

A scene description specifies the following areas of a presentation:

Spatial layout of the visual elements

Temporal organization of the media elements (e.g. synchronization)

Interactivity (e.g. mouse clicks, key inputs)

Change of scenes (e.g. animation effects)

LASeR is designed to be suitable for lightweight embedded devices such as mobile phones.

2.3.7. MPEG-7 Part 3: Visual

The Multimedia Content Description Interface, MPEG-7 (ISO/IEC 15938) specifying a series of

interfaces from system to application level to allow disparate systems to interchange information

about multimedia content. It describes the architecture for systems, a language for extensions and

specific applications, description tools in the audio and visual domains, as well as tools that are

not specific to audio-visual domains.

ISO/IEC 15938-3 (MPEG-7 Part 3) Visual specifies tools for description of visual content,

including still images, video and 3D models. These tools are defined by their syntax in DDL and

binary representations and semantics associated with the syntactic elements. They enable

description of the visual features of the visual material, such as color, texture, shape and motion,

as well as localization of the described objects in the image or video sequence. The structure of

MPEG-7 Visual is as shown in Figure 8.

Basic Structures

Descriptor Containers

GridLayout

TimeSeries

MultipleView

Basic Supporting Tools

TemporalInterpolation

Spatial2DCoordinateSystem

Visual Features

Color

Color Feature Descriptors

DominantColor

ScalableColor

ColorLayout

ColorStructure

GofGopColor

Color Supporting Tools

ColorSpace

ColorQuantization

Texture

Homogeneous Texture

TextureBrowsing

EdgeHistogram

Shape

RegionShape

ContourShape

Shape3D

Motion

CameraMotion

MotionTrajectory

ParametricMotion

MotionActivity

Localization

RegionLocator

SpatioTemporalLocator

Other

FaceRecognition

Figure 8 – Overview of Visual Descriptor tools

2.3.8. MPEG-7 Part 5: Multimedia Description Scheme

ISO/IEC 15938-5 (MPEG-7 Part 5) Multimedia Description Scheme specifies a metadata system

for describing multimedia content. It consists of the basic elements to form the building blocks for

the higher-level description tools, the content description tools to describe the features of the

multimedia content and the immutable metadata related to the multimedia content, the tools for

navigation and access, to describe the browsing, summarization and access of content, and

classification schemes which organizes terms that are used by the description tools.

Content Organization

Collections Models

Basic Elements

Content Metadata

Content Description

Schema Tools Basic Datatypes Links & Media

Localization Basic Tools

Navigation and

Access

Summaries

Views

Variations

User

Interaction

User

Preferences

User

History

Creation and Production

Media

Structure Semantics

Usage

Figure 9 – Overview of MDS description tools

2.3.9. MPEG-21 Part 2: Digital Item Declaration

The Multimedia Framework, MPEG-21 (ISO/IEC 21000) provides content creators, producers,

distributors and service provides a normative open framework for multimedia delivery and

consumption. It is based n two essential concepts: the definition of a fundamental unit of

distribution and transaction (the Digital Item) and the concept of Users interacting with Digital

Items. The goal of MPEG-21 is to define the technology needed to support Users to exchange,

access, consume, trade and otherwise manipulate Digital Items in an efficient, transparent and

interoperable way.

ISO/IEC 21000-2 (MPEG-21 Part 2) Digital Item Declaration specification is to describe a set of

abstract terms and concepts to form a useful model for defining Digital Items. Within this model, a

Digital Item is the digital representation of a work, and the action (managed, described, exchanged,

collected, etc.) upon the model. The example of the hierarchical structure of Digital Item

Declaration Model is as shown in figure 10.

Container

Item

Item

Item

Descriptor

Component

Descriptor

Resource

Component

Descriptor

Resource

Descriptor

Component

Descriptor

Resource

Component

Descriptor

Resource

Descriptor

Figure 10 – Example of Digital Item Declaration model

2.3.10. MPEG-21 Part 4: Intellectual Property Management and Protection Components

ISO/IEC 21000-4 (MPEG-21 Part 4) Intellectual Property Management and Protection (IPMP)

Components aims to address the need for effective management and protection of intellectual

property in the Multimedia Framework over heterogeneous access and delivery infrastructures. It

specifies component for IPMP applied to Digital Items to facilitate the exchange of governed

content between peers. The standard includes the ways of retrieving IPMP tools from remote

locations, exchanging messages between IPMP tools and between these tools and the terminal. It

also addresses authentication of IPMP tools, and has provisions for integrating Rights Expressions

according to the Rights Data Dictionary and the Rights Expression Language.

The IPMP Components consist of two parts:

IPMP Digital Item Declaration Language, which provides for a protected Representation of

DID model, allowing DID hierarchy which is encrypted, digitally signed or otherwise

governed to be included in a DID document in a schematically valid manner, and

IPMP Information schemas, defining structures for expressing information relating to the

protection of content, including tools, mechanisms and licenses. The IPMP information part is

flexible enough to signal protection information for the digital media which is not declared by

DIDL model as well.

2.3.11. MPEG-21 Part 5: Rights Expression Language

ISO/IEC 21000-5 (MPEG-21 Part 5) Rights Expression Language is a tool that can declare rights

and permissions using the terms as defined in the Rights Data Dictionary. It is intended to provide

flexible, interoperable mechanisms to support transparent and augmented use of digital resources

in publishing, distributing, and consuming of digital movies, digital music, electronic books,

broadcasting, interactive games, computer software and other creations in digital form, in a way

that protects digital content and honors the rights, conditions, and fees specified for digital

contents. It is also intended to support specification of access and use controls for digital content

in cases where financial exchange is not part of the terms of use, and to support exchange of

sensitive or private digital content.

The REL is also intended to provide a flexible interoperable mechanism to ensure personal data is

processed in accordance with individual rights and to meet the requirement for Users to be able to

express their rights and interest in a way that addresses issues of privacy and use of personal data.

The MPEG REL data model for a rights expression consists of four basic entities and the

relationship among those entities. The basic relationship is defined by the REL assertion “grant”.

Structurally, it consists of the following: the principal to whom the grant is issued, the right that

the grant specifies, the resource to which the right the grant applies and the condition that must be

met before the right can be exercised. This model is shown in Figure 11, while Figure 12 shows

the authorization model in MPEG-21 REL.

Right

Principal Resource Condition

Associated withIssued to Subject to

Figure 11 – REL Data Model

Authorization story

Primitive grant

Authorized r:Grant or r:GrantGroup

Authorizer

r:License

r:Principal

Time instant

Authorization context

Authorization story

Authorization request

r:Principal

r:Right

r:Resource

Interval of time

Authorization context

r:License elements

r:Grant elements that do not require

an authorizer

is

authorization

proof for

Figure 12 – REL Authorization Model

2.3.12. MPEG-21 Part 9: File Format

ISO/IEC 21000-9 (MPEG-21 Part 9) File Format is a storage format inherited from MPEG-4 file

format in order to make multi-purpose file, in which an MPEG-21 XML document such as DID,

IPMP, and REL and some or all of its referenced content can be place d in a single „content

package‟ file. This enables the interchange, editing and playback of MPEG-21 documents.

The main difference of MPEG-21 file format with MPEG-4 file format is the use of meta box in

file level as mandatory box. For dual-function file, the MPEG-21 file format can contains both an

MP4 presentation using movie box as well as an MPEG-21 DID, and it is permitted to use either

MPEG-21 or MPEG-4 player/reader.

In this project report, we will limit our description to the MAFs that are part of this project, which are

described in detail in the following sections.

3. Work Items


Edition: Protected Music Player Application Format

3.1.1. Overview

The Music Player MAF specifies a simple and uniform way to carry MP3 coded audio content in

MPEG-4 File Format augmented by simple MPEG-7 metadata and a JPEG image for cover art. As

such, MPEG-4, MPEG-7 and MPEG-21 represent an ideal environment to support the current

“MP3 music library” user experience, and, moreover, to extend that experience in new directions.

The Protected Music player AF builds on the Music Player. It adds content protection for mp4

song files, explains a default encryption for the song files and adds protection to the mp21 album

and playlist files with flexible protection tool selection and key management components. The

following cases are possible:

a) Protected content files in mp4 file format, without Key Management components, with the

default AES-128 encryption tool and MPEG-4 IPMP-X signalling in the IPMPInfoBox

b) Protected files with flexible tool selection and Key Management components (MPEG-21

IPMP and REL) using the mp21 file format with embedded mp4 content files

c) Protected mp21 file with Key Management components (MPEG-21 IPMP and REL) but

without embedded mp4 content file (variation of (b) ) that functions as a “license file” for an

external protected mp4 content file (a)

Protected MP4

file

MP21 file

License 1

Link to license/KMS

Link to Protected

Content

MP21 file

License 1Protected MP4

file


file

MP21 file

Protected MP4

file

MP21 file


file


file

Figure 13 — Examples illustrating the different cases for the relationship of mp4 and mp21

files

Optional separation of protected content and license supports a broad range of "governed content

scenarios" including “super distribution of protected content” and “subscription models”. Figure

13 illustrates the different cases and gives some examples.

3.1.2. File format

Music player AF structure uses MPEG-4 file format. It contains of file type box, movie box and

media data box. The „mdia‟ box and subsequent child boxes of the sample description is used to

find and decode MP3 data. The combination of „iloc‟ and „iinf‟ boxes in movie-level „meta‟

box is used to present the JPEG image.

The Music player AF specification also allows the use of MPEG-21 file format to store album

with single or more tracks. Each track is structured using the MPEG-4 file format (known as

“hidden MP4 file”, while its presentation is described using „iloc‟ and „iinf‟ boxes in file-

level „meta‟ box.

When a single hidden mp4 file is embedded in an mp21 file, the IPMP information is signalled in

the form of XML metadata description (as MPEG-21 IPMP Base Profile original form). The

protection description is carried in the „meta‟ box at file level using MPEG-21 DID and MPEG-21

IPMP_DIDL.

Figure 14 shows an illustration of the approach. The IPMPDIDL metadata contains two major

parts. Note that the structure described below is not an exhausted one, more additional DID

elements may exist:

– Descriptor that contains IPMPGeneralInfo. It is recommended that this Descriptor is

defined at the beginning of the IPMPDIDL metadata. The IPMPGeneralInfo contains:

ToolList, as defined in MPEG-21 IPMP Base Profile.

Container for licenses. The license information is described by MPEG-21 REL MAM

Profile.

– An Item element that model the structure of the Protected Music Player MAF content. The

Item shall contain at least three children Container elements. Each Container carries a

Resource element for each sub resource of the Protected Music Player MAF (mp4 file with

MP3onMP4 audio track, JPEG image and MPEG-7 metadata). If the sub resource is protected,

the Resource element shall have an IPMPInfo element that describes the protection

mechanism.

The protection mechanism for a multiple tracks file is similar to the case of a single track file with

MPEG-21 metadata and file type. In the multiple tracks case the same approach is applied as for

mp21 files with one embedded hidden mp4 file. However, the structure of the digital item in the

DIDL/IPMPDIDL has one more level.

Figure 15 shows the illustration for the case of protecting a multiple tracks mp21 album file. Note

that the structure of IPMPDIDL metadata now has several Item elements. Each Item element is

associated with one hidden mp4 file in the „mdat‟ box.

ftyp meta

iloc

Item 1 (MP3onMP4)

Item 2 (JPEG Image)

Item 3 (MPEG-7 XML)

xml

IPMPDIDL

Descriptor

IPMPGeneralInfo

ToolList

Licenses (if any)

Item

Container

Resource (MP3onMP4)

Container

Resource (JPEG Image)

Container

Resource

IPMPInfo

IPMPInfo

IPMPInfo

mdat

ftyp

moov

meta

MPEG-7 XML

mdat

MP3onMP4 AU

JPEG

Figure 14 — Protected Music player AF file format with one hidden MP4 file

ftyp meta

iloc

Item 1; Item 2; Item 3

Item 4; Item 5; Item 6

Item (n-2); Item (n-1); Item n

xml

IPMPDIDL

Descriptor

IPMPGeneralInfo

ToolList

Licenses (if any)

Container

Resource (MP3onMP4)

IPMPInfo

Container

Resource (JPEG Image)

IPMPInfo

Container

Resource

IPMPInfo

…

Item

Item

Item

mdat

Hidden MP4 File

Hidden MP4 File

Hidden MP4 File

●●●

●●●

Figure 15 — Protected Music player AF file with more than one hidden MP4 files

3.1.3. System architecture

Music player AF specifies a lossless, reversible conversion of a standard mp3 bitstream file into

an MPEG-4 file structure described in previous sub section. An MP3 bitstream file (containing

mp3 audio frames and an ID3 tag) is converted using two modules, as shown in Figure 16.

The first module translates an MP3 bitstream into a series of MP3 Access Units. This is

accomplished by the MP3onMP4 formatter, specified in ISO/IEC 14496-3:2005 subpart 9. The

Access Units are stored into one audio track of an MPEG-4 File.

The second module extracts the meta-data information from the input file‟s ID3 tag and expresses

it as MPEG-7 descriptor (see section 3.1.4). This MPEG-7 meta-data is stored - together with the

optional JPEG image for cover art - into the corresponding meta-box of the audio track.

MP4 file

MP3 file with ID3

tags

MP3onMP4

formatter

Extract ID3 tags

and express in

MPEG-7

MP3onMP4

data

MPEG-7

metadata

JPEG

image

Figure 16 – Creating Music player AF

Playback consists of extracting the meta-data from the MPEG-4 file and displaying it on a suitable

visual interface and extracting the MP3onMP4 data from the MPEG-4 file, filtering it with very

light-weight de-formatting operation, and playing it through a “classic” MP3 decoder. In practice,

it may be that the MP3onMP4 data is played by an “MP3onMP4 decoder,” consisting of the

concatenation of the MP3onMP4 deformatter and the MP3 decoder. This description is illustrated

in Figure 17.

MP4 file

MP3onMP4

data

MPEG-7

metadata

JPEG

image

MP4onMP3

decoderMP3onMP4

de-formatterMP3 decoder

JPEG decoder

Display Artist,

Album, Song

Display album art

Figure 17 – Playing Music player AF

3.1.4. Metadata

The MPEG-4 file format supports the storage of meta-data associated to a data track. Associated

meta-data describing the audio track, like artist or song name, is expressed in MPEG-7

nomenclature, as specified in ISO/IEC 15938-5:2003. MP3 bitstream files can contain associated

meta-data, typically ID3 tags. The specific mapping from ID3 v1.1 tags and the corresponding ID3

v2.3 frames to MPEG-7 meta-data is show in Table 2. Parenthetical comments under Artist clarify

that MPEG-7 is able to make a distinction between Artist as a person and Artist as a group name.

Table 2 — Mapping from ID3 v1.1 and ID3 v2.3 Tags to MPEG-7

ID3 v1 ID3 v2.3 Frame Description MPEG-7 Path

Artist TOPE

(Original artist /

performer)

Artist

performing the

song

CreationInformation/Creation/Creator[Role/@href=”urn:mpeg:

mpeg7:RoleCS:2001:PERFORMER”]/Agent[@xsi:type=”Perso

nType”]/Name/{FamilyName, GivenName} (Artist Name)

CreationInformation/Creation/Creator[Role/@href=”urn:mpeg:

mpeg7:RoleCS:2001:PERFORMER”]/Agent[@xsi:type=”Perso

nGroupType”]/Name (Group Name)

Album TALB

(Album / Movie /

Show title)

Title of the

album

CreationInformation/Creation/Title[@type=”albumTitle”]

Song Title TIT2

(Title / Songname /

Content

description)

Title of the

song

CreationInformation/Creation/Title[@type=”songTitle”]

Year TORY

(Original release

year)

Year of the

recording

CreationInformation/CreationCoordinates/Date/TimePoint

(Recording date.)

Comment COMM Any comment

of any length

CreationInformation/Creation/Abstract/FreeTextAnnotation

Track TRCK

(Track number /

Position in set)

CD track

number of song

Semantics/SemanticBase[@xsi:type=”SemanticStateType”]/Att

ributeValuePair

Genre TCON

(Content type)

ID 3 V1.1

Genre

ID 3 V2 Genre

(4)(Eurodisco)

CreationInformation/Classification/Genre[@href=”urn:id3:v1:4

”]


”]/Term[@termID=”urn:id3:v2:Eurodisco”]


”]

CreationInformation/Classification/Genre[@type=”secondary][

@href=”urn:id3:v2:Eurodisco”]

Table 3 — ID3 information of a song

ID3 V1.1 Value

Song Title If Ever You Were Mine

Album Title Celtic Legacy

Artist Natalie MacMaster

Year 1995

Comment AG# 3B830D8

Track 05

Genre 80 (Folk)

MPEG-7 Path notation is shorthand for the full XML notation, and an example of the

correspondence between MPEG-7 Path and XML notation is shown as follows. Table 3 shows the

ID3 information and the corresponding values. The metadata representation of the ID3 is shown in

Table 4.

Table 4 — MPEG-7 instantiation example representing the ID3 information

<?xml version="1.0" encoding="UTF-8"?>  <Mpeg7 xmlns="urn:mpeg:mpeg7:schema:2001" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="urn:mpeg:mpeg7:schema:2001 C:\mpeg7\is\Mpeg7-2001.xsd"> <Description xsi:type="CreationDescriptionType"> <CreationInformation id="track-05"> <Creation>  <Title type="songTitle">If Ever You Were Mine</Title>  <Title type="albumTitle">Celtic Legacy</Title>  <Abstract> <FreeTextAnnotation>AG# 3B8308D8</FreeTextAnnotation> </Abstract>  <Creator> <Role href="urn:mpeg:mpeg7:RoleCS:2001:PERFORMER"/> <Agent xsi:type="PersonType"> <Name> <FamilyName>MacMaster</FamilyName> <GivenName>Natalie</GivenName> </Name> </Agent> </Creator>  <CreationCoordinates> <Date> <TimePoint>1995</TimePoint> </Date> </CreationCoordinates> </Creation>  <Classification> <Genre href=" urn:id3:cs:ID3genreCS:v1:80"> <Name>Folk</Name> </Genre> </Classification> </CreationInformation> </Description> <Description xsi:type="SemanticDescriptionType"> <Semantics> <SemanticBase xsi:type="SemanticStateType">  <AttributeValuePair> <Attribute> <TermUse href="urn:mpeg:maf:cs :musicplayer:CollectionElementsCS:2007:assetNum"/> </Attribute> <IntegerValue>6</IntegerValue> </AttributeValuePair>  <AttributeValuePair> <Attribute> <TermUse href="urn:mpeg:maf:cs :musicplayer:CollectionElementsCS:2007:assetTot"/> </Attribute> <IntegerValue>12</IntegerValue> </AttributeValuePair>  <AttributeValuePair>

<Attribute> <TermUse href="urn:mpeg:maf:cs :musicplayer:CollectionElementsCS:2007:volumeNum"/> </Attribute> <IntegerValue>1</IntegerValue> </AttributeValuePair> <AttributeValuePair> <Attribute> <TermUse href="urn:mpeg:maf:cs :musicplayer:CollectionElementsCS:2007:volumeTot"/> </Attribute> <IntegerValue>2</IntegerValue> </AttributeValuePair> </SemanticBase> </Semantics> </Description> </Mpeg7>

3.2. MPEG-A Part 4: Musical Slide Show Application Format

3.2.1. Overview

The existing Music player AF was designed as a simple format for enhanced MP3 players. It

contains MP3 audio data, optional MPEG-7 meta-data and JPEG still image for cover art. The

Photo Player MAF under development combines JPEG still images with MPEG-7 meta-data.

Musical Slide show AF builds on top of the Music player and the Photo player AF and is meant as

a superset of these two MAFs.

The use of Musical slide show AF is presented in the following use cases:

Foreign language exercise materials

Photo-music album applications

Storytelling application

Personal slideshow application

Karaoke application

Karaoke + slideshow application

3.2.2. File format

The normative file structure of Musical slide show AF, as seen in Figure 18, consists of three

boxes at the file level: File Type Box („ftyp‟), Movie Box („moov‟) and Media Data Box

(„mdat‟). The „ftyp‟ box defines the type of the file format that the file structure complies to.

For the file type box, the major-brand is „mp42.‟ The brand that identifies the Musical slide show

application format file format is „mss1‟. The brand „mss1‟ is used as a compatible brand for the

Musical slide show application format file format. The „moov‟ box contains three types of tracks

(slide show, audio, text) and a metadata box:

Normative Slide show Track Box (A „trak‟ box for timed JPEG images)

Normative Audio Track Box (MP3 audio)

Optional Text Track Box (Timed text)

Optional Metadata Box (Media resource information and LASeR scene description)

ftyp moov

meta

iloc/iinf

MP3

item_ID = 1

item_name = <rel. uri to mp3>

content_type = audio/mp3

JPEG #1

item_ID = 2

item_name = <rel. uri to jpeg #1>

content_type = image/jpeg

…

JPEG #n

item_ID = n+1

item_name = <rel. uri to jpeg #n>


Timed Text

item_ID = n+2

item_name = <rel. uri to text>

content_type = text

xml

LASeR

trak (jpeg slide show)

mdia

stbl

meta

xml

MPEG-7

trak (mp3 audio)

mdia

stbl

meta

xml

MPEG-7

trak (text)

mdia

stbl

tx3g

mdat

MP3

3GPP TS 26.245

JPEG

1 2 n-1 n…

Figure 18 — Musical slide show file format

The „trak‟ boxes contain temporal and spatial information of the media data (JPEG images, MP3

audio, timed text). For the Musical slide show application format, all the images that are used in

the slide show presentation are arranged in a single track.

A Musical slide show player shall support application formats with the following number of

tracks:

Single slide show track (normative)

Single audio track (normative)

Single text track (optional)

The track handler types for the above tracks are:

„vide‟ for the slide show track

„soun‟ for the audio track

„text‟ for the text track

The movie level metadata box („meta‟) contains the item information box („iinf‟) and the item

location („iloc‟) box. For each media data, an item ID is assigned, and the physical location and

size of the media data are contained in the item location box. The item name and the content type

information are contained in the item information box.

The „xml‟ box that is located in the movie level „meta‟ box contains the LASeR scripts

responsible for the animation effects, and since it exists as a single “file,” the meta handler-type is

„lsr1‟ for the „meta‟ box. The “mdat” box contains the actual media data bytes.

For the Musical slide show application format, there are two possible rendering modes:

Basic mode

Enhanced mode

For the “Basic” mode, the timed text and slide show of JPEG images shall be rendered using the

sample table („stbl‟) box in the file format. In the “Enhanced” mode, LASeR scene description

shall be responsible for coordinating the overall presentation (slide show, animation effects, and

timed text).

The operational flow for both Basic and Enhanced mode is shown in Figure 19.

Start

‘LASeR scene desription’(slide show + animation + timed text)

LASeR handling

capability?

LASeR scene

description?

‘MP3’

‘slide show’ without animation

‘timed text’

no

yes

no

yes

Enhanced mode

Basic mode

Figure 19 — Basic and Enhanced mode operational flow diagram

Basic mode

In the “Basic” mode, MP3, JPEG images and timed text (3GPP TS 26.245) are rendered

concurrently by only using the information (timing, sample size, sample offset) obtained from the

„stbl‟ boxes. Therefore, when the file is loaded, the „moov‟ box is parsed first, then, the tracks

are read. For each track, the „stbl‟ box is parsed in order to gain access to the spatial and

temporal information regarding the sample data. Players that are not capable of handling LASeR

scene description may ignore the „meta‟ box where LASeR scene description („xml‟ box) is

placed. In this mode, the JPEG images are rendered based on the timing information in the „stbl‟

box.

Enhanced mode

In the “Enhanced” mode, animation effects shall be applied to the JPEG images in the slide show

presentation. The LASeR script is responsible for rendering the JPEG images and the timed text

data (using the „text‟ element). Therefore, the part that describes the timing information regarding

the JPEG images is ignored. For the “Enhanced” mode, the timeline of the slide show presentation

is fully dependent on how the LASeR scene description is formed. The MP3 is played in the same

way as in the “Basic” mode. However, the sample table box of the slide show track shall contain

timing information in case the player does not support LASeR decoding capability.


Creating a Musical slide show application format file involves formatting different types of media

data, and storing them into an MPEG-4 file format. Figure 20 shows the system architecture of

creating Musical slide show AF file. The Musical slide show AF consists of MP3 audio

(mandatory), JPEG images (mandatory), timed text (optional) and LASeR scene description for

animation effects (optional).

Playing a Musical slide show AF file involves two parts: content extraction and content

synchronization. The method of content extraction of Musical slide show AF is similar to Music

player AF. The method content synchronization is described in the next sub section.

MP3 audio

JPEG

images

MSS AF Creator

Create MP4 file structure

Create multimedia tracks

Store metadata

MSS AF file

MP3 audio

JPEG images

Timed text

LASeR scene

description

Animation

effects

Text

(timed text)

Figure 20 — System architecture of creating MSS AF

3.2.4. Synchronization

Synchronization of the media data is primarily achieved with the use of the sample table box

(„stbl‟). The sample table contains all the time and data indices of the media samples in a track.

For the slide show track, each JPEG image is considered to be a sample. Therefore, the timing

information (slide show duration), and the physical sizes and locations of the images regarding the

slide show presentation are stored inside the „stbl‟ box. Specifically, the following sub-boxes

are used:

„stts‟ (Decoding Time to Sample Box)

„stsz‟ (Sample Size Box)

„stco‟ (Chunk Offset Box)

The slide show duration is stored in the „stts‟ box, and the image size and the image location are

stored in the „stsz‟ box and the „stco‟ box respectively. Figure 21 shows an illustration of

allocating JPEG samples and referring them from the sample table box.

stbl mdat

time

JPEG 2

size

JPEG 1

size

offs

et

offs

et

JPEG 3

size

offs

et

JPEG n

size

offs

et

●●●

time

time

time

time

stco

stsz

stts

JPEG 1

JPEG 2

…

JPEG n

JPEG 1

JPEG 2

…

JPEG n

JPEG 1

JPEG 2

…

JPEG n

Figure 21 — Allocating several JPEGs as a collection of samples

In order for the timeline of the text track to be aligned with the slide show track, the timing

information in the „stts‟ box of the slide show track should act as the “clock.” In other words,

the text track should be fully dependent on the timeline of the slide show track. Figure 22 shows

an example of a synchronized Musical slide show application format presentation.

Synchronized text (Lyric 1)

Synchronized text

(Lyric 2)

Synchronized text

(Lyric 3)

Synchronized text

(Lyric 4)

Image 1 Image 2 Image 3 Image 4 Image 5

Image Samples

t

0 sec 5 sec 7 sec 10 sec

0 sec 4 sec 8 sec 12 sec

Images Time stamps

Text Time stamps

14 sec

LASeR Script

animation animation animation animation animation

Text

Samples

Figure 22 — Synchronizing resource

3.2.5. Animation

The animation effects for the Musical slide show application format focus on creating a more

entertaining image slide show presentation. Figure 23 shows an example of an animation effect

that combines image transformation and opacity control.

Figure 23 — Example of animation

It is important to note that the effects featured in the Musical slide show application format shall

only be used for image transitions during the slide show presentation, and they shall be comprised

of simple image filtering effects. In addition, the timing information within LASeR can

independently be defined regardless of the timing information in the sample table box. Table 5

shows the functionalities and the description elements (LASeR) that shall be supported for the

basic transition effects.

Table 5 — List of basic transition effects

Effects Functionalities

Description

elements Semantics

Grouping Effect grouping g Defined in subclause 6.8.15 of

ISO/IEC 14496-20:2006

Image referencing Image dimension image Defined in subclause 6.8.16 of

ISO/IEC 14496-20:2006

Referencing image

Opacity control Fade-in / Fade-out animate Defined in subclause 6.8.4 of

ISO/IEC 14496-20:2006

Geometrical

transformation Translation animateTransform

Defined in subclause 6.8.7 of

ISO/IEC 14496-20:2006

Scale

Rotation

Skew

Object motion Object motion on a predefined

path animateMotion

Defined in subclause 6.8.6 of

ISO/IEC 14496-20:2006

Color change Changes object color animateColor Defined in subclause 6.8.5 of

ISO/IEC 14496-20:2006

Attribute control Sets the value of an attribute set Defined in subclause 6.8.28 of

ISO/IEC 14496-20:2006

Shapes & Motion path path Defined in subclause 6.8.22 of

ISO/IEC 14496-20:2006

Basic shapes rect Defined in subclause 6.8.26 of

ISO/IEC 14496-20:2006

circle Defined in subclause 6.8.9 of

ISO/IEC 14496-20:2006

ellipse Defined in subclause 6.8.13 of

ISO/IEC 14496-20:2006

line Defined in subclause 6.8.17 of

ISO/IEC 14496-20:2006

polyline Defined in subclause 6.8.24 of

ISO/IEC 14496-20:2006

polygon Defined in subclause 6.8.23 of

ISO/IEC 14496-20:2006

Using LASeR in a textual format provides an easy way to create and edit descriptions for the

animation effects, since the input data can simply be typed in, and the data itself can be more

intuitive in terms of understanding the functionalities. Therefore, textual format shall be the

normative way of using LASeR. In the Musical slide show application format, a reduced set of

scene description elements for animation is used in local playback settings. Therefore, the data

size or the decoding speed may not be an issue in terms of parsing or decoding the data. Figure 24

shows a possible model for a LASeR renderer.

Figure 24 — LASeR renderer model for Musical slide show application format

3.2.6. Timed text

Timed text is intended to be used for applications (e.g. “karaoke” and language study materials)

that require extensive use of textual presentation. In the Musical slide show application format,

there are two possible ways to render timed text.

For players that are not capable of handling LASeR scene description (Basic mode) or contents

that only require minimum use of textual presentation, 3GPP TS 26.245 timed text format is used.

3GPP TS 26.245 timed text data consists of:

Text samples

Sample descriptions

A text sample consists of one text string and text modifiers (optional). Figure 25 shows the

structure of the timed text. Sample descriptions and text modifiers are parameters that determine

how the text string is to be displayed. Sample descriptions provide global information such as font,

position, background color about a text sample or samples, where as text modifiers provide

information about a text string when it is displayed. In file format structure, the sample description

is located inside the sample table box, along with time information of the text sample associated to

it, which is located in the time to sample box within the same sample table box, as shown in

Figure 26. This synchronization method is similar to that of slide show.

For the Musical slide show application format, there are four types of text modifiers (optional):

„styl‟ (for text style)

„hlit‟ (for highlighted text)

„krok‟ (for Karaoke, closed captioning and dynamic highlighting)

„blnk‟ (for blinking text)

For detailed sample description and text modifier syntax, refer to 3GPP TS 26.245 specification.

The players that are capable of handling LASeR scene description (Enhanced mode), the „text‟

element in LASeR is used for timed text functionality. The supported functionalities (optional) of

timed text are:

Characters and glyphs support

Font Support

Color Support

Text rendering position and composition

Highlighting, closed captioning and “karaoke”

Sample

description 1

Sample

description 2

Sample

description n●●●

Text sample

1

Text sample

2

Text sample

3

Text sample

n

Text string Text modifiers

Ch

ara

cte

r 1

Ch

ara

cte

r 2

Ch

ara

cte

r 3

Ch

ara

cte

r 4

Ch

ara

cte

r n

●●●

Te

xt m

od

ifie

r 1

Te

xt m

od

ifie

r 2

Te

xt m

od

ifie

r 3

Te

xt m

od

ifie

r n

●●●

sample table box (stbl)

media data box (mdat)

Each text sample is associated

to one sample description

Figure 25 — Timed text structure

trak

mdia

stbl

Time

information

stts

Sample

description

stsd/tx3g

mdat

Te

xt sa

mp

le 1

●●●

Te

xt sa

mp

le 2

Te

xt sa

mp

le n

Figure 26 — Timed text structure in file format

3.2.7. Metadata

For the Musical slide show application format, metadata provides simple background information,

such as creation date, artist/creator information, and title of a photo series or a song.

The two types of normative metadata (textual XML) included in the Musical slide show

application format are:

Collection and item level metadata for the slide show track (Collection-level metadata are for

allowing users to define groups/categories/sets of photos and to store metadata relating to

those groups, independently of the ordering of the slide-show; Item-level metadata are for

enabling content-based search of slides)

Metadata for the audio track

The Musical slide show application format file structure allows the metadata to be stored inside

the media tracks. The metadata handler type is „mp7t‟ for both slideshow and audio tracks. Figure

27 shows the locations of the metadata in the Musical slide show application format file structure.

ftyp moov

meta

trak (JPEG slide show)

trak (timed text)

mdia

trak (MP3 audio)

meta

mdia

mdat

JPEGs

1 2 … n-1

MP3

3GPP TS 26.245

n

iloc

iinf

xml

LASeR

xml

MPEG-7

meta

mdiaxml

MPEG-7

Figure 27 — MPEG-7 metadata in MSS AF file format

3.2.7.1. Metadata for slide show

For the Musical slide show application format, images are structurally arranged in a single

track, therefore, both collection and item level metadata are contained inside the slide show

track as single XML data.

For the collection and item level (metadata for individual photos) descriptive metadata,

MPEG-7 ContentCollection and Image DS are used, respectively (aligned with the

Photo player application format). In order to combine the two metadata, the Image DS for

the item-level metadata shall be contained under the Content element in the

ContentCollection DS for the collection-level metadata. Every photo in the file shall

have a corresponding Content element in the root collection. This means there shall be as

many Content elements in the root collection as there are photos.

In the item-level metadata, MediaLocator DS is used for associating the metadata

pertaining to the individual image with its resource data within the file, identified by its

item_ID. In the collection level, images can be referenced using the ContentRef element.

The ContentRef element shall only exist in sub-collections. The normative specification of

all semantics is given in ISO/IEC 15938-2 and ISO/IEC 15938-5:2003.

The slide show track schema is defined with respect to the MPEG-7 Version 2 schema as

specified in ISO/IEC 15938-10. The namespace of the Version 2 schema providing a basis for

the slide show track schema is “urn:mpeg:mpeg7:schema:2004”.

3.2.7.2. Metadata for audio

The metadata for the audio track provides the song title, name of the artist, album title, year

and genre of the audio content. It is the same as the metadata for audio used in Music player

AF. Please refer to section 3.1.4 in this document for the semantics and example.


Edition: Protected Musical Slide Show Application Format

3.3.1. Overview

The Protected Musical slide show application format builds on the Musical slide show AF as

described in section 3.2. It adds content protection for MP3 audio, JPEG images, 3GPP Timed

Text, and LASeR script animation with flexible protection tool selection and key management

components.

The two rendering modes for Musical slide show AF: “Basic” mode and “Enhanced” mode as

described in Musical slide show AF are still available in Protected Musical slide show AF. In this

case, the resource shall be unprotected prior to rendering. As the player in Basic mode may ignore

the LASeR script for animation, the protection for LASeR script may also be ignored in this mode.

The application shall be able to read the MPEG-21 DID metadata stored in movie-level „meta‟ box.

By parsing and executing the protection scheme to unprotect the protected contents (if any), the

output of the devices shall be the same as the output of Musical slide show AF without the MPEG-

21 DID metadata, for “Basic” or “Enhanced” mode.

As in Musical slide show AF, the Protected Musical slide show uses the same method of

synchronizing contents. The protection scheme is designed to not alter nor modify the

synchronization of contents. Therefore, the description of media synchronization and animation

will not be described in this section.

Table 6 shows the comparison of technologies used in Musical slide show AF compared to

Protected Musical slide show AF.

Table 6 — Technologies used in Musical Slide Show AF

Brand

Technologies

‗mss1‘ ‗mss2‘

MPEG-1/2 Layer 3 Audio

JPEG Images

MPEG-7 metadata

LASeR script for animation

3GPP TS 26.245 timed text data

MPEG-21 DID

MPEG-21 IPMP Components Base Profile

MPEG-21 REL MAM Profile

MPEG-21 Fragment Identifier

3.3.2. File format

The format of file in Protected Musical slide show AF is basically the same as that of Musical

slide show AF. It has three tracks of JPEG slide show, MP3 audio and a 3GPP timed text which

corresponds to three trak boxes in movie box. The difference is the inclusion of MPEG-21 DID in

the meta box contains the IPMPDIDL metadata for protection information. Due to this issue, the

LASeR is now part of the MPEG-21 DID in the same xml box as Musical slide show AF file

format. Figure 28 shows the Protected Musical slide show AF structure.

The „ftyp‟ box of the ISO Base Media File Format contains a list of “brands” that are used as

identifiers in the file format. To enable player applications to easily identify files which are

compliant to this AF specification, specific brand identifiers are defined. These brands are used in

the compatible-brands list in addition to other appropriate brand types, like “iso2”, “mp42” or

“mp21”. The brand that identifies the Protected Musical slide show AF is „mss2‟. The brand

follows the brand of the Musical slide show AF by altering the number of brands into number „2‟,

to specify the „2nd Edition‟.

ftyp moov

meta

iloc/iinf

MP3

item_ID = 1

item_name = <rel. uri to mp3>

content_type = audio/mp3

JPEG #1

item_ID = 2

item_name = <rel. uri to jpeg #1>


…

JPEG #n

item_ID = n+1

item_name = <rel. uri to jpeg #n>


Timed Text

item_ID = n+2

item_name = <rel. uri to text>

content_type = text

xml

Descriptor

IPMPGeneralInfo

ToolList

License

Descriptor

LASeR script

Item

Component

Resource (audio/mp3)

IPMPInfo

Item

Component

Resource (image/jpeg)

IPMPInfo

… Item

Component


IPMPInfo

Item

Component

Resource (text)

IPMPInfo

trak (jpeg slide show)

mdia

stbl

meta

xml

MPEG-7

trak (mp3 audio)

mdia

stbl

meta

xml

MPEG-7

trak (text)

mdia

stbl

tx3g

mdat

MP3

3GPP TS 26.245

JPEG

1 2 n-1 n…

Figure 28 — Protected Musical slide show AF file structure


Creating a protected Musical slide show AF file involves formatting different types of media data,

defining the protection and license information, and storing them into an MPEG-4 file format.

Based on the Musical slide show AF system architecture described in section 3.2.3, the protection

module is included in the Creator to protect the resources based on the protection and license

description. Figure 29 shows an example of protected Musical slide show AF creator system

architecture. MP3 audio, JPEG images, and text data are formatted as individual MP4 media

tracks. Descriptions for the animation effects are stored as LASeR scene description in XML

format. These resources are described in structured way using MPEG-21 Digital Item Declaration

Language (DIDL), while the protection and license information for the protected resource is

described using MPEG-21 Intellectual Property Management and Protection (IPMP) and MPEG-

21 Rights Expression Language (REL).

MP3 audio

JPEG

images

Protected MSS AF Creator

Create MP4 file structure

Create multimedia tracks

Store metadata

Pro

tectio

n a

nd

lice

nse

info

rma

tion

Protected MSS AF file

MP3 audio

JPEG images

Timed text

LASeR scene

description

MP

EG

-21

DID

L/IP

MP

/RE

L

Pro

tect re

so

urc

es

Animation

effects

Text

(timed text)

Figure 29 — Protected Musical slide show AF creator system architecture

3.3.4. Metadata

Figure 30 shows how the information described by IPMP and what resources are protected. The

mechanism of signaling protection for the resources with the metadata is as follows: each of MP3

audio and 3GPP Timed text is described as one item while the collection of JPEG images (the

slide show) is described as one item contains individual JPEG image as a component and LASeR

script for animation is described as Descriptor. If the resource is protected, its description will be

described as ProtectedAsset using IPMP DIDL description scheme.

ftyp moov

xml

Descriptor

IPMPGeneralInfo

ToolList

License

Descriptor

Protected LASeR script

IPMPInfo

Item

Component


IPMPInfo

Item

Component


IPMPInfo

…Item

Component


IPMPInfo

Item

Component

Resource (text)

IPMPInfo


meta mdia

trak (timed text)

mdia

trak (MP3 audio)

meta mdia

mdat

Protected JPEGs

1 2 … n-1

Protected MP3

Protected 3GPP

TS 26.245

n

Figure 30 — Protected resources pointed by the metadata

The IPMPDIDL metadata contains Descriptor that contains IPMPGeneralInfo. It is

recommended that the Descriptor is defined at the beginning of the IPMPDIDL metadata. The

IPMPGeneralInfo contains:

ToolList, as defined in MPEG-21 IPMP Base Profile.

Container for licenses. The license information is described by MPEG-21 REL MAM

Profile

With this specification, the collection and item level metadata will be left intact. The MPEG-21

will not be used to describe any information that has been described by the MPEG-7 metadata in

both audio and slide show „trak‟s. Instead, it will be only used to describe the structure and the

governance of the multimedia content inside the AF.

The example of metadata instantiation of protecting the contents shown in Figure 30 is as shown

in Table 7. The tool list is carried at the top of IPMP DIDL with necessary license collection. As

described in ipmpinfo:IPMPToolID element, the AES-128 encryption tool can be signalled

by defining the tool name as the identification tag. The linkage that refers the tool used for the

content protection is described in the localID attribute in the ToolList element in

IPMPGeneralInfoDescriptor and localidref attribute in the

IPMPInfoDescriptor. An item in the IPMP DIDL represents a collection of MP3 audio,

slide show (JPEG images) and timed text. The protected resources are described by the Resource

element. The structure of IPMP Component Base profile is as shown in Figure 31.

(a)

(b)

Figure 31 — IPMP Component Base Profile

Table 7 — Metadata example of protecting contents in Protected Musical slide show AF

<?xml version="1.0" encoding="UTF-8"?> <DIDL xmlns="urn:mpeg:mpeg21:2002:02-DIDL-NS" xmlns:dii="urn:mpeg:mpeg21:2002:01-DII-NS" xmlns:ipmpdidl="urn:mpeg:mpeg21:2004:01-IPMPDIDL-NS" xmlns:ipmpinfo="urn:mpeg:mpeg21:2004:01-IPMPINFO- BASE-NS" xmlns:mx="urn:mpeg:mpeg21:2003:01-REL-MX-NS" xmlns:r="urn:mpeg:mpeg21:2003:01-REL-R-NS" xmlns:sx="urn:mpeg:mpeg21:2003:01-REL-SX-NS" xmlns:enc="http://www.w3.org/2001/04/xmlenc#" xmlns:dsig="http://www.w3.org/2000/09/xmldsig#" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="urn:mpeg:mpeg21:2003:01-REL-R-NS rel-r.xsd urn:mpeg:mpeg21:2003:01-REL-MX-NS rel-mx.xsd urn:mpeg:mpeg21:2004:01IPMPINFO-BASE-NS IPMPInfo-Profilev0.4.xsd urn:mpeg:mpeg21:2004:01-IPMPDIDL-NS IPMPDIDL.xsd urn:mpeg:mpeg21:2002:01-DII-NS dii.xsd urn:mpeg:mpeg21:2002:02-DIDL-NS DIDL.xsd"> <Container> <Descriptor> <Statement mimeType="text/xml"> <ipmpinfo:IPMPGeneralInfoDescriptor> <ipmpinfo:ToolList> <ipmpinfo:ToolDescription localID="10"> <ipmpinfo:IPMPToolID>AES-128-CBC</ipmpinfo:IPMPToolID> </ipmpinfo:ToolDescription> </ipmpinfo:ToolList> <ipmpinfo:LicenseCollection> <ipmpinfo:RightsDescriptor> <ipmpinfo:License>  </ipmpinfo:License> </ipmpinfo:RightsDescriptor> </ipmpinfo:LicenseCollection> </ipmpinfo:IPMPGeneralInfoDescriptor> </Statement> </Descriptor>

<Descriptor> <Statement mimeType="application/ipmp">

<ipmpdidl:ProtectedAsset mimeType="application/laser"> <ipmpdidl:Identifier> <dii:Identifier>IPMPId001</dii:Identifier> </ipmpdidl:Identifier> <ipmpdidl:Info> <ipmpinfo:IPMPInfoDescriptor> <ipmpinfo:Tool> <ipmpinfo:ToolRef localidref="10"/> </ipmpinfo:Tool> </ipmpinfo:IPMPInfoDescriptor> </ipmpdidl:Info> <ipmpdidl:Contents> <![CDATA[-------------- animation script or code ---------]]> </ipmpdidl:Contents> </ipmpdidl:ProtectedAsset> </Statement> </Descriptor> <Item id="1"> <Component> <Resource mimeType="application/ipmp"> <ipmpdidl:ProtectedAsset mimeType="audio/mp3"> <ipmpdidl:Identifier> <dii:Identifier>IPMPId002</dii:Identifier> </ipmpdidl:Identifier> <ipmpdidl:Info> <ipmpinfo:IPMPInfoDescriptor> <ipmpinfo:Tool> <ipmpinfo:ToolRef localidref="10"/> </ipmpinfo:Tool> </ipmpinfo:IPMPInfoDescriptor> </ipmpdidl:Info> <ipmpdidl:Contents ref="#mp (/byte(0, 4550000)"/> </ipmpdidl:ProtectedAsset> </Resource> </Component>

</Item> <Item id="2">

<Component> <Resource mimeType="application/ipmp"> <ipmpdidl:ProtectedAsset mimeType="image/jpeg"> <ipmpdidl:Identifier> <dii:Identifier>IPMPId003</dii:Identifier> </ipmpdidl:Identifier> <ipmpdidl:Info> <ipmpinfo:IPMPInfoDescriptor> <ipmpinfo:Tool> <ipmpinfo:ToolRef localidref="10"/> </ipmpinfo:Tool> </ipmpinfo:IPMPInfoDescriptor> </ipmpdidl:Info> <ipmpdidl:Contents ref="#mp (/byte(4550000, 4550370)"/> </ipmpdidl:ProtectedAsset> </Resource> </Component> <Component> <Resource mimeType="application/ipmp"> <ipmpdidl:ProtectedAsset mimeType="image/jpeg"> <ipmpdidl:Identifier> <dii:Identifier>IPMPId004</dii:Identifier> </ipmpdidl:Identifier> <ipmpdidl:Info> <ipmpinfo:IPMPInfoDescriptor> <ipmpinfo:Tool> <ipmpinfo:ToolRef localidref="10"/> </ipmpinfo:Tool> </ipmpinfo:IPMPInfoDescriptor> </ipmpdidl:Info> <ipmpdidl:Contents ref="#mp (/byte(4550370, 4550892)"/> </ipmpdidl:ProtectedAsset> </Resource> </Component>


<Component> <Resource mimeType="application/ipmp"> <ipmpdidl:ProtectedAsset mimeType="text/txt"> <ipmpdidl:Identifier> <dii:Identifier>IPMPId005</dii:Identifier> </ipmpdidl:Identifier> <ipmpdidl:Info> <ipmpinfo:IPMPInfoDescriptor> <ipmpinfo:Tool> <ipmpinfo:ToolRef localidref="10"/> </ipmpinfo:Tool> </ipmpinfo:IPMPInfoDescriptor> </ipmpdidl:Info> <ipmpdidl:Contents ref="#mp (/byte(4550892, 4550900)"/> </ipmpdidl:ProtectedAsset> </Resource> </Component> </Item> </Container> </DIDL>

Several protection scenarios are possible:

Protecting individual item

Each resource: MP3 audio, JPEG image, 3GPP Timed Text, and LASeR script animation can be

protected individually:

Figure 32 shows how the protection can be described in the metadata for MP3 audio only.

Figure 33 shows how the protection can be described in the metadata for JPEG slide show (all

JPEGs in slide show track assumed as one single slide show item).

Figure 34 shows how the protection can be described in the metadata for 3GPP timed text

only

Figure 35 shows how the protection can be described for the LASeR script only.

ftyp moov

xml

Descriptor

IPMPGeneralInfo

ToolList

License

Descriptor

LASeR script

Item

Component


IPMPInfo

Item

Component


…Item

Component


Item

Component

Resource (text)


meta mdia

trak (timed text)

mdia

trak (MP3 audio)

meta mdia

mdat

JPEGs

1 2 … n-1

Protected MP3

3GPP TS 26.245

n

Figure 32 — Protecting MP3 audio

ftyp moov

xml

Descriptor

IPMPGeneralInfo

ToolList

License

Descriptor

LASeR script

Item

Component


Item

Component


IPMPInfo

…Item

Component


IPMPInfo

Item

Component

Resource (text)


meta mdia

trak (timed text)

mdia

trak (MP3 audio)

meta mdia

mdat

Protected JPEGs

1 2 … n-1

MP3

3GPP TS 26.245

n

Figure 33 — Protecting JPEG slide show

ftyp moov

xml

Descriptor

IPMPGeneralInfo

ToolList

License

Descriptor

LASeR script

Item

Component


Item

Component


…Item

Component


Item

Component

Resource (text)

IPMPInfo


meta mdia

trak (timed text)

mdia

trak (MP3 audio)

meta mdia

mdat

JPEGs

1 2 … n-1

MP3

Protected 3GPP

TS 26.245

n

Figure 34 — Protecting 3GPP timed text

ftyp moov

xml

Descriptor

IPMPGeneralInfo

ToolList

License

Descriptor


IPMPInfo

Item

Component


Item

Component


…Item

Component


Item

Component

Resource (text)


meta mdia

trak (timed text)

mdia

trak (MP3 audio)

meta mdia

mdat

JPEGs

1 2 … n-1

MP3

3GPP TS 26.245

n

Figure 35 — Protecting LASeR script

Table 8 shows the metadata instantiation example of protection to only the MP3 audio.

Table 8 — Metadata instantiation example of protecting MP3 audio


<Descriptor> <Statement mimeType=" application/laser "> <![CDATA[-------------- animation script or code ---------]]> </Statement> </Descriptor> <Item id="1"> <Component> <Resource mimeType="application/ipmp"> <ipmpdidl:ProtectedAsset mimeType="audio/mp3"> <ipmpdidl:Identifier> <dii:Identifier>IPMPId001</dii:Identifier> </ipmpdidl:Identifier> <ipmpdidl:Info> <ipmpinfo:IPMPInfoDescriptor> <ipmpinfo:Tool> <ipmpinfo:ToolRef localidref="10"/> </ipmpinfo:Tool> </ipmpinfo:IPMPInfoDescriptor> </ipmpdidl:Info> <ipmpdidl:Contents ref="#mp (/byte(0, 4550000)"/> </ipmpdidl:ProtectedAsset> </Resource> </Component>


<Component> <Resource mimeType="image/jpeg” ref="#mp (/byte(4550000, 4550370)"/> </Component> <Component> <Resource mimeType="image/jpeg” ref="#mp (/byte(4550370, 4550892)"/> </Component>

<Component> <Resource mimeType="image/jpeg” ref="#mp (/byte(4550892, 4551006)"/> </Component>

</Item>

<Item id="3"> <Component> <Resource mimeType="text/txt" ref="#mp (/byte(4551006, 4551078)"/> </Component> </Item> </Container> </DIDL>

Table 9 shows the metadata instantiation example of protecting LASeR animation script for the

JPEG slide show.

Table 9 — Metadata instantiation example of protecting LASeR animation script

<?xml version=”1.0” encoding="UTF-8"?> <DIDL xmlns="urn:mpeg:mpeg21:2002:02-DIDL-NS" xmlns:dii="urn:mpeg:mpeg21:2002:01-DII-NS" xmlns:ipmpdidl="urn:mpeg:mpeg21:2004:01-IPMPDIDL-NS" xmlns:ipmpinfo="urn:mpeg:mpeg21:2004:01-IPMPINFO- BASE-NS" xmlns:mx="urn:mpeg:mpeg21:2003:01-REL-MX-NS" xmlns:r="urn:mpeg:mpeg21:2003:01-REL-R-NS" xmlns:sx="urn:mpeg:mpeg21:2003:01-REL-SX-NS" xmlns:enc="http://www.w3.org/2001/04/xmlenc#" xmlns:dsig="http://www.w3.org/2000/09/xmldsig#" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="urn:mpeg:mpeg21:2003:01-REL-R-NS rel-r.xsd urn:mpeg:mpeg21:2003:01-REL-MX-NS rel-mx.xsd urn:mpeg:mpeg21:2004:01IPMPINFO-BASE-NS IPMPInfo-Profilev0.4.xsd urn:mpeg:mpeg21:2004:01-IPMPDIDL-NS IPMPDIDL.xsd urn:mpeg:mpeg21:2002:01-DII-NS dii.xsd urn:mpeg:mpeg21:2002:02-DIDL-NS DIDL.xsd"> <Container> <Descriptor> <Statement mimeType="text/xml"> <ipmpinfo:IPMPGeneralInfoDescriptor> <ipmpinfo:ToolList> <ipmpinfo:ToolDescription localID="10"> <ipmpinfo:IPMPToolID>AES-128-CBC</ipmpinfo:IPMPToolID> </ipmpinfo:ToolDescription> </ipmpinfo:ToolList> <ipmpinfo:LicenseCollection> <ipmpinfo:RightsDescriptor> <ipmpinfo:License>  </ipmpinfo:License> </ipmpinfo:RightsDescriptor> </ipmpinfo:LicenseCollection> </ipmpinfo:IPMPGeneralInfoDescriptor> </Statement> </Descriptor>

<Descriptor> <Statement mimeType="application/laser"> <ipmpdidl:ProtectedAsset mimeType="application/laser"> <ipmpdidl:Identifier> <dii:Identifier>IPMPId001</dii:Identifier> </ipmpdidl:Identifier> <ipmpdidl:Info> <ipmpinfo:IPMPInfoDescriptor> <ipmpinfo:Tool> <ipmpinfo:ToolRef localidref="10"/> </ipmpinfo:Tool> </ipmpinfo:IPMPInfoDescriptor> </ipmpdidl:Info> <ipmpdidl:Contents> <![CDATA[-------------- animation script or code ---------]]> </ipmpdidl:Contents> </ipmpdidl:ProtectedAsset> </Statement> </Descriptor> <Item id="1"> <Component> <Resource mimeType="audio/mp3" ref="#mp (/byte(0, 4550000)"/>

</Component> </Item> <Item id="2">

<Component> <Resource mimeType="image/jpeg" ref="#mp (/byte(4550000, 4550370)"/> </Component> <Component> <Resource mimeType="image/jpeg" ref="#mp (/byte(4550370, 4550892)"/> </Component>

<Component> <Resource mimeType="image/jpeg" ref="#mp (/byte(4550892, 4551024)"/> </Component>




<Component> <Resource mimeType="text/txt" ref="#mp (/byte(4552002, 4552082)"/> </Component> </Item> </Container> </DIDL>

Protecting combination of individual items

The whole resource or its combination (e.g. MP3 audio and JPEG images, or JPEG images and its

animation) can be protected at the same time. Figure 36 shows the protection description to protect

JPEG images and slide show animation

ftyp moov

xml

Descriptor

IPMPGeneralInfo

ToolList

License

Descriptor


IPMPInfo

Item

Component


Item

Component


IPMPInfo

…Item

Component


IPMPInfo

Item

Component

Resource (text)


meta mdia

trak (timed text)

mdia

trak (MP3 audio)

meta mdia

mdat

Protected JPEGs

1 2 … n-1

MP3

3GPP TS 26.245

n

Figure 36 — Protecting JPEG slide show and LASeR script

Protecting one or more JPEG images

Each JPEG images inside the image track can be protected individually or collectively. Figure 37

shows the protection description for protecting two JPEG images.

ftyp moov

xml

Descriptor

IPMPGeneralInfo

ToolList

License

Descriptor

LASeR script

Item

Component


Item

Component


IPMPInfo

Item

Component


IPMPInfo

…Item

Component


Item

Component

Resource (text)


meta mdia

trak (timed text)

mdia

trak (MP3 audio)

meta mdia

mdat

JPEGs

1 2 … n-1

MP3

3GPP TS 26.245

n

Figure 37 — Protecting two JPEG images

Table 10 shows the metadata instantiation example of protection to two JPEG images.

Table 10 — Metadata instantiation example of protecting two JPEG images in slide show


<Descriptor> <Statement mimeType="application/laser"> <![CDATA[-------------- animation script or code ---------]]> </Statement> </Descriptor> <Item id="1"> <Component> <Resource mimeType="audio/mp3" ref="#mp (/byte(0, 4550000)"/> </Component>


<Component> <Resource mimeType="application/ipmp"> <ipmpdidl:ProtectedAsset mimeType="image/jpeg"> <ipmpdidl:Identifier> <dii:Identifier>IPMPId001</dii:Identifier> </ipmpdidl:Identifier> <ipmpdidl:Info> <ipmpinfo:IPMPInfoDescriptor> <ipmpinfo:Tool> <ipmpinfo:ToolRef localidref="10"/> </ipmpinfo:Tool> </ipmpinfo:IPMPInfoDescriptor> </ipmpdidl:Info> <ipmpdidl:Contents ref="#mp (/byte(4550000, 4550370)"/> </ipmpdidl:ProtectedAsset> </Resource> </Component> <Component> <Resource mimeType="application/ipmp"> <ipmpdidl:ProtectedAsset mimeType="image/jpeg"> <ipmpdidl:Identifier> <dii:Identifier>IPMPId002</dii:Identifier> </ipmpdidl:Identifier> <ipmpdidl:Info> <ipmpinfo:IPMPInfoDescriptor> <ipmpinfo:Tool> <ipmpinfo:ToolRef localidref="10"/> </ipmpinfo:Tool> </ipmpinfo:IPMPInfoDescriptor> </ipmpdidl:Info> <ipmpdidl:Contents ref="#mp (/byte(4550370, 4550892)"/> </ipmpdidl:ProtectedAsset> </Resource> </Component>






Protecting certain segment of resource

Using MPEG-21 Fragment Identifier (ISO/IEC 21000-17), it is possible to protect specific

segment of the content, such as specifically defined rectangle region of JPEG image or specific

segment of MP3 audio which bytes duration are defined, as shown in Figure 38 and Figure 39,

respectively.

ftyp moov

xml

Descriptor

IPMPGeneralInfo

ToolList

License

Descriptor

LASeR script

Item

Component


Anchor

Fragment ref=”location” IPMPInfo

Item

Component


…Item

Component


Item

Component

Resource (text)


meta mdia

trak (timed text)

mdia

trak (MP3 audio)

meta mdia

mdat

JPEGs

1 2 … n-1

Partition protected

MP3

3GPP TS 26.245

n

Figure 38 — Protecting specific segment in MP3 audio

ftyp moov

xml

Descriptor

IPMPGeneralInfo

ToolList

License

Descriptor

LASeR script

Item

Component


Item

Component


Item

Component


Anchor


…Item

Component


Item

Component

Resource (text)


meta mdia

trak (timed text)

mdia

trak (MP3 audio)

meta mdia

mdat

JPEGs

1

2

… n-1

MP3

3GPP TS 26.245

n

Figure 39 — Protecting specific segment in a JPEG image

Table 11 shows the metadata instantiation example of protection to specific segment in the MP3

audio data (from byte 35,340 to 3,200,234).

Table 11 — Metadata instantiation example of protecting specific segment of MP3 audio

<?xml version="1.0" encoding="UTF-8"?> <DIDL xmlns="urn:mpeg:mpeg21:2002:02-DIDL-NS" xmlns:dii="urn:mpeg:mpeg21:2002:01-DII-NS" xmlns:ipmpdidl="urn:mpeg:mpeg21:2004:01-IPMPDIDL-NS" xmlns:ipmpinfo="urn:mpeg:mpeg21:2004:01-IPMPINFO- BASE-NS" xmlns:mx="urn:mpeg:mpeg21:2003:01-REL-MX-NS" xmlns:r="urn:mpeg:mpeg21:2003:01-REL-R-NS" xmlns:sx="urn:mpeg:mpeg21:2003:01-REL-SX-NS" xmlns:enc="http://www.w3.org/2001/04/xmlenc#" xmlns:dsig="http://www.w3.org/2000/09/xmldsig#" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="urn:mpeg:mpeg21:2003:01-REL-R-NS rel-r.xsd urn:mpeg:mpeg21:2003:01-REL-MX-NS rel-mx.xsd urn:mpeg:mpeg21:2004:01IPMPINFO-BASE-NS IPMPInfo-Profilev0.4.xsd urn:mpeg:mpeg21:2004:01-IPMPDIDL-NS IPMPDIDL.xsd urn:mpeg:mpeg21:2002:01-DII-NS dii.xsd urn:mpeg:mpeg21:2002:02-DIDL-NS DIDL.xsd"> <Container> <Descriptor> <Statement mimeType="text/xml"> <ipmpinfo:IPMPGeneralInfoDescriptor>

<ipmpinfo:ToolList> <ipmpinfo:ToolDescription localID="10"> <ipmpinfo:IPMPToolID>AES-128-CBC</ipmpinfo:IPMPToolID> </ipmpinfo:ToolDescription> </ipmpinfo:ToolList> <ipmpinfo:LicenseCollection> <ipmpinfo:RightsDescriptor> <ipmpinfo:License>  </ipmpinfo:License> </ipmpinfo:RightsDescriptor> </ipmpinfo:LicenseCollection> </ipmpinfo:IPMPGeneralInfoDescriptor> </Statement> </Descriptor>

<Descriptor> <Statement mimeType=" application/laser "> <![CDATA[-------------- animation script or code ---------]]> </Statement> </Descriptor> <Item id="1"> <Component> <Resource mimeType="audio/mp3" ref="#mp (/byte(0, 4550000)"/>

<Anchor>

<Fragment fragmentId="#mp (/byte(35430, 3200234) "> <ipmpdidl:ProtectedAsset mimeType="audio/mp3">

<ipmpdidl:Identifier> <dii:Identifier>IPMPId001</dii:Identifier> </ipmpdidl:Identifier> <ipmpdidl:Info> <ipmpinfo:IPMPInfoDescriptor> <ipmpinfo:Tool> <ipmpinfo:ToolRef localidref="10"/> </ipmpinfo:Tool> </ipmpinfo:IPMPInfoDescriptor> </ipmpdidl:Info> </ipmpdidl:ProtectedAsset> </Fragment>

</Anchor> </Component>


<Component> <Resource mimeType="image/jpeg” ref="#mp (/byte(4550000, 4550370)"/> </Component> <Component> <Resource mimeType="image/jpeg” ref="#mp (/byte(4550370, 4550892)"/> </Component>

<Component> <Resource mimeType="image/jpeg” ref="#mp (/byte(4550892, 4551006)"/> </Component>



Table 12 shows a metadata instantiation example of protection to specific region in two JPEG

images. The first image is protected in specific rectangular region pointed by left upper pixel

coordinate [20, 20] and right bottom pixel coordinate [40, 40]. The second image is protected in

specific ellipse region pointed by its circumscribing pixel coordinate from left upper at [45,30] to

right bottom at [120,120].

Table 12 — Metadata instantiation example of protecting specific segment of a JPEG image


<Descriptor> <Statement mimeType="application/laser"> <![CDATA[-------------- animation script or code ---------]]> </Statement> </Descriptor> <Item id="1"> <Component> <Resource mimeType="audio/mp3" ref="#mp (/byte(0, 4550000)"/> </Component>


<Component> <Resource mimeType="image/jpeg" ref="#mp (/byte(4550000, 4550370)"/>

<Anchor>

<Fragment fragmentId="#mp (/region(rect(20,20,40,40))) "> <ipmpdidl:ProtectedAsset mimeType="image/jpeg">


</Anchor> </Component> <Component> <Resource mimeType="image/jpeg" ref="#mp (/byte(4550370, 4550892)"/> </Component>

<Component> <Resource mimeType="image/jpeg" ref="#mp (/byte(4550892, 4551024)"/>

<Anchor>

<Fragment fragmentId="#mp (~region(ellipse(45,30,120,120))) "> <ipmpdidl:ProtectedAsset mimeType="image/jpeg">


</Anchor> </Component>





Moreover, it is also possible to combine the protection to certain segment of resource with

protection of other content, e.g., protecting certain segment of MP3 audio and protecting JPEG

images that are synchronized to the timestamp of the protected segment of MP3 as shown in

Figure 40.

ftyp moov

xml

Descriptor

IPMPGeneralInfo

ToolList

License

Descriptor

LASeR script

Item

Component


Anchor


Item

Component


Item

Component


Anchor


…Item

Component


Item

Component

Resource (text)


meta mdia

trak (timed text)

mdia

trak (MP3 audio)

meta mdia

mdat

JPEGs

1

2

… n-1

Partition protected

MP3

3GPP TS 26.245

n

Figure 40 — Protecting specific segment in MP3 audio as well as specific segment in a JPEG

image

3.4. MPEG-A Part 10: Video Surveillance Application Format

3.4.1. Overview

The Video Surveillance AF is a file format designed to provide for a first level of interoperability

for video-based surveillance systems. It contains MPEG-4 AVC baseline profile video and

associated MPEG-7 meta-data. Usage of other coded video formats will be assisted. The proposed

Video surveillance application format is not intended to directly accommodate the legacy content

and components. Rather, it is intended to provide a lightweight and useful wrapper to the video

content from the MPEG technologies that are the best fit for purpose at the date of expected

finalisation. However, the description of any relation existing between Video surveillance

application format content and other video content will need to be addressed. Currently, JPEG and

MPEG-4 Part-2 are arguably the most commonly deployed digital video standards. However, in

due course it is expected that AVC will be more commonly deployed, not least because it is

understood to be the most „fit-for-purpose‟ of the available video technologies.

3.4.2. File format

A Video Surveillance AF contains of a set of self-contained AF fragments which are connected to

each other. A Video surveillance application format fragment covers a limited amount of time.

Each Video surveillance application format fragment is identified by an UUID (universal unique

identifier) which is linked to a predecessor and successor fragment through their UUIDs as shown

in Figure 41. All Video surveillance application format data is stored within the Video

surveillance application format fragments. If a fragment has no predecessor or successor its value

is set the current fragment. Additionally an URI can be given serving as a hint to the location of

the predecessor and successor fragments. A Video surveillance application format fragment

remains self contained even if unhinged. Note that there is no requirement to use more than one

Video surveillance application format fragment. The concept of using fragments e.g. enables ring

buffer architectures. Each fragment shall be a valid AVC file as defined by the AVC file format.

To store the compressed video data in sample units and its corresponding metadata, the Video

surveillance application format provides a file structure based on the ISO Base Media File Format

which takes an object-oriented approach to the design of boxes for data storage. The file structure

of Video surveillance application format contains the boxes specifically designed for AVC File

Format.

Figure 42 shows the file structure of Video surveillance application format which consists of all

the MPEG-4 file format mandatory boxes: the file type („ftyp‟) box to store the information for

the identification of file format, the movie container („moov‟) box to store the location and timing

information of video samples, and media data container („mdat‟) box to store the compressed

video data in sample units. The sample unit for AVC bitstreams is NAL (Network Adaptation

Layer) unit.

In the file type box, the major brand „vsf1‟, which stands for „video surveillance format 1‟, is

specified to identify Video surveillance application format. This means any Video surveillance

application format-capable player should identify this brand in order to parse the file structure of

Video surveillance application format and to handle its contents.

VS AF fragment n-1

UUID

Start time

Duration

Predecessor UUID

Successor UUID

VS AF fragment n

UUID

Start time

Duration

Predecessor UUID

Successor UUID

VS AF fragment n+1

UUID

Start time

Duration

Predecessor UUID

Successor UUID

predecessor id

predecessor id

successor id

successor id

predecessor id successor id

predecessor id successor id

●●●

●●●

Figure 41 — VS AF fragments

In the movie container box, each track box carries the information about the video contents in

terms of locations, times, durations, track type handlers, AVC-specific information (profiles and

levels for codec types, bit rates, frame rates, screen sizes), media information etc. which are used

to identify and to enable the access to the corresponding media contents. By default, at least one

video track shall exist in the Video surveillance application format. The specification of Video

surveillance application format allows only containing one video data type encoded using AVC

baseline profile up to level 3.1. The handler type for each media handler box is specified based on

the content it refers to. Each video track is identified by the „vide‟ handler type while each timed

metadata track is signaled by the „meta‟ handler type.

The Video surveillance application format requires the capture time stamp to be stored for every

video frame. The timestamps are stored in timestamp metadata samples in a time parallel metadata

track which is linked to the video track by means of a track reference with type „vsmd‟. For all

video samples a timestamp metadata sample shall exist with decoding time equal to the decoding

time of the corresponding video sample. The Video surveillance application format defines the

storage of a binary coded timestamp for all video samples of a video track. However future

version of the Video surveillance application format might store more information about a video

sample. The video surveillance metadata sample entry contains a version number to inform the

reader of the sample format used in this metadata track.

ftyp meta

vsmi

xml

MPEG-7

moov

trak

meta

cami

xml

MPEG-7

stbl

trak

meta

cami

xml

MPEG-7

stbl

trak

meta stbl

●●●

moov

AVC video

AVC video

Timed

metadata

●●●

Figure 42 — VS AF file structure


Creating Video surveillance application format file involves the use of movie fragment (do not be

confused with Video surveillance application format fragment). In general, movie and track

fragments extend a presentation in time. All fragments must be stored in sequence given by an

ordinal sequence number. Each movie fragment contains one or more track fragments for all

tracks in the movie. Using movie fragment, the video data can be read during the storing and

creating the Video surveillance application format file. Figure 43 illustrates how the movie

fragments in file format.

moov

trak

meta

cami

xml

MPEG-7

stbl

trak

meta stbl

mfra

tfra

Track fragment Run Array 0


●●●

Free space (new array

fields are appended)

tfra



●●●

Free space (new array

fields are appended)

mdat

AVC video sample 0

Timed metadata sample 0

AVC video sample 1

AVC video sample 2

AVC video sample n

appended

Timed metadata sample 1

●●●

Timed metadata sample n

appended

Figure 43 — Using movie fragment

3.4.4. AVC video

Because of multi-functionality of MPEG-4 AVC, subsets of different tools have been defined in

order to allow effective implementations of the standard. These subsets, called "Profiles", limit the

tool set which shall be implemented. For each of these Profiles one or more Levels have been set

to restrict the computational complexity of implementations.

MPEG-4 AVC accepts various sizes of input picture within the capability specified from the

Profile and Level. In this AF usage of the MPEG-4 AVC video codec is required. The Baseline

Profile tool set will be used up to level 3.1 (maximum value of level_idc shall be 31). Both,

constraint_set0_flag and constraint_set1_flag shall be set to 1 simultaneously. The profile and

level of MPEG-4 AVC for Baseline profile and Level 1 to 3.1 is shown in Table 13 and Table 14

respectively.

Table 13 — MPEG-4 AVC Baseline profile

Features Baseline

I and P Slices

B Slices

SI and SP Slices

Multiple Reference Frames

In-Loop Deblocking Filter

CAVLC Entropy Coding

CABAC Entropy Coding

Flexible Macroblock Ordering (FMO)

Arbitrary Slice Ordering (ASO)

Redundant Slices (RS)

Data Partitioning

Interlaced Coding (PicAFF, MBAFF)

4:2:0 Chroma Format

Monochrome Video Format (4:0:0)

4:2:2 Chroma Format

4:4:4 Chroma Format

8 Bit Sample Depth

9 and 10 Bit Sample Depth

11 to 14 Bit Sample Depth

8x8 vs. 4x4 Transform Adaptivity

Quantization Scaling Matrices

Separate Cb and Cr QP control

Separate Color Plane Coding

Predictive Lossless Coding

Table 14 — MPEG-4 AVC Levels up to level 3.1

Level

number

Max

macroblocks

per second

Max frame

size

(macroblocks)

Max video bit

rate (VCL) for

Baseline,

Extended and

Main Profiles

Examples for high

resolution @ rame

rate (max stored

frames) in Level

1 1485 99 64 kbit/s [email protected] (8)

[email protected] (4)

1b 1485 99 128 kbit/s [email protected] (8)


1.1 3000 396 192 kbit/s




1.2 6000 396 384 kbit/s [email protected] (7)


1.3 11880 396 768 kbit/s [email protected] (7)


2 11880 396 2 Mbit/s [email protected] (7)


2.1 19800 792 4 Mbit/s [email protected] (7)


2.2 20250 1620 4 Mbit/s

[email protected](10)




3 40500 1620 10 Mbit/s





3.1 108000 3600 14 Mbit/s




3.4.5. Metadata

To store the information about the time stamp of each video sample, the timed metadata track is

used in the „moov‟ box. In addition, to store the MPEG-7 metadata, the „meta‟ boxes are used.

There are two types of „meta‟ boxes used: file level „meta‟ box to store the file level metadata; and

track level „meta‟ box to store track level metadata.

File level metadata

The file level metadata is used to describe the information regarding the identification and creation

time of a Video surveillance application format file and the textual annotation about the contents

within the Video surveillance application format. It also allows for the use of classification scheme

to describe the content. The metadata should be located in the top level of the Video surveillance

application format file in order to enable easy access for identification information of the Video

surveillance application format file. In the file structure hierarchy of Video surveillance

application format, the metadata is located in the file level, hence being called the file level

metadata.

On file level two different boxes may be used to store metadata. The AF Identification Box is

required, and shall be included in every Video surveillance application format fragment. The

„vsmi‟ box shall provide the Video surveillance application format fragment identification UUID

and the UUID of the successor and predecessor. It may also provide the URI to the successor and

predecessor. If these URIs are provided there must be the possibility to resolve the URIs. The box

covers the following information:

File identification: An UUID identifying every Video surveillance application format

fragment

Successor and predecessor identification: The UUID of the previous/next fragment in

composition time shall be included (URIs describing the corresponding location may be

included)

The UTC based time stamp of the first and the last sample in the video tracks

Additional metadata box containing further information may also be included in a Video

Surveillance AF fragment. If present, metadata contained inside this additional box is presented

using MPEG-7. The following profile applies to the MPEG-7 in file level metadata:

1. General text annotation

a. Textual annotation should be added using either free text or structured annotation. All

elements within structured annotation are limited to zero or one.

2. For file level metadata:

a. ID

i. The UUID of the Video surveillance application format file shall be repeated in the

PublicIdentifier of the DescriptionMetadata element. There shall be one of

these descriptors.

ii. It is assumed the UUID of the camera already contains the information of the cluster the

camera belongs to.

b. Time

i. The Video surveillance application format start-time should be repeated at file-level

(CreationTime of the DescriptionMetadata element). This indicates the

creation time of the fragment.

c. Textual annotations should be added using Comment of the DescriptionMetadata

element. Zero or one of the types described in 1.a. shall be used.

d. Classification schemes and Terms can be defined and used as described in MPEG-7 MDS.

The cardinality of the Definition element of the TermDefinitionBaseType shall be

zero or one. The cardinality of the Name element of the TermDefinitionBaseType

shall be zero or one. There shall be zero of the preferred attribute of the Name element of

the TermDefinitionBaseType.

e. Maintaining object references

i. Object references are grouped using the Graph DS and referenced using Relation DS

elements. The objects can be any DS, as used within the XML document, e.g. a still

region of video at track level.

ii. The attributes within a Relation are restricted to contain unary values. E.g. source, target

and type can only contain a single reference to a Classification Scheme Term, id reference,

etc. i.e. the values of the termReferenceType.

Track level metadata

The track level metadata is used to describe the information regarding the content of the Video

surveillance application format. It describes the track identification information, camera

equipments, timing information for each track, text annotation to describe the event,

decomposition of frames, locations of the objects as ROI‟s in the frame, color appearance of the

objects, and identification of the object. The current specification of Video surveillance

application format has a limited set of MPEG-7 Visual descriptors such as the dominant color and

scalable color descriptors. Since the track-level metadata describes the video content, the metadata

with the MPEG-7 dominant color and scalable color descriptor values are located inside the

respective track boxes of Video surveillance application format, hence being called the track level

metadata.

Track level metadata is included in two boxes. A required camera identifier box shall be included.

Provides the camera identification UUID and user defined identification extensions used to create

a particular track in a Video surveillance application format. The camera UUID should be

assigned with a physical camera or with a camera location. The camera identification box may be

enlarged (indicated by the size of the box) if storage of user defined identification data is required.

This extra information might not be understood by all Video surveillance application format

readers. If there are alternative tracks holding different encodings from the same camera then the

camera UUID shall be identical for all these alternate tracks. This box contains the camera

identification:

An UUID identifying the camera

Additional space for user defined identification

Additional metadata box may be included. If present, metadata contained inside this additional

box is presented using MPEG-7 with the following profiles:

a. Meta-data for each track are described using the Video Segment DS. E.g. VideoType.

Only one of these types shall be used.

b. ID.

i. The camera id shall be repeated from the 'cami' box in PublicIdentifier of the

DescriptionMetadata element. There shall be one of these descriptors.

c. Equipment

i. The camera / cluster settings should be given (Instrument of the

DescriptionMetadata element). If present, zero or one of these types shall be used.

ii. Additional information regarding the cluster to which the camera belongs to should be

given by EntityIdentifier and its VideoDomain element. If present, one of the

EntityIdentifier types shall be used and zero or more of the VideoDomain types

should be used. The VideoDomain elements reference entries from a Classification

Scheme (ClassificationScheme).

iii. The camera stream should be identified with StreamID. Zero or more of these types

shall be used. It is necessary to include the element InstanceIdentifier,although

this can be kept empty.

iv. The camera geographic position should be given using CreationLocation. Zero or

one of these types shall be used.

v. Camera calibration should be provided with the

Spatial2DCoordinateSystemType. A description of more than one of these

descriptors allows a calibration function for each preset for PTZ cameras to be calibrated.

Zero or one of these types shall be used

vi. If the media is outside of the Video surveillance application format fragment and

referenced using the Data Reference Box („dref‟) the MediaURI shall contain the same

reference. If no Data Reference Box („dref‟) is present the MediaURI should contain a

valid reference to a media instance. It is necessary to include the element

InstanceIdentifier, although this can be kept empty.

d. Time

i. Video offset has no specific element, so the Description Metadata DS

(DescriptionMetadata), Instrument and its Tool Setting elements should be used.

The setting name is “offset”. If present, one of these types shall have the format and

precision as given in Section 6.

ii. The time of the video shall be given with a media time element (MediaTime). Within

this element one time point shall be given. Duration information can be also given here.

To specify the duration of a video decomposition MediaDuration should be used. This

is a representation of the duration of the track as given in Section 6.

iii. To isolate where the StillRegion exists in the video, the MediaTimePoint shall be

used.

e. Decomposition

i. Groups of frames should be defined within the video using the

TemporalDecomposition. If present, one of these types shall be used

ii. Single frames should be decomposed using the StillRegion DS. If a frame

(StillRegion DS) is decomposed its time position shall be specified.

iii. To isolate a region within a frame, a choice shall be made between a Box or Polygon.

Zero or one of these can be described per StillRegion. If more regions are required

for a frame, then another StillRegion can be instantiated, referencing the same media

time point (mediaTimePoint).

f. Visual Descriptions

i. Color should be described in the StillRegion DS by the VisualDescriptor DS

or the GridLayout DS. The VisualDescriptor shall include one color descriptor.

The GridLayout can specify an arbitrary number of cells, each should contain one

color descriptor.

ii. DominantColor and ScalableColor shall be the only descriptors present from the

VisualDescriptor DS and GridLayout DS.

g. Semantic descriptions

i. A camera track should define semantic descriptions using TextAnnotation – see

1.”General text annotation”. This is possible with FreeTextAnnotation at

DescriptionMetadata, Video and StillRegion levels.

ii. A camera track should define semantic descriptions using TextAnnotation – see

1.”General text annotation”. This is possible with a structured annotation using the

StructuredAnnotation at DescriptionMetadata, Video and

StillRegion levels.

iii. In order to provide detailed meaning to semantic descriptions, Terms should be referenced

from Classification schemes (ClassificationScheme) – see 2.d.

h. Maintaining object references

i. Object references are defined as described in 2.e.

4. Implementation



4.1.1. Reference software

The reference software consists of a Protected Music player AF authoring tool and player, which

are built in C++ language using MFC in Visual C++ 6.0. We use additional libraries to decode the

MP3 audio and JPEG image data, and to parse the XML metadata. The FMOD SoundSystem

library by Fireflight Technologies is used to decode the MP3 audio. The CxImage library by

Davide Pizzolato is used to decode JPEG data. To parse the XML metadata in the player, we use

MSXML SDK. Also in this reference software we use of ISOLIB library to parse ISO-based file

format.

4.1.1.1. Authoring tool

The software architecture to create the aforementioned file format is shown in Figure 44. First,

the MP3 audio files and JPEG images are loaded into the creator. The creator will handle the

list of candidate files to be packaged into the Protected Music player AF categorized by its

type: audio and images. Next, one audio file and one image file is selected to be bind in the

AF file. The bind information will be used by the creator to define file format structure and

protection information. Based on this information, the creator will then load the actual

resource and packaged them into one or more tracks within a Protected Music player AF file.

Protected Music Player AF Creator

MAF file

MP3 file

JPEG file

Get re

so

urce

dir

ecto

rie

s

load resources

Write

file

save

Bin

d M

P3

an

d J

PE

G

Pro

tectio

n

In

form

atio

n

Load MP3 and JPEG file

Define boxes’ data

Figure 44 — Software architecture of Protected Music player AF creator

In the user interface, as shown in Figure 45, the following procedure applies:

1. Add an MP3 file using “Add MP3” button. Repeat this step to add more MP3 files

2. Add a JPEG file using “Add JPEG” button. Repeat this step to add more JPEG files

3. Select the desired MP3 and JPEG file to be combined in one MAF track. The selected file

will be shown in the text box in the Resource preview area

4. Select the desired protection tool to be applied for the MAF file by pressing “Protection

tool” button. The protection tool GUI will be displayed as shown in Figure 46. Insert the

protection key in protection key text input. Although currently not supported by the

reference software, it is possible to also add remote reference to the protection tool.

5. Define the license for MP3 resource by double-click the MP3 file in the list. The License

information GUI will be displayed as shown in Figure 47.

6. Combine both MP3 and JPEG using “Bind JPEG to MP3” button. By this step one track

of MAF is created

7. Repeat step 3 and 4 to create another track.

8. Finally, press “Save” button to define a name for the new MAF file

Figure 45 — Protected Music Player MAF Authoring Tool GUI

Figure 46 — Protection tool GUI

Audio controls

Selected resource

Resources list

Image preview

Protection tool

Figure 47 — License information GUI

4.1.1.2. Player

The software architecture for the Protected Music player AF application is shown in Figure 48.

Basically, it consists of three parts: box parser, un-protector and resource playback. The box

parser will be invoked first, parsing all necessary information stored inside the boxes, the un-

protector will be used to decrypt the encrypted protected resource(s), while the resource

playback will play the MP3 audio data and display its corresponding JPEG image data and

MPEG-7 metadata based on the parsed information. To reduce the memory usage, the

resource will be loaded to the memory only during the playback. The player is an extension to

Music player AF player reference software. The extension includes the use of ISOLIB library

to parse ISO-based file format, the IPMP DIDL parser and un-protection (decryption) module.

Protected Music Player AF Player

MP3 decoder

JPEG decoder

Play MP3

Display JPEG

MP4 file reader

Read FTYP

Read MOOV

Read META

Read TRAKs

parse

Read MDAT

Un

-pro

tectio

n

To

ol

Un

pro

tect

load

MAF file u

Figure 48 — Protected Music player AF player system architecture

To parse the protected Music player AF we use the ISOLIB API in the following procedure:

1. Parse the file-level meta box to get the primary data, i.e. the IPMP DIDL metadata

2. Parse the file-level meta box to get the first item, i.e. the hidden MP4 file

3. Parse the hidden MP4 file to get the resources: MP3 audio, JPEG image and MPEG-7

metadata

By parsing the IPMP DIDL metadata, we can obtain the identifier for the protection tool used

inside the file. The un-protection tool will be invoked just once when a protected item is

found during the third procedure.

The IPMP DIDL metadata is parsed using MSXML library, following the schema of IPMP

Component base profile and profiled REL schema. This reference software uses several

protection tools to unprotect protected resource(s): an XOR algorithm and AES algorithm

(Rijndael). The protection tool is applied into each access units of the protected MP3 audio

data.

The following describes the method used to playback three types of resources stored inside

the hidden MP4 file. MP3 is stored inside the „mdat‟ box inside the hidden MP4 file

partitioned into many chunks (access units). Method used to playback audio data is as follow.

The application first parses „iloc‟ box to read the item‟s position and length. Next, it parses

„infe‟ box to read the item‟s name and content type for respective item index. Based on the

content type, in „iinf‟ box, the resources‟ variables can be separated into audio and image

which will make the variables easier to be managed.

Since the JPEG data is regarded as an item of „meta‟ box, the ISOFindItemByID() and

ISOGetItemData() APIs are used to obtain the data from the „mdat‟ box and display it

using CxImage library.

Method to parse both IPMPDIDL metadata inside file-level „meta‟ box of MPEG-21 file and

MPEG-7 metadata inside track-level meta box of hidden MP4 file are the same. In both

module, we use ISOGetFileMeta() API to get file-level metadata, while

ISOGetTrackMeta() API is used to get track-level metadata. Parser built using MSXML

library is then use to obtain the information located inside the metadata.

For this reference software we provide several files as follows:

Single track MAF

This example consists of a single protected MP3 audio data and a JPEG image data. The

audio data is a Korean pop song, “Gido”, and the image of the artist. The output is shown in

Figure 49.

Figure 49 — Protected Music Player MAF with single track MAF

Multiple tracks MAF

This example consists of 2 MAF tracks (each has 1 MP3 audio data and 1 JPEG image data).

The first track is a Korean folk story audio file and the image shows the title of the story. The

second track is a Korean drama soundtrack and the image from the drama. The output is

shown in Figure 50 (a) for the first track, and (b) for the second track.

(a)

Figure 50a — Protected Music Player MAF with two tracks MAF: first track

(b)

Figure 50b — Protected Music Player MAF with two tracks MAF: second track

To unprotect the protected resource, the MAF player will automatically invoke a dialog box to

have user insert the protection key based on the protection tool defined inside the IPMPDIDL

metadata. Figure 51 shows the dialog box.

Figure 51 — Un-protector GUI



The Musical slide show AF file can be created using Protected Musical slide show AF reference

software by excluding the protection information. See section 4.3.2 of this document to detailed

description of the reference software.



4.3.1. Implementation on PDA

As a part of the development of Protected musical slide show AF, a player application has been

made for PDA using Windows Mobile OS. The user interface is as shown in Figure 52.

Figure 52 — User interface of Musical Slide Show AF player

Playing unprotected resource

This is a very simple case, where we have the Musical slide show application format file, consists

of MP3 data, several JPEG images and a set of lyrics; without any protection. User can freely play

the resource for unlimited time within unlimited range of time.

Figure 53 — User interface of AF Player application showing unprotected resource

As shown in Figure 53, if user presses the “File Information” button, the application showed “No

license information available” for the unprotected resource.

Playing protected resource

In case of playing protected resource, as shown in Figure 54, user can see the license information

of protected resource using “File Information” button. In this example, user has been granted to

play the resource for 20 times during January 1, 2006 until January 1, 2007. For this application,

we define the condition of “playing content” as “playing content continuously for more than 75%

Slideshow area

Control panel (left to right):

- Open file - Play/stop - Pause - Show information

of the length of the MP3 data”. This means if the user plays the resource and listen to the song

more than half of it without perform any sliding or pausing, then the user exercise limit shall

decrease by one. In the example we can see that user has already play the content for one time,

therefore the file information shows the remaining times (19/20 means the only 19 times of

playing left, out of 20 times given by the license).

Figure 54— User interface of AF Player application showing license of protected resource

Playing protected resource when exercise limit license already expired

In case of playing protected resource when the limited exercise license already expired, the

application still allows user to see unprotected JPEG images and see the file information. However

user cannot play the protected resource. As shown in Figure 55, the user has granted 2 times of

playing the content, where the user already play the content continuously twice. When the user

tries to play the content one more time, the application will show the message that the exercise

limit has expired

Figure 55 — User interface of AF Player application shows expired exercise limit

Playing protected resource when validity condition license already expired

In case of playing protected resource with expired validity condition license, the application still

allows user to see unprotected JPEG images and see the file information. However user cannot

play the protected resource. As shown in Figure 56, here we have Protected Musical slide show

application format file with validity available from January 1, 2005 until July 16, 2006. The user is

trying to play the resource at July 18, 2006.

Figure 56 — User interface of AF Player application shows expired license

Playing protected resource with different protection tool

In case of playing protected resource with different protection tool during the authoring process,

the application invoke a message, as shown in Figure 57, tells user that the application unable to

unprotect the protected audio. This case can be happen, for example, if user tries to playing the

protected AF file acquired from different authors or producers that has different protection tool.

Figure 57 — User interface of AF Player application failed to unprotect the resource


The reference software is build using the same platform and libraries as the reference software of

Protected Music player AF as described in section 3.1.5.2. Some modules also built based on that

of Protected Music player AF. Therefore in this section we will exclude the technical description

of the software modules.


The authoring tool is presented to introduce how the protected musical slide show can be

constructed. It has the following features:

MP3 player

JPEG display

MP3-JPEG synchronization

MP3-Timed text synchronization

Timed text font, highlight color and background color settings

Content protection: MP3, JPEG, Timed-text, LASeR (in schema only)

Choose of protection tools: XOR, AES-128-ECB, AES-128-CBC, and AES-128-CFB

Partial protection for MP3

Region protection for JPEG (experimental using XOR tool)

The authoring tool, as user interface shown in Figure 62, uses the following procedures to

create a Protected Musical slide show AF file.

Select MP3 by clicking “Add MP3” button

Select JPEG images by clicking “Add JPEG” button for one image. Repeat to add more

Select timed text lyrics by clicking “Add Text” button. The timed text lyrics file is a text

file pre-formatted using the following rules (as shown in Figure 58):

Separate synchronized text using slash “/” character

End the file with “/”

Figure 58 — Formatting the timed text

To synchronize images:

Select the image to be synchronized

Play the MP3 using “Play” button, or drag the slider to the desired timestamp

Select the animation effect

Click “Synchronize” button below the slider, and click “OK” to confirm

To synchronize text:

Play the MP3 using “Play” button

Click “Synchronize” button below the timed text viewer according to the synchronization

rules

To add protection to audio:

Double click the audio file directory name in resource list to invoke the protection

windows (Figure 59)

Click “Protect” to protect the whole audio file

For protecting certain segment, firstly click “|>” button to play the audio. At desired

timestamp, click “[s]” button to start the protection. To end the protection timestamp,

click “[e]” button. Click “[r]” button to reset

Click “OK” to confirm

Figure 59 — MP3 protection user interface

To add protection to image:

Double click the image file directory name in resource list to invoke the protection

windows (Figure 60)

Click “Protect” to protect the whole image region

For protecting certain segment in rectangle, click within the image to point the top-left

corner of the rectangle, and click once more to point the bottom-right corner of the

rectangle


Figure 60 — Image protection user interface

To set the protection tool and license scheme:

Click “IPMP Tool” button to invoke IPMP Tool window (Figure 61)

Select provided Tool ID

Input protection key (any character)

Define the license validation range

Input the desired exercise number, or check “Unlimited” to define unlimited number of

exercising content


To protect the LASeR animation, check the “Protect” button below the slider.

To protect the timed text, check the “Protect” button above the timed text viewer.

Finally, click “Save” to save the musical slide show file.

The video tutorial on how to use the authoring tool is available in YouTube:

http://www.youtube.com/watch?v=hJfOaEGQxsE

http://www.youtube.com/watch?v=hJfOaEGQxsE

Figure 61 — IPMP Tool and REL settings user interface

Figure 62 — Authoring tool user interface

4.3.2.2. Player

The musical slide show player reference software is built to provide example of

implementation of how to extract contents from the specified file format and execute the

contents. The implementation of protected musical slide show does not have to follow the

algorithm of this reference software. It has the following features:

Timestamp slider Timed text viewer Resource list

MP3-JPEG-Timed text synchronized play

ISO-base file format file structure view

MPEG-7 SMP structure view

MPEG-21 IPMP/REL structure view

SVG Player (from MPEG output doc. N8821) for LASeR rendering

The user interface of the player as shown in Figure 63 uses the following procedure to play

Protected Musical slide show AF file:

Click “Open” to load musical slide show file

If the file is protected, input the protection key in the input window, and click “OK” to

continue

Click “Play” to play the musical slide show

The ISO-base file format file structure, the MPEG-7 SMP and the MPEG-21 IPMP are

shown in the tree structure on the right side of the player, as shown in Figure 64.

Figure 63 — Player user interface

Figure 64 — Clockwise from top left: file structure, MPEG-7 and MPEG-21 structure view


4.4.1. Initial implementation

The initial implementation of Video surveillance AF was built as experiment result for MPEG

input proposal. It includes the file format creator and metadata generator. Since the current

specification of VS AF only for basic requirements, some parts are not implemented in reference

software.


The authoring tool of video surveillance MAF needs authorized user assistance to put the

information regarding the creation of the MAF, along with the information describing the

equipment settings and the video data itself from the surveillance camera. These inputs then

processed in the authoring tool by the following modules: video processing modules,

metadata modules, MAF module, and file writing module. The system architecture of the

authoring tool is shown in Figure 65.

Video surveillance AF creator

User

Surveillance

camera

Define MAF file

structure

MP4 file format

generator

Write file

VSAF File

Camera

information

Creation

information File-level metadata

generator

Track-level

metadata generator

Automatic video

processing

Manual video

processing

Video data

Figure 65 — Video surveillance AF creator system architecture

There are two video processing modules: automatic module and manual module. Automatic

video processing module contains algorithm (which can be a complex algorithm) to

automatically extract or, if necessary, alter the video to perform the video processing such as

object segmentation, object tracking, motion activity, color extraction, etc., and generate the

item-type metadata such as summary or segmentation information and visual descriptors.

Manual video processing module handles any user assistance to the video processing such as

manual video segmentation or user-assisted object extraction and, similar to automatic

module, generate the item-type metadata. The metadata output from these modules are

depicted by green lines from processing modules to the metadata modules. The green lines

depicted the processed video data sent to the MAF module. The video modules use

DirectShow library to render the video during the processing.

The metadata modules consist of Collection-type metadata generator module and Item-type

metadata module. Collection-type metadata describe the information regarding the description

of equipment information, creation information and video collection information, while item-

type metadata describe the information regarding the content of the video. Collection-type

metadata generator module receives input from the creation information obtained initially

from the user, while the equipment information obtained from the user based on the

equipment‟s (camera) specification (which also possible to be obtained automatically from the

equipment itself). Item-type metadata generator receives input from both video processing

modules as mentioned before. Both metadata generator modules sent output as metadata in

form of XML to the MAF module as depicted by blue lines in the Figure.

Static metadata contains the description of the creation of the VS AF. The description is as

described in Table 15.

The purpose of describing the equipment information such as the lens and the video settings is

to enable the processing of the video based on its source capabilities. For example, the angle

of view information of the lens can be used in 3D object tracking to determine whether the

object located near the camera or far away from the camera. By knowing the video

compression from the camera, the creator of the MAF can determine what kind of tool is

suitable for such video format. And so on so forth.

Table 15 – Description of the creation of the surveillance AF

Element Description

Comment Describes the brief description of the creation of MAF file. It is

described using FreeTextAnnotation

Creator Describes the creator of the surveillance MAF. For now we use

PersonType descriptor with the following descriptions:

Name/GivenName, Name/FamilyName,

Affiliation/Organization/Name (three Names can be added)

CreationLocation Describes the location of the scene in the video.

CreationTime Describes the time of creation of the video

Instrument Describes the camera settings. The following descriptions are used:

tool name, camera name, lens maker, lens focal ratio, lens focal

length, lens focus range, horizontal angle of view, vertical angle of

view, video compression, video compression profile, video resolution

and video frame rate

Figure 66 shows the user interface of creating/displaying this metadata. The inputs for

creation time are automatically obtained from the video file, as well as the video resolution.

The metadata generated in this application has been previously validated.

Content metadata can be generated by the application is the summary description and video

segmentation. Figure 67 shows how the video can be segmented manually. To segment the

video, simply play the video until desired time point/position, then pause the video and set it

as video segment (by default, a video is one big segment from the beginning to the end). The

time start and duration of each segment is displayed in the list, and a description for each

segment can be added if necessary. Only sequential summary for the segment was made, but

it is possible to group the segments to make the hierarchical summary.

A visual descriptor can be used to be generated to metadata. Currently we can make the

MotionActivity descriptor based on the motion intensity of each region that can be described

using GridLayout descriptor. The intensity of the motion is calculated from the amount of

pixel difference from a frame with previous frame. A threshold value is set so only large pixel

differences are considered as “object is moving / active”. Based on this value, we partition the

frame into 8x8 blocks and only blocks that have pixel activities (i.e. large pixel value

difference) are considered as active. Based on this, we can determine the magnitude of

activities in the regions.

The one purpose of describing the activity of a region is to have one or more region as area

within the frames that need more attention. This might be useful for the camera that encloses

quite large area, but only some parts of the frame should be monitored. For example, in

“traffic” sequence, we might need to consider only the road part of the frame.

Figure 68 and 69 show the magnitude of activities for “lab” sequence and “traffic” sequence,

respectively. For “lab” sequence, 3x3 regions is used, while in “traffic” sequence 4x4 regions

is used. From the experiment, we can say that for “lab” sequence, region no 5 might need

special attention, because it encloses the alley part of the lab. For “traffic” sequence, special

attention might be paid to region no 9 to 12 because they enclose the road part.

MAF module defines all necessary data regarding the creation of MAF boxes. This module

contains MPEG file format library (here we use ISOLib library). This module shall receive

two types of input data: metadata and video data. Metadata will be put in „meta‟ box, while

video data will be put in „mdat‟ box. After these MAF definitions (boxes‟s variable‟s value)

are defined, the writing module writes the MAF file as the output of the authoring tool.

Figure 66 — Static description metadata user interface

Figure 67 — Segmenting the video

Automatically obtained from file

1 2 3

4 5 6

7 8 9

Figure 68 — Grid layout and motion activity for ―lab‖ sequence

1 2 3 4

5 6 7 8

9 10 11 12

13 14 15 16

Figure 69 — Grid layout and motion activity for ―traffic‖ sequence

4.4.1.2. Player

The video surveillance AF player basically does the reverse of the authoring tool. It parses the

boxes inside the MAF to extract the metadata and the video stored within it, and play the

video based on the information described in the metadata. As shown in Figure 70, the player

contains of the following modules: MAF parser, XML parser, video renderer and display unit.

MAF parser first read the MAF file to determine whether valid MAF file is loaded to the

player. Next, it parses the whole boxes inside the MAF, and extracts the data within the boxes.

The video data itself, stored in „mdat‟ box of MAF file, is also extracted from the MAF using

MAF parser module.

Based on data parsed from MAF parser module, the XML parser will parse the description of

the collection-type and item-type metadata, stored the value in the memory which will be use

to define how the video data will be rendered/played. This module uses MSXML library to

parse the metadata which works through the nodes of the XML file and obtain the value.

Video surveillance AF player

File format parser

Read file type

Read movie

Read track

XM

L p

ars

er

Display unit

Vid

eo

de

co

de

rRead file type

Read

metadata

VSAF File

Figure 70 — Video surveillance AF player system architecture

The video renderer will render the video based on the information obtained from the XML

parser (depicted by dotted orange line in the Figure). The information on how the video will

be rendered includes video segmentation and visual descriptors. Along with the video data,

the renderer is also responsible to render any descriptors such as grid rendering or object

markers rendering. As in the authoring tool, we use DirectShow library in this module to

perform this job. Finally, the display unit will display the rendered video (depicted by blue

line in the Figure) and any static and visual description (green line) in the user interface.

Figure 71 — Video surveillance AF creator and player user interface

The player part can parse the MAF file, extract the metadata and video part, and play it. Based

on the segment description described in the metadata, user can jump to the position described

in it. User can also see the description of camera information and the intensity of motion in

the regions described by in the grid layout descriptors. The user interface is as shown in

Figure 71.


The reference software for MPEG-A Part 10 standardization development is being made as a joint

work with Kingston University, UK. ICU is responsible for the conformance points in the

reference software. This sub section is as described in the MPEG contribution documents and

output documents of the reference software.

The Video surveillance application format reference software is normative in the sense that it

correctly implements the normative clauses contained in ISO/IEC 23000-10. Conforming ISO/IEC

23000-10 implementations are not expected to follow the algorithms or the programming

techniques used by the Video surveillance application format reference software. Although the

packing software is considered normative, it is not expected to add anything normative to the

Video surveillance application format textual clauses included in ISO/IEC 23000-10. The

reference software consists of authoring tool (Video Surveillance Packer, VSP) and player (Video

Surveillance Viewer, VSV).


The authoring tool is a simple pakager. It packages XML data and AVC video into a VS AF

file format using modified ISOLib library. The modification is performed to add CAMI and

VSMI box as described in the specification.

The initialization settings file (an .INI file) is used to define the UUIDs of AVC video and the

role of the XML metadata to be packaged. As shown in Figure 72, the content handler will

read the initialization file and accordingly, load the XML metadata and AVC video. The file

format generator will package the contents and produce a VS AF file that conforms to the VS

AF specification.

Video surveillance AF packager

XML

metadata

AVC video

Initialization

settings

Co

nte

nt

ha

nd

ler

File format

generator

VSAF File

ISOLib

Figure 72 — VSP system architecture

4.4.2.2. Player

The player is a simple application that unpacks the contents inside the VS AF file. It also

contains the library for MPEG-7 metadata parser, the MP7JRS library and JM decoder for

decoding AVC bitstream. It works similar to the player in the implementation application of

surveillance AF, however, as shown in Figure 73, additional conformant check modules for

both MPEG-7 metadata and AVC video are implemented to ensure the conformant of the

metadata and video data inside the VS AF file. As shown in Figure 74 and Figure 75, the user

interface of the reference software displays the information regarding the conformant points

of the VS AF file unpacked by the player.

Video surveillance AF player

File format parser

Read file type

Read movie

Read track

XM

L p

ars

er

Display unit

Vid

eo

de

co

de

rRead file type

Read

metadata

Co

nfo

rma

nt

ch

eck

Conformant

check

Figure 73 — VSV system architecture

Figure 74 — VSV main screen showing contents and meta-data

Figure 75 — VSV track player screen showing contents and meta-data

5. Achievements

In this section we present the achievements have been made for each standard development. In each MAF, there

will be a list of MPEG input contributions, MPEG output documents, and papers.



5.1.1. MPEG input contributions

a. M12197, Hendry, Munchurl Kim, Protecting and Governing Music MAF Player Format

based Contents by using MPEG-21 IPMP, Poznan, Poland, July 2005.

This input document is the initial proposal to the protection of Music player AF using MPEG-

21 IPMP.

b. M12588, Hendry, Munchurl Kim, Florian Pestoni, A Flexible and Extensible Protection of

Music Player MAF using Ligthweight MPEG-21 IPMP, Nice, France, October 2005.

This input document is the updated version of previous input document by specifying the

profiled MPEG-21 IPMP named as “Lightweight MPEG-21 IPMP”.

c. M12855, Hendry, Munchurl Kim, Takafumi Ueno, Shen ShengMei, ZhongYang Huang,

Florian Pestoni, Satoshi Ito, Jeho Nam, Protected Music Player MAF based on MPEG-21

IPMP and REL Profiles, Bangkok, Thailand, January 2006.

This input document is the updated version of previous input document by including the

rights expression information to the protection using REL profile.

d. M13223, Hendry, Munchurl Kim, Florian Pestoni, Zhongyang Huang, Updated Text for WD

of AMD1 Protected Music Player MAF — Section 2, Montreux, Switzerland, April 2006.

This input document is the updated version of the output document from previous meeting

regarding the encryption scheme for Protected Music player AF.

e. M13370, Zhongyang Huang, Shengmei Shen, Takafumi Ueno, Hendry, Munchurl Kim, Idea

to harmonize section 1 and section 2 of WD 1.0 AMD/1 Protected MPEG-A Music Player,

Montreux, Switzerland, April 2006.

This input document is the initial document of working draft of Protected Music player AF

based on the working draft of protection scheme described in output documents in previous

meeting. The result of this input document is the working draft of Protected Music player AF.

f. M13658, Houari Sabirin, Jeongyeon Lim, Hendry, Munchurl Kim, Contribution to Reference

Software of ISO/IEC 23000-2: MPEG Music Player Application Format, Klagenfurt, Austria,

July 2006.

This input document is the initial implementation of Music player AF. The work done in this

document is used as the basis in the development of reference software of Protected Music

player AF.

g. M14172, Hendry, Houari Sabirin, Munchurl Kim, Contribution for Protected Music Player

MAF Reference Software, Marrakech, Morocco, January 2007.

This input document is the initial implementation of Protected Music player AF reference

software.

h. M14176, Hendry, Houari Sabirin, Munchurl Kim, Editor‟s Study of ISO/IEC FCD 23000-2

MPEG-A, Music Player 2nd

Edition, Marrakech, Morocco, January 2007.

5.1.2. MPEG output documents

a. N7874, Working Draft of AMD/1 Protected MPEG-A Music Player Section 1, Bangkok,

Thailand, January 2006.

This output document is the description of file structure specification for Protected Music

player AF.

b. N7875, Working Draft of AMD/1 Protected MPEG-A Music Player Section 2, Bangkok,

Thailand, January 2006.

This output document is the description of encryption scheme specification for Protected

Music player AF.

c. N8091, Working Draft of 2nd Edition of MPEG-A Music Player Section 2: Protected Music

Player, Montreux, Switzerland, April 2006.

This output document is the result of input document M13370 by combining output document

N7874 and N7875.

d. N8359, ISO/IEC CD 23000-2 MPEG-A Music Player 2nd edition, Klagenfurt, Austria, July

2006.

This output document is the advance of the working draft resulted in previous meeting.

e. N8582, ISO/IEC FCD 23000-2 MPEG-A Music Player 2nd edition, Hangzhou, China,

October, 2006.

This output document is the advance of the committee draft resulted in previous meeting.

f. N8583, Reference Software Workplan for MPEG-A Music Player 2nd edition, Hangzhou,

China, October, 2006.

This output document describes the workplan for developing the reference software for

Protected Music player AF, based on the contribution M13658.

g. N8820, Study of ISO/IEC FCD 23000-2 MPEG-A Music Player 2nd edition, Marrakech,

Morocco, January 2007.

This output document is the advance of final committee draft resulted in previous meeting.

h. N8821, Reference Software Workplan v2.0 for MPEG-A Music Player 2nd edition including

initial software, Marrakech, Morocco, January 2007.

This output document describes the workplan for developing the reference software for

Protected Music player AF based on the contribution M14172.

i. N9122, Text of ISO/IEC 23000-2 FDIS Music Player Application Format 2nd Edition, San

Jose, USA, April 2007.

This output document is the final draft of international standard of Protected Music player AF,

concludes the work of Protected Music player AF.



a. M12396, Jeongyeon Lim, Munchurl Kim, Synchronization of Multiple JPEG data to MP3

tracks in Music MAF Player Format, Poznan, Poland, July 2005.

This input document is the initial proposal of implementing slide show in Music player AF.

b. M12589, Chansuk Yang, Jeongyeon Lim, Munchurl Kim, Extensions to Music MAF Player

Format for Multiple JPEG images and Text data with Synchronizations to MP3 data, Nice,

France, October 2005.

This input document is the updated proposal of implementing slide show to Music player AF

with addition of text synchronization.

c. M13673, Houari Sabirin, Jeongyeon Lim, Hendry, Munchurl Kim, Contribution to Reference

Software of ISO/IEC 23000-4: MPEG Musical Slideshow Application Format, Klagenfurt,

Austria, July 2006.

This input document is the initial implementation of Musical slide show AF. Most parts of the

contribution are used as the basis of the development of the Protected Musical slide show AF.

d. M13563, H. Jean Cha, Tae Hyeon Kim, Harald Fuchs, Munchurl Kim, Updated text for WD

1.0 Musical Slide Show MAF, Klagenfurt, Austria, July 2006.

This input document is the updated working draft of Musical slide show AF.

e. M14184, Hyouk-Jean Cha, Tae Hyeon Kim, Harald Fuchs, Munchurl Kim, Editor‟s study text

of ISO/IEC FCD 23000-4 Musical slide show MAF, Marrakech, Morocco, January 2007.

This input document is the updated final committee draft of the Musical slide show AF.


a. N8131, WD of ISO/IEC 23000-4 (Musical Slide Show MAF), Montreux, Switzerland, April

2006.

This output document is the first working draft of Musical slide show AF.

b. N8397, Text of ISO/IEC 23000-4/CD (Musical Slide Show MAF), Klagenfurt, Austria, July

2006.

This output document is the update of the working draft from the previous meeting. It

implements the proposal on synchronization method of timed text and JPEG images as one

slide show sample.

c. N8674, Text of ISO/IEC 23000-4/FCD (Musical Slide Show MAF), Hangzhou, China,

October, 2006.

This output document is the update of the committee draft from the previous document.

d. N8880, Study Text of ISO/IEC 23000-4/FCD (Musical Slide Show MAF), Marrakech,

Morocco, January 2007.

This output document is the update of the final committee draft from the previous document.

e. N9038, Text of ISO/IEC 23000-4/FDIS (Musical Slide Show MAF), San Jose, USA, April

2007.

This output document is the final draft of international standard of Musical slide show AF,

concludes the work of Musical slide show AF.

5.2.3. Papers

a. Muhammad Syah Houari Sabirin, 김문철, "Authoring Tool of Musical Slide Show MAF

Content", 2006 한국방송공학회 학술대회, 11 월 10 일, 서울산업대학교.




a. M13722, Houari Sabirin, Hendry, Munchurl Kim, Proposal to Improve Musical Slideshow

File Format, Klagenfurt, Austria, July 2006.

This input document is an initial proposal for Protected Musical slide show AF. It contains

very basic information on the idea of protecting contents in Musical Slide show AF.

b. M14175, Hendry, Houari Sabirin, Munchurl Kim, Proposal for Protected Musical Slide Show

MAF with IPMP, Marrakech, Morocco, January 2007.

This input document is a proposal for adding content protection in Musical slide show AF

using MPEG-21 IPMP and MPEG-21 REL. It contains metadata instantiation examples of

content protection for Musical slide show AF.

c. M14477, Houari Sabirin, Hendry, Munchurl Kim, Updated Proposal for Protected Musical

Slide Show MAF with IPMP, San Jose, USA, April 2007.

This input document is updated text of previous proposal of Protected Musical slide show AF.

It provides more detailed descriptions and metadata instantiation examples for protected

Musical slide show AF.

d. M14644, Houari Sabirin, Munchurl Kim, Proposed text for Protected Musical Slide Show

MAF PDAM, Lausanne, Switzerland, July 2007.

This input document is a proposed text for amendment of Musical slide show AF. It contains

updated description of technical specifications and metadata schemes for protecting contents

of Musical slide show AF.

e. M15124, Houari Sabirin, Munchurl Kim, Use cases for content protection in Musical slide

show Application Format 2nd

Edition, Antalya, Turkey, January 2008.

This input document described some use cases for content protection in Protected Musical

slide show AF. The proposed use cases have been incorporated in the Annex of FDIS

ISO/IEC 23000-4.

f. M15417, Houari Sabirin, Munchurl Kim, ISO/IEC 23000-4 2nd

Edition Reference Software,

Archamps, France, April 2008.

This input document described the specifications of Protected Musical slide show AF

reference software. It contains the descriptions of the authoring tool and the player. It also

provides the description to the conformant files created by the authoring tool. The proposed

reference software has been incorporated into the Amendment 2 of FDIS ISO/IEC 23000-4

and is now in PDAM status.


a. N9040, WD1.0 of ISO/IEC 23000-4/Amd.2 Protected Musical Slide Show, San Jose, USA,

April 2007.

This output document is the first working draft of Protected Musical slide show AF based on

input documents M13722, M14175 and M14477.

b. N9290, Text of ISO/IEC 23000-4/CD Musical Slide Show 2nd Edition, Lausanne,

Switzerland, July 2007.

This output document is the advance of the working draft resulted in previous meeting. It

includes the updated input document M14644.

c. N9389, Text of ISO/IEC 23000-4/FCD Musical Slide Show 2nd Edition, Shenzhen, China,

October 2007.

This output document is the advance of the committee draft resulted in previous meeting.

d. N9691, Study Text of ISO/IEC FCD 23000-4 Musical Slide Show 2nd Edition, Antalya,

Turkey, January 2008.

This output document is the advance of the final committee draft resulted in previous meeting.

e. N9843, Text of ISO/IEC FDIS 23000-4 Musical Slide Show 2nd Edition, Archamps, France,

April 2008.

This output document is the final draft of international standard of Protected Musical slide

show AF, concludes the work of Protected Musical slide show standardization. Included in

the document is the contribution M15124.

f. N9847, ISO/IEC 23000-4:2008/PDAM2 Protected MSS Conf. & Ref. Software, Archamps,

France, April 2008.

This document is the committee draft of amendment no.2 of Protected Musical slide show

(the 1st amendment is the reference software to Musical slide show).

5.3.3. Papers

a. Muhammad Syah Houari Sabirin, Hendry, Munchurl Kim, "Musical Slide Show MAF with

Protection and Governance using MPEG-21 IPMP Components and REL," 19th IS&T/SPIE

Symposium on Electronic Imaging: Multimedia on Mobile Devices 2007, January 2007, San

Jose, California, USA.



a. M14173, Jeongyeon Lim, Houari Sabirin, Munchurl Kim, Proposal for Surveillance MAF,

Marrakech, Morocco, January 2007.

This input document is the initial proposal for VS AF. It includes the proposal for the file

format and the MPEG-7 MDS and Visual to be implemented in VS AF. The document also

includes some use cases.

b. M14486, Houari Sabirin, Jeongyeon Lim, Munchurl Kim, A Proposal for Basic Video

Surveillance Application Format, San Jose, USA, April 2007.

This input document is the update of the previous proposal for VS AF, contains specific

description for file format and MPEG-7 metadata based on the requirements for basic version

of VS AF.

c. M14645, Houari Sabirin, Munchurl Kim, MPEG-7 core description profile and visual

descriptors for Video Surveillance MAF, Lausanne, Switzerland, July 2007.

This input document proposes the use of MPEG-7 core description profile as the starting point

to create MPEG-7 profile for VS AF, based on the requirements of VS AF as described in

MAF Overview document from the previous meeting.

d. M14893, Houari Sabirin, James Annesley, Munchurl Kim, and James Orwell, Visual

Surveillance Multimedia Application Format: MPEG-7 Profile, Shenzhen, China, October

2007.

This input document proposes the MPEG-7 profile for VS AF. Most of the descriptions are

based on the input document M14645.

e. M15426, Houari Sabirin, Munchurl Kim, Proposal for the usage of MPEG-7 and MPEG-21 in

Advanced Video Surveillance AF, Archamps, France, April 2008.

This input document proposes the use of more elements in MPEG-7 and MPEG-21 for the

next stage of development of VS AF.

f. M15472, James Annesley, Houari Sabirin, Update of ISO/IEC 23000-10/Amd1 WD1.0

Conformance and Reference Software, Archamps, France, April 2008.07.23

This input document is an update to the working draft of the VS AF conformance and

reference software produced in previous meeting.


a. N9295, Text of ISO/IEC 23000-10/CD (Video Surveillance MAF), Lausanne, Switzerland,

July 2007.

This output document includes the proposal described in input document M14645.

b. N9412, Study Text of ISO/IEC 23000-10/FCD (Video Surveillance Application Format),

Shenzhen, China, October 2007.

This output document is the updated result of VS AF committee draft. It includes the proposal

described in input document M14893.

c. N9706, Text of ISO/IEC FCD 23000-10 (Video Surveillance Application Format), Antalya,

Turkey, January 2008.


d. N9707, Text of ISO/IEC 23000-10/AMD1 WD1.0 Conformance and Reference Software,

Antalya, Turkey, January 2008.

This output document is the initial work of developing conformance files and reference

software for VS AF. It includes the first version of the reference software.

e. N9856, Study Text of ISO/IEC FCD 23000-10 (Video Surveillance Application Format),



f. N9857, Text of ISO/IEC 23000-10/AMD1 WD2.0 Conformance and Reference Software,


This output document is the updated version of the working draft from previous meeting. It

includes the second version of the reference software.

g. N9858, Future Work on Surveillance AF's - collection of requirements, Archamps, France,

April 2008.

This output document is the list of work to be done for the next stage of development of VS

AF. It includes the proposal described in input document M15426.

5.4.3. Papers

a. Wonsang You, M.S. Houari Sabirin, and Munchurl Kim, "Moving Object Tracking in

H.264/AVC bitstream," MCAM 2007, LNCS 4577, pp.483-492.

b. M.Syah Houari Sabirin, Munchurl Kim "Computation of MPEG-7 Motion Descriptors in

AVC|H.264 Bitstreams for Video Surveillance MAF", 2007 한국방송공학회 학술대회,

p117~118, 11 월 3 일, 고려대학교 이공계캠퍼스 창의관.

6. Conclusions

This document describes the work on the project of standardization of several parts of MPEG-A Multimedia

Application Format. MPEG-A is a standard from Moving Picture Expert Groups (MPEG) that specifies the

storage format by combining existing technologies to create rich-content multimedia application.

The work spans over 2 years and has resulted three final drafts of international standard: MPEG-A Part 3 2nd

Edition: Protected Music Player Application Format, MPEG-A Part 4: Musical Slide Show Application Format,

and MPEG-A Part 4 2nd

Edition: Protected Musical Slide Show Application Format. One part, MPEG-A Part

10: Video Surveillance Application Format is still in final committee draft status and the reference software is

still in development (which will be completed as soon as the specification reaches the final draft status).

Most of the contributions proposed in the development of each MAF had been promoted to the output

documents, such as the implementation of protection in Music player application format and Musical slide show

application format, the synchronization of JPEG images for the slide show, and the metadata profile for Video

surveillance application format. It also has resulted reference software for Protected Music player application

format, Protected Musical slide show application format (which also includes the non-protected part) and some

part in Video surveillance application format.

7. References

1. ISO/IEC JTC1/SC29/WG11 MPEG2005/N7068, Busan, Korea, April 2005, White Paper on MPEG-A

2. ISO/IEC JTC1/SC29/WG11 MPEG2008/N9840, Archamps, France, April 2008, MAF Overview

3. ISO/IEC 14496-12:2005, Information technology – Coding of audio-visual objects – Part 12: ISO base media

file format

4. ISO/IEC 14496-14:2003, Information technology – Coding of audio-visual objects – Part 14: MP4 file format

5. ISO/IEC 14496-20:2006, Information technology – Coding of audio-visual objects – Part 20: Lightweight

Application Scene Representation (LASeR) and Simple Aggregation Format (SAF)

6. ISO/IEC 15938-1, Information technology – Multimedia content description interface – Part 1: System

7. ISO/IEC 15938-3, Information technology – Multimedia content description interface – Part 3: Visual

8. ISO/IEC 15938-5, Information technology – Multimedia content description interface – Part 5: Multimedia

description schemes

9. ISO/IEC 21000-2, Information technology – Multimedia framework (MPEG-21) – Part 2: Digital Item

Declaration

10. ISO/IEC 21000-4, Information technology – Multimedia framework (MPEG-21) – Part 4: Intellectual Property

Management and Protection Components

11. ISO/IEC 21000-5, Information technology – Multimedia framework (MPEG-21) – Part 5: Rights Expression

Language

12. ISO/IEC 21000-17, Information technology – Multimedia framework (MPEG-21) – Part 17: Fragment

identification of MPEG resources

13. 3GPP TS 26.245, Transparent end-to-end Packet switched Streaming Service (PSS); Timed text format, V7.0.0,

2007-06-21

14. ISO/IEC JTC1/SC29/WG11 MPEG2006/M13658, Houari Sabirin, Jeongyeon Lim, Hendry, Munchurl Kim,

Contribution to Reference Software of ISO/IEC 23000-2: MPEG Music Player Application Format, Klagenfurt,

Austria, July 2006

15. ISO/IEC JTC1/SC29/WG11 MPEG2007/M14172, Hendry, Houari Sabirin, Munchurl Kim, Contribution for

Protected Music Player MAF Reference Software, Marrakech, Morocco, January 2007

16. ISO/IEC JTC1/SC29/WG11 MPEG2006/M13673, Houari Sabirin, Jeongyeon Lim, Hendry, Munchurl Kim,

Contribution to Reference Software of ISO/IEC 23000-4: MPEG Musical Slideshow Application Format,

Klagenfurt, Austria, July 2006.

17. ISO/IEC JTC1/SC29/WG11 MPEG2008/M15417, Houari Sabirin, Munchurl Kim, ISO/IEC 23000-4 2nd

Edition Reference Software, Archamps, France, April 2008.

18. ISO/IEC JTC1/SC29/WG11 MPEG2008/N9856, Study Text of ISO/IEC FCD 23000-10 (Video Surveillance

Application Format), Archamps, France, April 2008.

19. ISO/IEC JTC1/SC29/WG11 MPEG2008/N9857, Text of ISO/IEC 23000-10/AMD1 WD2.0 Conformance and

Reference Software, Archamps, France, April 2008.

Project Report of ISO/IEC 23000 MPEG-A Multimedia Application Format

Documents

Transcript of Project Report of ISO/IEC 23000 MPEG-A Multimedia Application Format