Post on 21-Dec-2015
June 2002
by Mona Vajihollahi Roozbeh Farahbod
The MPEG-7The MPEG-7The MPEG-7The MPEG-7
Visual Standard for Content Visual Standard for Content DescriptionDescription
By Mona Vajihollahi [mvajihol@cs.sfu.ca] & Roozbeh Farahbod [rfarahbo@cs.sfu.ca]
Agenda• Introduction• Scope of the Standard• Development of the Standard• Visual Descriptors• Other Components of MPEG-7• References
By Mona Vajihollahi [mvajihol@cs.sfu.ca] & Roozbeh Farahbod [rfarahbo@cs.sfu.ca]
Introduction• Image/Video Retrieval
– Text-based Retrieval – Content-based Retrieval
• MPEG-7:– An international standard for
descriptions and description systems– Goal: To search, identify, filter and
browse audiovisual content
By Mona Vajihollahi [mvajihol@cs.sfu.ca] & Roozbeh Farahbod [rfarahbo@cs.sfu.ca]
Agenda• Introduction• Scope of the Standard• Development of the Standard• Visual Descriptors• Other Components of MPEG-7• References
By Mona Vajihollahi [mvajihol@cs.sfu.ca] & Roozbeh Farahbod [rfarahbo@cs.sfu.ca]
Scope of the Standard• Diversity of Applications
– Multimedia, Music/Audio, Graphics, Video
• Descriptors (Ds)– Describe basic characteristics of
audiovisual content– Examples: Shape, Color, Texture, …
• Description Schemes (DSs)– Describe combinations of descriptors- Example: Spoken Content
By Mona Vajihollahi [mvajihol@cs.sfu.ca] & Roozbeh Farahbod [rfarahbo@cs.sfu.ca]
Scope of the Standard (2)
DescriptionProduction(extraction)
DescriptionConsumption
StandardDescription
Normative part ofMPEG-7 standard
• MPEG-7 does not specify -How to extract descriptions-How to use descriptions-The similarity between contents
By Mona Vajihollahi [mvajihol@cs.sfu.ca] & Roozbeh Farahbod [rfarahbo@cs.sfu.ca]
Agenda• Introduction• Scope of the Standard• Development of the Standard• Visual Descriptors• Other Components of MPEG-7• References
By Mona Vajihollahi [mvajihol@cs.sfu.ca] & Roozbeh Farahbod [rfarahbo@cs.sfu.ca]
Development of the Standard• Call for Proposals
– Goal: Specify requirements for technology
• Experimentation Model (XM)– Goal: Specify and implement the feature
extraction, encoding & decoding algorithms, search engines
• Core Experiments– Goal: Improve the current technology in XM– If successful, it is incorporated in the new
XM
By Mona Vajihollahi [mvajihol@cs.sfu.ca] & Roozbeh Farahbod [rfarahbo@cs.sfu.ca]
Components of MPEG-7
1) MPEG-7 Systems2) MPEG-7 Description Definition
Language 3) MPEG-7 Visual4) MPEG-7 Audio5) MPEG-7 Multimedia DSs6) MPEG-7 Reference Software7) MPEG-7 Conformance
By Mona Vajihollahi [mvajihol@cs.sfu.ca] & Roozbeh Farahbod [rfarahbo@cs.sfu.ca]
Agenda• Introduction• Scope of the Standard• Development of the Standard• Visual Descriptors• Other Components of MPEG-7• References
By Mona Vajihollahi [mvajihol@cs.sfu.ca] & Roozbeh Farahbod [rfarahbo@cs.sfu.ca]
Visual Descriptors
• Color Descriptors• Texture Descriptors• Shape Descriptors• Motion Descriptors for Video
By Mona Vajihollahi [mvajihol@cs.sfu.ca] & Roozbeh Farahbod [rfarahbo@cs.sfu.ca]
Color Descriptors
By Mona Vajihollahi [mvajihol@cs.sfu.ca] & Roozbeh Farahbod [rfarahbo@cs.sfu.ca]
Color Spaces• Constrained color spaces
– Scalable Color Descriptor uses HSV– Color Structure Descriptor uses HMMD
• MPEG-7 color spaces:– Monochrome– RGB – HSV– YCrCb– HMMD
By Mona Vajihollahi [mvajihol@cs.sfu.ca] & Roozbeh Farahbod [rfarahbo@cs.sfu.ca]
Scalable Color Descriptor• A color histogram in HSV color space• Encoded by Haar Transform
By Mona Vajihollahi [mvajihol@cs.sfu.ca] & Roozbeh Farahbod [rfarahbo@cs.sfu.ca]
Dominant Color Descriptor• Clustering colors into a small number of
representative colors• It can be defined for each object,
regions, or the whole image
• F = { {ci, pi, vi}, s}• ci : Representative colors
• pi : Their percentages in the region
• vi : Color variances
• s : Spatial coherency
By Mona Vajihollahi [mvajihol@cs.sfu.ca] & Roozbeh Farahbod [rfarahbo@cs.sfu.ca]
Color Layout Descriptor• Clustering the image into 64 (8x8)
blocks• Deriving the average color of each
block (or using DCD)• Applying DCT and encoding• Efficient for
– Sketch-based image retrieval– Content Filtering using image
indexing
By Mona Vajihollahi [mvajihol@cs.sfu.ca] & Roozbeh Farahbod [rfarahbo@cs.sfu.ca]
Color Structure Descriptor• Scanning the image by an 8x8 pixel
block• Counting the number of blocks
containing each color• Generating a color histogram (HMMD)• Main usages:
– Still image retrieval– Natural images retrieval
By Mona Vajihollahi [mvajihol@cs.sfu.ca] & Roozbeh Farahbod [rfarahbo@cs.sfu.ca]
GoF/GoP Color Descriptor• Extends Scalable Color Descriptor• Generates the color histogram for a
video segment or a group of pictures• Calculation methods:
– Average– Median– Intersection
By Mona Vajihollahi [mvajihol@cs.sfu.ca] & Roozbeh Farahbod [rfarahbo@cs.sfu.ca]
Visual Descriptors
• Color Descriptors• Texture Descriptors• Shape Descriptors• Motion Descriptors for Video
By Mona Vajihollahi [mvajihol@cs.sfu.ca] & Roozbeh Farahbod [rfarahbo@cs.sfu.ca]
Texture Descriptors
• Homogenous Texture Descriptor• Non-Homogenous Texture
Descriptor (Edge Histogram)
By Mona Vajihollahi [mvajihol@cs.sfu.ca] & Roozbeh Farahbod [rfarahbo@cs.sfu.ca]
Homogenous Texture Descriptor• Partitioning the frequency domain into 30
channels (modeled by a 2D-Gabor function)
• Computing the energy and energy deviation for each channel
• Computing mean and standard variation of frequency coefficients
• F = {fDC, fSD, e1,…, e30, d1,…, d30}• An efficient implementation:
– Radon transform followed by Fourier transform
By Mona Vajihollahi [mvajihol@cs.sfu.ca] & Roozbeh Farahbod [rfarahbo@cs.sfu.ca]
2D-Gabor Function• It is a Gaussian
weighted sinusoid • It is used to
model individual channels
• Each channel filters a specific type of texture
By Mona Vajihollahi [mvajihol@cs.sfu.ca] & Roozbeh Farahbod [rfarahbo@cs.sfu.ca]
Radon Transform• Transforms images with lines into a domain of
possible line parameters• Each line will be transformed to a peak point
in the resulted image
By Mona Vajihollahi [mvajihol@cs.sfu.ca] & Roozbeh Farahbod [rfarahbo@cs.sfu.ca]
Non-Homogenous Texture Descriptor
• Represents the spatial distribution of five types of edges– vertical, horizontal, 45°, 135°, and non-
directional
• Dividing the image into 16 (4x4) blocks• Generating a 5-bin histogram for each
block• It is scale invariant
By Mona Vajihollahi [mvajihol@cs.sfu.ca] & Roozbeh Farahbod [rfarahbo@cs.sfu.ca]
Non-Homogenous Texture Descriptor (2)
By Mona Vajihollahi [mvajihol@cs.sfu.ca] & Roozbeh Farahbod [rfarahbo@cs.sfu.ca]
Visual Descriptors
• Color Descriptors• Texture Descriptors• Shape Descriptors• Motion Descriptors for Video
By Mona Vajihollahi [mvajihol@cs.sfu.ca] & Roozbeh Farahbod [rfarahbo@cs.sfu.ca]
Shape Descriptors
• Region-based Descriptor• Contour-based Shape Descriptor• 2D/3D Shape Descriptor• 3D Shape Descriptor
By Mona Vajihollahi [mvajihol@cs.sfu.ca] & Roozbeh Farahbod [rfarahbo@cs.sfu.ca]
Region-based Descriptor• Expresses pixel distribution within a 2-D
object region• Employs a complex 2D-Angular Radial
Transformation (ART)• Advantages:
– Describes complex shapes with disconnected regions
– Robust to segmentation noise– Small size– Fast extraction and matching
By Mona Vajihollahi [mvajihol@cs.sfu.ca] & Roozbeh Farahbod [rfarahbo@cs.sfu.ca]
Region-based Descriptor (2)
• Applicable to figures (a) – (e)• Distinguishes (i) from (g) and (h)• (j), (k), and (l) are similar
By Mona Vajihollahi [mvajihol@cs.sfu.ca] & Roozbeh Farahbod [rfarahbo@cs.sfu.ca]
Contour-Based Descriptor• It is based on Curvature Scale-
Space representation
By Mona Vajihollahi [mvajihol@cs.sfu.ca] & Roozbeh Farahbod [rfarahbo@cs.sfu.ca]
Curvature Scale-Space• Finds curvature zero
crossing points of the shape’s contour (key points)
• Reduces the number of key points step by step, by applying Gaussian smoothing
• The position of key points are expressed relative to the length of the contour curve
By Mona Vajihollahi [mvajihol@cs.sfu.ca] & Roozbeh Farahbod [rfarahbo@cs.sfu.ca]
Curvature Scale Space (2)
By Mona Vajihollahi [mvajihol@cs.sfu.ca] & Roozbeh Farahbod [rfarahbo@cs.sfu.ca]
Contour-Based Descriptor• It is based on Curvature Scale-
Space representation• Advantages:
– Captures the shape very well– Robust to the noise, scale, and
orientation– It is fast and compact
By Mona Vajihollahi [mvajihol@cs.sfu.ca] & Roozbeh Farahbod [rfarahbo@cs.sfu.ca]
Contour-Based Descriptor (2)
• Applicable to (a)• Distinguishes
differences in (b)• Find similarities in
(c) - (e)
By Mona Vajihollahi [mvajihol@cs.sfu.ca] & Roozbeh Farahbod [rfarahbo@cs.sfu.ca]
Comparison
• Blue: Similar shapes by Region-Based• Yellow: Similar shapes by Contour-
Based
By Mona Vajihollahi [mvajihol@cs.sfu.ca] & Roozbeh Farahbod [rfarahbo@cs.sfu.ca]
2D/3D Shape Descriptor• A 3D object can be roughly
described by snapshots from different angels
• Describes a 3D object by a number of 2D shape descriptors
• Similarity Matching: matching multiple pairs of 2D views
By Mona Vajihollahi [mvajihol@cs.sfu.ca] & Roozbeh Farahbod [rfarahbo@cs.sfu.ca]
3D Shape Descriptor• Based on Shape spectrum• An extension of Shape Index (A local
measure of 3D Shape to 3D meshes)• Captures information about local
convexity• Computes the histogram of the shape
index over the whole 3D surface
By Mona Vajihollahi [mvajihol@cs.sfu.ca] & Roozbeh Farahbod [rfarahbo@cs.sfu.ca]
Visual Descriptors
• Color Descriptors• Texture Descriptors• Shape Descriptors• Motion Descriptors for Video
By Mona Vajihollahi [mvajihol@cs.sfu.ca] & Roozbeh Farahbod [rfarahbo@cs.sfu.ca]
Motion Descriptors• Motion Activity Descriptors• Camera Motion Descriptors• Motion Trajectory Descriptors• Parametric Motion Descriptors
By Mona Vajihollahi [mvajihol@cs.sfu.ca] & Roozbeh Farahbod [rfarahbo@cs.sfu.ca]
Motion Activity Descriptor• Captures ‘intensity of action’ or
‘pace of action’• Based on standard deviation of
motion vector magnitudes• Quantized into a 3-bit integer [1,
5]
By Mona Vajihollahi [mvajihol@cs.sfu.ca] & Roozbeh Farahbod [rfarahbo@cs.sfu.ca]
Camera Motion Descriptor• Describes the movement of a
camera or a virtual view point• Supports 7 camera operations
Track left
Track right
Boom up
Boom down
Dollybackward
Dollyforward Pan right
Pan left
Tilt up
Tilt downRoll
By Mona Vajihollahi [mvajihol@cs.sfu.ca] & Roozbeh Farahbod [rfarahbo@cs.sfu.ca]
Motion Trajectory• Describes the movement of one
representative point of a specific region• A set of key-points (x, y, z, t) • A set of interpolation functions describing the
path
By Mona Vajihollahi [mvajihol@cs.sfu.ca] & Roozbeh Farahbod [rfarahbo@cs.sfu.ca]
Parametric Motion• Characterizes the evolution of
regions over time• Uses 2D geometric transforms• Example:
– Rotation/Scaling: • Dx(x,y) = a + bx + cy
• Dy(x,y) = d – cx + by
By Mona Vajihollahi [mvajihol@cs.sfu.ca] & Roozbeh Farahbod [rfarahbo@cs.sfu.ca]
Agenda• Introduction• Scope of the Standard• Development of the Standard• Visual Descriptors• Other Components of MPEG-7• References
By Mona Vajihollahi [mvajihol@cs.sfu.ca] & Roozbeh Farahbod [rfarahbo@cs.sfu.ca]
Other Components• MPEG-7 Audio• MPEG-7 Multimedia Description
Schemes• MPEG-7 Description Definition
Language• MPEG-7 Systems• MPEG-7 Reference Software• MPEG-7 Conformance
By Mona Vajihollahi [mvajihol@cs.sfu.ca] & Roozbeh Farahbod [rfarahbo@cs.sfu.ca]
MPEG-7 Audio• Comprises 5 technologies:
– Audio description framework (17 low-level descriptors)
– High-Level Audio Description Tools (Ds & DSs)
• Instrumental timbre description tools• Sound recognition tools• Spoken content description tools• Melody description tools (facilitate query-by-
humming)
By Mona Vajihollahi [mvajihol@cs.sfu.ca] & Roozbeh Farahbod [rfarahbo@cs.sfu.ca]
Multimedia Description Schemes• Specific metadata structures • Describe & annotate audio-visual concepts• Contain MPEG-7 Descriptors or other DSs
Basic datatypes
Links & media localization
Basic Tools
Models
Basic elements
Navigation & Access
Content management
Content description
Collections
Summaries
Variations
Content organization
Creation & Production
Media Usage
Semantic aspects
Structural aspects
User interaction
User Preferences
Schema Tools
User History Views Views
By Mona Vajihollahi [mvajihol@cs.sfu.ca] & Roozbeh Farahbod [rfarahbo@cs.sfu.ca]
Description Definition Language (DDL)• “…a language that allows the creation
of new Description Schemes and, possibly, Descriptors.”
• “It also allows the extension and modification of existing Description Schemes.”
MPEG-7 Requirement Documents V.13
By Mona Vajihollahi [mvajihol@cs.sfu.ca] & Roozbeh Farahbod [rfarahbo@cs.sfu.ca]
DDL (2)
• It is based on XML Schema Language
• Consists of– XML Schema Structural Components– XML Schema Data Types– MPEG-7 Specific Extensions
By Mona Vajihollahi [mvajihol@cs.sfu.ca] & Roozbeh Farahbod [rfarahbo@cs.sfu.ca]
DDL (3)
• A Simplified Example:
By Mona Vajihollahi [mvajihol@cs.sfu.ca] & Roozbeh Farahbod [rfarahbo@cs.sfu.ca]
MPEG-7 Systems• Defines
– the terminal architecture and the normative interfaces.
– how descriptors and description schemes are stored, accessed and transmitted
– tools that are needed to allow synchronization between content and descriptions
By Mona Vajihollahi [mvajihol@cs.sfu.ca] & Roozbeh Farahbod [rfarahbo@cs.sfu.ca]
Reference Software: the XM
• XM implements– MPEG-7 Descriptors (Ds) – MPEG-7 Description Schemes (DSs)– Coding Schemes– DDL
By Mona Vajihollahi [mvajihol@cs.sfu.ca] & Roozbeh Farahbod [rfarahbo@cs.sfu.ca]
MPEG-7 Conformance
• Includes the guidelines and procedures for testing conformance of MPEG-7 implementations
By Mona Vajihollahi [mvajihol@cs.sfu.ca] & Roozbeh Farahbod [rfarahbo@cs.sfu.ca]
References1. T. Sikora, “The MPEG-7 Visual Standard for Content
Description – An Overview”, IEEE Trans. Circuits Syst. Video Technol., vol. 11, pp. 696-702, June 2001
2. S.-F. Chang, T.Sikora, and A. Puri, “Overview of MPEG-7 Standard”, IEEE Trans. Circuits Syst. Video Technol., vol. 11, pp. 688-695, June 2001
3. J. M. Martinez, "Overview of the MPEG-7 Standard", ISO/IEC JTC1/SC29/WG1, 2001
4. B.S. Manjunath, J.-R. Ohm, V.V. Vasudevan, and A. Yamada, “MPEG-7 Color and Texture Descriptors”, IEEE Trans. Circuits Syst. Video Technol., vol. 11, pp. 703-715, June 2001
By Mona Vajihollahi [mvajihol@cs.sfu.ca] & Roozbeh Farahbod [rfarahbo@cs.sfu.ca]
References (2)
5. M. Bober, “MPEG-7 Visual Shape Descriptors”, IEEE Trans. Circuits Syst. Video Technol., vol. 11, pp. 716-719, June 2001
6. A. Divakaran, “An Overview of MPEG-7 Motion Descriptors and Their Applications”, 9th Int. Conf. on Computer Analysis of Images and Patterns , CAIP 2001 Warsaw, Poland, 2001, Lecture Notes in Computer Science vol.2124, pp. 29-40
7. J. Hunter, "An overview of the MPEG-7 description definition language (DDL)", IEEE Trans. Circuits Syst. Video Technol., vol. 11, pp. 765-772, June 2001
By Mona Vajihollahi [mvajihol@cs.sfu.ca] & Roozbeh Farahbod [rfarahbo@cs.sfu.ca]
References (3)
8. F. Mokhtarian, S. Abbasi, and J. Kittler, “Robust and Efficient Shape Indexing through Curvature Scale Space”, Proc. International Workshop on Image DataBases and MultiMedia Search, pp. 35-42, Amsterdam, The Netherlands, 1996
9. CSS Demo, http://www.ee.surrey.ac.uk/Research/VSSP/imagedb/demo.html
10. Gabor Function, http://disney.ctr.columbia.edu/jrsthesis/node43.html
11. Radon Transform, http://eivind.imm.dtu.dk/staff/ptoft/Radon/Radon.html
By Mona Vajihollahi [mvajihol@cs.sfu.ca] & Roozbeh Farahbod [rfarahbo@cs.sfu.ca]
Presented for
Multimedia Systems CourseProf. Ze-Nian Li
School of Computing ScienceSimon Fraser University
June 2002
Most of the pictures or their basic ideas are taken from the listed papers and web pages.