MPEG-7
• MPEG-7 overview– What is…– Why?– Objectives and scope– Main elements and organization.
• MPEG-7 Audio– Low-level features– High-level tools
What is MPEG-7?• "Multimedia Content Description Interface”• ISO/IEC standard by MPEG (Moving Picture Experts Group)
• Providing meta-data for multimedia• MPEG-1, -2, -4: make content available;
MPEG-7: makes content accessible, retrievable, filterable, manageable (via device / computer).
• Multi-degrees of interpretation of information’s meaning• Support as broad a range of applications as possible.• A compatible (with existing tech) and extensible standard.
Why MPEG-7?
• “The value of information often depends on how easy it can be found, retrieved, accessed, filtered and managed. ”
• Past: poverty of the digital multimedia sources -> Simplicity of the access mechanisms
• Now: growing amount of audiovisual information-> Identifying and managing them efficiently is
becoming more difficult.e.g. “record only news about sport.”
Why MPEG-7?• For future multimedia services, content
representation and description may have to be addressed jointly.
• Many services dealing with content representation will have to deal first with content description– “a non-described content may be useless”
• Need for access only to the content description:– New original services (e.g. optimizing personal time)– Adaptation to networks and terminal capabilities
Application domains• Broadcast media selection (e.g., radio channel, TV
channel).• Digital libraries (e.g., film, video, audio and radio
archives).• E-Commerce (e.g., personalized advertising).• Education (e.g., repositories of multimedia courses,
multimedia search for support material).• Home Entertainment (e.g., management of personal
multimedia collections, including manipulation of content, e.g. karaoke).
• Journalism (e.g. searching speeches of a certain politician using his name, his voice or his face).
• Multimedia directory services (e.g. yellow pages, G.I.S).• Surveillance and remote sensing.
MPEG-7 ObjectivesStandardize content-based description for various
types of audiovisual information
• Independent from media support (encoding and storage)• Different granularity
– Low-level features: shape, size, key, tempo changes,– High-level semantic info: “scene with a barking brown dog on the
left and with the sound of passing cars in the background.”• Meaningful in the context of the application
– Same material -> different types of features and combinationse.g. timbre v.s. loudness
MPEG-7 Objectives
• Information about the content– The form: e.g. the coding format used
– Conditions for accessing the material:e.g. Intellectual property rights / price
– Classification: e.g. parental rating
– Links to other relevant materials– The context: “e.g. Olympic Games 1996, final of 200 meter
hurdles, men)”
• Information present in the content:– Combination of low-level and high-level descriptors
Scope of the Standard
processing chain:
An example of architecture
• Pull: (Client Queries -> Descriptions repository -> Matched Ds)• Push: (Filter descriptions -> Programmed actions)
Where are the descriptions from?• Preservation of existing descriptive data (e.g.
scripts) through production/delivery• Generated automatically by capture devices
(e.g. time or GPS location in a camera)• Extracted automatically & semi-automatically
(i.e. with some human assistance)• Manually produced (e.g. for legacy material such
as existing film archives)
Main Elements of MPEG-7
• Relationship among elements introduced above.
Descriptions
• MPEG-7 approaches the description of content from several viewpoints.
• A set of methods and tools for the different viewpoints of the description (not a monolithic system)
• Interrelated and can be combined in many ways.• Associated with the content itself: (searching, filtering)• Location: (document V.S. stream)
– physically located with the material– somewhere else on the globe (maybe not)
• Interoperability with other metadata standards: (XML)
Major Functionalities• MPEG-7 Systems• MPEG-7 Description Definition Language• MPEG-7 Visual• MPEG-7 Audio• MPEG-7 Multimedia Description Schemes • Reference Software: the eXperimentation Model (test)
• MPEG-7 Conformance (syntax checking)
• MPEG-7 Extraction and use of descriptions (technical report)
MPEG-7 Audio• Audio provides structures—building upon
some basic structures from the MDS—for describing audio content.
• Low-level Descriptors:– audio features that cut across many applications
• High-level Description Tools:– more specific to a set of applications.
Low-level Features
Low-level Features (details)• Basic: (temporally sampled scalar values for general use)
– AudioWaveform Descriptor• waveform envelope: (for display purposes).
– AudioPower Descriptor• temporally-smoothed instantaneous power:
(quick summary of a signal)• Silence segment: (no significant sound)
– aid further segmentation of the audio stream, or as a hint not to process a segment
– Applicable to all kinds of signals
Low-level Features (details)
• Basic Spectral: (single time-frequency analysis of signal)– AudioSpectrumEnvelope: (Base class)
• the short-term power spectrum:(display, synthesize, general-purpose search)
– AudioSpectrumCentroid: • dominated by high or low frequencies ?
– AudioSpectrumSpread:• the power spectrum centered near the spectral centroid, or spread
out over the spectrum?• pure-tone and noise-like sounds
– AudioSpectrumFlatness: (the presence of tonal components)
Low-level Features (details)
• Signal Parameters: (periodic or quasi-periodic signals)
– AudioFundamentalFrequency:• “confidence measure”, replacing “pitch-tracking”
– AudioHarmonicity:• distinction between sounds with a
harmonic / inharmonic / non-harmonic spectrum
Low-level Features (details)• Timbral Temporal: (temporal characteristics of segments
of sounds, musical timbre)– LogAttackTime– TemporalCentroid
• where in time the energy of a signal is focused.• Useful when attack times are identical
T0t
Signal envelope(t)
T1Illustration of log-tack time
Low-level Features (details)
• Timbral Spectral: (spectral features in a linear-frequency space)– SpectralCentroid:
• power-weighted average of the frequencyof the bins in the linear power spectrum.
• distinguishing musical instrument timbres– 4 Ds for harmonic regularly-spaced components of signals:
• HarmonicSpectralCentroid• HarmonicSpectralDeviation• HarmonicSpectralSpread• HarmonicSpectralVariation
Low-level Features (details)• Spectral Basis: (low-dimensional projections of a spectral space to
aid compactness and recognition)
– AudioSpectrumBasis:• a series of (time-varying / statistically independent) basis functions
derived from the singular value decomposition of a normalized power spectrum.
– AudioSpectrumProjection:• low-d features of a spectrum after projection upon a reduced rank
basis.
– independent subspaces of a spectra correlate strongly with different sound sources.
– Provide more salience using less space.• With Sound Classification and Indexing Description Tools.
High-level audio Description Tools (Ds and DSs)
• Exchange some generality for descriptive richness:– a smaller set of audio features (as compared to visual
features) that may canonically represent a sound without domain-specific knowledge.
• Audio Signature (DS)
• Musical Instrument Timbre• Melody• General Sound Recognition and Indexing• Spoken Content
High-level audio Description Tools (details)
• Audio Signature Description Scheme– SpectralFlatness Ds– a unique content identifier for the purpose of
robust automatic identification– e.g. audio fingerprinting
High-level audio Description Tools (details)
• Musical Instrument Timbre Description Tools– HarmonicInstrumentTimbre Ds:
• LogAttackTime Descriptor– PercussiveIinstrumentTimbre Ds:
• SpectralCentroid Descriptor
High-level audio Description Tools (details)
• Melody Description Tools: – efficient, robust, and expressive melodic similarity
matching.– MelodyContour Description Scheme:
• terse, efficient melody contour / rhythm– MelodySequence Description Scheme:
• verbose, complete, expressive melody / rhythm.• Interval encoding
High-level audio Description Tools (details)
• General Sound Recognition and Indexing Description Tools: – SoundModel Description Scheme– SoundClassificationModel Description Scheme
• a set of SoundModel DS -> multi-way classifier– SoundModelStatePath Descriptor
• indices to states generated by a SoundModel of a segment
– immediately applied to sound effects– automatically index and segment sound tracks.– Low -> mid -> high level analyses
High-level audio Description Tools (details)
• Spoken Content Description Tools: – detailed description of words spoken within an
audio stream.– indexing into and retrieval of an audio stream– indexing of multimedia objects annotated with
speech.• Recall of audio/video data by memorable spoken events.
– a character or person spoke a particular word• Spoken Document Retrieval
– separate spoken documents• Annotated Media Retrieval
– photograph retrieved using a spoken annotation
Instantaneous HarmonicSpectralCentroid
Instantaneous HarmonicSpectralDeviation
Signal
Sliding Analysis Window
STFT
Signal envelope
f0
Harmonic Peaks
Detection
Instantaneous HarmonicSpectralSpread
Temporal Centroid
z-1
Power Spectrum SpectralCentroid
LogAttackTime
Instantaneous HarmonicSpectralVariation
Timbre Descriptor Estimation
MPEG-7 Audio Amendment 2
will include extended functionality of audio metadatathat is complementary to low-level audio descriptorsin ISO/IEC 15938-4,
providing high level description tools like chord pattern and Rhythm pattern,
both of which support compact representation of timbre and rhythm.
Top Related