CS 414 - Spring 2011 CS 414 – Multimedia Systems Design Lecture 11 – MP3 Audio & Introduction to...

CS 414 - Spring 2011

CS 414 – Multimedia Systems Design Lecture 11 – MP3 Audio &Introduction to MPEG-4 (Part 6)

Klara Nahrstedt

Spring 2011


Administrative MP1 – deadline February 18

Outline MP3 Audio Encoding MPEG-2 and Intro to MPEG-4 Reading:

Media Coding book, Section 7.7.2 – 7.7.5 Recommended Paper on MP3: Davis Pan, “A Tutorial on MPEG/Audio

Compression”, IEEE Multimedia, pp. 6-74, 1995 Recommended books on JPEG/ MPEG Audio/Video Fundamentals:

Haskell, Puri, Netravali, “Digital Video: An Introduction to MPEG-2”, Chapman and Hall, 1996


MPEG-1 Audio Encoding

CharacteristicsPrecision 16 bitsSampling frequency: 32KHz, 44.1 KHz, 48 KHz3 compression layers: Layer 1, Layer 2, Layer 3

(MP3) Layer 3: 32-320 kbps, target 64 kbps Layer 2: 32-384 kbps, target 128 kbps Layer 1: 32-448 kbps, target 192 kbps


MPEG Audio Encoding Steps


MPEG Audio Filter Bank Filter bank divides input into multiple sub-bands

(32 equal frequency sub-bands) Sub-band i defined

- filter output sample for sub-band i at time t, C[n] – one of 512 coefficients, x[n] – audio input sample from 512 sample buffer


]64[*]64[(*64

)16)(12(cos(3][

7

0

7

0

jkxjkCki

iSjk

t

][],31,0[ iSi t

MPEG Audio Psycho-acoustic Model MPEG audio compresses by removing

acoustically irrelevant parts of audio signals Takes advantage of human auditory systems

inability to hear quantization noise under auditory masking

Auditory masking: occurs when ever the presence of a strong audio signal makes a temporal or spectral neighborhood of weaker audio signals imperceptible.



MPEG/audio divides audio signal into frequency sub-bands that approximate critical bands. Then we quantize each sub-band according to the audibility of quantization noise within the band

MPEG Audio Bit Allocation This process determines number of code bits allocated to each sub-

band based on information from the psycho-acoustic model Algorithm:

1. Compute mask-to-noise ratio: MNR=SNR-SMR Standard provides tables that give estimates for SNR resulting from

quantizing to a given number of quantizer levels

2. Get MNR for each sub-band

3. Search for sub-band with the lowest MNR

4. Allocate code bits to this sub-band. If sub-band gets allocated more code bits than appropriate, look up new

estimate of SNR and repeat step 1


MP3 Audio Format


Source: http://wiki.hydrogenaudio.org/images/e/ee/Mp3filestructure.jpg

MPEG Audio Comments Precision of 16 bits per sample is needed to get good SNR

ratio Noise we are getting is quantization noise from the

digitization process For each added bit, we get 6dB better SNR ratio Masking effect means that we can raise the noise floor

around a strong sound because the noise will be masked away

Raising noise floor is the same as using less bits and using less bits is the same as compression


MPEG-2 Standard Extension MPEG-1 was optimized for CD-ROM and

apps for 1.5 Mbps (video strictly non-interleaved)

MPEG-2 adds to MPEG-1: More aspect ratios: 4:2:2, 4:4:4Progressive and interlaced frame codingFour scalable modes

spatial scalability, data partitioning, SNR scalability, temporal scalability


MPEG-1/MPEG-2

Pixel-based representations of content takes place at the encoder

Lack support for content manipulatione.g., remove a date stamp from a video turn off “current score” visual in a live game

Need support manipulation and interaction if the video is aware of its own content


MPEG-4 Compression


Interact With Visual Content


Original MPEG-4

Conceived in 1992 to address very low bit rate audio and video (64 Kbps)

Required quantum leaps in compressionbeyond statistical- and DCT-based techniquescommittee felt it was possible within 5 years

Quantum leap did not happen


The “New” MPEG-4 Support object-based features for content

Enable dynamic rendering of contentdefer composition until decoding

Support convergence among digital video, synthetic environments, and the Internet


MPEG-4 Components Systems – defines architecture, multiplexing structure, syntax Video – defines video coding algorithms for animation of synthetic and

natural hybrid video (Synthetic/Natural Hybrid Coding) Audio – defines audio/speech coding, Synthetic/Natural Hybrid Coding

such as MIDI and text-to-speech synthesis integration Conformance Testing – defines compliance requirements for MPEG-4

bitstream and decoders Technical report DSM-CC Multimedia Integration Framework – defines Digital Storage

Media – Command&Control Multimedia Integration Format; specifies merging of broadcast, interactive and conversational multimedia for set-top-boxes and mobile stations


MPEG-4 Example

ISO N3536 MPEG4 CS 414 - Spring 2011

MPEG-4 Example

Daras, P. MPEG-4 Authoring Tool, J. Applied Signal Processing, 9, 1-18, 2003


Interactive Drama

http://www.interactivestory.netCS 414 - Spring 2011

MPEG-4 Characteristics and Applications


Media Objects

An object is called a media objectreal and synthetic images; analog and

synthetic audio; animated faces; interaction

Compose media objects into a hierarchical representation form compound, dynamic scenes


Composition

Scene

furniture

desk globe

character

voicesprite


Video Syntax Structure


New MPEG-4 Aspect: Object-based layered syntactic structure

Content-based Interactivity Achieves different qualities for different objects

with a fine granularity in spatial resolution, temporal resolution and decoding complexity

Needs coding of video objects with arbitrary shapes

Scalability Spatial and temporal scalability Need more than one layer of information (base and

enhancement layers)


Examples of Base and Enhancement Layers


Coding of Objects Each VOP corresponds to an entity that after

being coded is added to the bit stream Encoder sends together with VOP

Composition information where and when each VOP is to be displayed

Users are allowed to change the composition of the entire scene displayed by interacting with the composition information


Spatial Scalability


VOP which is temporally coincident with I-VOP in the base layer, is encoded as P-VOP in the enhancement layer. VOP which is temporally coincident with P-VOP in the base layer is encoded as B-VOP in the enhancement layer.

Temporal Scalability


Composition (cont.)

Encode objects in separate channelsencode using most efficient mechanism transmit each object in a separate stream

Composition takes place at the decoder,rather than at the encoder requires a binary scene description (BIFS)

BIFS is low-level language for describing:hierarchical, spatial, and temporal relations


MPEG-4 Rendering


Interaction as Objects

Change colors of objects Toggle visibility of objects Navigate to different content sections Select from multiple camera views

change current camera angle Standardizes content and interaction

e.g., broadcast HDTV and stored DVD


Hierarchical Model

Each MPEG-4 movie composed of trackseach track composed of media elements (one

reserved for BIFS information)each media element is an objecteach object is a audio, video, sprite, etc.

Each object specifies its:spatial information relative to a parent temporal information relative to global timeline


Synchronization

Global timeline (high-resolution units)e.g., 600 units/sec

Each continuous track specifies relatione.g., if a video is 30 fps, then a frame should

be displayed every 33 ms.

Others specify start/end time


MPEG-4 parts MPEG-4 part 2

Includes Advanced Simple Profile, used by codecs such as Quicktime 6

MPEG-4 part 10 MPEG-4 AVC/H.264 also called Advanced

Video Coding Used by coders such as Quicktime 7Used by high-definition video media like Blu-

ray DiscCS 414 - Spring 2011

CS 414 - Spring 2011 CS 414 – Multimedia Systems Design Lecture 11 – MP3 Audio & Introduction to...

Documents

Transcript of CS 414 - Spring 2011 CS 414 – Multimedia Systems Design Lecture 11 – MP3 Audio & Introduction to...