CS 414 - Spring 2012 CS 414 – Multimedia Systems Design Lecture 11 – MP3 and MP4 Audio (Part 7)...
-
Upload
ericka-sayers -
Category
Documents
-
view
218 -
download
0
Transcript of CS 414 - Spring 2012 CS 414 – Multimedia Systems Design Lecture 11 – MP3 and MP4 Audio (Part 7)...
CS 414 - Spring 2012
CS 414 – Multimedia Systems Design Lecture 11 – MP3 and MP4 Audio (Part 7)
Klara Nahrstedt
Spring 2012
Outline MP3 Audio Encoding MP4 Audio Reading:
Media Coding book, Section 7.7.2 – 7.7.5 Recommended Paper on MP3: Davis Pan, “A Tutorial on MPEG/Audio
Compression”, IEEE Multimedia, pp. 6-74, 1995 Recommended books on JPEG/ MPEG Audio/Video Fundamentals:
Haskell, Puri, Netravali, “Digital Video: An Introduction to MPEG-2”, Chapman and Hall, 1996
CS 414 - Spring 2012
Why Compression is Needed
Data rate = sampling rate * quantization bits * channels (+ control information)
For example (digital audio):44100 Hz; 16 bits; 2 channelsgenerates about 1.4M of data per second;
84M per minute; 5G per hour
MPEG-1 Audio
Lossy compression of audio In late 1980’s ISO’s MPEG group started to
standardize TV broadcastingUse of Audio on CD-ROM (later DVD)
MPEG-1 Audio – 1992 MPEG-2 Audio - 1994 MPEG-1 Audio Layer I, II, III
CS 414 - Spring 2012
Criteria for A Good Standard
Achieve desired outcome Be comprehensible Allow efficient
implementation Support competition Give benchmark tests Be supported by industry Be good for end users ….
Two models: implement first, then
standardize standardize first, then
implement
MPEG-1 Audio Layer II Called MP2 Dominant standard for audio broadcasting
DAB digital radio and DVB digital television
Came out of MUSICAM codecs with bit rates 64-196 kbps MUSICAM audio coding - basis for MPEG-1 and MPEG-2 audio
Sampling rates: 32, 44.1, 48 kHz Bit rates: 32, 48, 56, 64, 80, 96, … 384 kbps Format: mono, stereo, dual channel, …
MP2 – sub-band audio encoder in time domain
MPEG-1 Audio Layer III
MPEG-1 Layer III is called MP3 format Popular for PC and Internet applicationsGoal to compress to 128 kbps, but can be
compressed to higher or lower resulting quality
Utilization of psychoacoustics Scientific study of sound perception .
CS 414 - Spring 2012
MPEG Audio – MP3 First psychoacoustic masking code was
proposed in 1979 in AT&T – Bell Labs, Murray Hill.
MP3 based on OCF (optimum coding in frequency domain) and PXFM (Perceptual transform coding)
MPEG-1 Audio Layer III – public release 1993 MPEG-2 Audio III – public release 1995
CS 414 - Spring 2012
MPEG Audio – MP3
1997 – mp3.com – offering thousands of MP3s created by independent artists for free
1999 – Napster MP3 peer-to-peer file sharing
Problem: copyright infringement Authorized services: Amazon.com,
Rhapsody, Juno Records, ..
CS 414 - Spring 2012
MPEG-1 Audio Encoding
CharacteristicsPrecision 16 bitsSampling frequency: 32KHz, 44.1 KHz, 48 KHz3 compression layers: Layer 1, Layer 2, Layer 3
(MP3) Layer 3: 32-320 kbps, target 64 kbps Layer 2: 32-384 kbps, target 128 kbps Layer 1: 32-448 kbps, target 192 kbps
CS 414 - Spring 2012
MPEG Audio Filter Bank Filter bank divides input into multiple sub-bands
(32 equal frequency sub-bands) Sub-band i defined
- filter output sample for sub-band i at time t, C[n] – one of 512 coefficients, x[n] – audio input sample from 512 sample buffer
CS 414 - Spring 2012
]64[*]64[(*64
)16)(12(cos(3][
7
0
7
0
jkxjkCki
iSjk
t
][],31,0[ iSi t
MPEG Audio Psycho-acoustic Model MPEG audio compresses by removing
acoustically irrelevant parts of audio signals Takes advantage of human auditory systems
inability to hear quantization noise under auditory masking
Auditory masking: occurs when ever the presence of a strong audio signal makes a temporal or spectral neighborhood of weaker audio signals imperceptible.
CS 414 - Spring 2012
Loudness and Pitch (Review on Psychoacoustic Effects)
More sensitive to loudness at mid frequencies than at other frequencies intermediate frequencies at [500hz, 5000hz]Human hearing frequencies at [20hz,20000hz]
Perceived loudness of a sound changes based on frequency of that soundbasilar membrane reacts more to intermediate
frequencies than other frequencies
CS 414 - Spring 2012
Fletcher-Munson Contours
Each contour represents an equal perceived sound
CS 414 - Spring 2012
Perception sensitivity (loudness) is not linear across all frequencies and intensities
Masking Effects (Review of Psychoacoustic Effects)
CS 414 - Spring 2012
Frequency masking
Temporal masking
CS 414 - Spring 2012
MPEG/audio divides audio signal into frequency sub-bands that approximate critical bands. Then we quantize each sub-band according to the audibility of quantization noise within the band
MPEG Audio Bit Allocation This process determines number of code bits allocated to each sub-
band based on information from the psycho-acoustic model Algorithm:
1. Compute mask-to-noise ratio: MNR=SNR-SMR Standard provides tables that give estimates for SNR resulting from
quantizing to a given number of quantizer levels
2. Get MNR for each sub-band
3. Search for sub-band with the lowest MNR
4. Allocate code bits to this sub-band. If sub-band gets allocated more code bits than appropriate, look up new
estimate of SNR and repeat step 1
CS 414 - Spring 2012
Audio Quality Bitrate
With too low bit rate, we get compression artifacts Ringing Pre-echo – sound is heard before it occurs. It is most noticeable
in impulsive sounds from percussion instruments such as cymbals Occurs in transform-based audio compression algorithms
Quality of encoder and encoding parametersConstant Bit rate encoding Variable Bit rate encoding
CS 414 - Spring 2012
MP3 Audio Format
CS 414 - Spring 2012
Source: http://wiki.hydrogenaudio.org/images/e/ee/Mp3filestructure.jpg
MPEG Audio Comments Precision of 16 bits per sample is needed to get good SNR
ratio Noise we are getting is quantization noise from the
digitization process For each added bit, we get 6dB better SNR ratio Masking effect means that we can raise the noise floor
around a strong sound because the noise will be masked away
Raising noise floor is the same as using less bits and using less bits is the same as compression
CS 414 - Spring 2012
Successor of MP3 Advanced Audio Coding (AAC) – now part of
MPEG-4 Audio Inclusion of 48 full-bandwidth audio channels Default audio format for iPhone, iPad, Nintendo,
PlayStation, Nokia, Android, BlackBerry Introduced 1997 as MPEG-2 Part 7 In 1999 – updated and included in MPEG-4
CS 414 - Spring 2012
AAC’s Improvements over MP3
More sample frequencies (8-96 kHz) Arbitrary bit rates and variable frame
length Higher efficiency and simpler filterbank
Uses pure MDCT (modified discrete cosine transform)
Used in Windows Media Audio
CS 414 - Spring 2012
MPEG-4 Audio
Variety of applications General audio signalsSpeech signalsSynthetic audioSynthesized speech (structured audio)
CS 414 - Spring 2012
MPEG-4 Audio Part 3
Includes variety of audio coding technologiesLossy speech coding (e.g., CELP)
CELP – code-excited linear prediction – speech coding
General audio coding (AAC)Lossless audio codingText-to-Speech interfaceStructured Audio (e.g., MIDI)
CS 414 - Spring 2012
MPEG-4 Part 14 Called MP4 with Extension .mp4 Multimedia container format Stores digital video and audio streams and
allows streaming over Internet Container or wrapper format
meta-file format whose spec describes how different data elements and metadata coesit in computer file
CS 414 - Spring 2012
MPEG-4 Audio Bit-rate 2-64kbps Scalable for variable rates MPEG-4 defines set of coders
Parametric Coding Techniques: low bit-rate 2-6kbps, 8kHz sampling frequency
Code Excited Linear Prediction: medium bit-rates 6-24 kbps, 8 and 16 kHz sampling rate
Time Frequency Techniques: high quality audio 16 kbps and higher bit-rates, sampling rate > 7 kHz
CS 414 - Spring 2011