The MPEG-4 General Audio Coder€¦ · Scalable GA Coder : Combination with CELP Coder (I) • Very...
Transcript of The MPEG-4 General Audio Coder€¦ · Scalable GA Coder : Combination with CELP Coder (I) • Very...
grl 6/98 page 1
115
FRAUNHOfER InstitutIntegrierte Schaltungen
The MPEG-4 General Audio Coder
The MPEG-4 General Audio Coder
Bernhard GrillFraunhofer Institute for Integrated Circuits (IIS)
grl 6/98 page 2
115
FRAUNHOfER InstitutIntegrierte Schaltungen
The MPEG-4 General Audio Coder
Outline • MPEG-2 Advanced Audio Coding (AAC)
• MPEG-4 Extensions:– Perceptual Noise Substitution (PNS)
– Long Term Prediction
– TwinVQ Coding Core
• The MPEG-4 Scalable General Audio Coder
• Results of Listening Tests
• Demonstration of a Real-Time Player
grl 6/98 page 3
115
FRAUNHOfER InstitutIntegrierte Schaltungen
The MPEG-4 General Audio Coder
MPEG-2 AACIn
put t
ime
sign
al Inte
nsity
/C
oupl
ing
Bitstream Multiplexer
Perceptual Model
Gain Control
Filter Bank
TNS M/S
Rate/Distortion Control
Noi
sele
ss
Cod
ing
Bitstream Output
Sca
leF
acto
rs
Pre
dict
ion
Quant.
EncoderOverview:
grl 6/98 page 4
115
FRAUNHOfER InstitutIntegrierte Schaltungen
The MPEG-4 General Audio Coder
Inpu
t tim
e si
gnal In
tens
ity/
Cou
plin
g
Bitstream Multiplexer
Perceptual Model
Gain Control
Filter Bank
TNS M/S
Rate/Distortion Control
Noi
sele
ss
Cod
ing
Bitstream Output
Sca
leF
acto
rs
Pre
dict
ion
Quant.
Extension: Perceptual Noise Substitution (PNS)
PNS
grl 6/98 page 5
115
FRAUNHOfER InstitutIntegrierte Schaltungen
The MPEG-4 General Audio Coder
Perceptual Noise Substitution (2)• Parametric coding of noise-like signal components has been used
widely e.g. in speech coding
• Perceptual Noise Substitution (PNS) permits a frequency selectiveparametric coding of noise-like signal components
• Noise-like signal components are detected on a scalefactor band basis
• Corresponding groups of spectral coefficients are excluded fromquantization/coding
• Instead, only a "noise substitution flag" plus total power of thesubstituted band is transmitted in the bitstream
• Decoder inserts pseudo random vectors with desired target power asspectral coefficients
Background:
MPEG-4:
grl 6/98 page 6
115
FRAUNHOfER InstitutIntegrierte Schaltungen
The MPEG-4 General Audio Coder
Perceptual Noise Substitution (3)"Perceptual Noise
Substitution" (PNS):
Perceptual coder +
parametric represent.
of noise-like signals
Quantization & Coding
Bitstream Multiplexer
Analysis Filterbank
Perceptual Model
Bitstream
Out
Audio
Input
Inverse Quantization
Synthesis Filterbank
Audio
Output
Bitstream
In
Bitstream Demultiplexer
Encoder
Decoder
Noise Detection
Noise subst. signaling
Substituted signal energies
Noise Generator
Noise subst. signaling Substituted signal energies
grl 6/98 page 7
115
FRAUNHOfER InstitutIntegrierte Schaltungen
The MPEG-4 General Audio Coder
Inpu
t tim
e si
gnal In
tens
ity/
Cou
plin
g
Bitstream Multiplexer
Perceptual Model
Gain Control
Filter Bank
TNS M/S
Rate/Distortion Control
Noi
sele
ss
Cod
ing
Bitstream Output
Sca
leF
acto
rs
Pre
dict
ion
Quant.
Extension 2: Long Term Prediction
grl 6/98 page 8
115
FRAUNHOfER InstitutIntegrierte Schaltungen
The MPEG-4 General Audio Coder
Long Term Prediction (2)• Tone-like signals require much higher coding precision than noise-like
signals (e.g. 20 dB vs. 6 dB)
• Tonal signal components are predictable
• Prediction of each spectral coefficient with backward adaptive predictor
• High complexity (ca. 50% of decoder computation & RAM)
• Long Term Predictor (LTP) as known from speech coding
• Lower complexity: Saving of approx. 50% in terms of computation andmemory over MPEG-2 predictors
• Comparable performance to MPEG-2 predictors
Motivation:
MPEG-2 AAC:
MPEG-4:
grl 6/98 page 9
115
FRAUNHOfER InstitutIntegrierte Schaltungen
The MPEG-4 General Audio Coder
Extension 3: Twin-VQIn
put t
ime
sign
al Inte
nsity
/C
oupl
ing
Bitstream Multiplexer
Perceptual Model
Gain Control
Filter Bank
TNS M/S
Rate/Distortion Control
Noi
sele
ss
Cod
ing
Bitstream Output
Sca
leF
acto
rs
Pre
dict
ion
Quant.
grl 6/98 page 10
115
FRAUNHOfER InstitutIntegrierte Schaltungen
The MPEG-4 General Audio Coder
Transform-Domain Weighted Interleave VQ (2)• Audio coding at extremely low bitrates (6-8 kbit/s)
• CELP speech coders do not perform well for music
• 0.5 Bits per frequency line at these data rates !!
• Transform-Domain Weighted Interleave Vector Quantization (TwinVQ)as alternative coding kernel
– Vector selection under control of the perceptual model
• Fully integrated into MPEG-4 AAC coding system:
– Uses same spectral representation as AAC coder
– Makes use of other MPEG-4 tools(e.g. LTP, TNS, joint stereo coding)
Background:
MPEG-4:
grl 6/98 page 11
115
FRAUNHOfER InstitutIntegrierte Schaltungen
The MPEG-4 General Audio Coder
Transform-Domain Weighted Interleave VQ (3)• Normalization of spectral coefficients:
– LPC envelope (overall spectral shape)
– Periodic component coding (harmonic components)
– Bark-scale envelope coding (additional flattening)
• Vector Quantization (VQ) process:
– Interleaving of spectral coefficients into new sub-vectors
– Vector quantization(two sets of codebooks, weighted distortion measure allowsdistortion control by perceptual model)
⇒ no bit/noise allocation or rate control iteration
Structure:
grl 6/98 page 12
115
FRAUNHOfER InstitutIntegrierte Schaltungen
The MPEG-4 General Audio Coder
Definition:
Types ofScalability:
• Capability to decode useful sub-sets of the bitstream
• SNR / NMR (Noise to Mask Ratio) Scalability:
– “Extension layers improve the SNR/NMR of the coded signal”
• Audio Bandwidth Scalability:
– “Extension layers increase the decodable audio band width”
• Restriction of Generality:
– Very low bit rate core coder optimized for special signals, e.g speech.Additional layers provide good quality for all types of signals.
• Implementation Complexity:
Scalability
grl 6/98 page 13
115
FRAUNHOfER InstitutIntegrierte Schaltungen
The MPEG-4 General Audio Coder
Application examples• Network based (packetized) transmission
– Requires routers which know about the importance of a packet
– Less important (outer layer) packets may be dropped if the availablebandwidth decreases
• Broadcast
– The most important (inner layer) packets are transmitted with abetter error protection scheme
• Music data base
– High quality content is encoded and stored
– Access to a lower quality version is possible without recoding toallow for pre-listening with a lower quality
grl 6/98 page 14
115
FRAUNHOfER InstitutIntegrierte Schaltungen
The MPEG-4 General Audio Coder
Scalable GA Coder (I)
• Encoding of the error signal of an AAC or Twin-VQ Quantizationand Coding (Q&C) module in a second, or third, or n-th similarquantization module in the frequency domain
• Solutions using only AAC, or only Twin-VQ modules possible
• Additionally, Twin-VQ / AAC combinations defined
• Useful for large enhancement steps of >= 8 kbit/s per step
MDCT
Perceptual Model
Q&C ReconstructSpectrum Q&C+
-FSS
Encoder Block Diagram
grl 6/98 page 15
115
FRAUNHOfER InstitutIntegrierte Schaltungen
The MPEG-4 General Audio Coder
Scalable GA Coder (II)• Twin-VQ Q&C Modules
– 8 kbit/s fixed step size Vector Quantizer (VQ) modules
– optional 6 kbit/s in first layer
– first choice for a 6 or 8 kbit/s base layer for the codingof general audio signals
• AAC Q&C modules
– Any step size possible
– Reasonable step sizes from 8 to >64 kbit/s
– The same end quality can be achieved as from a singlestep AAC coder
– However, a higher bit rate may be required for the sameaudio quality
Decoder Block Diagram
ReconstructSpectrum
ReconstructSpectrum Inv.
FSS +Add
IMDCT
grl 6/98 page 16
115
FRAUNHOfER InstitutIntegrierte Schaltungen
The MPEG-4 General Audio Coder
Scalable GA Coder : Combination with CELP Coder (I)
• Very low bitrate core coder ( e.g. speech coder)
• Core coder typically operating at a lower sampling frequency
• MDCT used for efficient up-sampling
MDCT
MDCT
CELPCODEC FSS
Perceptual. model
Quantization& Coding
Encoder
grl 6/98 page 17
115
FRAUNHOfER InstitutIntegrierte Schaltungen
The MPEG-4 General Audio Coder
Scalable GA Coder : Combination with Core Coder (II)
IMDCTMDCTCELPDECODER IFSS
Decoder
Re-quantization
Re-quantization
+
grl 6/98 page 18
115
FRAUNHOfER InstitutIntegrierte Schaltungen
The MPEG-4 General Audio Coder
Scalable Stereo Coding: Stereo / Stereo
grl 6/98 page 19
115
FRAUNHOfER InstitutIntegrierte Schaltungen
The MPEG-4 General Audio Coder
Scalable Stereo Coding: Mono / Stereo
grl 6/98 page 20
115
FRAUNHOfER InstitutIntegrierte Schaltungen
The MPEG-4 General Audio Coder
Scalable Stereo Coding: Mono Core / Mono GA / Stereo GA
grl 6/98 page 21
115
FRAUNHOfER InstitutIntegrierte Schaltungen
The MPEG-4 General Audio Coder
Scalable GA Coder : Typical Configurations• Some successfully tested mono/mono combinations:
6 kbit/s CELP + 18 kbit/s AAC6 kbit/s TwinVQ + 18 kbit/s AAC8 kbit/s TwinVQ + 8 kbit/s TwinVQ
6 kbit/s CELP + 18 kbit/s + 24 kbit/s AAC
• Mono/stereo combinations
6 kbit/s mono CELP + 18 kbit/s mono + 24 kbit/s stereo AAC24 kbit/s mono + 16 kbit/s stereo + 16 kbit/s stereo AAC24 kbit/s mono + 72 kbit/s stereo AAC
• Stereo/stereo combinations
2 x 6 kbit/s mono CELP + 36 kbit/s stereo AAC
grl 6/98 page 22
115
FRAUNHOfER InstitutIntegrierte Schaltungen
The MPEG-4 General Audio Coder
Results (I) Mono Configurations
0
0,5
1
1,5
2
2,5
3
3,5
4
4,5
Ser ies 1
Ser ies 2 3,5 4,1 3,85 3,65 3,27
Lay er 3 24 kb it/s
A A C 24 kb it/s
CELP+AA C 24 kb it/s
Tw inV Q + A A C
24 kb it/s
A A C 18 kb it/s
grl 6/98 page 23
115
FRAUNHOfER InstitutIntegrierte Schaltungen
The MPEG-4 General Audio Coder
Results (II) Mono / Stereo Configuration
0
0,5
1
1,5
2
2,5
3
3,5
4
4,5
5
L3 24k bit/s
A A C24
k bit/s
s c al24 24k bit/s
L3 40k bit/s
A A C40
k bit/s
s c al40
k bit/s
L3 56k bit/s
A A C56
k bit/s
s c al56
k bit/s
Mono Stereo
grl 6/98 page 24
115
FRAUNHOfER InstitutIntegrierte Schaltungen
The MPEG-4 General Audio Coder
Conclusions • Highest quality coding with proven AAC technology
• PNS, LTP and TwinVQ further enhance the very low bitrateperformance
• Mono, Stereo, and Multi-channel Stereo supported
• Bitrate range 6 - ~300 kbit/s per channel at 8 - 96 kHz SR
• Additional flexibility with the scalable coding modes
– Unique capabilities through the availability of the mono-stereo coding modes
• Overall complexity within the limits of today’s hardware
• ==>
• The MPEG-4 GA coder the most versatile audio codingsystem available today
• Low-Delay and Error Resilience Additions inMPEG-4 Version 2