MPEG-1 MUMT-614 Jan.23, 2002 Wes Hatch. Purpose of MPEG encoding To decrease data rate How? –two...
-
Upload
lillie-wamsley -
Category
Documents
-
view
216 -
download
0
Transcript of MPEG-1 MUMT-614 Jan.23, 2002 Wes Hatch. Purpose of MPEG encoding To decrease data rate How? –two...
Purpose of MPEG encoding
• To decrease data rate
• How?– two choices:
• could decrease sample rate, but this would cause a decrease in available BW
• or, we could decrease word size. This will introduce noise into the signal (lower S:N ratio).
• Solution:– perceptual codingreduce word size based on signal
conditions
Quick Overview
• MPEG removes “irrelevancy” & statistical redundancy
• lossy (but not perceptibly so)
• 1.41 Mbps (cd audio) between 64 and 448 kbps. (95% to 68% reduction)
• ratios of 4:1, 6:1 can be transparent in advanced listening tests
• supports 32, 44.1, 48 kHz sampling rates
MPEG-1 Types
• 3 layers: I, II, and III– I is simplest III is most complex– a layer can play encodings of those beneath it
• eg. Layer III can play I, II, and III; layer II may only play I and II
Components
• There are two “components” to MPEG-1: the encoder and the decoder. – the decoder is what is actually described under
the specification; the encoder is not.– improvements to the encoder will have
immediate effects in quality without necessitating corresponding changes to the decoder
Encoder vs. Decoder
• Encoder– does all the work– forward adaptive encoding
• all allocation of bits is performed by the encoder
• the psychoacoustic model used to determine “irrelevant” data is contained here
• improving psychoacoustic models/changes to encoder doesn’t require changing the decoder
• Decoder does less work
Encoder Details
• audio (PCM) passes through a polyphase filter bank, splitting the signal into 32 bands– filter outputs one sample per band for every 32
samples in• layer I: after each band gets 12 samples the decoder
determines the bit allocation for that band
• layer II: operates on 12 x 3= 36 samples per band (larger frame). Lower bands may receive: 15 bits, middle: 7 bits, and high: 3 bits max
• layer III is different…we’ll come back to it.
Encoder Details
• FFT is performed (w/Hann window)– 512 point for layer I– 1024 point for layer II
• a psychoacoustic model compares the output and is used to calculate masking thresholds
• used to determine which are the audible components (ie. SMR)
How bits are allocated
• data in the band is coded, NOT the FFT data.
• more “audible” components (ie. those highest above the masking threshold) are assigned the most bits
Encoder Details
• Scale factor is calculated– largest sample value in the band for each frame
is found. Each of the 12 samples in the band are divided by this factor
– layer II has 3 scale factors (for 3 groups of 12 samples), but one may suffice if the differences are small
• Corresponds to max. SPL in each band
Encoder Details (layer III)
• layer III:– each band is transformed into 18 spectral
coefficients with a MDCT (50% overlap)• gives 576 coefficients, each representing a BW of
41.67 at 48 kHz24ms
– window size of the MDCT is variable• long window for steady state signals (36 samples) to
small windows for transient (12 samples)
Encoder Details (layer III)
• framerate varies in layer III
• can also use a bit reservoir for if more accuracy is needed
• Huffman encoding employed
Encoder
• a portion of the data stream is consumed by coding info:– headers– bit allocation info– scale factors– samples from each band
Other Features
• stereo joint coding– stereophonic irrelevance/redundancy eliminated– sum and difference signals (layer III)– L/R high frequency band samples summed into
one channel, but scale factors remain independent
Decoder Details
• Put signal back together:– decode bit allocation info– samples multiplied by scale factors and run
through an inverse filterbank– delays typically range from 10 to 30ms