CS 294-9 :: Fall 2003 Audio Coding Ketan Mayer-Patel.
-
Upload
geoffrey-curtis -
Category
Documents
-
view
216 -
download
2
Transcript of CS 294-9 :: Fall 2003 Audio Coding Ketan Mayer-Patel.
CS 294-9 :: Fall 2003
Overview of Today• PCM
– Linear– -LaW
• DPCM
• ADPCM
• MPEG-1
• Vocoding
Sampling Techniques
Generic Coding Techniques
Psychoacoutic Coding
Speech Specific Techniques
CS 294-9 :: Fall 2003
Audio Signals• Analog audio is basically voltage as a continuous function
of time.
• Unlike video which is 3D, audio is a 1D signal. – Can capture without having to discretize the higher dimensions.
• Audio sampling basically boils down to quantizing signal level to a set of values.
• Digital audio parameters:– bits per sample
– sampling rate
– number of channels.
CS 294-9 :: Fall 2003
Sampling
• Pulse Amplitude Modulation (PAM)– Each sample’s amplitude is represented by 1 analog value
• Sampling theory (Nyquist)– If input signal has maximum frequency (bandwidth) f,
sampling frequency must be at least 2f
– With a low-pass filter to interpolate between samples, the input signal can be fully reconstructed
CS 294-9 :: Fall 2003
PCM
• Pulse Code Modulation (PCM)– Each sample’s amplitude represented by an integer code-word
– Each bit of resolution adds 6 dB of dynamic range
– Number of bits required depends on the amount of noise that is tolerated
010000110010000100001001101010111100
Quantization error (“noise”)
n = SNR – 4.77
6.02
CS 294-9 :: Fall 2003
Linear PCM• Uses evenly spaced quantization levels.
• Typically 16-bits per sample.
• Provides a large dynamic range.
• Difficult for humans to perceive quantization noise.
• Compact Disks– 16-bit linear sampling– 44.1 KHz sampling rate– 2 channels
CS 294-9 :: Fall 2003
Non-linear Sampling• If we try to use 8 bits per sample, dynamic
range is reduced significantly and quantization noise can be heard.
• In particular, we end up with not enough levels for the lower amplitudes.
• Solution is to sample more densely in the lower amplitudes and less densely for the higher amplitudes.
• Sort of like a log scale.
CS 294-9 :: Fall 2003
-law and A-law• Non-linear sampling called “companding”
• 8-bits companded provides dynamic range equivalent to 12-bits.
• U-law and A-law are companding standards defined in G.711
• Difference is in exact shape of piece-wise linear companding function.
CS 294-9 :: Fall 2003
f(x) = 127 x sign(x) x ln(1 + |x|)
ln(1 + )(x normalized to [-1, 1])
-Law companding
• Provides 14-bit quality (dynamic range) with an 8-bit encoding
• Used in North American & Japanese ISDN voice service• Simple to compute encoding
CS 294-9 :: Fall 2003
High-resolutionPCM encoding(12, 14, 16 bits)
8-bit-Law
encoding
14-bitdecoding
Sender
TableLookupTable
Lookup
InverseTable
Lookup
InverseTable
Lookup
Receiver
InputAmplitude
0-11-3
29-3131-35
91-9595-103
215-223223-239
463-479
StepSize
1
2
4
8
16
Segment
000
001
010
011
Quanti-
zation00000001
11110000
11110000
11110000
1111
CodeValue
01
1516
3132
4748
63
......
......
......
......
......
......
......
......
-Law Encoding
CS 294-9 :: Fall 2003
High-resolutionPCM encoding(12, 14, 16 bits)
8-bit-Law
encoding
14-bitdecoding
Sender
TableLookupTable
Lookup
InverseTable
Lookup
InverseTable
Lookup
Receiver
Multiplier
1
2
4
8
16
-LawEndoding00000000000001
00011110010000
00111110100000
01011110110000
0111111
DecodeAmplitude
02
3033
9399
219231
471
......
......
......
......
-Law Decoding
CS 294-9 :: Fall 2003
010000110010000100001001101010111100
Difference Encoding
• Differential-PCM (DPCM)– Exploit temporal redundancy in samples
– Difference between 2 x-bit samples can be represented with significantly fewer than x-bits
– Transmit the difference (rather than the sample)
CS 294-9 :: Fall 2003
010000110010000100001001101010111100
“Slope Overload”
Slope Overload Problem
• Differences in high frequency signals near the Nyquist frequency cannot be represented with a smaller number of bits!– Error introduced leads to severe distortion in the higher
frequencies
CS 294-9 :: Fall 2003
Adaptive DPCM (ADPCM)
• Use a larger step-size to encode differences between high-frequency samples & a smaller step-size for differences between low-frequency samples
• Use previous sample values to estimate changes in the signal in the near future
CS 294-9 :: Fall 2003
++
++PredictorPredictor
+
–
+
y-bitPCM
sample
x-bitADPCM
“difference”DifferenceQuantizerDifferenceQuantizer
Step-SizeAdjusterStep-SizeAdjuster
DequantizerDequantizer+
PredictedPCM
Sample n+1
ADPCM• To ensure differences are always small...
– Adaptively change the step-size (quanta)– (Adaptively) attempt to predict next sample
value
CS 294-9 :: Fall 2003
++
++RegisterRegister
+
–
+
16-bitPCM
sample
4-bitADPCM
differenceDifferenceQuantizerDifferenceQuantizer
Step-SizeAdjusterStep-SizeAdjuster
DequantizerDequantizer+
PCMSample n–1
IMA’s proposed ADPCM
• Predictor is not adaptive and simply uses the last sample value
• Quantization step-size increases logarithmically with signal frequency
CS 294-9 :: Fall 2003
++
++RegisterRegister
+
–
+
16-bitPCM
samplePCM
samplen–1
4-bitADPCM
difference(in step-size units)
DifferenceQuantizerDifferenceQuantizer
Step-SizeAdjuster
Step-SizeAdjuster
DequantizerDequantizer+
difference < step_size 000 0.0
step_size < difference < step_size 001 0.25
step_size < difference < step_size 010 0.50
step_size < difference < step_size 011 0.75
step_size < difference < step_size 100 1.0
step_size < difference < step_size 101 1.25
step_size < difference < step_size 110 1.5
step_size < difference 111 1.75
1 41 23 4
5 43 27 4
1 41 23 4
5 43 27 4
Quantization Step-SizeMultiples
QuantizerOutput
IMA Difference Quantization
CS 294-9 :: Fall 2003
0123456789
1011121314151617
789
101112131416171921232528313437
IndexStepSize
181920212223242526272829303132333435
41455055606673808897
107118130143157173190209
IndexStepSize
363738394041424344454647484950515253
230253279307337371408449494544598658724796876963
10601166
IndexStepSize
545556575859606162636465666768697071
128214111552170718782066227224992749302433273660402644284871535858946484
IndexStepSize
7273747576777879808182838485868788
7132784586309493
10442114871263513899152891681818500203502235824623270862979432767
IndexStepSize
IMA Step-size Table
CS 294-9 :: Fall 2003
++
++RegisterRegister
+
–
+
16-bitPCM
Sample
PCMSample
n–1
4-bitADPCM
difference(in step-size units)
DifferenceQuantizerDifferenceQuantizer
Step-SizeAdjuster
Step-SizeAdjuster
DequantizerDequantizer+
Step-SizeTable
Lookup
Step-SizeTable
Lookup
Range Limit(0 to 88)
Range Limit(0 to 88)
RegisterRegister
Step-SizeTable IndexAdjustment
Lookup
Step-SizeTable IndexAdjustment
Lookup
++Index
Adjustment
PreviousIndex
Quantizer
Output
New Step-Size
Adaptive Step-size Selection
CS 294-9 :: Fall 2003
Step-Size TableIndex Adjustment
QuantizerOutput
000
001
010
011
100
101
110
111
-1
-1
-1
-1
Step-SizeTable
Lookup
Step-SizeTable
Lookup
Range Limit(0 to 88)
Range Limit(0 to 88)
RegisterRegister
Step-SizeTable Index Adjustment
Lookup
Step-SizeTable Index Adjustment
Lookup
++ IndexAdjustment
PreviousIndex
New Step-Size
DifferenceQuantizer
DifferenceQuantizer
difference < step_size
step_size < difference < step_size
step_size < difference < step_size
step_size < difference < step_size
step_size < difference < step_size
step_size < difference < step_size
step_size < difference < step_size
step_size < difference
1 41 23 4
5 43 27 4
1 41 23 4
5 43 27 4
Quantization Step-SizeAdjustment
X 0.91
X 0.91
X 0.91
X 0.91
X 1.21
X 1.46
X 1.77
X 2.14
2
4
6
8
Adaptive Step-size Selection
CS 294-9 :: Fall 2003
X Step Q Adj I M Decode150 7 0 150155 5 7 010 -1 0 0.5 3.5 154167 13 7 111 8 8 1.75 12 166170 4 16 001 -1 7 0.25 4 170250 80 14 111 8 15 1.75 24.5 195250 55 31 111 8 23 1.75 54 249 250 1 66 000 -1 22 0.0 0 249250 1 60 000 -1 21 0.0 0 249200 -49 55 011 -1 20 0.75 -41 208200200200200200200
InputDiffe
rence
Quantizer output
Step Size
Index Adjustment
Step-Size table index
Predicted value
Step-size multip
lier
Reconstituted diffe
rence
+
RegisterRegister
+
–
+
DifferenceQuantizer
DifferenceQuantizer
Step-SizeAdjuster
Step-SizeAdjuster
DequantizerDequantizer+
Xn
Xn–1
+
IMA ADPCM Example
CS 294-9 :: Fall 2003
+++
Step-SizeAdjuster
Step-SizeAdjuster
+PCM
sample n–1
difference < step_size
step_size < difference < step_size
step_size < difference < step_size
step_size < difference < step_size
step_size < difference < step_size
step_size < difference < step_size
step_size < difference < step_size
step_size < difference
1 41 23 4
5 43 27 4
1 41 23 4
5 43 27 4
Quantization Step-Size TableIndex Adjustment
QuantizerOutput
000 -1
001 -1
010 -1
011 -1
100 2
101 4
110 6
111 8
DequantizerDequantizer
RegisterRegister
Networking ConsiderationsThe IMA codec is reasonably robust to errors
An interval with a low-level signal will correct any step-size error
CS 294-9 :: Fall 2003
Psychoacoustic Properties
• Human perception of sound is a function of frequency and signal strength– (MPEG exploits this relationship.)
100
80
60
40
20
0
SoundLevel(dB)
0.02 0.05 0.1 0.2 0.5 1 2 5 10 20
Frequency(kHz)
Inaudible
Audible
CS 294-9 :: Fall 2003
100
80
60
40
20
0
SoundLevel(dB)
0.02 0.05 0.1 0.2 0.5 1 2 5 10 20
Frequency(kHz)
Inaudible
Audible
Masking tone
Masked tone
Auditory Masking
• The presence of tones at certain frequencies makes us unable to perceive tones at other “nearby” frequencies– Humans cannot distinguish between tones within 100 Hz at
low frequencies and 4 kHz at high frequencies
CS 294-9 :: Fall 2003
MPEG Encoder Block Diagram
Mapping Quantizer Coding
FramePacking
Psycho-acoutsticModel
PCM Audio Samples(32, 44.1, 48 kHz)
EncodedBitstream
Ancillary Data
CS 294-9 :: Fall 2003
Subband Filter• Transforms signal from time domain to
frequency domain.– 32 PCM samples yields 32 subband samples.
• Each subband corresponds to a freq. band evenly spaced from 0 to Nyquist freq.
– Filter actually works on a window of 512 samples that is shifted over 32 samples at a time.
• Subband coefficients are analyzed with psychoacoustic model, quantized, and coded.
CS 294-9 :: Fall 2003
Layer 1• 384 samples per frame.
• Iterative bit allocation process:– For each subband, determine MNR.– Increase number of quantization bits for
subband with smallest MNR.– Iterate until all bits used.
• Fixed allocation of bits among subbands for a particular frame.
• Up to 448 kb/s
CS 294-9 :: Fall 2003
Layer 2• 1152 samples per frame.
• Iterative bit allocation.
• Subband allocation is dynamic.
• Up to 384 kb/s
CS 294-9 :: Fall 2003
Layer 3• 1152 samples
– Up to 320 kb/s
• Each subband further analyzed using MDCT to create 576 frequency lines.– 4 different windowing schemes depending on
whether samples contain “attack” of new frequencies.
• Lots of bit allocation options for quantizing frequency coefficients.
• Quantized coefficients Huffman coded.
CS 294-9 :: Fall 2003
Vo-coding
• Concept: Develop a mathematical model of the vocal cords & throat– Derive/compute model parameters for
a short interval and transmit to the decoder
– Use the parameters to synthesize speech at the decoder
• So what is a good model?– A “buzzer” in a “tube”!
– The buzzer is characterized by its intensity & pitch
– The tube is characterized by its formants
CS 294-9 :: Fall 2003
75
60
45
30
15
0
Amplitude
Frequency(kHz)
Vocoding - Basic Concepts
• Formant — frequency maxima & minima in the spectrum of the speech signal
• Vocoders group and code portions of the signal by amplitude
CS 294-9 :: Fall 2003
“yadda yadda yadda”
y(n) = ak y(n – k) + G x x(n)k=1
p
• Linear Predictive Coding (LPC)– A sample is represented as a linear combination of p
previous samples
“Buzzer” and “Tube” Model
• Vocoding principles:– voice = formants + buzz pitch & intensity
– voice – estimated formants = “residue”
CS 294-9 :: Fall 2003
LPC• Decoder artificially generates speech via formant synthesis
– A mathematical simulation of the vocal tract as a series of bandpass filters
– Encoder codes & transmit filter coefficients, pitch period, gain factor, & nature of excitation
• Standards:– Regular Pulse Excited Linear Predictive Coder (RPE-LPC)
• Digital cellular standard GSM 6.1 (13 kbps)
– Code Excited Linear Predictive Coder (CELP)• US Federal Standard 1016 (4.8 kbps)
– Linear Predictive Coder (LPC)• US Federal Standard 1015 (2.4 kbps)