Overview of the AMR concept Global System for Mobile ... · Overview of the AMR concept oUnlike...
Transcript of Overview of the AMR concept Global System for Mobile ... · Overview of the AMR concept oUnlike...
1
Global System for Mobile Communications (GSM)
Eloi Batlle
Digital Speech Processing
Universitat Pompeu Fabra
Overview of the AMR concept
o Unlike previous GSM codecs which operate at a fixed rate and constant error protection level, AMR adapts to the local radio channel and traffic conditions
o Full-rate only for maximum robustness to channel errors
o Half-rate only for maximum capacity advantage
Multi-rate adaptation
o The AMR adaptation is based on the quality of the radio channel
o Modes
7.95 kbits/s 5.90 kbits/s7.40 kbits/s 5.15 kbits/s6.70 kbits/s 4.75 kbits/s
Half-rate
12.2 kbits/s 6.70 kbits/s10.2 kbits/s 5.90 kbits/s7.95 kbits/s 5.15 kbits/s7.40 kbits/s 4.75 kbits/s
Full-rate
bit-rateChannel
Bit allocation
954.75 kbit/s
1035.15 kbit/s
1185.90 kbit/s
1346.70 kbit/s
1487.40 kbit/s
1597.95 kbit/s
20410.2 kbit/s
24412.2 kbit/s
Bits per frameMode
2
Transmit side Receive side
Notesn 1) 8-bit A-law or -law PCM (ITU-T recommendation G.711), 8
000 samples/s;n 2) 13-bit uniform PCM, 8 000 samples/s;n 3) Voice Activity Detector (VAD) flag;n 4) Encoded speech frame, 50 frames/s, number of bits/frame
depending on the AMR codec mode;n 5) Silence Descriptor (SID) frame (marked SID_FIRST or
SID_UPD);n 6) TX_TYPE, 2 bits, indicates whether information bits are
available and if they are speech or SID information;n 7) Information bits delivered to the radio subsystem;n 8) Information bits received from the radio subsystem;n 9) RX_TYPE, the type of frame received quantized into three
bits, (classified by the RSS).
Discontinuous transmission
o Each direction of transmission is occupied about 50% of the time
o Discontinuous transmission (DTX)n Mobile Station battery life will be prolongedn Better radio frequency spectrum efficiency
o DTX requires some functionsn Voice Activity Detector (VAD)n Comfort noise
3
Voice Activity Detection
o The input to the VAD is a set of parameters computed by the encoder
o Each 20 ms the system decides whether the frame contains speech or not
Comfort noise insertiono When transmission is on, the background
noise is transmitted together with the speech
o When the speech ends, the connection is off and the perceived noise would drop to a very low level
o This step modulation of noise may be perceived as annoying
o Comfort noisen Evaluation of background noisen Noise parameters encoding and decodingn Generation of comfort noise in the receiver
Lost speech frame substitution
o Frames may be lost due to transmission errors
o In order to mask the effect of an isolated lost frame, it is substituted by a predicted one based on previous frames
o For several lost frames, a muting technique shall be used
AMR codec homing
o All modules shall react on a given input sequence always with the corresponding bit exact output sequence, provided that the tested modules are in their home-state when starting
o Special inband signaling frames have been defined to provoke these homing-functions also in remotely placed modules
4
Frequency responseAMR Codec Frequency Response
-35
-30
-25
-20
-15
-10
-5
0
5
50 470 890 1310 1730 2150 2570 2990 3410 3830
Frequency [Hz]
[dB]
12.2 kbit/s
10.2 kbit/s
7.95 kbit/s
7.40 kbit/s
6.70 kbit/s
5.90 kbit/s
5.15 kbit/s
4.75 kbit/s
Speech encodero Pre-processingo Linear prediction analysis and quantizationo Open-loop pitch analysiso Impulse response computationo Target signal computationo Adaptive codebooko Algebraic codebooko Quantization of the adaptive and fixed
codebook gainso Memory update
Speech encoder (2)
o Speech frames of 20ms (160 samples)
o 8000 samples/s frequencyo Speech frame divided into 4
subframes of 5 ms (40 samples)o Adaptive and fixed codebook
parameters are transmitted every subframe
Speech encoder (and 3)
w indowingand a utoc orr elationR [ ]
Le vinson-Durbin
R[ ] A(z)
A(z)
LSPquantiza tion
c ompute ta rgetfor
innova tion
update filterme mories forne xt subfra me
O pe n-loop pitch se arc h Adaptive c ode bookse ar ch
Innovativ e codebookse ar ch
Filter m em oryupdate
inte rpola tion
subframe sLSP A(z)
LSP
c omputew eighte d
spee ch(4 subfra me s)
findope n-loop pitc h
find be st innova tion
fixed codebook
ga in qua ntiz ation
A(z )^
x(n)
pitchindex
codeinde x
frame subframe
s (n) c ompute targetfor a da ptivec odebook
Tofind be st de lay
a nd gain
x (n)
c omputeimpulse
re spons eA( z)^
A(z) h(n)
h(n)
A(z)
LPC analy sis(twice pe r frame )
A(z)
(twice pe r fram e)
x (n)2
qua ntizeLTP-gain
computeadaptive
codebookcontr ibution
LSPindic es
LTPgain
index
gain indexfixe d code book
inte rpola tionfor the 4
subfr amesLSP A(z)^
for the 4
Pre -proc es sing
Pr e-proc ess ing
computeexcita tion
5
Speech decoder
LSPin dices
d ecode LSP
int erpo latio n of LSP fo r th e4 su bframe s
LSP
d ecodead aptiv eco deboo k
deco deinn ovati veco debo ok
p itchi ndex
c odein dex
deco degains
A(z)^
co nstructexcit ation
frame subfram e post-processing
s'(n)^s(n)^p ost f ilter
g ainsindi ces
sy nthesisf ilter
CELP synthesis modelo Code-excited linear predictive coding model
A(z)1 s(n)^
+
v(n)
c(n)
u(n)
gc
fixedcodebook
adaptive codebook gp
LP synthesis
post-filtering s'(n)^
Pre-processing
o High-pass filteringo Down-scaling (factor of 2)
21
21
1 911376953.0906005859.11927246903.08544941.1927246093.0)( −−
−−
+−+−=
zzzzzHh
Windowing and autocorrelationo 12.2 kbits/sn Twice per frame analysis using two different
asymmetric windows with L 1(I)=160, L 2
(I)=80, L1
(II)=232 and L 2(II)=8
−+=
−−
−=
−
−=
−+=
−
−+
−=
−
−=
1,, , 14
)(2cos
,1,,0 , 12
2cos46.054.0
)(
.1,, , 1
)(cos46.054.0
,1,,0 , 1
cos46.054.0
)(
)(2
)(1
)(1)(
2
)(1
)(1)(
1
)(2
)(1
)(1)(
2
)(1
)(1)(
1
IIIIIIII
II
II
II
II
III
I
I
II
I
LLLnL
Ln
LnL
n
nw
LLLnL
Ln
LnL
n
nw
K
K
K
K
π
π
π
π
6
Windowing and autocorrelation (2)
o 12.2 kbits/s
20 ms5 ms
frame (160 sample s) sub frame(40 sa mples)
fr ame n-1 frame n
t
Iw (n)
IIw (n)
Windowing and autocorrelation (3)
o 12.2 kbits/s
n The autocorrelation is computed by
r k s n s n k kacn k
( ) ' ( ) ' ( ) , , , ,= − ==
∑239
0 10 K
o 12.2 kbits/sn A 60Hz bandwidth expansion used by lag
windowing the autocorrelation
n Where f0=60Hz is the bandwidth expansion and fs=8000Hz is the sampling frequency
Windowing and autocorrelation (4)
( )
,10,1 ),()()('
10,1,2
21
exp2
0
K
K
==
=
−=
kkwkrkr
if
ifiw
lagacac
slag
π
Windowing and autocorrelation (5)
o 12.2 kbits/sn A white noise correction factor is usedn It is equivalent to adding a noise floor of
-40dB
r rac ac' ( ) . ( )0 1 0001 0=
7
Windowing and autocorrelation (and 6)
o All other kbits/s modes
n Once per frame analysis using wII(n) window with L1=200 and L2=40
n Autocorrelation of the windowed speech and a 60Hz bandwidth expansion
n White noise correction factor of -40dB
Levinson-Durbin
[ ]( )
end)1()1()(
end
do 1 to1for
1
)('
1
do 10 to1for
)0(')0(
2
)1()1()(
)(
1
0)1(
)1(0
−−=
+=
−==
−
−−=
=
=
=
−−
−
−
=−
−
∑
iEkiE
akaa
ijka
iE
jirak
a
i
rE
i
ijii
ij
ij
ii
i
i
j acij
i
i
ac
LP to LSP conversion
o The linear prediction coefficients (LP) are converted to the line spectral pair (LSP) representation for quantization and interpolation purposes
o LSPs are defined as the roots of
( ) ( ) ( )( ) ( ) ( )111
2
1111
−−
−−
−=′
+=′
zAzzAzF
zAzzAzF
LP to LSP conversion (2)
o F1’(z) is symmetric and F2
’(z) is anti-symmetric
o All these roots are on the unit circle and they alternate each other
o F1’(z) corresponds to the vocal tract
with the glotis closed and F2’(z) with
the glotis open
8
LP to LSP conversion (3)
o F1’(z) has a root z=-1 (ω=π)
o F2’(z) has a root z=1 (ω=0)
o To eliminate these two roots we define
( ) ( ) ( )( ) ( ) ( )1
22
111
1
1−
−
−′=
+′=
zzFzF
zzFzF
o Each polynomial has 5 conjugate root on the unit circle, therefore they can be written as
o Where qi=cos(ω i) with ω i being the line spectral frequencies (LSF) and qi are the LSP in the cosine domain
LP to LSP conversion (4)
( ) ( )
( ) ( )∏
∏
=
−−
=
−−
+−=
+−=
10,,4,2
212
9,,3,1
211
21
21
K
K
ii
ii
zzqzF
zzqzF
LP to LSP conversion (5)
o The LSP are found by evaluating F1(z) and F2(z) at 60 points equally spaced between 0 and π and checking for sign changes
o The sign change interval is then divided by 4 to better track the root
LP to LSP conversion (6)
o LPC vs. LSP
9
LP to LSP conversion (and 7) LSP coefficients quantization
o 12.2 kbits/sn The two sets of LP coefficients are
quantified using the LSP representation in the frequency domain
( )ff
q iis
i= =2
1 10π
arccos , , , ,K
LSP coefficients quantization (2)
o 12.2 kbits/sn A 1st order MA prediction is applied and
the prediction residual vectors are given by
n Where z(n) are the mean-removed LSF vectors and p(n) is the predicted vector
( ) ( ) ( )( ) ( ) ( )
r z pr z p
( ) ( )
( ) ( ),,
1 1
2 2n n nn n n
= −= −
and
( ) ( )p rn n= −0 65 12. $ ( )
LSP coefficients quantization (3)
o 12.2 kbits/sn The two residual vectors are jointly
quantified using split matrix quantization (SMQ)
n The matrix (r(1) r(2)) is split into 5 submatrices of 2x2
n The 5 submatrices are quantified with 7, 8, 8+1, 8 and 6 bits respectively (the third matrix uses a signed codebook)
10
LSP coefficients quantization (4)
o All other kbits/s modesn The LP coefficients are quantified using
the representation in the frequency domain
n A 1st order MA prediction is applied
LSP coefficients quantization (and 5)
o All other kbits/s modesn The residual vector is split into 3 subvectors of
dimensions 3, 3 and 4
7884.75 kbit/s
7885.15 kbit/s
9985.90 kbit/s
9986.70 kbit/s
9987.40 kbit/s
9997.95 kbit/s
99810.2 kbit/s
Subvector 3Subvector 2Subvector 1Mode
LSP interpolation
o 12.2 kbits/sn The two sets of LP parameters are used
for the 2nd and 4th subframen 1st and 3rd subframes use a linear
interpolation of the parameters in the adjacent subframes
$ . $ . $ ,$ . $ . $ .
( ) ( ) ( )
( ) ( ) ( )q q q
q q q1 4
12
3 2 4
05 0 5
05 05
n n n
n n n= += +
−
LSP interpolation (and 2)
)(4
)1(4
)(3
)(4
)1(4
)(2
)(4
)1(4
)(1
ˆ75.0ˆ25.0ˆ
ˆ5.0ˆ5.0ˆ
ˆ25.0ˆ75.0ˆ
nnn
nnn
nnn
qqq
qqq
qqq
+=
+=
+=
−
−
−
oAll other kbits/s modesnLP parameters are used for the 4th
subframen1st, 2nd and 3rd subframes use a linear
interpolation of the parameters in the adjacent subframes
11
LSP to LP conversion
o Once the LP are quantified and interpolated, they are converted back to the LP coefficients domain
( )( )
( ) ( ) ( )
( ) ( ) ( ) ( )
endend
212 1 down to 1for
2212 5 to1for
1001
111211
11121
1
1
−+−−=−=
−+−−===
=−
−
−
jfjfqjfjfij
ififqifi
ff
i
i
LSP to LP conversion (2)
o f1’(i) and f2’(i) are found by
o And the LP coefficients by
( ) ( ) ( )( ) ( ) ( )′ = + − =′ = − − =
f i f i f i if i f i f i i
1 1 1
2 2 2
1 1 51 1 5, , ,, , ,
KK
( ) ( )( ) ( )a
f i f i i
f i f i ii =′ + ′ =
′ − − ′ − =
05 05 1 5
05 11 0 5 11 6 101 2
1 2
. . , , ,
. . , , ,
K
K
Monitoring resonance
o Resonances in the LPC filter are monitored to detect possible problems
Open-loop pitch analysis
o How to determine the location and height of the impulses?
o Based on analysis-by-synthesiso Filtering of the input signal with a
perceptual weighting filter
( ) ( ) ( )
( ) ( ) ( ) ( ) 1,,0,10
12
10
11
21
−=−−−+=
=
∑∑==
Lninsainsansns
zAzAzW
iw
ii
i
iiw Kγγ
γγ
12
Open-loop pitch analysis (2)
o 12.2 kbits/sn Search for the maxima of the correlation
in three ranges
n And normalized by dividing by
O s n s n kk w wn
= −=∑ ( ) ( )
0
79
.143,,721,71,,362,35,,183
KKK
===
iii
s n t iw in2 ( ),− =∑ 1, ,3K
Open-loop pitch analysis (3)
o The best open-loop delay is found by
( )( )
( )
( )( )
T t
M T M
if M M T
M T MT t
endif M M T
M T MT t
end
op
op
op
op
op
op
op
op
=
=
>
==
>
==
1
1
2
2
2
3
3
3
0 85
0 85
.
.
Open-loop pitch analysis (4)
o 10.2 kbits/sn Twice per frame (every 10ms)n The correlation is determined by
n Windown Low pitch lag (tables)n Previous frame lago dL=20o Told median 5 previous
( ) ( ) ( ) ( )C d s n s n d w d dw wn
= − ==∑
0
79
20 143, , ,K
( ) ( ) ( )w d w d w dl n=( ) ( )w d cw dl =
( ) ( )w d
cw T d d vn
old L=− + >
, . ,
. ,
0 3
1 0 otherwise,
Open-loop pitch analysis (5)
o 7.95, 7.40, 6.70, 5.90 kbits/sn Twice per frame (each 10ms)
n And normalized by
n Maxima and delays are (Mi,t i), i=1,2,3
O s n s n kk w wn
= −=∑ ( ) ( )
0
79
.143,,801,79,,402,39,,203
KKK
===
iii
s n t iw in2 ( ),− =∑ 1, ,3K
13
Open-loop pitch analysis (6)
o The best open-loop delay is found by
( )( )
( )
( )( )
T t
M T M
if M M T
M T MT t
endif M M T
M T MT t
end
op
op
op
op
op
op
op
op
=
=
>
==
>
==
1
1
2
2
2
3
3
3
0 85
0 85
.
.
Open-loop pitch analysis (7)
o 5.15, 4.75 kbits/sn Once per frame (each 20ms)
n And normalized by
n Maxima and delays are (Mi,t i), i=1,2,3
O s n s n kk w wn
= −=∑ ( ) ( )
0
79
.143,,801,79,,402,39,,203
KKK
===
iii
s n t iw in2 ( ),− =∑ 1, ,3K
Open-loop pitch analysis (and 8)
o The best open-loop delay is found by
( )( )
( )
( )( )
T t
M T M
if M M T
M T MT t
endif M M T
M T MT t
end
op
op
op
op
op
op
op
op
=
=
>
==
>
==
1
1
2
2
2
3
3
3
0 85
0 85
.
.
Impulse response
o The impulse response of the weighted synthesis filter is computed each subframe
o This impulse response will be used for the search of codebooks
( ) ( ) ( ) ( ) ( )[ ]H z W z A z A z A z= γ γ1 2$
14
Adaptive codebook
o Adaptive codebook search is performed on a subframe basis
o The parameters are the delay and gain of the pitch filter
o The codebook contain entries taken from the previously synthesized excitation signal
Algebraic codebook
o Based on interleaved single -pulse permutation (ISPP) designn A few sparse impulse sequences that are
phase-shifted versions of each othern All the pulses have the same magnituden Amplitudes are +1 or -1
Algebraic codebook (and 2)o 12.2 kbits/sn 10 non-zero pulses
o 10.2 kbits/sn 8 non-zero pulses
o 7.95, 7.40 kbits/sn 4 non-zero pulses
o 6.70 kbits/sn 3 non-zero pulses
o 5.90, 5.15, 4.75 kbits/sn 2 non-zero pulses
CELP synthesis modelo Code-excited linear predictive coding model
A(z)1 s(n)^
+
v(n)
c(n)
u(n)
gc
fixedcodebook
adaptive codebook gp
LP synthesis
post-filtering s'(n)^
15
CELP synthesis model (and 2)
o To reconstruct speechn A noise-like excitation modeln A pitch filter model of the glottal
vibrationsn A linear prediction filter model of the
vocal tract
Speech decoder
LSPin dices
d ecode LSP
int erpo latio n of LSP fo r th e4 su bframe s
LSP
d ecodead aptiv eco deboo k
deco deinn ovati veco debo ok
p itchi ndex
c odein dex
deco degains
A(z)^
co nstructexcit ation
frame subfram e post-processing
s'(n)^s(n)^p ost f ilter
g ainsindi ces
sy nthesisf ilter
Speech decoder (2)
o Decodingn LP parametersn Adaptive codebook vectorn Adaptive codebook gainn Innovative codebook vectorn Innovative codebook gain
o Smoothing of the fixed codebook gaino Anti-sparseness processingo Speech synthesis
Adaptive codevector
o The received pitch index is used to find the integer and fractional parts of the pitch lag
o The adaptive codebook vector v(n) is obtained by interpolating the past excitation u(n) at the pitch delay
o The received index is used to find the quantised adaptive codebook gain gpfrom the quantisation table
16
Algebraic codebook
o The received index is used to extract the positions and amplitude signs of the excitation pulses and to find the algebraic code vector c(n)
o The received index is used to compute the quantised fixed codebook gain gc
Reconstructing speech
o The input excitation is
o The excitation is filtered by the LP filter
( ) ( ) ( )ncgnvgnu cp +=
( ) ( ) ( )zUzA
zY 1=