Speech Processing
description
Transcript of Speech Processing
Speech Processing
Homomorphic Signal Processing
April 19, 2023 Veton Këpuska 2
Outline
Principles of Homomorphic Signal Processing
Details of Homomorphic Processing
Variants of Homomorphic Processing
Investigation of Homomorphic systems to speech analysis and synthesis
April 19, 2023 Veton Këpuska 3
Principles of Homomorphic Processing
Superposition Property of Linear Systems:
Lx1[n]
x2[n]
x[n]L(x[n])
Lx1[n]
x2[n]
a1L(x1[n])
L(x[n])
L a2L(x2[n])
nxLanxLanxanxaL
nxLnxL
nxLnxLnxnxL
22112211
2121
a1
a2
a2
a1
April 19, 2023 Veton Këpuska 4
Principles of Homomorphic Processing
Example 6.1: If signals fall in non-overlapping frequency bands
then they are separable. x[n]=x1[n]+x2[n]
X1()=ℱ{x1[n]} & X1() [0,/2],
X2()=ℱ{x2[n]} & X2() [/2, ],
y[n] = h[n] * (x1[n]+x2[n]) = h[n] * x1[n] + h[n] * x2[n]
y[n] = h[n] * x2[n] = x2[n]
0 for ∈[0,/2]
1 for ∈[/2, ]
April 19, 2023 Veton Këpuska 5
Generalized Superposition Concept that would support separation of nonlinearly
combined signals. Leads to the notion of Generalized Linear
Filtering.
Properties: H(x1[n]□x2[n])=H(x1[n])○H(x2[n]) H(c:x [n])=c◈H(x [n])
Systems that satisfy those two properties are referred to as homomorphic systems and are said to satisfy a generalized principle of superposition.
Principles of Homomorphic Processing
H()x[n]□
Input rule
: y[n]○
Output rule
◈
April 19, 2023 Veton Këpuska 6
Principles of Homomorphic Processing
Importance of homomorphic systems for speech processing lies in their capability of transforming nonlinearly combined signals to additively combined signals so that linear filtering can be performed on them.
Homomorphic systems can be expressed as a cascade of three homomorphic sub-systems depicted in the figure below – referred to as the canonic representation:
H
D□x[n]
□
:+. y[n]L
+. .
+D○
○+
. ◈-1
I II III
nx ny
April 19, 2023 Veton Këpuska 7
Canonic Representation of a Homomorphic System
i. The Characteristic System: Transforms □ into add “+”
ii. The linear system: transforms “add” into “add”
iii. The inverse system: transforms add into ○
D□x[n]
□
:+.
I nx
L+. .
+ nx nyII
y[n]D○
○+
. ◈-1
III
ny
April 19, 2023 Veton Këpuska 8
Homomorphic Systems
Let the goal be removal of undesired component of the signal (e.g., noise):
Type of combination rule
System Operation
Signal & Additive noise
Linear System Linear Filtering
Signal & Multiplicative noise
Multiplicative System
Multiplicative Filtering
Signal & Convolutional Noise
Convolutional System
Convolutional Filtering
April 19, 2023 Veton Këpuska 9
Multiplicative Homomorphic Systems
Consider Homomorphic Multiplicative System depicted below:
Use D□ to convert MULT into ADD. Use D○ to convert ADD into MULT.
Which rule (operation) transforms MULT into ADD?
M[]x[n]● ●
y[n]
-1
D●x[n]
● +y[n]L
+ +D●
●+ -1
I II III
nx ny
April 19, 2023 Veton Këpuska 10
Multiplicative Homomorphic Systems
If x[n]=x1[n]●x2[n], and x1[n]>0 & x2[n]>0 for all n
Then log(x1[n]●x2[n])=log(x1[n])+log(x2[n])
However, x[n] may not be always positive. Generalization to complex signals:
x[n]=|x[n]|ejarg(x[n])
which requires definition of complex log operator.
April 19, 2023 Veton Këpuska 11
Multiplicative Homomorphic Systems
An implementation of multiplicative Homomorphic System:
Definition: Complex log:
Complex exp.(Inverse operation)
Complex logx[n]
● +y[n]Linear
System
+ + Complex Exp.
●+
I II III
nx ny
nxjnxnx argloglog
nxjnxnx eee argloglog
April 19, 2023 Veton Këpuska 12
Homomorphic Systems for Convolution Consider Homomorphic System for Convolution depicted below:
Use D□ to convert “ *” into ADD. Use D○ to convert ADD into “ *” .
How to transform “*” into ADD?
C[]x[n]* *
y[n]
D ** +
y[n]L+ +
D **+ -1
I II III
nx ny
x[n]
C
April 19, 2023 Veton Këpuska 13
Homomorphic Systems for Convolution
Let x[n]=x1[n]*x2[n]
Inverse Operation
I.
З[]* ●
log[]● +
З-1[]++
zX zX
x[n] nx
D *
time “time”
III.
З[]+ +
exp[]+ ●
З-1[]*●
zY zY
D *
“time”
nyy[n]
-1
April 19, 2023 Veton Këpuska 14
Homomorphic Systems for Convolution
For x[n]=x1[n]*x2[n]:
1. X(z)=X1(z)X2(z)
2. Log(X(z))=Log(X1(z)X2(z))= Log(X1(z))+Log(X2(z))Complex logarithm. This operation requires special handling because:
X(z) > 0 For complex X(z) phase is not uniquely defined (i.e., multiple of
2) X(z) has to be defined on unit circle (e.g., Z transform of a
stable sequence).
In practice operate on unit circle z=ej. Fourier Transform:
j1
jjjj
eXnx
eXjeXeXeX
ˆˆ
argloglogˆ
April 19, 2023 Veton Këpuska 15
Homomorphic Systems for Convolution
Two cases are possible in computing :1. Complex Cepstrum (CC):
2. Real Cepstrum (RC):
nx
jj eXjeXnx arglogˆ 1
jeXnc log1
April 19, 2023 Veton Këpuska 16
Homomorphic Systems for Convolution
Example 6.3 Consider a sequence x[n] consisting of a system impulse response h[n] convolved with an impulse train p[n]:
Goal is to estimate h[n]. First form canonical representation for convolution:
If D* is such that p[n] remains train of pulses, and h[n] falls between impulses then separation is possible.
h[]p[n] x[n]
k
k kPnanp x[n]=h[n]*p[n]
npnhnpDnhDnxDnx ˆˆˆ
^^
April 19, 2023 Veton Këpuska 17
Example 6.3 (cont.)
Let L denote such operation (i.e., rectangular window that would separate p[n] from h[n]).
nhnpLnhLnpnhLnxLny ˆˆˆˆˆˆˆ
^ ^
0
nyDnh ˆ1*
April 19, 2023 Veton Këpuska 18
Example 6.4
a,b real and positive:⇒ log(ab) = log(a)+log(b)
a,b real but b<0⇒ log(ab) = log(a|b|ejk)=log(a)+log(|b|)+jk, k=1,3,5,… log(ab) is ambiguous.
This example indicates that special consideration must be made in defining the logarithm operator for complex X(z) in order to make the logarithm of the product the sum of logarithms.
April 19, 2023 Veton Këpuska 19
Homomorphic Systems for Convolution-Complex Logarithm
Suppose that X(z) is evaluated on the unit circle (z=ej)
Let x[n]=x1[n]*x2[n] ⇒ X()=X1() X2()
Consider then complex log of X():
Considering that X()=X1() X2() then:
XjXeXX Xj logloglog
2121
21
2121
loglog
loglog
loglogloglog22
XXjXX
eXeX
XXXXXXjXj
April 19, 2023 Veton Këpuska 20
Homomorphic Systems for Convolution-Complex Logarithm
In the previous expression the following was assumed:
Also:
Expression generally does not hold due to the ambiguity in the definition of phase:
0 & 0 if holds Expression
loglog
loglog
21
21
21
XX
XX
XXX
21
21
XX
XXX
kXPVX 2
April 19, 2023 Veton Këpuska 21
Homomorphic Systems for Convolution-Complex Logarithm
Note that: PV denotes principal value of the phase which falls in the interval
[-,]. Arbitrary multiple of 2 can be added to the principal phase value Thus additive property generally does not hold.
How to impose uniqueness?1. Force continuity of phase:
Select k such that ∠X()=PV[∠X()]+ 2k is a continuous function. Figure 6.5 (next slide).
2. Phase derivative approach:
It can be shown that:
ωXdω
d ωX, where dβXωX
ω
0
2ωX
ωXωXωXωXωX
dω
d ωX irir
April 19, 2023 Veton Këpuska 22
Fourier Transform Phase Continuity
April 19, 2023 Veton Këpuska 23
Homomorphic Systems for Convolution
Relationship of complex cepstrum to real cepstrum c[n]: If x[n] real then:
|X()| is real and even and thus log[|X()|] is real and even ∠X() is odd, and hence
is referred to as the complex cepstrum. Even component of the complex cepstrum, c[n] is referred to
as the real cepstrum.
2
ˆˆ nxnxnc
nx
deXnx njlog2
1ˆ
nx
April 19, 2023 Veton Këpuska 24
Complex Cepstrum of Speech-Like Sequences
Sequences with Rational z-Transform: General form the class of sequences is given below:
Mi, Ni – are zeros and poles inside the unit circle. Mo, No – are zeros and poles outside the unit circle. |ak|, |bk|, |ck|, |dk| are all < 1 ⇒ Thus there are no singularities on the unit circle. A > 0.
oi
oi
N
kk
N
kk
M
kk
M
kk
r
zdzc
zbzaAzzX
11
1
11
1
11
11
April 19, 2023 Veton Këpuska 25
Complex Cepstrum of Speech-Like Sequences
Applying complex logarithm gives:
is a z-transform of sequence
Want inverse z-transform to be absolutely summable ⇒ ROC of must include unit circle, |z|=1.
This condition is equivalent to having all constituent elements of have ROC’s that include unit circle, |z|=1
oioi N
kk
N
kk
M
kk
M
kk zdzczbzaA
zXzX
11
1
11
1 1log1log1log1loglog
logˆ
nx nX
zXnx ˆˆ 1
zX
zX
April 19, 2023 Veton Këpuska 26
Complex Cepstrum of Speech-Like Sequences
In order to obtain ROC for expressions of the form: log(1-z-1) log(1-z)
they are expressed in a power series expansion:
1
1
11
1 ,1log
1 ,1log
n
nn
n
nn
zzn
z
zzn
z
1
Im
Re
Z-plane
ROC for log(1-z-1)
1/
1
Im
Re
Z-plane
ROC for log(1- z)
April 19, 2023 Veton Këpuska 27
Complex Cepstrum of Speech-Like Sequences
The ROC of is therefore given by an annulus defined by the poles & zeros of X(z) closes to the unit circle:
1
Im
Re
Z-plane
ROC for typical rational X(z)
zX
April 19, 2023 Veton Këpuska 28
Complex Cepstrum of Speech-Like Sequences
Complex cepstrum associated with rational X(z) can be therefore expressed as:
nx
11logˆ1111
nun
d
n
bnu
n
c
n
anAnx
ooii N
k
nk
M
k
nk
N
k
nk
M
k
nk
11
11
11
11
zaaz
zbbzAzX
April 19, 2023 Veton Këpuska 29
Example 6.5
Let:
where a, b, c, are real and <1. The ROC of X(z) includes unit circle so that x[n] is stable. A delay z-r corresponds to a shift in the sequence. Thus complex cepstrum is given by:
1
1
1
11
cz
bzazzzX r
rnnn
znun
bnu
n
c
n
anx
log11ˆ 1
April 19, 2023 Veton Këpuska 30
Example 6.5 (cont.)
The inverse z-transform of the shift term is given by:
Contribution of z-r term is significant. On the unit circle: z-r=e-jr=1∠-r contributes a
linear ramp to the phase and thus for a large shift r, dominates the phase representation and gives a large discontinuity at and -.
0, 0
0,cos
log1
n
nn
nrz r
April 19, 2023 Veton Këpuska 31
Complex Cepstrum of Speech-Like Sequences
Relation of complex cepstrum and real cepstrum for x[n] with rational z-transform that is minimum phase:
Complex cepstrum of a minimum-phase sequence with a rational z-transform is right-sided:
0, 0
0, 2
0, 1
ˆ
nnl
nnl
nnl
ncnlnx
2
ˆˆ nxnxnc
April 19, 2023 Veton Këpuska 32
Impulse Train Convolved with Rational z-Transform Sequences
Second class of sequences of interest in the speech context is the train of uniformly-spaced unit samples with varying weights and its interaction with the system:
h[n]p[n] x[n]
Q
rk rNnnp
0
x[n]=h[n]*p[n]
Q
r
rNr
Q
rr zα zPrNnnp
00
Z
1
0
1
00
1Q
r
Nr
Q
r
rNr
Q
r
rNk zazαzα zP
April 19, 2023 Veton Këpuska 33
Impulse Trans Convolved with Rational z-Transform Sequences
If p[n] is minimum phase and |ar(zN)-1|<1, zeros are inside the unit circle, log[P(z)] can be expressed as:
Thus is an infinite right-sided sequence of impulses spaced N-samples apart.
Note that in general for non-minimum phase sequences the complex cepstrum is two-sided with uniformly spaced impulses.
1
0 1
1
0
11log log
Q
r k
kNkr
Q
r
Nr z
k
azazP
zPnp logˆ 1
April 19, 2023 Veton Këpuska 34
Example 6.6
Consider a sequence x[n]=h[n]*p[n] where z-transform of h[n] is given by:
b,b*, and c, c* are complexconjugate pairs.
Consider p[n] to be train ofperiodic pulses then:
11
11
11
11
zaaz
zbbzAzH
1
Im
Re
Z-plane
a
b
b*
a*
h[n]p[n] x[n]
0
k
k kPnnp x[n]=h[n]*p[n]
April 19, 2023 Veton Këpuska 35
Example 6.6 (cont)
If ∈ and ||<1 then p[n] is train of decaying exponentials:
Z-transform of p[n] is given by:
Then, as derived earlier:
…
1p[n]
n
0k
kPkP zzP
npnhnx ˆˆˆ
April 19, 2023 Veton Këpuska 36
Example 6.6 (cont)
h[n]p[n]
April 19, 2023 Veton Këpuska 37
Homomorphic Filtering
In the cepstral domain: Pseudo-time Quefrency Low Quefrency Slowly varying components. High Quefrency Fast varying components.
Removal of unwanted components (i.e., filtering) can be attempted in the cepstral domain (on the signal , in which case filtering is referred to as liftering):
When the complex cestrum of h[n] resides in a quefrency interval less than a pitch period, then the two components can be separated form each other.
nx
April 19, 2023 Veton Këpuska 38
Homomorphic Filtering
If log[X()] Is viewed as a “time signal” Consisting of low-frequency and high-frequency
contributions. Separation of this signal with a high-pass/low-pass
filter.
One implementation of low pass filter:
D *
* +y[n]l[n]
+ +D *
*+ -1
nx ny
x[n]=h[n]*p[n]
April 19, 2023 Veton Këpuska 39
Homomorphic Filtering
Alternate view of “liftering” operation: Filtering operation L() applied in the log-spectral domain
Interchange of time and frequency domain by viewing the frequency-domain signal log[X()] as a time signal to be filtered. ⇒ “Cepstrum” can be thought of as spectrum of log[X ()] Time axes of is referred to as “quefrency” Filter l[n] as the “lifter”.
F-1 y[n]l[n] F-1
nx nyx[n]=
h[n]*p[n] F log F exp
X()^ Y()^
L()
nx
April 19, 2023 Veton Këpuska 40
Homomorphic Filtering
Three elements in the doted lines of previous figure can be replaced by L(), which can be viewed as a smoothing function:
XLY logˆ
y[n]L() F-1x[n]=h[n]*p[n] F log exp
X()^ Y()^
April 19, 2023 Veton Këpuska 41
Practical Implementation Issues
Use FFT and IFFT for Fourier Transformations. X() is computed by:
log|X()| computed as
And for x[n] use
N
n
N
knj
enxkX0
2
kXjkXkXkX loglogˆ
^
N
knjN
kN ekX
Nnx
21
0
ˆ1][ˆ
April 19, 2023 Veton Këpuska 42
Practical Implementation Issues
1. Cepstrum x[n] is infinitely long thus xN[n] is aliased version of x[n]. That is:
Thus it is necessary to use a largest N as possible2. Phase component j∠X(k) must be properly
unwrapped to ensure phase continuity.
Goal to determine r[k] so that ∠X(k) is continuous.
r
N rNnxnx ][ˆ][ˆ
^ ^
^
krkXPVkX 2
April 19, 2023 Veton Këpuska 43
Modulo 2 Phase Unwrapper
Goal is to determine r[k] so that X(k) is continuous
2/N
-
PrincipalValue PV
PV[X()] PV[X(k)]
Phase Representation in Discrete Complex Spectrum
April 19, 2023 Veton Këpuska 44
Modulo 2 Phase Unwrapper Algorithm:
If PV[X(k)]-PV[X(k-1)]>2- r[k]=r[k-1]-1 # Subtract 2
Else if PV[X(k)]-PV[X(k-1)]<2- r[k]=r[k-1]+1 # Add 2
Else r[k]=r[k-1] # Do not change
End
Note: Even with fine grid of (determined by N) 2/N, it is possible that subsequent PV samples may be more than 2 rad apart (case of poles/zeros close together).
April 19, 2023 Veton Këpuska 45
Phase Derivate-Based Phase Unwrapper
The phase derivative is uniquely defined by:
Then:
However, since only X(k) is available must estimate from discrete values.
2ωX
ωXωXωXωXωX
dω
d ωX irir
dX ωX 0
ωX
April 19, 2023 Veton Këpuska 46
Phase Derivate-Based Phase Unwrapper
Re-state the Problem:
Where q(k) is an integer-valued function.
Assuming that phase has been correctly unwrapped up-to k-1
with the value (k-1) then:
An approximation:
Select value of q(k) such that E[k] is minimized:
over q(k).
kkk qXPVX 2
k
k
dkk
1
1
11
1 2
kkkk
kk
kkk qXPVkE ˆ2
April 19, 2023 Veton Këpuska 47
Example
April 19, 2023 Veton Këpuska 48
Short-Time Homomorphic Analysis of Periodic Sequences
Recall Source-System model of speech production:
For voiced speech p[n] is quasi-periodic:
For unvoiced speech p[n] is noise-like. In practice a periodic waveform is windowed by a finite-
length sequence w[n]:
s[n]=w[n]x[n]=w[n](p[n]*h[n]) Approximation to s[n]:
h[n]p[n] x[n]= h[n]*p[n]
0
k
k kPnnp
][])[][( ][~ nhnpnwnx
April 19, 2023 Veton Këpuska 49
Short-Time Homomorphic Analysis of Periodic Sequences
If w[n] is smooth relative to h[n], that is, P large enough so that h[n-kP] do not substantially overlap, then:
Then, Cepstrum of s[n] is:
where is complex cepstrum of w[n]p[n].
Can show that:
D[n] – weighting function depending on w[n].
][ˆ][ˆ][ nhnpns
[n] [n][n][n] [n]~ shpwx
][ˆ np
k
kPnhnDnpns ][ˆ][][ˆ][ …………()
April 19, 2023 Veton Këpuska 50
Short-Time Homomorphic Analysis of Periodic Sequences
Cepstral Domain (Quefrency) Perspective
Under what conditions can we perform deconvolution? Cepstral Domain (Quefrency) Perspective
Let x[n], a voiced speech signal, produced by an infinite train of periodic impulses:
Thus the only samples in X() and log[X()] are defined at multiples of the fundamental frequency o=2/P, i.e., k=(2/P)k
X(k) = P(k) H(k)
log[X(k)] = log[P(k)] + log[H(k)]
][][][
0
nhnpnx
kPnnpk
April 19, 2023 Veton Këpuska 51
Cepstral Domain (Quefrency) Perspective
In the cepstral domain, appear as a set of replicas of h[n] appearing at every kP.
Thus, aliasing is an issue and needs to be handled properly. That is, can this aliasing be prevented or at least minimized?
Consider:
s[n]=w[n]x[n]=w[n](p[n]*h[n])
k
kPnh ][^
F
WHPS 2
1
k
oo kWkHP
S 1
April 19, 2023 Veton Këpuska 52
Cepstral Domain (Quefrency) Perspective
Let’s rewrite s[n] as:
s[n] = (p[n]w[n])*g[n]where g[n] ≈ h[n].
Then:
Taking log of equations under and , and solving for log[G()] the following is obtained:
GkWP
Sk
o
1
ko
koo kWkWkHG log loglog
………(1)
April 19, 2023 Veton Këpuska 53
Cepstral Domain (Quefrency) Perspective
To simplify, assume W() has only one main lobe of rectangular window:
That is:with wo=2/P
otherwise
Wo
,02
,1
April 19, 2023 Veton Këpuska 54
Cepstral Domain (Quefrency) Perspective
Thus second log term becomes zero:
ko
koo kWkWkHG log loglog
0
………(2)
koo
koo
koo
kkHW
kWkH
kWkHG
log
log
loglog
April 19, 2023 Veton Këpuska 55
Cepstral Domain (Quefrency) Perspective
From (1) and (2) we can write:
where is the complex cepstrum of p[n]w[n], and
ngnpns ˆˆˆ np
k
kPnhnwGng logˆ 1
Quefrency
…………()
April 19, 2023 Veton Këpuska 56
Cepstral Domain (Quefrency) Perspective
Last equation () is a special case of Equation () with D[n]=w[n].
As with purely convolutional model:the contributions of the windowed pulse train and impulse response are additively combined so that deconvolution is possible.
Now the impulse response contribution is repeated at the pitch period rate. This aliasing is: Dependent upon pitch, and is different from aliasing
due to an Insufficient DFT length (see section 6.4.4).
][])[][( ][~ nhnpnwnx
April 19, 2023 Veton Këpuska 57
Cepstral Domain (Quefrency) Perspective
Conditions under which: s[n]≈(w[n]p[n])*h[n]
1. w[n] – time domain window, should be long enough so that D[n] should be smooth over |n|<P over the extent of h[n].
2. w[n] – should be short enough to reduce contribution of replicas of h[n]. In practice w[n] is Hamming window of 2-3 pitch periods long.
3. w[n] should be centered at time origin, n=0, aligned with h[n].
Under those conditions for low-time lifter (filter in cepstral domain), l[n] of the length |n|<P/2
That is, complex cepstrum is close to that derived form conventional model.
Note that with high-pitched speakers there is stronger presence of p[n] close to the origin (as noted earlier) as well as more aliasing of replicas of h[n].
^
^
][])[][( ][~ nhnpnwnx
^
April 19, 2023 Veton Këpuska 58
Frequency Domain Perspective
Let x[n] where:
Then: X(k)=P(k) H(k)
Where X(k) represents line spectrum at k=(2/P)k.
Question arises: Under what conditions the window properties would lead:
the output to be close to actual:
s[n]=w[n]x[n]=w[n](p[n]*h[n])?
][][][
0
nhnpnx
kPnnpk
][])[][( ][~ nhnpnwnx
April 19, 2023 Veton Këpuska 59
Frequency Domain Perspective Define an error measure E() that would reflect degradation in the
frequency domain:
Want to minimize:
It was found empirically that for Hamming window this spectral distance measure is minimized for window length in the range of roughly 2-3 pitch periods.
An implication of this result is that the length of the analysis window should be adapted to the pitch period to make the windowed waveform as close as possible (in the sense described above) to the desired convolutional model.
X
SE ~
dED2
log2
1
April 19, 2023 Veton Këpuska 60
Short-Time Speech Analysis
Complex Cepstrum of Voiced Speech Recall:
H(z)=AG(z)V(z)RL(z)
The output speech then is:
GainGlottalModel
Vocaltract
Model
LipRadiation
Model
][
][][][][][][][nh
l nrnvngnApnhnpnx
April 19, 2023 Veton Këpuska 61
Complex Cepstrum of Voiced Speech
General form for stable V(z):
Zeros inside & outside the unit circle Poles inside the unit circle
Goal is to separate h[n] from p[n]. Let s[n]=w[n](p[n]*h[n]) be approximately equal to
i
i o
N
kk
M
k
M
kkk
zc
zbzazV
1
1
1 1
1
1
11
][])[][( ][~ nhnpnwnx
April 19, 2023 Veton Këpuska 62
Complex Cepstrum of Voiced Speech
Recall that x[n]≈s[n] if window is 2-3 pitch-periods long and its center aligned with h[n].
Using the DFT of order N the following denotes discrete complex cepstrum:
For a typical speaker the duration of the short-time window lies in the range of 20ms-40ms.
Assuming that: Source and systems components lie roughly in separate
quefrency regions Negligible aliasing of the replicas of h[n] Most of the h[n] occurs within P/2 from origin Distortion function D[n] is smooth in the same range for |n|<P/2
and thus it makes other higher order replicas negligible for |n|>P/2.
Then, applying a cepstral lifter function:
~
][ˆ][ˆ][ˆ nhnpns NNN
^^
April 19, 2023 Veton Këpuska 63
Complex Cepstrum of Voiced Speech
Low-Quefrency lifter:
to separate h[n] from p[n]. Similarly high-quefrency lifter can be used to produce
the input train pulse (pitch estimation).
elsewhere
Pnnl
,02
,1 ][
elsewhere
Pnnl
,12
,0 ][
^
April 19, 2023 Veton Këpuska 64
Example 6.11
Voiced female speech with pitch period of 5 ms.
Sampling rate fs=10kHz. Hamming window of 15 ms. A 1024 point FFT/IFFT is used to
obtain discrete complex cepstrum. Center window on h[n] (more about
that latter).
April 19, 2023 Veton Këpuska 65
Example 6.11
April 19, 2023 Veton Këpuska 66
Example 6.11Maximum
Phase
Minimum Phase
Maximum Phase
Minimum Phase
April 19, 2023 Veton Këpuska 67
Complex Cepstrum of Unvoiced Speech
Recall the transfer function model for the unvoiced speech:
H(z) = AV(z)R(z)
In contrast to the voiced case, there is no glottal volume velocity contribution.
Resulting speech waveform in time domain:x[n]=u[n]*h[n]=u[n]*v[n]*r[n]
Resulting signal after applying short time analysis window:
s[n]=w[n](u[n]*h[n])
White noise
April 19, 2023 Veton Këpuska 68
Complex Cepstrum of Unvoiced Speech
Similarly to the arguments applied for voiced speech: Duration of the analysis window w[n] is selected so
that the formant of the unvoiced speech power spectral density are not significantly broadened
w[n] is sufficiently smooth so as to be as nearly constant over h[n] the following can be assumed:
s[n]≈(w[n]u[n])*h[n]
Defining the windowed white noise as q[n] = u[n]w[n], and
Computing discrete complex cepstrum with N-point DFT
April 19, 2023 Veton Këpuska 69
Complex Cepstrum of Unvoiced Speech
qN[n] – the discrete complex cepstrum of the noise source covers all quefrencies, and thus separation is not possible.
Phase unwrapping of noisy signals is very unreliable.
Real cepstrum is adequate for unvoiced speech (phase information not important for this case) resulting in minimum-phase versions of h[n].
Deconvolved excitation may contain interesting fine source structure for classes of sounds; e.g., voiced fricatives.
][ˆ][ˆ][ˆ nhnqns NNN
April 19, 2023 Veton Këpuska 70
Analysis/Synthesis Structure
In speech analysis underlying parameters of the speech model are estimated
In speech synthesis stage the waveform is reconstructed from the model parameters.
Liftering of low-quefrency region of the cepstrum ⇒ provides an estimate of the system impulse response
Liftering of high-quefrency region of the cepstrum ⇒ provides an estimate of source excitation signal.
Inverting the estimate of the source signal with homomorphic system to obtain excitation function.
Convolution of the two resulting component estimates yields the original short-time segment exactly.
1D
April 19, 2023 Veton Këpuska 71
Analysis/Synthesis Structure With an overlap-add reconstruction from the short-time
segments, the entire waveform is recovered. The homomorphic system performs transformation with
no information reduction. This process is analogous to reconstructing the
waveform, in linear prediction analysis/synthesis, from the convolution of the all-pole filter and the output of its inverse filter.
In speech coding and speech modification applications a more efficient representation is desired.
Complex or real cepstrum provides an approach to such a representation because pitch and voicing can be estimated from the peak (or lack of peak) in the high-quefrency region of the cepstrum.
April 19, 2023 Veton Këpuska 72
Zero and Minimum-Phase Synthesis
Assuming that we have a succinct and accurate characterization of the speech production source (as with linear prediction-based analysis/synthesis), able to synthesize an estimate of the speech
waveform.
This synthesis can be performed based on any one of several possible phase functions: Zero-phase, Minimum-phase, maximum-phase Mixed-phase functions
April 19, 2023 Veton Këpuska 73
Zero and Minimum-Phase Synthesis
General framework for homomorphic analysis/synthesis:
1024-pointReal Cepstrum
Analysis window of 10-20 ms
P/2
April 19, 2023 Veton Këpuska 74
Mixed-Phase Synthesis
Example 6.13
April 19, 2023 Veton Këpuska 75
Contrasting Linear Predication and Homomorphic Filtering
Homomorphic Filtering is viewed as an alternative to linear prediction.
Linear Prediction Homomorphic Filtering
Parametric Non-parametric
Sharp smooth resonances Wider spurious resonances
All-pole representation Poles and zeros can be represented.
Minimum-phase response estimate only
Minimum-phase as well as Mixed-phase if complex cepstrum is used.
Synthesized speech “crisper” but more “mechanical”
Synthesized speech more “natural” but “muffled”
April 19, 2023 Veton Këpuska 76
Contrasting Linear Predication and Homomorphic Filtering Similar problems with both methods:
Linear Prediction Homomorphic FilteringIncreased speech distortion with increasing pitch
Aliasing of the vocal tract impulse response at the pitch period repetition rate
Linear prediction windowing results in the prediction of nonzero values of the waveform from zeros outside the window.
Windowing a periodic waveform distorts the convolutional model.
Number of poles is required The length of the low-quefrency lifter must be chosen
Best window and order selection is often a function of the pitch of the speaker.
April 19, 2023 Veton Këpuska 77
Homomorphic Prediction
Number of speech analysis methods rely on combining homomorphic filtering with linear prediction and are referred to collectively as homomorphic prediction.
Two primary advantages of combining the methods:
1. By reducing the effects of waveform periodicity, an all-pole estimate suffers less from the effect of high-pitch aliasing.
2. By removing ambiguity in waveform alignment, zero estimation can be performed without the requirement of pitch-synchronous analysis.
April 19, 2023 Veton Këpuska 78
Homomorphic Prediction
Waveform Periodicity: Recall that for the waveform consisting of the
convolution of a short-time impulse train and an impulse response:
x[n]=p[n]*h[n] Autocorrelation function is given by the convolution
of the autocorrelation function of the response and that of the impulse train:
rx[]=rh[]*rp[] Thus, as the spacing between impulses (the pitch
period) decreases, the autocorrelation function of the impulse response suffers form increasing distortion.
April 19, 2023 Veton Këpuska 79
Homomorphic Prediction
Thus if spectrogram magnitude of h[n] can be estimated accurately then linear prediction analysis can be performed with an estimate of rh[] free of the waveform periodicity. This leads to the following idea:1. Use homomorphic filtering to deconvolve and
estimate of h[n] by low-pass liftering the real or complex cepstrum of x[n].
2. Use autocorrelation method on the resulting impulse response estimate by linear prediction analysis to obtain the model parameters.
April 19, 2023 Veton Këpuska 80
Example 6.14 Suppose h[n] is a minimum-phase all-pole sequence of
order p. Consider a waveform x[n] constructed by convolving h[n] with a sequence p[n] where:
p[n] = [n] + [n-N], with <1
Complex cepstrum of x[n] is given by:
Where and are the complex cepstra of p[n] and h[n], respectively.
The autocorrelation function is given by:
rx[] = (1+2)rh[] + rh[-N] + rh[+N] rx[] is rh[] distorted by its neighboring terms centered at
=+N and =-N.
][ˆ][ˆ][ˆ nhnpnx ][ˆ ][ˆ nhnp
April 19, 2023 Veton Këpuska 81
Homomorphic Prediction Important point of previous example:
The first p coefficients of the real cepstrum of x[n] are undistorted (if a long-enough DFT length is used in the computation)
The first p coefficients of the autocorrelation function rx[] of the waveform are distorted by aliasing of autocorrelation replicas (regardless of the DFT length)
Cepstral lowpass lifter of duration less than p extracts a smoothed and not aliased version of the spectrum.
Linear prediction coefficients can alternatively be obtained exactly through the recursive relation between the real cepstrum and predictor coefficients of the all-pole model when h[n] is all-pole (Exercise 6.13).
April 19, 2023 Veton Këpuska 82
Homomorphic Prediction Zero Estimation:
Consider a transfer function of poles and zeros of the form:
Also consider a sequence x[n]=h[n]*p[n] where p[n] is a periodic impulse train.
Suppose that: Estimate of h[n] is obtained through homographic filtering of
x[n] Number of poles and zeros is known and Linear-phase component z-r has been removed.
Then poles of h[n] can be estimated using the covariance method of linear predication.
Other methods can be used (e.g., Shanks method described in Chapter 5) to estimate zeros.
zD
zNzH
April 19, 2023 Veton Këpuska 83
Homographic Prediction
April 19, 2023 Veton Këpuska 84
Summary This chapter focus was on the use of Homomorphic
filtering with application to deconvolution-separation of source from a system.
The presented methodology is general and can be applied not only to deconvolution of vocal tract from glottal source.
Example Applications: Control of dynamic range of multiplicatively combined
signals (Exercise 6.19) Recovery of speech from degraded recordings. Old acoustic
recordings suffer from convolutional distortion imparted by an acoustic horn that can be approximated by a linear resonant filter. See Exercise 6.20 for details.
In image processing, homomorphic filtering can be used for contrast enhancement (See Oppenheim and Shafer Book, “Digital Signal Processing”, p487, Prentice Hall 1975.)
April 19, 2023 Veton Këpuska 85
Summary Homomorphic processing is applied in the phase Vocoder
and sinewave analysis/synthesis. It also has been found useful in speech coding (Chapter 12) Speaker Recognition (Chapter 14) It also a basis for mel-cepstrum; Fourier Transform of a
constant-Q filtered log-spectrum. Mel-cepstrum it is hypothesized that it approximates signal
processing in the early stages of human auditory perception.
Homomorphic filtering applied along the temporal trajectories of the mel-cepstral coefficients can be used to remove convolutional channel distortions even when the cepstrum of these distortions overlaps the cepstrum of speech (Chapter 13): Cepstral Mean Subtraction and RASTA processing.
END
April 19, 2023 Veton Këpuska 86