Transcript of Time-Scale Modification of Speech Signal SOLAFS (Synchronized Overlap- Add, Fixed Synthesis)
- Slide 1
- Slide 2
- Time-Scale Modification of Speech Signal SOLAFS (Synchronized
Overlap- Add, Fixed Synthesis)
- Slide 3
- Overview Introduction Overview of the methods Basic Idea SOLAFS
Method Matlab Code The results Conclusion
- Slide 4
- Introduction There are a large number of applications to modify
the time-scale of speech, music or other acoustic material. Without
modifying the pitch. To speed up or slow down the speech No Donald
Duck or Minnie Mouse effects.
- Slide 5
- Introduction TSM-Time scale modification refer to changing the
reproduction rate of a changing the reproduction rate of a signal.
signal. Two primary operation involved - time-scale expansion -slow
down - time-scale expansion -slow down - time-scale compression
-speed up - time-scale compression -speed up
- Slide 6
- Introduction original expansion compression
- Slide 7
- Overview of methods Time- scale modification utilizes three
basic methods: - frequency domain processing methods - frequency
domain processing methods - analysis/synthesis methods -
analysis/synthesis methods - time-domain processing methods -
time-domain processing methods SOLAFS is a time-domain processing
method. method.
- Slide 8
- Basic Idea SOLAFS is an improvement of the prior SOLA method(
Synchronized overlap-add). SOLA consists of -shifting the beginning
of a new speech segment over the end of the preceding segment to
find the point of the highest cross- correlation. -shifting the
beginning of a new speech segment over the end of the preceding
segment to find the point of the highest cross- correlation. -when
found it, the frame are overlapped and average together. -when
found it, the frame are overlapped and average together.
- Slide 9
- SOLAFS There are 4 parameters Window length () - smallest unit
of input signal that is manipulated by the method Window length (W)
- smallest unit of input signal that is manipulated by the method
Analysis shift (S a ) - inter-frame interval between successive
search ranges for analysis windows along the input signal Synthesis
shift (S s ) - inter-frame interval between successive analysis
windows along the output signal Shift search interval (k max ) -
the duration of the interval over which an analysis window may be
shifted for purpose of aligning it with the region of the output
signal it will overlap.
- Slide 10
- SOLAFS The four parameters used in the SOLAFS
- Slide 11
- Analysis The analysis windows are chosen as follows: where m =
a window index, i.e. it refers to the m th window m = a window
index, i.e. it refers to the m th window n = a sample index in an
input buffer for the input n = a sample index in an input buffer
for the input signal, which buffer is W samples long k m = the
number of samples of shift for the mth signal, which buffer is W
samples long k m = the number of samples of shift for the mth
window x m [n] = the nth sample in the mth analysis window window x
m [n] = the nth sample in the mth analysis window
- Slide 12
- Analysis The analysis windows are then used to form the output
signal y[i] recursively in accordance to the following: where: W ov
= W S s is the number of points in the overlap region W ov = W S s
is the number of points in the overlap region b[n] = an overlap-add
weighting function which is referred b[n] = an overlap-add
weighting function which is referred to as a fading factor an
averaging function, a to as a fading factor an averaging function,
a linear fade function, and so forth. linear fade function, and so
forth.
- Slide 13
- Analysis Calculation for k m k m is an optimal shift that is
determined k m is an optimal shift that is determined by the
normalized cross-correlation between x and y in the overlap region.
by the normalized cross-correlation between x and y in the overlap
region.where k max is the maximum allowable shift from the initial
string position of the analysis window k max is the maximum
allowable shift from the initial string position of the analysis
window
- Slide 14
- K m can be often predicted without computation of the
similarity. The m th shift, k m, should be determined by: if
otherwise
- Slide 15
- Implement in MATLAB There are 7 steps as follows; 1. As an
initialization step, take W samples 1. As an initialization step,
take W samples from the input signal, which samples are from the
input signal, which samples are stored in an input signal buffer,
and place stored in an input signal buffer, and place them in an
output sample buffer for the them in an output sample buffer for
the output signal. output signal. 2. find the start of the first
analysis 2. find the start of the first analysis window mS a.
window mS a.
- Slide 16
- Implement in MATLAB 3. Next, find the maximum similarity
between the first W ov samples at the start of the analysis window
and at the end of the output signal by computing the cross-
correlation between the samples from the start of the analysis
window, and the samples from the end of the output window.
- Slide 17
- Implement in MATLAB 4. We shift the start of the analysis
window by one or two and repeat step 3. 4. We shift the start of
the analysis window by one or two and repeat step 3. 5. Steps 3 and
4 are repeated until we have shifted the analysis window by the
maximum amount of k max that is allowed. 5. Steps 3 and 4 are
repeated until we have shifted the analysis window by the maximum
amount of k max that is allowed.
- Slide 18
- Implement in MATLAB 6. If the maximum cross-correlation occurs
for a certain shift of the analysis window, overlap-add the last W
ov samples of the output signal and the first W ov samples of the
shifted analysis window, and transfer W W ov further samples into
the output buffer. 6. If the maximum cross-correlation occurs for a
certain shift of the analysis window, overlap-add the last W ov
samples of the output signal and the first W ov samples of the
shifted analysis window, and transfer W W ov further samples into
the output buffer.
- Slide 19
- Implement in MATLAB 7. Steps 2 7 are repeated by choosing the
next analysis window, until the input signal reaches its end.
- Slide 20
- Parameter choices The smallest useful synthesis shift is S s =
W ov S s = W ov The smallest useful window length is W = 2W ov W =
2W ov K max = 2W
- Slide 21
- MATLAB %%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%% % Project Spring
2005 % Rachan Fugcharoen ECE5525 % % Do SOLAFS timescale mod'n % %
Y is X scaled to run F x faster. X is added-in in windows % % W pts
long, overlapping by Wov points with the previous output. % % The
similarity is calculated over the last Wsim points of output. % %
Maximum similarity skew is Kmax pts. % % Each xcorr calculation is
decimated by xdecim (8) % % The skew axis sampling is decimated by
kdecim (2) %%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%% % Read the wave file
[d,fs,bit]=wavread('we.wav'); W=400; % window length Wov=W/2; %
Overlapping point long Kmax=2*W; % maximum number of shifting
Wsim=Wov; % Similarity point long of output xdecim=8; % decimation
of each xcorr kdecim=2; % decimation of the skew axis sampling
X=d'
- Slide 22
- MATLAB % Factor to run x faster or slower F=4; Ss =W-Wov;
size(X); xpts = size(X,2); ypts = round(xpts / F); Y = zeros(1,
ypts); % Cross-fade win is Wov pts long - it grows xfwin =
(1:Wov)/(Wov+1); % Index to add to ypos to get the overlap region
ovix = (1-Wov):0; % Index for non-overlapping bit newix =
1:(W-Wov); % Index for similarity chunks % decimate the
cross-correlation simix = (1:xdecim:Wsim) - Wsim; % prepad X for
extraction padX = [zeros(1, Wsim), X, zeros(1,Kmax+W-Wov)]; %
Startup - just copy first bit Y(1:Wsim) = X(1:Wsim);
- Slide 23
- MATLAB xabs = 0; lastxpos = 0; km = 0; for ypos =
Wsim:Ss:(ypts-W); % Ideal X position xpos = F * ypos; % Overlap
prediction - assume all of overlap from last copy kmpred = km +
(xpos - lastxpos); lastxpos = xpos; if (kmpred