Time-Scale Modification of Speech Signal SOLAFS (Synchronized Overlap- Add, Fixed Synthesis)

33
Time-Scale Modification of Speech Signal SOLAFS (Synchronized Overlap-Add, Fixed Synthesis)

Transcript of Time-Scale Modification of Speech Signal SOLAFS (Synchronized Overlap- Add, Fixed Synthesis)

  • Slide 1
  • Slide 2
  • Time-Scale Modification of Speech Signal SOLAFS (Synchronized Overlap- Add, Fixed Synthesis)
  • Slide 3
  • Overview Introduction Overview of the methods Basic Idea SOLAFS Method Matlab Code The results Conclusion
  • Slide 4
  • Introduction There are a large number of applications to modify the time-scale of speech, music or other acoustic material. Without modifying the pitch. To speed up or slow down the speech No Donald Duck or Minnie Mouse effects.
  • Slide 5
  • Introduction TSM-Time scale modification refer to changing the reproduction rate of a changing the reproduction rate of a signal. signal. Two primary operation involved - time-scale expansion -slow down - time-scale expansion -slow down - time-scale compression -speed up - time-scale compression -speed up
  • Slide 6
  • Introduction original expansion compression
  • Slide 7
  • Overview of methods Time- scale modification utilizes three basic methods: - frequency domain processing methods - frequency domain processing methods - analysis/synthesis methods - analysis/synthesis methods - time-domain processing methods - time-domain processing methods SOLAFS is a time-domain processing method. method.
  • Slide 8
  • Basic Idea SOLAFS is an improvement of the prior SOLA method( Synchronized overlap-add). SOLA consists of -shifting the beginning of a new speech segment over the end of the preceding segment to find the point of the highest cross- correlation. -shifting the beginning of a new speech segment over the end of the preceding segment to find the point of the highest cross- correlation. -when found it, the frame are overlapped and average together. -when found it, the frame are overlapped and average together.
  • Slide 9
  • SOLAFS There are 4 parameters Window length () - smallest unit of input signal that is manipulated by the method Window length (W) - smallest unit of input signal that is manipulated by the method Analysis shift (S a ) - inter-frame interval between successive search ranges for analysis windows along the input signal Synthesis shift (S s ) - inter-frame interval between successive analysis windows along the output signal Shift search interval (k max ) - the duration of the interval over which an analysis window may be shifted for purpose of aligning it with the region of the output signal it will overlap.
  • Slide 10
  • SOLAFS The four parameters used in the SOLAFS
  • Slide 11
  • Analysis The analysis windows are chosen as follows: where m = a window index, i.e. it refers to the m th window m = a window index, i.e. it refers to the m th window n = a sample index in an input buffer for the input n = a sample index in an input buffer for the input signal, which buffer is W samples long k m = the number of samples of shift for the mth signal, which buffer is W samples long k m = the number of samples of shift for the mth window x m [n] = the nth sample in the mth analysis window window x m [n] = the nth sample in the mth analysis window
  • Slide 12
  • Analysis The analysis windows are then used to form the output signal y[i] recursively in accordance to the following: where: W ov = W S s is the number of points in the overlap region W ov = W S s is the number of points in the overlap region b[n] = an overlap-add weighting function which is referred b[n] = an overlap-add weighting function which is referred to as a fading factor an averaging function, a to as a fading factor an averaging function, a linear fade function, and so forth. linear fade function, and so forth.
  • Slide 13
  • Analysis Calculation for k m k m is an optimal shift that is determined k m is an optimal shift that is determined by the normalized cross-correlation between x and y in the overlap region. by the normalized cross-correlation between x and y in the overlap region.where k max is the maximum allowable shift from the initial string position of the analysis window k max is the maximum allowable shift from the initial string position of the analysis window
  • Slide 14
  • K m can be often predicted without computation of the similarity. The m th shift, k m, should be determined by: if otherwise
  • Slide 15
  • Implement in MATLAB There are 7 steps as follows; 1. As an initialization step, take W samples 1. As an initialization step, take W samples from the input signal, which samples are from the input signal, which samples are stored in an input signal buffer, and place stored in an input signal buffer, and place them in an output sample buffer for the them in an output sample buffer for the output signal. output signal. 2. find the start of the first analysis 2. find the start of the first analysis window mS a. window mS a.
  • Slide 16
  • Implement in MATLAB 3. Next, find the maximum similarity between the first W ov samples at the start of the analysis window and at the end of the output signal by computing the cross- correlation between the samples from the start of the analysis window, and the samples from the end of the output window.
  • Slide 17
  • Implement in MATLAB 4. We shift the start of the analysis window by one or two and repeat step 3. 4. We shift the start of the analysis window by one or two and repeat step 3. 5. Steps 3 and 4 are repeated until we have shifted the analysis window by the maximum amount of k max that is allowed. 5. Steps 3 and 4 are repeated until we have shifted the analysis window by the maximum amount of k max that is allowed.
  • Slide 18
  • Implement in MATLAB 6. If the maximum cross-correlation occurs for a certain shift of the analysis window, overlap-add the last W ov samples of the output signal and the first W ov samples of the shifted analysis window, and transfer W W ov further samples into the output buffer. 6. If the maximum cross-correlation occurs for a certain shift of the analysis window, overlap-add the last W ov samples of the output signal and the first W ov samples of the shifted analysis window, and transfer W W ov further samples into the output buffer.
  • Slide 19
  • Implement in MATLAB 7. Steps 2 7 are repeated by choosing the next analysis window, until the input signal reaches its end.
  • Slide 20
  • Parameter choices The smallest useful synthesis shift is S s = W ov S s = W ov The smallest useful window length is W = 2W ov W = 2W ov K max = 2W
  • Slide 21
  • MATLAB %%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%% % Project Spring 2005 % Rachan Fugcharoen ECE5525 % % Do SOLAFS timescale mod'n % % Y is X scaled to run F x faster. X is added-in in windows % % W pts long, overlapping by Wov points with the previous output. % % The similarity is calculated over the last Wsim points of output. % % Maximum similarity skew is Kmax pts. % % Each xcorr calculation is decimated by xdecim (8) % % The skew axis sampling is decimated by kdecim (2) %%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%% % Read the wave file [d,fs,bit]=wavread('we.wav'); W=400; % window length Wov=W/2; % Overlapping point long Kmax=2*W; % maximum number of shifting Wsim=Wov; % Similarity point long of output xdecim=8; % decimation of each xcorr kdecim=2; % decimation of the skew axis sampling X=d'
  • Slide 22
  • MATLAB % Factor to run x faster or slower F=4; Ss =W-Wov; size(X); xpts = size(X,2); ypts = round(xpts / F); Y = zeros(1, ypts); % Cross-fade win is Wov pts long - it grows xfwin = (1:Wov)/(Wov+1); % Index to add to ypos to get the overlap region ovix = (1-Wov):0; % Index for non-overlapping bit newix = 1:(W-Wov); % Index for similarity chunks % decimate the cross-correlation simix = (1:xdecim:Wsim) - Wsim; % prepad X for extraction padX = [zeros(1, Wsim), X, zeros(1,Kmax+W-Wov)]; % Startup - just copy first bit Y(1:Wsim) = X(1:Wsim);
  • Slide 23
  • MATLAB xabs = 0; lastxpos = 0; km = 0; for ypos = Wsim:Ss:(ypts-W); % Ideal X position xpos = F * ypos; % Overlap prediction - assume all of overlap from last copy kmpred = km + (xpos - lastxpos); lastxpos = xpos; if (kmpred