Design of 600 Bps Speech Coding Algorithm Based on Melpe
-
Upload
jehanzaib-yousuf -
Category
Documents
-
view
262 -
download
0
description
Transcript of Design of 600 Bps Speech Coding Algorithm Based on Melpe
DESIGN OF 600 BPS SPEECH CODING
ALGORITHM BASED ON MELPE
ZOU Feng GUO Ying CHEN Xinfu LIU YanTelecommunication engineering college, Air Force Engineering University.
Xi'an, Shaanxi, China.Email: feng-zoug163.com
Abstract This paper describes a new 600 bps speech codingalgorithm based on the enhanced mixed excitation linearprediction model [1 (MELPe), which is the new NATOstandard STANAG 4591. The MELPe speech coders arerobust in difficult background noise environments andintended mostly for military communications. Multi-framejoint vector quantification algorithm is used, which takesadvantage of inherent inter-frame redundancy. The predictedmulti-stage vector quantization algorithm is designed toquantize the line spectrum frequency (LSF) parameters. Tohandle channel errors and without requiring extra overhead,simulated annealing [21 (SA) algorithm is used to perform ajudicious assignment of binary codes to index the vectors in thevector quantification codebook. Simulation results have proventhat the proposed coder is found better than that of LPClOe [31and approximately near the 2.4 kbps MELP standard [4].
I. INTRODUCTION
The Enhanced Mixed Excitation Linear Predictivevocoder had been adopted as the STANAG4591 by NorthAtlantic Treaty Organization (NATO). The MELPe based onMELP which was selected as the 2400 bps Federal StandardVocoder by the United States Department of Defense DigitalVoice Processing Consortium (DDVPC) in 1996, andprovides a 1200 bps option and speech enhancement. It isrobust in difficult background noise environments while thebit rate is low, so that it is frequently used on HFNVHFchannels.
In this paper, the important aspects of the algorithm aredescribed. The analysis and synthesis algorithms are sharedwith the 2400 bps MELPe standard. The parameters of fourconsecutive frames are grouped together into a super-frame.The modified multi-frame joint vector quantizationalgorithm is used to quantize the parameters of super-frame.The predicted multi-stage vector quantization algorithm isdesigned because of the LSF correlation between frames toframes. A switched MA prediction [5] is used to predict LSFparameters, which limits the propagation of decoding errorsin the order of the prediction. SA algorithm is designed toperform an assignment of binary codes to handle channelerrors and without requiring extra overhead.
II. OVERVIEW
The frame interval of proposed coder is 25 ms in durationand contains 200 voice samples, and four consecutive framesare grouped together into a super-frame. This results in anoverall algorithmic delay of 100 ms. The band-limited signalsampled at 8000 Hz, and the input and output samples arerepresented using 16-b linear PCM.
There are six kinds of parameters which are estimated inthe MELPe vocoder. No bits are used to quantize theaperiodic flag and the Fourier magnitude vector. Becauseaperiodic pulses are used most often during transition regionsbetween voiced and unvoiced segments, it can be achievedfrom the voiced/unvoiced decisions. The Fourier magnitudevector is quantized to one of two vectors, and state selectionis done according to the V/U decisions of the super-frame. Aflat vector is used for unvoiced frames, and the other one isselected for voiced frames. The band-pass voicing, energy,pitch and spectrum are selected to be quantized andtransmitted, and the selected parameters will be quantizedjointly.
III. PARAMETERS QUANTIZATION
A. Multi-frame pitch quantizationDifferent quantization schemes of pitch are determined
by the different voiced/unvoiced decisions of the super-frame, and 9-bit codebook is used to quantize the pitchinformation. Pitch information is not to be quantized, whereall the frames are unvoiced in voicing patterns. For voicingpatterns that contains only one voiced frame, the pitch valueof voiced frame is quantized on a logarithmic scale with a99-level uniform quantizer which is used in the MELPstandard of the voiced frame. Otherwise, the pitchparameters are vector quantized. The unused bits are used tothe error protection. A special distortion measure is used inthis VQ algorithm, which is additional detailed in reference[6] [7]. The optimum index that minimizes the distortion isselected from codebook. The distortion measure is describedas follow:
4
d = > w pji=l
pi A+ p- Apifi=l
Ap = fPi - Pi-I, voiced frames0, otherwise
(1) super-frame will be used to predict the LSF coefficients ofthe current super-frame. Input LSF parameters are predictedby using this switched second-order MA prediction. Theprediction residue is quantized by a 4-stage VQ [8]. Fig. 1shows the scheme ofLSF quantization algorithm.
(2)
[1, voiced framewi
=
0.1, unvoiced frame(3)
Where P and pi are the quantized and unquantized log
pitch values respectively, p0 is the last log pitch value of
the previous super-frame, wi is the weighting coefficient,
S is a parameter to control the contribution of pitchdifferentials which is set to be 1 in the proposed coder. Thismeasure incorporated pitch differential into the codebooksearch.
B. Multi-frame band-pass voicing quantizationThe proposed coder determines the five band-pass
voiced/unvoiced decisions per frame, and uses a 4-bitcodebook to quantize per super-frame by taking advantage ofinter-frame redundancy of the voicing decisions. The band-pass voiced/unvoiced decisions parameters of fourconsecutive frames are grouped together into a vector. Theweighted Euclidean distance is used as the distortionmeasure.
(4)d = ,t,w (i j _bi aj)i=l j=l
Where i is the i-th frame of the current super-frame, j isthe j-th band-pass of the current frame, bi,j = I means that the
j-th band-pass voiced/unvoiced decisions is voiced,otherwise b,oj , j is the quantized band-pass
voiced/unvoiced decision, and Wj is the weighted factor,
and {w1 w2 w3'w4s}= {1,1/2,1/4,1/8,1/16}.
C. Multi-frame LSF quantizationMultistage VQ with inter-frame MA prediction is
designed to quantize LSF parameters. LSF parameters are
quantized by using a switched inter-frame MA prediction,which limits the propagation of decoding errors in the orderof the prediction. The LSF parameters of four consecutiveframes are grouped together into a matrix. The switchedinter-frame MA prediction is designed to exploit theredundancy arising from the correlation between consecutivematrixes. We found that two sets of MA predictivecoefficients give a good performance. One of the two sets isfor the periodic period of speech, and the other is transitionalperiod. As the order of MA prediction increased, thedistortion decrease. Second-order MA prediction gives a
reasonably small distortion and weakly influenced byrandom bit error conditions. The LSF residue of the prior
4 ~~~SDCodebookIxn -u I
mdex ,
Figure 1. The scheme of LSF quantization algorithm.
Input LSF parameters are coj . Quantized LSF
parameters CtiWj are generated using:
2
di,j PO,j r,j + ZPkj r -1,5-k
k=l
Pk,j = diag{pkj Pk,j * PIk}
(5)
(6)
ZPkj = I;j = 1,2,3,4;k = 0, M;M =2 (7)k=O
Where i is the i-th super-frame, j is the j-th frame of thesuper-frame, M is the MA prediction order, '¾ is the output
vector from the four-stage VQ at the j-th frame of i-th super-frame, I is the unit matrix, and Pk j is a diagonal predictionmatrix. These are two sets of diagonal prediction matrixes,and which minimizes the prediction residue is selected. Thegeneralized Lloyd algorithm is used to train the MApredictive coefficients. Spectral distortion is selected as thedistortion measure.
SD [fTioigS ] ](8)
Where i is the i-th super-frame, j is the j-th frame of the
super-frame, and Sij (O) and Sij (CO) are the powerspectrum of unquantized and quantized signal.
The MSVQ codebook consists of four stages of 1024,512, 512, and 256 levels respectively. The search procedureis an M-best [9] approximation to a full search, and M=8.
TABLE I. LSF QUANTIZER PERFORMANCE
Average SD (dB) 4>SD>2 SD>4
1.18 5.14% 0.1%
Table 1 shows the performance of the LSF quantizationalgorithm. It can achieve "transparent quality" approximatelyonly uses 36-bit codebook to quantize, and the selection ofpredictive coefficients uses 1 bit.
D. Multi-frame gain quantizationTwo gain parameters are calculated per frame, and the
logarithmic energy values from four successive frames aregrouped to form vectors of 8dimensions G {G1, G2, ... G8 }. 5 bits are used to scalar
quantize G, and G =(G1+G2+...+G8)/8 . A 4-bit
codebook is used quantized the vectors AG = G/G . TheEuclidean distance is adopted as the distortion measure.
E. Bit allocationThe proposed coder operates on frames of 25 ms, and
four consecutive frames are grouped together into a super-frame, for super-frame duration of 100 ms. A total of 60 bitsis used per super-frame. The bit allocation of proposedcoder are shown in Table 2.
TABLE II. BIT ALLOCATION
Parameters Bits (bit)
LSF 36
Selection of predictive coeff. 1
Pitch 9
Fourier Magnitudes 0
Band-pass Voicing 4
Aperiodic Flag 0
Gain 9
Synchronization 1
Total 60
Where £ is the channel bit error rate, k is the dimension ofvector, L is the size of codebook, ci is the codevector,
p(ci ) is the priori probability of the codevector ci, b(ci )
is the binary codeword assigned ci, Tb(c ) is the set of all
integers j satisfying Hamming-Distance is 1, d(c1 c1) is
the distortion between the ci and cj .
IV. TEST RESULT
We select the Diagnostic Rhyme Test (DRT) and theDiagnostic Acceptability Measure (DAM) as the subjectivequality tests. The 2.4 kbps MELPe standard coder was usedfor comparison purposes. In Fig. 2 to 4, the waveforms andspectrographs of test signal and synthesized signal aredrawn.
Figure 2. The waveform and spectrograph of test signal.
F. Optimization algorithm of VQ codebook based on SAVector quantification codebook is too sensitive to
transmit in error prone environments since the signal will beseverely damage. To simplify numerical computation, wehave assumed that the channel bit error rate (BER) issufficiently small so that we consider only single bit errorpatterns in a codeword. The SA algorithm is run to minimizethe average channel distortion D4 (b) .
Figure 3. The waveform and spectrograph of 2400 bps MELPe vocodersynthesized signal.
(9)L
Dd(ci,(b) = ' I p(ci) i )k i=l j:b( (,i)
Figure 4. The waveform and spectrograph of proposed 600 bps vocodersynthesized signal.
The coders were tested on speech containing quietbackground, and l1% random bit error channel. Theaveraged results of male and female scores are shown inTable 3 and 4.
TABLE III. TEST RESULTS IN QUIET BACKGROUND
Test item DRT DA
TABLE IV. TEST RESULTS IN 10% BRE
V. CONCLUSION
In this paper, we propose a new 600 bps speech codingalgorithm based on MELPe, and describe the important
aspects of the algorithm. To reduce the bit rate and obtainhigh quality synthesized speech, we develop a modifiedmulti-frame joint vector quantization that takes advantage ofinherent inter-frame redundancy. The predicted multi-stagevector quantization algorithm is designed to quantize theLSF parameters, which achieves "transparent quality"approximately and against channel errors effective. SAalgorithm is designed to perform an assignment of binarycodes to handle channel errors and without requiring extraoverhead. The informal subjective quality tests show thatthe speech quality of proposed coder is found approximatelynear the 2.4 kbps MELP standard and is still intelligiblewith a bit error rate of 1%.
ACKNOWLEDGMENT
The research is supported by: Shaanxi Natural ScienceFoundation of China (No. 2006F40).
REFERENCES
[1] J. S. Collura, and D. F. Brandt, "The 1.2kps/2.4kbps MELP speechcoding suite with integrated noise pre-processing" in Proc. IEEE Mil.Comm. Atlantic City, NJ, vol. 2, pp. 1449-1453, Oct.-Nov, 1999.
[2] Nariman Farvardin, "A study of vector quantization for noisychannels". IEEE Trans. Inform Theory. vol. 36 , no. 4, pp.799-809,1990.
[3] T.E. Tremain, "The government standard linear predictive codingalgorithm: LPC-10" Speech Technology, pp. 40-49, April 1982.
[4] McCree A, and Truong K, "A 2.4 kbit/s MELP coder candidate forthe new U.S. federal standard" Proceedings of IEEE ICASSP 1996.Piscataway, New Jersey: IEEE Press, pp. 200-203, 1996.
[5] R. Salami, and C. Laflamme, "Design and description of CS-ACELP:A toll quality 8kb/s speech coder" IEEE Transactions on Speech andAudio Processing, vol. 6, no. 2, pp. 116-130, March 1998.
[6] Wang Tian, and Koishida K, "A 1200 bps speech coder based onMELP" IEEE. ICASSP 2000. Piscataway, New Jersey: IEEE Press,pp. 1375-1378, 2000.
[7] Wang Tian, and Kazuhito K, "A 1200/2400bps coding suite based onMELP" Proc. IEEE, Inter. Conf. Acoustics, Speech and SignalProcessing, pp.90-92, 2002.
[8] Chan W Y, and Gupta S, "Enhanced multistage vector quantizationby joint codebook design" IEEE Transactions on Communications,vol. 40, no. '1, pp. 1693-1697, 1992.
[9] LeBlanc W P, and Bhattacharya B, "Efficient search and designprocedures for robust multi-stage VQ of LPC parameters for 4 kb/sspeech coding" IEEE Transactions on Speech and Audio Processing,vol. 1, no. 4, pp. 373-385, 1993.