Design of 600 Bps Speech Coding Algorithm Based on Melpe

4
DESIGN OF 600 BPS SPEECH CODING ALGORITHM BASED ON MELPE ZOU Feng GUO Ying CHEN Xinfu LIU Yan Telecommunication engineering college, Air Force Engineering University. Xi'an, Shaanxi, China. Email: feng-zoug163.com Abstract This paper describes a new 600 bps speech coding algorithm based on the enhanced mixed excitation linear prediction model [1 (MELPe), which is the new NATO standard STANAG 4591. The MELPe speech coders are robust in difficult background noise environments and intended mostly for military communications. Multi-frame joint vector quantification algorithm is used, which takes advantage of inherent inter-frame redundancy. The predicted multi-stage vector quantization algorithm is designed to quantize the line spectrum frequency (LSF) parameters. To handle channel errors and without requiring extra overhead, simulated annealing [21 (SA) algorithm is used to perform a judicious assignment of binary codes to index the vectors in the vector quantification codebook. Simulation results have proven that the proposed coder is found better than that of LPClOe [31 and approximately near the 2.4 kbps MELP standard [4]. I. INTRODUCTION The Enhanced Mixed Excitation Linear Predictive vocoder had been adopted as the STANAG4591 by North Atlantic Treaty Organization (NATO). The MELPe based on MELP which was selected as the 2400 bps Federal Standard Vocoder by the United States Department of Defense Digital Voice Processing Consortium (DDVPC) in 1996, and provides a 1200 bps option and speech enhancement. It is robust in difficult background noise environments while the bit rate is low, so that it is frequently used on HFNVHF channels. In this paper, the important aspects of the algorithm are described. The analysis and synthesis algorithms are shared with the 2400 bps MELPe standard. The parameters of four consecutive frames are grouped together into a super-frame. The modified multi-frame joint vector quantization algorithm is used to quantize the parameters of super-frame. The predicted multi-stage vector quantization algorithm is designed because of the LSF correlation between frames to frames. A switched MA prediction [5] is used to predict LSF parameters, which limits the propagation of decoding errors in the order of the prediction. SA algorithm is designed to perform an assignment of binary codes to handle channel errors and without requiring extra overhead. II. OVERVIEW The frame interval of proposed coder is 25 ms in duration and contains 200 voice samples, and four consecutive frames are grouped together into a super-frame. This results in an overall algorithmic delay of 100 ms. The band-limited signal sampled at 8000 Hz, and the input and output samples are represented using 16-b linear PCM. There are six kinds of parameters which are estimated in the MELPe vocoder. No bits are used to quantize the aperiodic flag and the Fourier magnitude vector. Because aperiodic pulses are used most often during transition regions between voiced and unvoiced segments, it can be achieved from the voiced/unvoiced decisions. The Fourier magnitude vector is quantized to one of two vectors, and state selection is done according to the V/U decisions of the super-frame. A flat vector is used for unvoiced frames, and the other one is selected for voiced frames. The band-pass voicing, energy, pitch and spectrum are selected to be quantized and transmitted, and the selected parameters will be quantized jointly. III. PARAMETERS QUANTIZATION A. Multi-frame pitch quantization Different quantization schemes of pitch are determined by the different voiced/unvoiced decisions of the super- frame, and 9-bit codebook is used to quantize the pitch information. Pitch information is not to be quantized, where all the frames are unvoiced in voicing patterns. For voicing patterns that contains only one voiced frame, the pitch value of voiced frame is quantized on a logarithmic scale with a 99-level uniform quantizer which is used in the MELP standard of the voiced frame. Otherwise, the pitch parameters are vector quantized. The unused bits are used to the error protection. A special distortion measure is used in this VQ algorithm, which is additional detailed in reference [6] [7]. The optimum index that minimizes the distortion is selected from codebook. The distortion measure is described as follow:

description

melpe 600bps

Transcript of Design of 600 Bps Speech Coding Algorithm Based on Melpe

Page 1: Design of 600 Bps Speech Coding Algorithm Based on Melpe

DESIGN OF 600 BPS SPEECH CODING

ALGORITHM BASED ON MELPE

ZOU Feng GUO Ying CHEN Xinfu LIU YanTelecommunication engineering college, Air Force Engineering University.

Xi'an, Shaanxi, China.Email: feng-zoug163.com

Abstract This paper describes a new 600 bps speech codingalgorithm based on the enhanced mixed excitation linearprediction model [1 (MELPe), which is the new NATOstandard STANAG 4591. The MELPe speech coders arerobust in difficult background noise environments andintended mostly for military communications. Multi-framejoint vector quantification algorithm is used, which takesadvantage of inherent inter-frame redundancy. The predictedmulti-stage vector quantization algorithm is designed toquantize the line spectrum frequency (LSF) parameters. Tohandle channel errors and without requiring extra overhead,simulated annealing [21 (SA) algorithm is used to perform ajudicious assignment of binary codes to index the vectors in thevector quantification codebook. Simulation results have proventhat the proposed coder is found better than that of LPClOe [31and approximately near the 2.4 kbps MELP standard [4].

I. INTRODUCTION

The Enhanced Mixed Excitation Linear Predictivevocoder had been adopted as the STANAG4591 by NorthAtlantic Treaty Organization (NATO). The MELPe based onMELP which was selected as the 2400 bps Federal StandardVocoder by the United States Department of Defense DigitalVoice Processing Consortium (DDVPC) in 1996, andprovides a 1200 bps option and speech enhancement. It isrobust in difficult background noise environments while thebit rate is low, so that it is frequently used on HFNVHFchannels.

In this paper, the important aspects of the algorithm aredescribed. The analysis and synthesis algorithms are sharedwith the 2400 bps MELPe standard. The parameters of fourconsecutive frames are grouped together into a super-frame.The modified multi-frame joint vector quantizationalgorithm is used to quantize the parameters of super-frame.The predicted multi-stage vector quantization algorithm isdesigned because of the LSF correlation between frames toframes. A switched MA prediction [5] is used to predict LSFparameters, which limits the propagation of decoding errorsin the order of the prediction. SA algorithm is designed toperform an assignment of binary codes to handle channelerrors and without requiring extra overhead.

II. OVERVIEW

The frame interval of proposed coder is 25 ms in durationand contains 200 voice samples, and four consecutive framesare grouped together into a super-frame. This results in anoverall algorithmic delay of 100 ms. The band-limited signalsampled at 8000 Hz, and the input and output samples arerepresented using 16-b linear PCM.

There are six kinds of parameters which are estimated inthe MELPe vocoder. No bits are used to quantize theaperiodic flag and the Fourier magnitude vector. Becauseaperiodic pulses are used most often during transition regionsbetween voiced and unvoiced segments, it can be achievedfrom the voiced/unvoiced decisions. The Fourier magnitudevector is quantized to one of two vectors, and state selectionis done according to the V/U decisions of the super-frame. Aflat vector is used for unvoiced frames, and the other one isselected for voiced frames. The band-pass voicing, energy,pitch and spectrum are selected to be quantized andtransmitted, and the selected parameters will be quantizedjointly.

III. PARAMETERS QUANTIZATION

A. Multi-frame pitch quantizationDifferent quantization schemes of pitch are determined

by the different voiced/unvoiced decisions of the super-frame, and 9-bit codebook is used to quantize the pitchinformation. Pitch information is not to be quantized, whereall the frames are unvoiced in voicing patterns. For voicingpatterns that contains only one voiced frame, the pitch valueof voiced frame is quantized on a logarithmic scale with a99-level uniform quantizer which is used in the MELPstandard of the voiced frame. Otherwise, the pitchparameters are vector quantized. The unused bits are used tothe error protection. A special distortion measure is used inthis VQ algorithm, which is additional detailed in reference[6] [7]. The optimum index that minimizes the distortion isselected from codebook. The distortion measure is describedas follow:

Page 2: Design of 600 Bps Speech Coding Algorithm Based on Melpe

4

d = > w pji=l

pi A+ p- Apifi=l

Ap = fPi - Pi-I, voiced frames0, otherwise

(1) super-frame will be used to predict the LSF coefficients ofthe current super-frame. Input LSF parameters are predictedby using this switched second-order MA prediction. Theprediction residue is quantized by a 4-stage VQ [8]. Fig. 1shows the scheme ofLSF quantization algorithm.

(2)

[1, voiced framewi

=

0.1, unvoiced frame(3)

Where P and pi are the quantized and unquantized log

pitch values respectively, p0 is the last log pitch value of

the previous super-frame, wi is the weighting coefficient,

S is a parameter to control the contribution of pitchdifferentials which is set to be 1 in the proposed coder. Thismeasure incorporated pitch differential into the codebooksearch.

B. Multi-frame band-pass voicing quantizationThe proposed coder determines the five band-pass

voiced/unvoiced decisions per frame, and uses a 4-bitcodebook to quantize per super-frame by taking advantage ofinter-frame redundancy of the voicing decisions. The band-pass voiced/unvoiced decisions parameters of fourconsecutive frames are grouped together into a vector. Theweighted Euclidean distance is used as the distortionmeasure.

(4)d = ,t,w (i j _bi aj)i=l j=l

Where i is the i-th frame of the current super-frame, j isthe j-th band-pass of the current frame, bi,j = I means that the

j-th band-pass voiced/unvoiced decisions is voiced,otherwise b,oj , j is the quantized band-pass

voiced/unvoiced decision, and Wj is the weighted factor,

and {w1 w2 w3'w4s}= {1,1/2,1/4,1/8,1/16}.

C. Multi-frame LSF quantizationMultistage VQ with inter-frame MA prediction is

designed to quantize LSF parameters. LSF parameters are

quantized by using a switched inter-frame MA prediction,which limits the propagation of decoding errors in the orderof the prediction. The LSF parameters of four consecutiveframes are grouped together into a matrix. The switchedinter-frame MA prediction is designed to exploit theredundancy arising from the correlation between consecutivematrixes. We found that two sets of MA predictivecoefficients give a good performance. One of the two sets isfor the periodic period of speech, and the other is transitionalperiod. As the order of MA prediction increased, thedistortion decrease. Second-order MA prediction gives a

reasonably small distortion and weakly influenced byrandom bit error conditions. The LSF residue of the prior

4 ~~~SDCodebookIxn -u I

mdex ,

Figure 1. The scheme of LSF quantization algorithm.

Input LSF parameters are coj . Quantized LSF

parameters CtiWj are generated using:

2

di,j PO,j r,j + ZPkj r -1,5-k

k=l

Pk,j = diag{pkj Pk,j * PIk}

(5)

(6)

ZPkj = I;j = 1,2,3,4;k = 0, M;M =2 (7)k=O

Where i is the i-th super-frame, j is the j-th frame of thesuper-frame, M is the MA prediction order, '¾ is the output

vector from the four-stage VQ at the j-th frame of i-th super-frame, I is the unit matrix, and Pk j is a diagonal predictionmatrix. These are two sets of diagonal prediction matrixes,and which minimizes the prediction residue is selected. Thegeneralized Lloyd algorithm is used to train the MApredictive coefficients. Spectral distortion is selected as thedistortion measure.

SD [fTioigS ] ](8)

Where i is the i-th super-frame, j is the j-th frame of the

super-frame, and Sij (O) and Sij (CO) are the powerspectrum of unquantized and quantized signal.

The MSVQ codebook consists of four stages of 1024,512, 512, and 256 levels respectively. The search procedureis an M-best [9] approximation to a full search, and M=8.

Page 3: Design of 600 Bps Speech Coding Algorithm Based on Melpe

TABLE I. LSF QUANTIZER PERFORMANCE

Average SD (dB) 4>SD>2 SD>4

1.18 5.14% 0.1%

Table 1 shows the performance of the LSF quantizationalgorithm. It can achieve "transparent quality" approximatelyonly uses 36-bit codebook to quantize, and the selection ofpredictive coefficients uses 1 bit.

D. Multi-frame gain quantizationTwo gain parameters are calculated per frame, and the

logarithmic energy values from four successive frames aregrouped to form vectors of 8dimensions G {G1, G2, ... G8 }. 5 bits are used to scalar

quantize G, and G =(G1+G2+...+G8)/8 . A 4-bit

codebook is used quantized the vectors AG = G/G . TheEuclidean distance is adopted as the distortion measure.

E. Bit allocationThe proposed coder operates on frames of 25 ms, and

four consecutive frames are grouped together into a super-frame, for super-frame duration of 100 ms. A total of 60 bitsis used per super-frame. The bit allocation of proposedcoder are shown in Table 2.

TABLE II. BIT ALLOCATION

Parameters Bits (bit)

LSF 36

Selection of predictive coeff. 1

Pitch 9

Fourier Magnitudes 0

Band-pass Voicing 4

Aperiodic Flag 0

Gain 9

Synchronization 1

Total 60

Where £ is the channel bit error rate, k is the dimension ofvector, L is the size of codebook, ci is the codevector,

p(ci ) is the priori probability of the codevector ci, b(ci )

is the binary codeword assigned ci, Tb(c ) is the set of all

integers j satisfying Hamming-Distance is 1, d(c1 c1) is

the distortion between the ci and cj .

IV. TEST RESULT

We select the Diagnostic Rhyme Test (DRT) and theDiagnostic Acceptability Measure (DAM) as the subjectivequality tests. The 2.4 kbps MELPe standard coder was usedfor comparison purposes. In Fig. 2 to 4, the waveforms andspectrographs of test signal and synthesized signal aredrawn.

Figure 2. The waveform and spectrograph of test signal.

F. Optimization algorithm of VQ codebook based on SAVector quantification codebook is too sensitive to

transmit in error prone environments since the signal will beseverely damage. To simplify numerical computation, wehave assumed that the channel bit error rate (BER) issufficiently small so that we consider only single bit errorpatterns in a codeword. The SA algorithm is run to minimizethe average channel distortion D4 (b) .

Figure 3. The waveform and spectrograph of 2400 bps MELPe vocodersynthesized signal.

(9)L

Dd(ci,(b) = ' I p(ci) i )k i=l j:b( (,i)

Page 4: Design of 600 Bps Speech Coding Algorithm Based on Melpe

Figure 4. The waveform and spectrograph of proposed 600 bps vocodersynthesized signal.

The coders were tested on speech containing quietbackground, and l1% random bit error channel. Theaveraged results of male and female scores are shown inTable 3 and 4.

TABLE III. TEST RESULTS IN QUIET BACKGROUND

Test item DRT DA

TABLE IV. TEST RESULTS IN 10% BRE

V. CONCLUSION

In this paper, we propose a new 600 bps speech codingalgorithm based on MELPe, and describe the important

aspects of the algorithm. To reduce the bit rate and obtainhigh quality synthesized speech, we develop a modifiedmulti-frame joint vector quantization that takes advantage ofinherent inter-frame redundancy. The predicted multi-stagevector quantization algorithm is designed to quantize theLSF parameters, which achieves "transparent quality"approximately and against channel errors effective. SAalgorithm is designed to perform an assignment of binarycodes to handle channel errors and without requiring extraoverhead. The informal subjective quality tests show thatthe speech quality of proposed coder is found approximatelynear the 2.4 kbps MELP standard and is still intelligiblewith a bit error rate of 1%.

ACKNOWLEDGMENT

The research is supported by: Shaanxi Natural ScienceFoundation of China (No. 2006F40).

REFERENCES

[1] J. S. Collura, and D. F. Brandt, "The 1.2kps/2.4kbps MELP speechcoding suite with integrated noise pre-processing" in Proc. IEEE Mil.Comm. Atlantic City, NJ, vol. 2, pp. 1449-1453, Oct.-Nov, 1999.

[2] Nariman Farvardin, "A study of vector quantization for noisychannels". IEEE Trans. Inform Theory. vol. 36 , no. 4, pp.799-809,1990.

[3] T.E. Tremain, "The government standard linear predictive codingalgorithm: LPC-10" Speech Technology, pp. 40-49, April 1982.

[4] McCree A, and Truong K, "A 2.4 kbit/s MELP coder candidate forthe new U.S. federal standard" Proceedings of IEEE ICASSP 1996.Piscataway, New Jersey: IEEE Press, pp. 200-203, 1996.

[5] R. Salami, and C. Laflamme, "Design and description of CS-ACELP:A toll quality 8kb/s speech coder" IEEE Transactions on Speech andAudio Processing, vol. 6, no. 2, pp. 116-130, March 1998.

[6] Wang Tian, and Koishida K, "A 1200 bps speech coder based onMELP" IEEE. ICASSP 2000. Piscataway, New Jersey: IEEE Press,pp. 1375-1378, 2000.

[7] Wang Tian, and Kazuhito K, "A 1200/2400bps coding suite based onMELP" Proc. IEEE, Inter. Conf. Acoustics, Speech and SignalProcessing, pp.90-92, 2002.

[8] Chan W Y, and Gupta S, "Enhanced multistage vector quantizationby joint codebook design" IEEE Transactions on Communications,vol. 40, no. '1, pp. 1693-1697, 1992.

[9] LeBlanc W P, and Bhattacharya B, "Efficient search and designprocedures for robust multi-stage VQ of LPC parameters for 4 kb/sspeech coding" IEEE Transactions on Speech and Audio Processing,vol. 1, no. 4, pp. 373-385, 1993.