Encoding of Lpc Spectral Parameters

8/3/2019 Encoding of Lpc Spectral Parameters

1/4

s9.9ENCODING OF LPC SPECTRAL PARAMETERS USINGSWITCHED-ADAPTIVE INTERFRAME VECTOR PREDICTION?

Mei Yong, Grant Davidson, and Allen Gersho

Department of Electrical and Computer EngineeringUniversity of California, Santa Barbara, CA 93106

ABSTRACTLPC spectral parameter encoding is often a challen ging task inthe development of low bit- rate speech coding systems. An efficient,low complexity method called Switched-Adaptive Interfram e VectorPrediction (SIV P) has been developed for this purpose. SIVP utilizesvector linear prediction to exploit the high frame-to-frame redun-dancy present in successive frames of LPC parame ters. When SIVPis combined with scalar quantization, we have found that the LPCparamete r bit-rate req uired to achieve high-quality synthetic speechis only 1300 bits per second. With vector quantiza tion, the bit-ratecan be reduced even further (to 1000 hits per second) without intro-ducing any perceivable quantization noise in the reconstructedspeech.

1. IN T RO D U CT IO NOne of the most powerful techniques in speech coding isLinear Predictive Coding (E).ts importance lies in the fact thata small set of parameters provides reasonably accurate estimates ofthe short-term speech spectrum. As ongoing speech codingresearch pushes bit-rates further downward, it has become increas-ingly important to find efficient LPC parameter quantization tech-niques.In this paper, we describe a new method for encoding thetime sequence of LPC spectral parameters extracted from a speechwaveform called Switched-Adaptive Interframe Vector Prediction

(SIVP). The application of interframe vector prediction to LPCvector encoding (i.e., to a set of LPC parameters extracted fromone frame of speech), was proposed in 1984 as part of a three yearproject at UCSB for NASA's Mobile Satellite Experiment, admin-istered by the Jet Propulsion Laboratory (JPL). This was one ofseveral techniques used to develop a 4.8 kbps speech coder suitablefo r 5 KHz fading radio channels between a mobile vehicle and asatellite in geosynchronous orbit. Recently, a hardware prototypespeech coder for NASA which incorporates the SIVP technique hasbeen implemented in our laboratory. The prototype will soon beused in field trials to be conducted by JPL. Preliminary results onSIVP were recently reported in [ l ] and a similar approach to inter-frame coding of parameters was also given in [2]. In this paper,we extend our preliminary report, give a more comprehensivepresentation of the approach, and focus on additional aspects ofSIVP.

We first review the basic concepts of interframe vector linearprediction, and then describe the switched-adaptive predictor designalgorithm. This algorithm can be applied to both vector-based aswell as frame-based adaptation, making it more general than thepreviously reported method of switched adaptation [3] used onlyfor frame-based adaptation. We also show how switched predic-tors can be used to help reduce transmission errors which tend tot This work was performed for the Jet PropulsionLaboratory, Califomia Institute of Tech-nology, sponsoredby the National Aero nautics and Space Administration.

be caused by interframe predictive coding.Several methods for quantizing the interframe predictionerrors are examined, including both vector and scalar quantization.In particular, an adaptive bit allocation scheme incorporating scalarquantization is shown to be particularly effective for very low bit-rates and low-complexity implementations.Finally, a performance comparison between SIVP and severalother LPC parameter encoding methods is given. It is shown thatSIVP is indeed superior to more established methods.

2. IN T E RF RA M E V E CT O R L IN E A R P RE D ICT IO NCalculating LPC coefficients for successive frames of speechsamples may be viewed as a process of sampling the speech spec-trum as it evolves in time. Conventional LPC based coders typi-cally uses sampling intervals as long as possible but then eachobservation of the spectrum is separately coded and transmitted.However, there is considerable redundancy between differentspeech frames within one phoneme. We hope to retain the preci-sion needed in describing an individual spectrum by coding not thespectrum itself, but the error between the predicted spectrum basedon previous frames and the actual current spectrum. Here, we pro-pose to use Vector Linear Prediction (VLP) o remove the redun-dancy in the sequence of LPC parameter vectors. By applyingVLP, each spectral parameter in the current fram e is predicted not

only from corresponding parameters of previous frames, but alsofrom other parameters of previous frames . In this way, correlationbetween the different spe ech frames can be maximally exploited.L et { x,: n = --,. . . .,- } represent the LPC vector process,where x, = &(I ) , x,(2), . . .,x, (m))T s a zero-mean, m-dimensionalLPC vector associated with the n-th speech frame. (If theextracted LPC parameters have non-zero means, their mean-valuesare removed before prediction.) The first order prediction of Y, ca nbe written as

2, = Axn- l , (1)where A is a m x m prediction matrix. The prediction error vector,e, = (e , I) , e .@) , . . .,e, (m ) T , is equal to

e, = x, - Ax,-l .It is easily shown [3] that the optimal prediction matrix, whichminimizes the mean squared prediction error, is given by

(2)

A = ColC: , (3 )where

The operator E ( ) represents an expectation and N is the number ofvectors in speech training set. The vector prediction gain isdefined as:

402CH2561-9/88/0000-0402 $1.00 0 988 IEEE


2/4

where 1 I I I represents the Euclidean norm. It is important to notethat this definition assumes that the vector of parameters, x,, ha s

sed on quantization properties of theLPC speech model. For interframele to choose a set of LPC parametersquantization noise, but also highlye. After extensively evaluating vari-ou s LPC parameter sets, we selected Line Spectral Pairs (LSP's),also known as Line Spectral Frequencies (LSF's) [4],[5] as themost promising parameter set for SIVP. Our experiments alsoindicate that the short-term spectral envelope can be efficientlypredicted using fist-order vector prediction, and that the use ofadditional frames helps very little. In the following discussion, wepresent results based on fist-order VLP using LSP parameters asthe LPC speech model representation.

chose the switched-ad aptive approach and develop ed a new algo-rithm for the predictor design. In switched-ad aptation, the predic-tor matrix is updated on a vector-by-vector basis. For each inputvector to be predicted, a predictor mamx is selected from a fixedset of suc h matrices using a statistical classification of the inputvector. In contrast to forward-adaptation where the entire predic-tion matrix would be transmitted to the decoder, in switched-adaptive prediction only one index representing the selected predic-tor mamx is transmitted.The block diagram of a switched-adaptive predictive codingn Fig. 1. In the figure, x, represents an inputsents its prediction, The prediction error vectore. obta ined by subtra cting 9, from x, is quantized and sent to thedecoder. The positions of the switches at the encoder and thesynchronized by a flag signal (index). The recon-al e, is obtained by adding the quantized predictionerror to the prediction of the signal.

A

..i__.._._._.____..__!.----..i

Figure 1 Switched-Adaptive Predictive Coding SystemMost prior work using switched-adap tive prediction has beenlied to prediction of speech waveform segments. With thea1 assumption of lo cally stationary, classification for a giventistical characteristics of that frameobtained by averaging over the whole frame. The predictor foreach class is then designed from data comprising all frames of

speech assigned to this class. Predictor design techniques arer the case of interframe prediction, each frame containse vector and the statistical changes of vectors from one

to another can be very large. Furthermore, if one expects to

403

obtain average features for several successive vectors, a very longtime delay could result. Therefore, the classification metho ds usedfor speech waveform, which are based on the average properties ofeach frame are not suited for this kind of vector process. Th emethod proposed solves these problems and can be applied to bothvector-based as well as frame-based adaptation.As shown in the last section, the prediction matrix A isuniquely determined by the covariance matrices which, in turn, are

determined by the nature of correlation between adjacent vectorsand between the different components of a vector. This suggeststhat the division of the classes can be executed such that each classpreserves certain correlation properties.L et r, be di-th component,classification statiFor each input vethresholds determined by experiments to decide the proper class.The above method gives reasonably good performance for asmall number of classes. However, since S, is taken as an averageof all components of the instantaneous correlation vector, it maynot represent the local correlation property between two adjacentvectors well. This problem becomes particularly evident when thenumb er of classes is increased. Our second classification metho d,

the predictor design pr it should be noted that asuboptimality is introduced when all training vectors which belongto a given class are collected first and then used as training data to

class, the outer producto update an estimate rrent and previous vector is usedvariance matrix of the i-th class.

the actual encoding process, the predictor is selected by exhaus-rs to find the one yielding thesmallest prediction error.diction gain versus the numberof bits used for specifying a class using both of above classificationmethods. As sho wn in the figure, the prediction is indeedimproved by using this switched-adaptation approach.

h

vea,Ec

ac

0 2 4 E 0Number of B i t s t o Specify Predictor

Figure 2 Open-LoopSpecifying A Class f SIVP vs. Number of Bits

Another advantage of U tched prediction is that itreduces the error propag inherent in DPCM systems.In applying switched prediction, we observed that prediction errorscan sometimes be larger than the signals themselves, although the


3/4

average prediction gain of any class is always greater than unity.We can improve this situation by adding another class which canbe considered to have a zero prediction matrix. A vector will beclassified into this class if its norm is less than the norm of thesmallest prediction error vector obtained using the non-zero predic-tion matrix. Each time this kind of input vector is detected, thevector is quantized directly and sent to the receiver. According tothe class indicator, the receiver will either reconstruct the vector byadding the past value to the just-received prediction error vector, orit will set the reconstructed vector equal to the just-received vector.In the latter instance, the decoder's memory will be set equal to theencode r's memory, and eff ects of channel bit errors prior to thecurrent frame will be corrected.4. QUANTIZATION OF THE PREDICTION ERROR

In SIVP, the dynamic range of the components of the predic-tion error vector is greatly reduced in com2arison with the range ofthe actual LPC parameters themselves. Furthermore, correlationsbetween the different components of a prediction error vector aregenerally reduced as well. This property makes it feasible to usescalar quantization on each prediction error component since thedrop in performance will not be too high compared with using vec-tor quantization.With scalar quantization, each individual prediction errorcomponent is quantized by a scalar quantizer and a nonuniform bitallocation is used for the different components according to therelative perceptual importance of the different line spectral frequen-

cies. We hav e experimented with both fixed and adaptive bit allo-cation for the scalar quantization scheme.Using a fixed bit allocation, the same bit assignment to eachLSF is used for all speech frames. Since each prediction errorcomponent appears to have a Gamma like distribution, we initiallyadopted the pdf-optimized nonuniform scalar quantizer designed forthe Gamma distribution [7]. Subsequently, we found that Lloyd-Max optimal scalar quantizers give better performance for the errorcomponents.In the adaptive scheme, the bit allocation is changed to distri-bute bits more heavily in perceptually-important bands of thespeech spectrum. The properties of LSF's provide an easy way toperform the adaptive bit allocation.In [5], a weighting factor proportional to the spectral sensi-tivity of each LSF is computed, and a weighted squared error cri-terion measuring the perceptual difference between the original andquantized LSP vector is defined as:

d,(") = C [ w , ( i ) q n ( i ) l * , (6)where 4 n ( i ) denotes the quantization error of the r-th LSF in then-th frame and w,( i ) denotes the associated weighting factor. Ou rgoal is to find the optimal bit allocation for a given vector suchthat d,") is minimized. Since it is difficulty to establish the exactrelation between instantaneous quantization error and the quantizerinput, we use an expression similar to the standard formula formean-squared quantization error [7] as an estimate of the squaredquantization error. This estimate is expressed as:

m

,= l

(7 )where R , ( i ) is the number of bits allocated to the i-th componentand cse (i ) is the standard deviation of the i -th prediction error com-ponent (the quantizer input). Both a and E are constant dependingon the pdf of the quantizer input signal. The value of a is adjustedempirically to find better match between the estimate and truevalue. Substituting Eq. (7 ) into Eq. (6), and minimizing d;") withthe constraint that the total number of bits per vector are equal toR , the bit allocation for the current LSP vector can be easilysolved:

Th e o.(z) values can be estimated off line and w n ( i ) s computedon line from the given LSP vector. To avoid transmitting any sideinformation describing the bit allocation, the predicted LSP vector(present in both the encoder and decoder) is used to compu te theweighting factors.If SIVP is combined with vector quantization, the totalnumber of bits required for encoding the LPC parameters isreduced even further. To avoid excessively large VQ codebooks,we correspondingly increase the number of bits specifying a pred-ictor and exploit the sparse property of prediction matrices to keepthe computational cost reasonable. The codebook design andsearch procedure are based on the weighted MSE distortion cri-terion.

Fig. 3 summarizes the objective performance (speech segmen-tal SNR) for the different quantization methods. The correspon-dence between objective and subjective performance measures isnot perfect, and listening tests should preferably be used for furtheroptimization. Synthe tic speech segments were produced using theunquantized residual, derived by inverse filtering the speechwaveform, to excite the quantized LPC synthesis filter. In thesimulations, 11.3 minutes of sampled speech data were used as thetraining set for the predictor and quantizer design, and speech dataoutside of the training set were used for the performance tests. Afour-class switched predictor was chosen. In the Fig. 3, SQ-Gamma indicates that the Gamma scalar quantizer was used;correspondingly, SQ-Lloyd, SQ-Lloyd-ABA and VQ designatesrespectively the Lloyd-Max scalar quantizer, the Lloyd-Max scalarquantizer incorporating the adaptive bit allocation and vector quan-tizer. Clearly , vector quantization outperforms all scalar quantizersat the same bit-rate. Adaptive bit allocation appears to be particu-larly effective for very low bit-rate encoding. Through informallistening tests, we found that when SIVP is combined with scalarquantization (SIVP-SQ), synthetic speech which is negligiblydifferent from the original can be achieved using a total of 26 bitsper frame (or 1.3 kbps) to encode the LPC spectral parameters.Combining SI W with vector quantizauon (SIVP-VQ), a total of 20bits per frame (or 1 kbps) is sufficient without introducing any per-ceivable quantization noise.

c /--415 20 25

B i t - R a t e (bi is /frame)Figure 3 Speech SNRSEG vs. Rate for SIVP-SQ and SIVP-VQ5. COMPARISON WITH OTHER METHODS

Two different 4.8 kbps speech coder algorithms have beendeveloped in our laboratory [1],[8]. In order to improve codedspeech quality, intensive studies have been conducted for manydifferent LPC parameter encoding methods. Among them, Two-

404


4/4

Stage LSPVQ [8] and scalar quantization based on formant JND's[9] were found to be more effective at bit-rate appropriate for 4. 8kbps than other more established methcds. In this section, wecompare S I W with the above two methods as well as several otherLPC parameter encoding techniques. The four-class SIVP systemincorporating the Lloyd-Max scalar quantizers with the fixed bitllocation is taken for comparison.The trajectories of unquantized and quantized LSF's usingSIVP and the Two-Stage LSPVQ at 20 bits per frame have beenshown in our previous paper [ I ] and the performance advantage ofSIVP over the Two-Stage LSPVQ indicates that scalar quantizationmbined with interframe prediction can outperform memoryless

. 4 shows LSP trajectories obtained by SIVP and JNDion, both using 2 4 bits per frame. Again, it is easy toobserve that the quantized LSF's are closer to the original withI ' ' ' ' I ' " ' l ' ~ ' ' ~ ~ ' " I " " ~

0 10 20 30 40 50

10 20 30 40Time (frame)4 Trajectories of First Five Unquantized and Quantized24 bit per frame) a) SIVP-SQ b) JND Scalar Quantizer

Figure 5 Speech SNRSEG vs. Rate for VXCt It should be noted that the onginal design of the JND quantlzatlon table utilized voicedspeech only and also the training set used for the table design was taken from speech sam-pled at 10 kHz rather than 8 kHz as in our case Therefore, the dma use of the table in our

Finally, Fig. 5 presents a plot ofbits/frame) for several LPC parameter quantization schemes appliedto Vector Excitation Coding algorithm (VXC) [I]. In this example,the LPC residual signal was vector quantized at 0.25 bitslsample,the pitch was quantized at 7 bitdframe, and the pitch predictoralong the right vertical axis.tion of inverse sine transformed reflection coefficients (SQTR)(without use of interframe prediction) by a substantial margin, par-ticularly for a 100 Hz fra rate where higher frame-to-ver, as expected, the per-iminishes for higher ratesparameters) is approached.uantization at the same rate.a-Saito (IS) method [IO] inthe figure as a reference condition.6. CO N CL U D IN G REMARK

In this paper, we have dplexity LPC parameter encod , and demonstrated that itis indeed superior to other more established methods that do notuse interframe prediction. the SIVP-SQscheme has been incorpora 8 kbps speechcoder algorithms developed algorithms willsoon be tested in field mal atellite Experi-ment using a hardware pro mpleted in ourlaboratory. Thus, SIV P has been validated in real-time hardwarewith regard to performance and complexity. The LSP parameterset has been found to be a particularly attractive choice for usewith SIVP.References1. G. Davidson, M. Yong, and 'Real-Time Vector Excita-tion Coding of Speech at 48 C . ICASSP, pp. 2189-2192,Dallas, April 1987.

eters for Low Bit R2184, Dallas, April 1V. Cuperman and A. Gersho, "July 1985.

2. Y. Shoham, "Vector tion of the Spectral Param-' Proc. ICASSP, pp. 2181-ctive Coding of Speech. at 16 kbitsh," IEEE Trans. on COM-33, pp. 685-696,

4. F. Itakura, "Line S sentation of Linear Predictive

5. G. S . Kang an sen, "Low-Bit-Rate Speech Encoders

'ance and AutocorrelationProc. ICASSP, pp. 1545-7. N. S . Jayant and P. oaing of Waveform, Prentice-

1548, Dallas, April 1987.Hall, Inc., Englewood

g," Proc. ICASSP, pp. 2185-PC Quantization Based on, vol. ASSP-34, no. 4, pp.

A. Buzo, A. H. Gray, R. M. Gray, and J. D. Markel, "Speech C d -ing Based upon Vector Quantization," IEEE Trans. on ASSP, vol.ASSP-28, no. 5, pp. 562-574, October 1980.

9. 0. Ghitz

10.

envimnment could lead to infen or quality

405

Encoding of Lpc Spectral Parameters

Documents

Transcript of Encoding of Lpc Spectral Parameters