Reduced complexity two stage vector quantization

15
Digital Signal Processing 19 (2009) 476–490 Contents lists available at ScienceDirect Digital Signal Processing www.elsevier.com/locate/dsp Reduced complexity two stage vector quantization Saikat Chatterjee , T.V. Sreenivas Department of Electrical Communication Engineering, Indian Institute of Science, Bangalore 560 012, India article info abstract Article history: Available online 3 December 2008 Keywords: Structured vector quantization LSF parameter quantization Weighted square Euclidean distance We address the issue of complexity for vector quantization (VQ) of wide-band speech LSF (line spectrum frequency) parameters. The recently proposed switched split VQ (SSVQ) method provides better rate–distortion (R/D) performance than the traditional split VQ (SVQ) method, even at the requirement of lower computational complexity, but at the expense of much higher memory. We develop the two stage SVQ (TsSVQ) method, by which we gain both the memory and computational advantages and still retain good R/D performance. The proposed TsSVQ method uses a full dimensional quantizer in its first stage for exploiting all the higher dimensional coding advantages and then, uses an SVQ method for quantizing the residual vector in the second stage so as to reduce the complexity. We also develop a transform domain residual coding method in this two stage architecture such that it further reduces the computational complexity. To design an effective residual codebook in the second stage, variance normalization of Voronoi regions is carried out which leads to the design of two new methods, referred to as normalized two stage SVQ (NTsSVQ) and normalized two stage transform domain SVQ (NTsTrSVQ). These two new methods have complimentary strengths and hence, they are combined in a switched VQ mode which leads to the further improvement in R/D performance, but retaining the low complexity requirement. We evaluate the performances of new methods for wide-band speech LSF parameter quantization and show their advantages over established SVQ and SSVQ methods. © 2008 Elsevier Inc. All rights reserved. 1. Introduction Vector quantization (VQ) is a fundamental, yet powerful technique for signal compression which has the promise of achieving rate–distortion (R/D) bound. In addition, VQ provides an opportunity to use a perceptually relevant distance measure instead of the usual square Euclidean distance (SED) measure. But, the use of a full search VQ is limited because of its enormous computational and memory complexities. The complexity issues become more serious for high quality (high bit-rate) applications, such as for wide-band speech and audio. We have shown that at least 36 bits/vector is required for high quality quantization of wide-band speech LSF parameters [34]. At this bitrate, the full search VQ incurs prohibitive complexity. Currently, to overcome this complexity limitation, there is much interest in designing efficient structured VQ methods for quantizing the wide-band speech LSF parameters [25,29,31,33]. For quantizing the telephone-band speech LSF parameters at a moderate complexity, split VQ (SVQ) method was pro- posed by Paliwal and Atal [6]. This SVQ method has been a de-facto standard in several telephone-band speech coding applications, such as IS-136, G.723.1, etc. [19]. The use of SVQ is further investigated to quantize the LSF parameters for wide-band speech and audio signals [8,12,21]. In SVQ, the full LSF vector is split into sub-vectors and then, each sub-vector is * Corresponding author. Fax: +91 80 2360 0683. E-mail addresses: [email protected] (S. Chatterjee), [email protected] (T.V. Sreenivas). 1051-2004/$ – see front matter © 2008 Elsevier Inc. All rights reserved. doi:10.1016/j.dsp.2008.11.008

Transcript of Reduced complexity two stage vector quantization

Digital Signal Processing 19 (2009) 476–490

Contents lists available at ScienceDirect

Digital Signal Processing

www.elsevier.com/locate/dsp

Reduced complexity two stage vector quantization

Saikat Chatterjee ∗, T.V. Sreenivas

Department of Electrical Communication Engineering, Indian Institute of Science, Bangalore 560 012, India

a r t i c l e i n f o a b s t r a c t

Article history:Available online 3 December 2008

Keywords:Structured vector quantizationLSF parameter quantizationWeighted square Euclidean distance

We address the issue of complexity for vector quantization (VQ) of wide-band speech LSF(line spectrum frequency) parameters. The recently proposed switched split VQ (SSVQ)method provides better rate–distortion (R/D) performance than the traditional split VQ(SVQ) method, even at the requirement of lower computational complexity, but at theexpense of much higher memory. We develop the two stage SVQ (TsSVQ) method, bywhich we gain both the memory and computational advantages and still retain goodR/D performance. The proposed TsSVQ method uses a full dimensional quantizer in itsfirst stage for exploiting all the higher dimensional coding advantages and then, usesan SVQ method for quantizing the residual vector in the second stage so as to reducethe complexity. We also develop a transform domain residual coding method in this twostage architecture such that it further reduces the computational complexity. To design aneffective residual codebook in the second stage, variance normalization of Voronoi regionsis carried out which leads to the design of two new methods, referred to as normalizedtwo stage SVQ (NTsSVQ) and normalized two stage transform domain SVQ (NTsTrSVQ).These two new methods have complimentary strengths and hence, they are combinedin a switched VQ mode which leads to the further improvement in R/D performance,but retaining the low complexity requirement. We evaluate the performances of newmethods for wide-band speech LSF parameter quantization and show their advantages overestablished SVQ and SSVQ methods.

© 2008 Elsevier Inc. All rights reserved.

1. Introduction

Vector quantization (VQ) is a fundamental, yet powerful technique for signal compression which has the promise ofachieving rate–distortion (R/D) bound. In addition, VQ provides an opportunity to use a perceptually relevant distancemeasure instead of the usual square Euclidean distance (SED) measure. But, the use of a full search VQ is limited because ofits enormous computational and memory complexities. The complexity issues become more serious for high quality (highbit-rate) applications, such as for wide-band speech and audio. We have shown that at least 36 bits/vector is required forhigh quality quantization of wide-band speech LSF parameters [34]. At this bitrate, the full search VQ incurs prohibitivecomplexity. Currently, to overcome this complexity limitation, there is much interest in designing efficient structured VQmethods for quantizing the wide-band speech LSF parameters [25,29,31,33].

For quantizing the telephone-band speech LSF parameters at a moderate complexity, split VQ (SVQ) method was pro-posed by Paliwal and Atal [6]. This SVQ method has been a de-facto standard in several telephone-band speech codingapplications, such as IS-136, G.723.1, etc. [19]. The use of SVQ is further investigated to quantize the LSF parameters forwide-band speech and audio signals [8,12,21]. In SVQ, the full LSF vector is split into sub-vectors and then, each sub-vector is

* Corresponding author. Fax: +91 80 2360 0683.E-mail addresses: [email protected] (S. Chatterjee), [email protected] (T.V. Sreenivas).

1051-2004/$ – see front matter © 2008 Elsevier Inc. All rights reserved.doi:10.1016/j.dsp.2008.11.008

S. Chatterjee, T.V. Sreenivas / Digital Signal Processing 19 (2009) 476–490 477

quantized independently. The approach of independent quantization leads to a coding loss as the statistical inter-dependencebetween the sub-vectors is not exploited. This coding loss is referred to as split loss in the literature [26]. To recover thesplit loss, So and Paliwal [27,29,31,32] have recently proposed switched split VQ (SSVQ) method where multiple SVQs aredesigned for different regions of the signal pdf. The SSVQ is shown to provide better R/D performance than SVQ, evenat the reduced computational complexity, but at the expense of higher memory requirement. In a comparative study forwide-band speech LSF quantization problem, So and Paliwal [31] have shown the superiority of SSVQ over several existingquantization methods. The improved R/D performance of SSVQ can be attributed to the ability of utilizing the code-vectorsmore efficiently within a local region of the signal pdf. However, the use of multiple SVQs has resulted in a much highermemory requirement in SSVQ method.

From the VQ literature [5], it is known that the multistage VQ (MSVQ) is a powerful product code VQ method thathelps us to reduce both the computational complexity and the memory requirement. The use of MSVQ is investigated fortelephone-band speech LSF quantization in [7] and in G.729 [19] codec. For wide-band speech, AMR-WB codec [36] usesa split MSVQ (S-MSVQ) method to quantize the 16-dimensional intermittence spectrum frequency (ISF) parameters. In theS-MSVQ, the full vector is split into sub-vectors and then, each sub-vector is quantized independently using MSVQ. Thecombination of SVQ and MSVQ in S-MSVQ leads to a larger reduction in complexity, but at the cost of split loss. Unlikethe combination of SVQ and MSVQ in S-MSVQ, we develop a two stage split VQ (TsSVQ) method in which the first stagequantizer is a full dimensional quantizer for exploiting the global dependencies existing between the vector components[2] and the second stage quantizer is an SVQ to reduce the complexity. The TsSVQ is further improved by exploiting theproperties of the local region of a signal pdf space in the way different from SSVQ. We show that two new schemes can bedeveloped using this approach, viz., normalized two stage split VQ (NTsSVQ) and normalized two stage transform domainsplit VQ (NTsTrSVQ). Even in these two new schemes, they have complementary strengths of low memory requirement inNTsSVQ and low computational complexity in NTsTrSVQ. A combination of both the techniques results in the switched twostage SVQ (STsSVQ) method which provides considerable improvement in R/D performance compared to the SVQ and SSVQmethods. However, the computational complexity of STsSVQ is lower than SVQ and comparable to SSVQ, while the memoryrequirement is much lower than SSVQ. Hence, overall, comparing R/D performance and complexity (both computational andmemory), the STsSVQ method stands better than the SVQ and SSVQ methods.

The paper is organized as follows: a brief review of structured VQ methods along-with a literature survey for LSF quanti-zation is given in Section 2; Sections 3, 4, 5 and 6 describe respectively the proposed TsSVQ, NTsSVQ, NTsTrSVQ and STsSVQmethods. Experimental results for wide-band speech LSF quantization are provided in Section 7 along-with a comparisonwith other established quantization schemes.

2. Structurally constrained VQ

Several structured VQ techniques have been developed [5] which apply various constraints to the VQ codebook andhence result in less complexity, although show poorer R/D performance than a full search VQ. The important structuredVQ methods are: tree-structured VQ, MSVQ, product VQ, classified VQ, etc. In tree-structured VQ, we exploit the inherentstructure of the clusters in the vector space. The tree-structure permits parent node in the tree to encompass the vectorspace spanned by the child nodes and thus quantization search complexity is reduced through a vertical search of the tree,rather than a horizontal search of all the leaf nodes. The tree-structure can be balanced/unbalanced, binary/ternary split ateach node, etc. Clearly, the price paid for reduced computational complexity is the increment in memory and the poorerR/D performance than a full search VQ.

In MSVQ (also referred to as residual VQ), the quantization is carried out through the use of multiple stages where theresidual vector of each stage is quantized by the next stage. For the same bit-rate, it can be shown that there is a reductionof both the memory and computation in MSVQ, unlike the case of tree-structured VQ. Because of the residue operation, thelater stages have less and less redundancy and the residue pdfs tend to be uncorrelated ones. However, in each stage, the VQwould be full dimensional as the dimension of the signal vector, exploiting the higher dimensional coding advantages [2].It is well known that the increase in number of stages leads to a decrease in complexity, but at the expense of introducingmore sub-optimality in the sense of degraded R/D performance [5].

In product VQ, the signal vector is decomposed into specific sub-vectors which could be later combined to get back theoriginal vector exactly. The sub-vectors are usually of lower dimension or lower degree of freedom, which helps us to re-duce the computational complexity and memory requirement. Also, the sub-vectors are quantized independently, exploitingspecific perceptual properties, such as in shape-gain VQ [5]. The SVQ [6] is also a product VQ in which the sub-vectorssimply concatenate to form the full vector. It is mentioned that more partitions in SVQ would certainly solve the complexityproblem, but might provide severely degraded R/D performance when there exists substantial statistical inter-dependencebetween the sub-vectors [5,26].

In classified VQ, we recognize the time-varying nature of the signal pdf, which is first determined using a classificationscheme and then an optimum VQ is used to suit the localized signal pdf. This approach can be viewed as an adaptivescheme, where both the forward and backward adaptations are possible. Often, forward adaptation is used for better perfor-mance. The recently proposed SSVQ [27,29,31,32] is such a forward adaptation scheme by which better performance thanSVQ (product VQ) is achieved, even with the requirement of lower computation, but at the expense of higher memory. Ingeneral, it can be shown that the adaptive quantizers can provide better R/D performance than fixed quantizers at the same

478 S. Chatterjee, T.V. Sreenivas / Digital Signal Processing 19 (2009) 476–490

bit rate, but at the cost of higher system complexity. However, for SSVQ, the inherent product code structure reduces thesearch complexity because of the two-level operation.

It is clear that the four structured VQ schemes have complementary advantages and can be combined in different ways,leading to many different structured VQ methods. The SSVQ is such a scheme combining the classification approach and theproduct VQ (SVQ) approach. There are other methods of combining, such as tree-structure VQ and MSVQ [5].

We recognize that a two stage VQ (in the multistage approach) can also be viewed as a classified VQ, exploiting thelocal signal pdf property. This two stage architecture was successfully used by Pan and Fischer [10,16] for achieving betterR/D performance. Further, we can easily parameterize the local signal pdf (multi-variate distribution) and achieve significantadvantages in R/D performance and computational complexity. While the product VQ (SVQ) approach is the most commoncomplexity reduction approach for certain specific applications, we can expect better R/D performance by introducing thetransform domain quantization [35] which fully exploits the linear dependency (correlation) of the signal as well as reducesthe computational complexity significantly. With the observations and ideas, we develop the new methods using a twostage architecture. The proposed TsSVQ is further improved using variance normalization which is tuned to the localizedregion of signal vector space. The second stage quantizer is implemented using either SVQ in TsSVQ and NTsSVQ methodsor transform domain SVQ [35] in NTsTrSVQ method. Further, NTsSVQ and NTsTrSVQ methods are combined to form anothernew method which is referred to as STsSVQ.

2.1. Literature review of LSF quantization

Various forms of sub-optimal structured VQs [4], such as tree-search [3], classified VQ [14], MSVQ [7], are proposed inthe past for telephone-band speech LSF coding and then, further extended to the case of wide-band speech [9,18]. The mostcited, successful and practically used technique of LSF quantization is SVQ [6]. A two sub-vector SVQ is proposed in [6]which showed better R/D performance than a two stage VQ (i.e. MSVQ method with two stage implementation) even at alower computational complexity and memory requirement. But, this two sub-vector SVQ method still renders huge compu-tational complexity and hence, a three sub-vector SVQ technique is used in different telephone-band speech coders, such asIS-136, G.723.1, etc. [19]. For wide-band speech, Lefebvre et al. [8] and Chen et al. [12] used a seven part SVQ operating at49 bits/frame to quantize 16 LPC parameters; high quality coding results were reported by Biundo et al. [21] for a four orfive sub-vector SVQ at 45 bits/frame. For telephone-band speech, efficient MSVQ, with M-candidate search and joint designof codebooks, is proposed in [7] which shows better R/D performance than a two sub-vector SVQ method. Further researchis still going on to improve the performance and robustness issues of MSVQ method, such as in [23]. Among the recenttechniques, a split-multistage VQ (S-MSVQ) with MA predictor is used to quantize the LPC coefficients in adaptive multi-rate wide-band speech coder (AMR-WB) [36]. The GMM based quantization method [22] is proposed for LSF quantizationin telephone-band speech coding application, which shows a comparable R/D performance with the two sub-vector SVQ [6]method at a fraction of computational complexity and memory requirement. This GMM based method is further extendedto the multi-frame GMM based block quantization method for coding the LSF parameters of both telephone-band and wide-band speech [28,30]. The multi-frame GMM based block quantization method provides improved R/D performance, but atthe expense of higher coding delay which may not be acceptable in two-way communication application. Other quantizationschemes reported include the predictive Trellis coded quantizer [25] and the HMM based recursive quantizer [24]. The SSVQis proposed for telephone-band speech in [27,32] and further extended for wide-band speech application in [29,31]. It isalready mentioned that the SSVQ provides better R/D performance than the traditional SVQ method, even at lower compu-tational complexity, but at the expense of considerable increment in memory. It is shown in [32] that a three-part SSVQwith 8 switching directions is the best coder with a trade-off between R/D performance, computational complexity andmemory requirement for telephone-band speech application. Three-part SSVQ with 8 switching directions performs betterthan a memory-less GMM based block quantizer [32]. However, for wide-band speech coding application, it is shown that[31] the five-part SSVQ with 8 switching directions is slightly inferior to the memory-less GMM based block quantizer inthe sense of R/D performance; but, the five-part SSVQ requires nearly one-tenth of the complexity of the GMM based blockquantizer [31].

3. Two stage split VQ

In a traditional two stage VQ method, both the stages use full dimensional VQ. Let us assume that the first stage quantizer(Q1) is allocated b0 number of bits, by which M number of Voronoi regions (i.e. M = 2b0 ) are formed in the original vectorspace, whose centroids are the reconstruction vectors as {μk}M

k=1. Unlike a basic two stage VQ method in which the residualvector, U = X − μk , is coded directly using a full dimensional quantizer in second stage (Q2), we split the residual vectorin smaller dimensions and then use an SVQ technique to quantize; this proposed method is referred to as two stage splitVQ (TsSVQ). Thus, all the higher dimensional coding advantages [2] of VQ are exploited in the first stage (Q1) using afull dimensional VQ and the requirements of lower computational complexity and memory are achieved by using an SVQmethod in the second stage (Q2). In the second stage, the residual vector is statistically less correlated and hence, the useof an SVQ technique for quantizing the residual vector will not result in a considerable amount of split loss.

For the MSVQ method, it can be shown (using Lagrange analysis) that the complexity (both computational and memory)becomes minimum when the bits are allocated uniformly across the stages. Also, the overall complexity will decrease by

S. Chatterjee, T.V. Sreenivas / Digital Signal Processing 19 (2009) 476–490 479

Fig. 1. Two stage split VQ (TsSVQ) method where the second stage quantizer (Q2) is an SVQ.

increasing the number of stages, but at the cost of degraded R/D performance. Thus, we restrict to two stages in the TsSVQmethod. Considering the TsSVQ method, we are being able to exploit both the full dimensional signal pdf structure atQ1 using the full dimensional quantizer and the lower complexity by using the SVQ at Q2. The use of a full dimensionalquantizer in the first stage allows us to exploit the global dependencies existing between the vector components [32]. Now,considering the strategy of allocating the bits between two stages, we can see that if higher number of bits is allocatedto the first stage then the performance of developed method may tend towards the performance of a full search VQ andhence, the method may provide better R/D performance at a particular bit-rate. However, to minimize the computationalcomplexity, we can determine the optimum bit allocation strategy which is shown in Section 3.3.

A block diagram of TsSVQ method is shown in Fig. 1, consisting of two parts: a full dimensional quantizer in Q1 andan SVQ quantizer in Q2. Let, X be the p-dimensional vector which is quantized to μk using Q1 and thus, belongs to thekth Voronoi region of signal vector space. We note that the Q1 is acting like a Voronoi region selection block in the fulldimensional signal space. The residual vector in second stage is given as

U = [X − μk], (1)

where μk is the mean vector of the kth Voronoi region. In the second stage, the residual vector U is quantized using anSVQ. Thus, the quantization informations are μk and U.

3.1. TsSVQ codebook training

The LBG algorithm [1] is first applied on the full training database to produce the M optimum code-vectors of Q1, i.e.{μk}M

k=1. All the training vectors are then quantized at Q1 using SED measure and the residual vector training database iscreated using Eq. (1). The SVQ codebook for the second stage quantizer (Q2) is designed using LBG algorithm where theresidual vector (U) is split into S number of sub-vectors.

3.2. Weighted square Euclidean distance

In this paper, we evaluate the performance of the proposed methods for LSF quantization. In the context of LSF quanti-zation, it is common to use the weighted square Euclidean distance (WSED) measure [6,11]. For the nth speech frame, letus allow to introduce the subscript ‘n’ for denoting the LSF vector as Xn . Now, the WSED, between the input LSF vector (Xn)and the coded LSF vector (Xn), is defined as

d(Xn, Xn) = [Xn − Xn]T Wn[Xn − Xn] =p∑

i=1

wn,i(Xn,i − Xn,i)2, (2)

where Wn is a diagonal weighting matrix with diagonal elements as {wn,i}pi=1. In the case of LSF quantization, Wn is

dependent on the nth LSF vector Xn . Throughout this paper, we use the spectral sensitivity coefficients as the weightingvalues [11]. It is observed that the WSED is a separable distance measure and hence, can be easily used for quantizing thesub-vectors in SVQ method.

From Eq. (1), we observe that the nth LSF vector can be formed as Xn = Un +μk; so the decoded vector at the receiver isrealized as Xn = Un +μk . Thus, for the nth LSF vector, we can simplify the WSED measure of Eq. (2) in terms of the originaland coded residual vectors (Un and Un) at second stage as

d(Xn, Xn) = [Xn − Xn]T Wn[Xn − Xn]= [

(Un + μk) − (Un + μk)]T

Wn

× [(Un + μk) − (Un + μk)

]= [Un − Un]T Wn[Un − Un]. (3)

Thus, the second stage residual vector is required to be quantized using the following WSED measure, given as

d(Un, Un) = [Un − Un]T Wn[Un − Un] =p∑

i=1

wn,i(Un,i − Un,i)2. (4)

From the VQ literature [5], it is well known that the quantization error incurred in the last stage of MSVQ method is theover-all quantization error of the vector to be coded. Thus, for encoding a vector, if the distance measure be used as the

480 S. Chatterjee, T.V. Sreenivas / Digital Signal Processing 19 (2009) 476–490

WSED measure of Eq. (2), then in the case of TsSVQ method, the Voronoi region is selected using the common SED measurein Q1 and the residual vector (Un) is coded using the WSED measure of Eq. (4) in Q2.

3.3. Complexity of TsSVQ with optimum bit allocation

For a p-dimensional vector, let the bits allocated to Q1 be b0 (i.e. M = 2b0 ); also let, the sub-vector dimensions in Q2 are{pi}S

i=1 (S number of sub-vectors in Q2) along-with corresponding bit allocations as {bi}Si=1 such that p = ∑S

i=1 pi and if

the total allocated bits/vector is b, then b = ∑Si=0 bi . Using the WSED of Eq. (2) as the distance measure to encode a vector,

the total required computation (in flops),1 for Voronoi region search using SED measure, mean subtraction to produce theresidual vector, SVQ codebook search using WSED measure of Eq. (4) and mean addition for reproduction, is given as

CTsSVQ = (3p2b0 + 2b0

) + p +S∑

i=1

(4pi2

bi + 2bi) + p

= 2p + (3p + 1)2b0 +S∑

i=1

(4pi + 1)2bi . (5)

The optimum bit allocation is decided by minimizing the total required computational complexity subject to the constraintof fixed bit budget as follows:

minbi

{CTsSVQ = 2p + (3p + 1)2b0 +

S∑i=1

(4pi + 1)2bi

}subject to

S∑i=0

bi = b. (6)

The optimum bit allocation scheme that minimizes the total required computational complexity CTsSVQ, subject to the con-straint of fixed bit budget,

∑Si=0 bi = b, is given as

b0 = 1

S + 1

[b + log2

[(3p + 1)

S∏j=1

(4p j + 1)

]]− log2(3p + 1),

bi = 1

S + 1

[b + log2

[(3p + 1)

S∏j=1

(4p j + 1)

]]− log2(4pi + 1), 1 � i � S. (7)

Proof. Let us consider that

L = CTsSVQ + λ

(S∑

i=0

bi − b

)= 2p + (3p + 1)2b0 +

S∑i=1

(4pi + 1)2bi + λ

(S∑

i=0

bi − b

).

Using partial differentiation ∂L∂bi

= 0, we get

b0 = log2

[− λ

ln 2

]− log2(3p + 1),

and

bi = log2

[− λ

ln 2

]− log2(4pi + 1), 1 � i � S.

Summing ∀i and using∑S

i=0 bi = b and dummy variable j, we get

S∑i=0

bi = (S + 1) log2

[− λ

ln 2

]− log2

[(3p + 1)

S∏j=1

(4p j + 1)

]

or

log2

[− λ

ln 2

]= 1

S + 1

[b + log2

[(3p + 1)

S∏j=1

(4p j + 1)

]].

1 It is assumed that each operation like addition, subtraction, multiplication, division and comparison needs one floating point operation (flop). Suppose,the SED measure is used; then calculation of the distance between the vector and a code-vector needs p subtraction, p squaring operation and p addition(it may seem p − 1 additions; but from the loop implementation, it is p addition). Also, for a b bit quantizer, there are 2b number of code-vectors and thus2b comparisons are necessary. Thus the total computation required is: 3p2b + 2b flops. For WSED measure, total computation required is 4p2b + 2b flops.

S. Chatterjee, T.V. Sreenivas / Digital Signal Processing 19 (2009) 476–490 481

Substituting log2[− λln 2 ] appropriately, we get the optimum bit allocation formula as shown in Eq. (7).

On the other hand, the total required memory (in floats) to store the Voronoi region mean vectors (in Q1) and the SVQcodebook (in Q2) is given as

MTsSVQ = p2b0 +S∑

i=1

pi2bi . (8)

To quantize a p-dimensional input vector with allocated b bits/vector, a full search VQ requires (3p + 1)2b flops using SEDand (4p + 1)2b flops using WSED for searching the nearest code-vector among 2b code-vectors. Also, the full search VQneeds p2b floats as the required memory to store the codebook. For example, if p = 10 and b = 20 (i.e. 2 bits/sample),then, for a full search VQ, the required search complexity using WSED measure is (4 × 10 + 1)220 flops and the requiredmemory is (10 × 220) floats, both are prohibitively large. On the other hand, for TsSVQ method, suppose the residual vectorin second stage is split into two sub-vectors of each having dimension 5. Let the bit allocation to Q1 is 6 and to each of thesub-vectors is 7. Therefore, the required computational complexity using WSED measure is (2 × 10) + [(3 × 10 + 1) × 26] +[(4 × 5 + 1) × 27 + (4 × 5 + 1) × 27] = 7.24 × 210 flops. The required memory is 10 × 26 + (5 × 27 + 5 × 27) = 1920 floats.This example depicts that the TsSVQ reduces the computational complexity and memory requirement than a full search VQby a huge extent. �4. Normalized two stage split VQ

In the first stage of TsSVQ method, the codebook consists of the M mean vectors corresponding to the M Voronoiregions in the signal vector space. Thus, the residual vector database, at the output of first stage, can be depicted asan union of all the mean removed Voronoi regions having different covariance structures. Naturally, to design an effectiveresidual codebook at the second stage, it is necessary to incorporate a variance normalization procedure so that all the meanremoved Voronoi regions have unity variance along all the dimensions. For example, Fig. 2(a) shows a data distributionconsisting of two Gaussians. Suppose the first stage quantizer is allocated 1 bit and thus, the corresponding codebookconsists of two codevectors which are the mean vectors of two Gaussians. Now, the residual vector database consists of twomean removed Gaussians as shown in Fig. 2(b) where the denser region depicts the Gaussian with lower covariance spread.If the second stage quantizer is allocated 5 bits, then 32 code-vectors are designed using the residual vector database andthe designed code-vectors are also shown in Fig. 2(b). It is seen that the Gaussian with smaller covariance spread is unableto use all the code-vectors effectively. Therefore, a variance normalization procedure, specific to each of the Gaussians (i.e.for each of the Voronoi regions in first stage), is carried out to get a compact residual vector database such that each ofthe mean removed Voronoi regions has unity variance along all the dimensions and thus, the design of an effective secondstage codebook is facilitated. The variance normalized residual vector database and the associated code-vectors are shownin Fig. 2(c) which illustrates the importance of Voronoi region specific variance normalization procedure. Thus, we improvethe performance of TsSVQ method by proposing the normalized two stage split VQ (NTsSVQ) method.

Let, X be the p-dimensional vector which is quantized to μk using Q1 and thus, belongs to the kth Voronoi region ofsignal vector space. Then, the variance normalized residual vector in second stage is given as

V = ΛkU = Λk[X − μk], (9)

(a) (b) (c)

Fig. 2. Importance of variance normalization of Voronoi regions in NTsSVQ method. (a) A pdf consisting of two Voronoi regions characterized by twodistinct Gaussians; red colored, thick dots represent the Gaussian with lower covariance spread. (b) Residual vector database consisting of two meanremoved Voronoi regions and the associated 32 code-vectors represented by ‘∗’ marks. (c) Residual vector database consisting of two mean removed andvariance normalized Voronoi regions and the associated 32 code-vectors represented by ‘∗’ marks. (For interpretation of the references to color in thisfigure legend, the reader is referred to the web version of this article.)

482 S. Chatterjee, T.V. Sreenivas / Digital Signal Processing 19 (2009) 476–490

where μk and Λk are respectively the mean vector and variance normalization matrix of the kth Voronoi region; Λk =diag[{1/σk,i}p

i=1], where {σk,i}pi=1 are the standard deviations of the data for kth Voronoi region. Thus, the quantization

informations are μk and V. From Eq. (9), we can write the original vector as X = [Λk]−1V + μk and thus, the decodedvector is realized at the receiver as X = [Λk]−1V + μk . At the receiver, the decoder first decodes the index of μk (i.e. thevalue of k) and then a lookup table is searched to find the diagonal values of Λk matrix for reconstructing the quantizedvector.

4.1. NTsSVQ codebook training

The LBG algorithm [1] is first applied on the full training database to produce the M optimum code-vectors of Q1, i.e.{μk}M

k=1. All the training vectors are then classified using SED measure and the Voronoi region specific variance normal-ization matrices, {Λk}M

k=1, are evaluated using the classified data. Then the new training database of normalized residualvector is created using Eq. (9). For the second stage quantizer, the SVQ codebook is designed using LBG algorithm wherethe normalized residual vector (V) is split into S number of sub-vectors.

4.2. Voronoi region specific WSED

Let us allow to use the subscript ‘n’ for denoting the nth speech frame LSF vector as Xn . For quantizing the input vectorusing the WSED measure of Eq. (2), a new distance measure is derived for coding the normalized residual vector Vn in thesecond stage of NTsSVQ method. We simplify the WSED measure of Eq. (2) in terms of the original and coded normalizedresidual vectors (Vn and Vn) as

d(Xn, Xn) = [Xn − Xn]T Wn[Xn − Xn]= [([Λk]−1Vn + μk

) − ([Λk]−1Vn + μk

)]TWn

[([Λk]−1Vn + μk

) − ([Λk]−1Vn + μk

)]= [Vn − Vn]T [[Λk]−1]T

Wn[Λk]−1[Vn − Vn]= [Vn − Vn]T On,k[Vn − Vn], (10)

where On,k is defined as On,k = [[Λk]−1]T Wn[Λk]−1. Thus, the second stage residual vector is required to be quantized usingthe following WSED measure, given as

d(Vn, Vn) = [Vn − Vn]T On,k[Vn − Vn]. (11)

In Eq. (11), the new weighting matrix, denoted by On,k , is dependent on the nth vector weighting matrix (Wn) and variancenormalization matrix of the kth Voronoi region (Λk). It is also observed that both the matrices, Wn and Λk , are diagonaland hence, the matrix On,k is also diagonal. Therefore, the distance measure of Eq. (11) is further simplified as

d(Vn, Vn) = [Vn − Vn]T On,k[Vn − Vn] =p∑

i=1

on,k,i(Vn,i − Vn,i)2, (12)

where {on,k,i = σ 2k,i wn,i}p

i=1 are the new Voronoi region specific weighting coefficients. For quantizing the nth LSF vectorwhich belongs to the kth Voronoi region, this modified WSED measure is used in second stage to code the normalizedresidual vector coefficients. Thus, the index of Voronoi region mean vector is found using the SED measure in Q1 and thenormalized residual vector (Vn) is quantized in the second stage using the Voronoi region specific WSED measure of Eq. (12).An important point to mention is that the WSED measure of Eq. (12) remains as a separable distance measure and thus,can be used for SVQ in second stage.

4.3. Complexity of NTsSVQ

The computational steps associated with NTsSVQ method are: Voronoi region search using SED measure at first stage,Voronoi region mean vector subtraction, variance normalization (i.e. dividing by standard deviation values), finding Voronoiregion specific weights ({on,k,i}p

i=1) from {σk,i}pi=1 and {wn,i}p

i=1 values, SVQ codebook search using WSED measure ofEq. (12) at second stage, multiplication by standard deviation values to realize [Λk]−1 and Voronoi region mean vectoraddition for reconstruction.

For a p-dimensional vector, let the bits allocated to Q1 be b0 (i.e. M = 2b0 ); the sub-vector dimensions in Q2 are {pi}Si=1

(i.e. S number of sub-vectors at Q2) with corresponding bit allocations as {bi}Si=1 such that p = ∑S

i=1 pi , and if the total

bit allocation/vector is b, then b = b0 + ∑Si=1 bi . Using the WSED as the distance measure to encode a vector, the required

computation is (in flops) given as

S. Chatterjee, T.V. Sreenivas / Digital Signal Processing 19 (2009) 476–490 483

(a) (b) (c)

Fig. 3. Importance of transform domain variance normalization of Voronoi regions in NTsTrSVQ method. (a) A pdf consists of two Voronoi regions char-acterized by two distinct correlated Gaussians; red colored, thick dots represent the Gaussian with lower covariance spread. (b) Residual vector databaseconsisting of two mean removed Voronoi regions and the associated 32 code-vectors represented by ‘∗’ marks. (c) Residual vector database consisting oftwo mean removed, transformed and variance normalized Voronoi regions and the associated 32 code-vectors represented by ‘∗’ marks. (For interpretationof the references to color in this figure legend, the reader is referred to the web version of this article.)

CNTsSVQ = (3p2b0 + 2b0

) + p + p + 2p +S∑

i=1

(4pi2

bi + 2bi) + p + p

= 6p + (3p + 1)2b0 +S∑

i=1

(4pi + 1)2bi . (13)

The above computational complexity is nearly similar to CTsSVQ of Eq. (5) and hence the optimum bit allocation for NTsSVQwill be similar like TsSVQ as shown in Eq. (7).

It is now necessary to store {Λk}Mk=1 matrices in a look up table where M = 2b0 . All the Λk matrices are diagonal and

hence, it is required to store the M number of p-dimensional standard deviation vectors. The total memory required forstoring the Voronoi region mean vectors, standard deviation vectors and the SVQ codebook (in Q2) is (in floats) given as

MNTsSVQ = p2b0 + p2b0 +S∑

i=1

pi2bi = 2p2b0 +

S∑i=1

pi2bi . (14)

5. Normalized two stage transform domain split VQ

In the context of emphasizing the importance of Voronoi region normalization, Fig. 3(a) shows a data distribution con-sisting of two correlated Gaussians. Suppose, the first stage quantizer is allocated 1 bit and thus, the associated codebookconsists of two codevectors which are the mean vectors of two correlated Gaussians. Therefore, the residual vector database,collected at the output of the first stage, consists of two mean removed correlated Gaussians as shown in Fig. 3(b) where thedenser region depicts the Gaussian with lower covariance spread. Suppose, the second stage quantizer is allocated 5 bits andhence, the 32 code-vectors are designed using the residual vector database as shown in Fig. 3(b). It is seen that the smallerGaussian is unable to use all the code-vectors effectively. Thus, the use of a de-correlating transform (such as KarhunenLoeve transform (KLT)) followed by variance normalization, specific to each of the Gaussians (i.e. for each of the Voronoiregions), provides a compact residual vector database to design an effective second stage codebook. We use the Voronoiregion specific KLT for de-correlation and thus, expect higher transform coding gain as each of the KLTs is tuned to thesource distribution of corresponding Voronoi region. Therefore, in the second stage, the transformed vector can be split intosub-vectors and coded using SVQ technique. Recently it is shown in [35] that the use of a transform domain SVQ (TrSVQ)method results in the reduction of computational complexity as the transform domain vector can be split into higher num-ber of sub-vectors without incurring the split loss and thus, we use the TrSVQ method for quantizing the residual vector inthe second stage. The transformed and variance normalized residual database and the associated code-vectors are shown inFig. 3(c) which illustrates the importance of applying Voronoi region specific KLT followed by variance normalization. Thus,we develop a new method which is referred to as normalized two stage transform domain split VQ (NTsTrSVQ).

If X be p-dimensional vector which is quantized to μk using Q1 (i.e. X belongs to the kth Voronoi region), then thetransformed and variance normalized residual vector is given as

Z = Ψ kTkU = Ψ kTk[X − μk], (15)

where μk , Tk and Ψ k are respectively the mean vector, KLT matrix and variance normalization matrix associated with kthVoronoi region; Ψ k = diag[{1/

√λk,i}p ], where {λk,i}p are the eigen-values of covariance matrix for kth Voronoi region.

i=1 i=1

484 S. Chatterjee, T.V. Sreenivas / Digital Signal Processing 19 (2009) 476–490

Thus, the quantized informations are μk and Z. From Eq. (15), we can write the original vector as X = [Ψ kTk]−1Z + μk andthus, the decoded vector at the receiver is realized as X = [Ψ kTk]−1Z + μk .

5.1. NTsTrSVQ codebook training

The LBG algorithm [1] is first applied on the full training database to produce the M optimum code-vectors of Q1,i.e. {μk}M

k=1. All the training vectors are then classified using SED measure and the Voronoi region specific KLT matrices,{Tk}M

k=1, and variance normalization matrices, {Ψ k}Mk=1, are evaluated using the classified data. Then the training vectors

are transformed using Eq. (15) to create a new training database of transformed domain and variance normalized residualvector. For a particular Voronoi region, the KLT is so ordered that the eigen-values are in descending order; hence, fordesigning the SVQ codebook in the second stage, the transformed vector (Z) is split into S number of sub-vectors in such away that the variance based bit allocation results in a nearly uniform bit allocation to minimize the complexity [35].

5.2. Voronoi region specific WSED

Let us allow to use the subscript ‘n’ for denoting the nth speech frame LSF vector as Xn . For quantizing the input vectorusing the WSED measure of Eq. (2), a new distance measure is derived for coding the transformed and variance normalizedresidual vector Zn in the second stage of NTsTrSVQ method. We simplify the WSED measure of Eq. (2) in terms of originaland coded transformed and variance normalized residual vectors (Zn and Zn) as

d(Xn, Xn) = [Xn − Xn]T Wn[Xn − Xn]= [Zn − Zn]T [[Ψ kTk]−1]T

Wn[Ψ kTk]−1[Zn − Zn]= [Zn − Zn]T [[Ψ k]−1]T [[Tk]−1]T

Wn[Tk]−1[Ψ k]−1[Zn − Zn]= [Zn − Zn]T Rn,k[Zn − Zn], (16)

where Rn,k = [[Ψ k]−1]T [[Tk]−1]T Wn[Tk]−1[Ψ k]−1. Thus, the second stage residual vector is required to be quantized usingthe following distance measure, given as

d(Zn, Zn) = [Zn − Zn]T Rn,k[Zn − Zn]. (17)

In Eq. (17), the new weighting matrix Rn,k is dependent on the nth vector weighting matrix (Wn), and KLT matrix anddiagonal variance normalization matrix associated with the kth Voronoi region (i.e. Tk and Ψ k). The matrix, Rn,k , is notdiagonal (as Tk is a full matrix) and hence, the distance measure of Eq. (17) is not a separable distance measure. Thus, thedistance measure is not amenable to apply for quantizing the transformed domain sub-vectors using SVQ technique. Now, ifit is forced to assume that Wn is an identity matrix (i.e. it is assumed that the distance measure to code Xn be the commonSED and thus, Wn = I), then the new weighting matrix becomes diagonal and can be simplified as Rn,k = [[Ψ k]−1]T [Ψ k]−1 =diag[{λk,i}p

i=1] (as Tk is orthogonal and Ψ k is diagonal). Therefore the distance measure, characterized by Eq. (17), is furthersimplified as

d(Zn, Zn) = [Zn − Zn]T Rn,k[Zn − Zn] =p∑

i=1

rn,k,i(Zn,i − Zn,i)2, (18)

where {rn,k,i = λk,i}pi=1 are the new Voronoi region specific weighting coefficients. This modified WSED measure is used by

the TrSVQ method in second stage to quantize the transformed residual vector. Therefore, the index of the Voronoi regionmean vector is found using SED measure at Q1, whereas the transformed and normalized residual vector (Zn) is quantizedusing the Voronoi region specific WSED measure of Eq. (18) in the second stage. For quantizing Xn , it is noted that theNTsTrSVQ method is unable to use the WSED measure of Eq. (2), but forced to use the commonly used SED measure asthe underlying distance measure. Thus, even though the NTsTrSVQ method uses a de-correlating transform but its averageperformance may suffer because of using the sub-optimum SED measure for LSF quantization.

5.3. Complexity of NTsTrSVQ

The computational steps associated with NTsTrSVQ method are: Voronoi region search using SED measure at first stage,Voronoi region mean subtraction, Voronoi region specific KLT transformation and division by standard deviation values, find-ing Voronoi region specific weights ({λk,i}p

i=1) from standard deviation values, SVQ codebook search using WSED measureof Eq. (18) at second stage, multiplication by standard deviation values, inverse KLT transformation and mean addition forreproduction. Using SED measure at first stage and WSED measure at second stage, the required computation is (in flops)given as

S. Chatterjee, T.V. Sreenivas / Digital Signal Processing 19 (2009) 476–490 485

Fig. 4. Switched two stage split VQ (STsSVQ) method. Both the NTsSVQ and NTsTrSVQ methods are used to code the input vector and the vector withminimum WSED is chosen; 1 flag bit is transmitted for indexing the chosen method.

CNTsTrSVQ = (3p2b0 + 2b0

) + p + 2p2 + p + p +S∑

i=1

(4qi2

bi + 2bi) + p + 2p2 + p

= 5p + 4p2 + (3p + 1)2b0 +S∑

i=1

(4qi + 1)2bi , (19)

where the bits allocated to Q1 is b0 (i.e. M = 2b0 ) and the sub-vector dimensions in second stage are {qi}Si=1 (S number of

sub-vectors for TrSVQ in second stage) with corresponding variance based bit allocations as {bi}Si=1 such that p = ∑S

i=1 qi ,

and if the total bit allocation/vector is b, then b = b0 + ∑Si=1 bi . On the other hand, the required memory to store the mean

vectors, standard deviation values and KLT matrices of M Voronoi regions is (2p + p2)2b0 floats. Also the required memoryto store the TrSVQ codebook in second stage is

∑Si=1 qi2bi floats. Therefore, the total required memory (in floats) is given as

MNTsTrSVQ = (2p + p2)2b0 +

S∑i=1

qi2bi . (20)

The use of transform helps us to exploit the linear redundancy between sub-vectors and thus, allows to use the SVQtechnique with more number of sub-vectors, leading to the further reduction in computational complexity. But, the memorycomplexity increases because of the requirement of storing the Voronoi region specific KLT matrices.

6. Switched two stage SVQ

As discussed in the previous sections, two new schemes have complimentary strengths of low memory requirement inNTsSVQ and low computational complexity in NTsTrSVQ. Even though the use of optimum transform in NTsTrSVQ guaranteesassured coding gain, but the method may not provide considerable improvement in LSF quantization performance because ofusing the sub-optimum distance measure. Although we expect comparable average performances of NTsSVQ and NTsTrSVQmethods, their data dependent performances can be exploited more efficiently by switching between them in a closed loopmanner; we refer to this new method as switched two stage SVQ (STsSVQ). In STsSVQ, the input vector is coded using boththe NTsSVQ and NTsTrSVQ methods; the method whose reproduction vector best approximates the input vector, in termsof the least distortion, is chosen and a flag bit pointing to the chosen method is transmitted. A block diagram of STsSVQmethod is shown in Fig. 4. The STsSVQ method may be seen as a multiple coding scheme where more than one codingmethod is applied in switched mode for achieving better R/D performance. Multiple coding methods, such as multipletransform domain compression schemes are already proposed in the literature [13,15,17,20]. In the STsSVQ method, we usethe WSED measure of Eq. (2) to determine the best quantized vector among the two quantized vectors produced by NTsSVQand NTsTrSVQ methods.

We note that both the NTsSVQ and NTsTrSVQ methods possess the Voronoi region selection block as their first stagequantizer and thus, Q1 is common for both the methods. Hence, the overall complexity (computational and memory) ofSTsSVQ method is slightly lower than the added complexities of NTsSVQ and NTsTrSVQ methods.

7. Quantization results

We evaluate the proposed methods for wide-band speech LSF parameter quantization. To measure the LSF quantizationperformance, we use the traditional measure of spectral distortion (SD) [6]. For the nth frame, the SD (in dB) is defined as

SDn ={

1

π∫−π

{10 log10 Pn(Ω) − 10 log10 Pn(Ω)

}2dΩ

} 12

, (21)

where Pn(Ω) and Pn(Ω) are the LP filter power spectra using, respectively, original LSF and quantized LSF parameters.A low average SD along-with minimum number of high SD outliers are considered necessary for good LSF quantizationperformance [18,29,31,33,34].

486 S. Chatterjee, T.V. Sreenivas / Digital Signal Processing 19 (2009) 476–490

The speech data used in the experiments is from the TIMIT database. The specification of AMR-WB speech codec [36]is used to compute the 16th order LPCs which are then converted to LSFs. We briefly describe the LPC analysis methodin AMR-WB speech codec [36]. The 16 kHz speech is processed in two sub-bands, 0.05–6.4 kHz and 6.4–7 kHz, to allocatethe bits optimally according to the subjective importance of the lower band. In lower band, speech is down-sampled to12.8 kHz and then filtered through a high-pass filter, followed by a pre-emphasis filter. The pre-emphasis filter removes thespectral tilt of the speech spectrum and emphasizes the higher frequency formants for better subjective quality. Short-termprediction analysis is performed once per speech frame using the autocorrelation approach with 30 ms asymmetric window.An overhead of 5 ms is used in the autocorrelation computation with 60 Hz bandwidth expansion. Though intermittencespectral frequency (ISF) parameters are used in AMR-WB speech coder [36], but we use LSF parameters in this paper. It wasshown in [31] that the LSF parameters are better amenable for vector quantization. We use 361,046 number of LSF vectorsas training data and 87,961 LSF vectors as test data (distinct from training data).

7.1. Performance of TsSVQ method

In the case of TsSVQ method, the second stage quantizer (Q2) is designed using the SVQ technique where the 16-dimensional residual vector is split into five parts of (3,3,3,3,4)-dimensional sub-vectors. For an allocated b = 46bits/vector, we compute2 the optimum bit allocation to Q1 and to the split sub-vectors at Q2, using Eq. (7) as:

b0|opt = 6.13, b1|opt = 8.05, b2|opt = 8.05,

b3|opt = 8.05, b4|opt = 8.05 and b5|opt = 7.66.

In the sense of minimum computational complexity, we may choose the optimum b0 as 6 using integer bit allocation (i.e.Mopt = 26 = 64).

Table 1 shows the R/D performance, computational complexity and memory requirements of the TsSVQ method usingvarying number of Voronoi regions (i.e. for different M where M = 2b0 ) at Q1. The bit allocation, for Q1 and the splitsub-vectors of residual vector at Q2, is also shown. It is observed from Table 1 that the increase in the number of Voronoiregions leads to the lower distortion at any bits/vector, illustrated in Fig. 5. From Fig. 5, it is observed that at b0 = 6(i.e. M = 64), the TsSVQ method provides an optimum trade-off between R/D performance, computational complexity andmemory requirement. Hence, we choose the optimum number of Voronoi regions as 64 which is also further used forNTsSVQ, NTsTrSVQ and STsSVQ methods.

7.2. Performance of NTsSVQ, NTsTrSVQ and STsSVQ methods

In the case of NTsSVQ method, the second stage quantizer is designed using SVQ technique where the variance nor-malized residual vector (V) is split into five parts of (3,3,3,3,4)-dimensional sub-vectors. For NTsSVQ method, simple SEDmeasure is used in Q1 and the Voronoi region specific WSED measure of Eq. (12) is used in the second stage. Table 2 showsthe R/D performance, computational complexity and memory requirements of the NTsSVQ method using 64 Voronoi regionsat Q1. The bit allocation, for Q1 and the split sub-vectors of normalized residual vector in second stage, is also shown.Comparing Table 1 (at M = 64) and Table 2, it is observed that the variance normalization of Voronoi regions in NTsSVQmethod improves the R/D performance and saves nearly 1 bit/vector (in the sense of average SD) compared to the TsSVQmethod, but at the expense of moderate increment in memory.

In NTsTrSVQ method, the second stage quantizer is designed using SVQ technique where the KLT transformed and vari-ance normalized residual vector (Z) is split into six parts of (2,2,2,3,3,4)-dimensional sub-vectors. An important point tomention is that further splitting would lead to poorer R/D performance [35]. In NTsTrSVQ method, the common SED is usedin Q1 and the Voronoi region specific WSED of Eq. (18) is used in the second stage. Table 3 shows the R/D performance,computational complexity and memory requirement of NTsTrSVQ method using 64 Voronoi regions at Q1. The bit allocation,for Q1 and split sub-vectors of normalized residual vector in second stage, is also shown. Comparing Table 1 and Table 3(at M = 64), it is observed that the NTsTrSVQ method provides better R/D performance than the TsSVQ method and savesnearly 1 bit/vector (in the sense of average SD), even at considerably lower computational complexity, but at the expenseof higher memory. But, the inability of NTsTrSVQ method for using the WSED measure of Eq. (2) results in the increase ofhigh distortion outliers. In the sense of average SD, we note that the use of Voronoi region specific KLT followed by variancenormalization provides for efficient implementation of SVQ in transformed domain and thus, reduces the computationalcomplexity, but at the expense of higher memory due to the requirement of storing the KLTs.

In the case of STsSVQ method, both NTsSVQ and NTsTrSVQ methods are applied together and the best coding schemeis chosen for an input vector. We implement the STsSVQ method where the first stage quantizer (Q1) is common to bothNTsSVQ and NTsTrSVQ methods. In the second stage quantizer, a five part SVQ codebook is designed for NTsSVQ method anda six part SVQ codebook is designed for NTsTrSVQ method. The R/D performance, computational complexity and memory

2 For wide-band speech LSF quantization, the allocated 46 bits/vector is the maximum number of bits/vector which we have experimented with andhence, we expect the maximum complexity at this bitrate.

S. Chatterjee, T.V. Sreenivas / Digital Signal Processing 19 (2009) 476–490 487

Table 1Performance of two stage split VQ (TsSVQ) method at different number of Voronoi regions in first stage.

Total bits/vector(bit allocation as:b0,b1,b2, . . . ,b5)

Avg.SD(dB)

SD outliers kflops/vector(CPU)

kfloats/vector(ROM)

2–4 dB(in %)

>4 dB(in %)

Number of Voronoi regions, M = 16; b0 = 442 (4,7,8,8,8,7) 1.233 3.38 0.001 14.64 3.4543 (4,8,8,8,8,7) 1.199 2.98 0.001 16.30 3.8444 (4,8,8,8,8,8) 1.128 1.69 0.000 18.48 4.3545 (4,8,9,8,8,8) 1.091 1.39 0.000 21.80 5.1246 (4,8,9,9,8,8) 1.054 1.10 0.000 25.13 5.88

Number of Voronoi regions, M = 32; b0 = 542 (5,7,8,8,7,7) 1.211 3.16 0.001 13.76 3.3243 (5,7,8,8,8,7) 1.171 2.53 0.001 15.42 3.7144 (5,8,8,8,8,7) 1.138 2.25 0.001 17.08 4.0945 (5,8,8,8,8,8) 1.070 1.28 0.000 19.26 4.6046 (5,8,9,8,8,8) 1.036 1.07 0.000 22.59 5.37

Number of Voronoi regions, M = 64; b0 = 642 (6,7,8,7,7,7) 1.188 2.80 0.003 13.66 3.4543 (6,7,8,8,7,7) 1.148 2.28 0.002 15.32 3.8444 (6,7,8,8,8,7) 1.109 1.79 0.002 16.99 4.2245 (6,8,8,8,8,7) 1.076 1.58 0.001 18.65 4.6046 (6,8,8,8,8,8) 1.012 0.87 0.000 20.83 5.12

Number of Voronoi regions, M = 128; b0 = 742 (7,7,7,7,7,7) 1.162 2.19 0.003 15.13 4.0943 (7,7,8,7,7,7) 1.126 1.84 0.003 16.80 4.4844 (7,7,8,8,7,7) 1.089 1.47 0.003 18.46 4.8645 (7,7,8,8,8,7) 1.053 1.21 0.002 20.12 5.2446 (7,8,8,8,8,7) 1.023 1.07 0.002 21.79 5.63

Number of Voronoi regions, M = 256; b0 = 842 (8,7,7,7,7,6) 1.171 2.73 0.001 20.32 5.8843 (8,7,7,7,7,7) 1.102 1.63 0.000 21.40 6.1444 (8,7,8,7,7,7) 1.069 1.38 0.000 23.07 6.5245 (8,7,8,8,7,7) 1.034 1.13 0.000 24.73 6.9146 (8,7,8,8,8,7) 0.999 0.89 0.000 26.40 7.29

Fig. 5. Choice of optimum number of Voronoi regions for two stage split VQ (TsSVQ) method: Increasing the number of bits to first stage, reduces thedistortion. At b0 = 6 (M = 64), TsSVQ provides an optimum trade-off between rate–distortion performance, computational complexity and memory require-ment.

requirement of the STsSVQ method, for M = 64, are shown in Table 4. We also show the percentages of vectors that arequantized using either NTsSVQ method or NTsTrSVQ method. It is noted that both the methods are nearly equally usedfor evaluating the best quantized vector. Comparing Table 2, Table 3 and Table 4, it is observed that the STsSVQ methodprovides improved R/D performance compared to the other two methods, but at the expense of higher complexity. TheSTsSVQ method saves more than 1 bit/vector compared to both the NTsSVQ and NTsTrSVQ methods.

488 S. Chatterjee, T.V. Sreenivas / Digital Signal Processing 19 (2009) 476–490

Table 2Performance of normalized two stage split VQ (NTsSVQ) method (at M = 64).

Total bits/vector{bit allocation as:(b0,b1,b2, . . . ,b5)}

Avg.SD(dB)

SD outliers kflops/vector(CPU)

kfloats/vector(ROM)

2–4 dB(in %)

>4 dB(in %)

42 (6,7,8,7,7,7) 1.139 2.15 0.004 13.72 4.4843 (6,7,8,8,7,7) 1.100 1.73 0.004 15.39 4.8644 (6,7,8,8,8,7) 1.063 1.34 0.003 17.05 5.2445 (6,8,8,8,8,7) 1.032 1.17 0.003 18.72 5.6346 (6,8,8,8,8,8) 0.972 0.62 0.002 20.89 6.14

Table 3Performance of normalized two stage transform domain split VQ (NTsTrSVQ) method (at M = 64).

Total bits/vector{bit allocation as:(b0,b1,b2, . . . ,b6)}

Avg.SD(dB)

SD outliers kflops/vector(CPU)

kfloats/vector(ROM)

2–4 dB(in %)

>4 dB(in %)

42 (6,6,6,5,7,6,6) 1.159 3.71 0.026 9.26 19.5843 (6,6,6,6,7,6,6) 1.116 2.95 0.022 9.55 19.6444 (6,6,6,6,8,6,6) 1.077 2.38 0.019 11.21 20.0345 (6,6,6,6,8,7,6) 1.036 1.87 0.014 12.04 20.2246 (6,7,6,6,8,7,6) 0.997 1.50 0.009 12.62 20.35

Table 4Performance of switched two stage split VQ (STsSVQ) method (at M = 64).

Total bits/vector Avg.SD(dB)

SD outliers kflops/vector(CPU)

kfloats/vector(ROM)

% of use

2–4 dB(in %)

>4 dB(in %)

NTsSVQ NTsTrSVQ

42 1.081 1.26 0.010 17.76 22.52 52.79 47.2043 1.043 1.04 0.004 19.96 23.04 51.48 48.5144 1.004 0.76 0.004 21.92 23.48 50.99 49.0045 0.968 0.58 0.003 25.24 24.25 50.96 49.0346 0.935 0.44 0.003 27.74 24.83 49.75 50.24

Table 5Performance of traditional five-part split VQ (SVQ) method.

Total bits/vector{bit allocation tosub-vectors}

Avg. SD (dB) SD outliers kflops/vector(CPU)

kfloats/vector(ROM)

2–4 dB(in %)

>4 dB(in %)

42 (8,9,9,8,8) 1.258 2.63 0.000 24.32 5.6343 (8,9,9,9,8) 1.214 2.02 0.000 27.64 6.4044 (9,9,9,9,8) 1.182 1.83 0.000 30.97 7.1645 (9,9,9,9,9) 1.116 0.97 0.000 35.32 8.1946 (9,10,9,9,9) 1.074 0.74 0.000 41.98 9.72

7.3. Comparison with established quantization methods

The proposed methods are compared to the traditional SVQ and recently proposed SSVQ methods. For both the SVQ andSSVQ methods, we use the WSED measure of Eq. (2).

In the case of SVQ method, the 16 dimensional LSF vector is split into 5 parts of (3,3,3,3,4)-dimensional sub-vectors.3

The performance is shown in Table 5 along-with the bit allocation to sub-vectors.Five part SSVQ [29,31] with 8 and 16 number of switch directions, is implemented and the performance is shown

in Table 6. The bit allocation, to the switching selector and the split sub-vectors, is also shown in Table 6 according tothe SSVQ implementation in [31]. It is observed that there is a steep increase in memory requirement as the number ofswitching directions is increased.

We compare the SVQ, SSVQ (with 8 switching directions), TsSVQ and STsSVQ methods in Fig. 6. It is observed thatthe TsSVQ method provides considerable improvement in R/D performance than the SVQ method, even at nearly half thecomputational complexity and memory requirement. The STsSVQ method saves more than 4 bits/vector compared to theSVQ, even at lower computational complexity, but at the expense of higher memory. Comparing with the SSVQ method, the

3 Five part SVQ is also implemented in [29,31] to compare with five part SSVQ for wide-band speech coding application.

S. Chatterjee, T.V. Sreenivas / Digital Signal Processing 19 (2009) 476–490 489

Table 6Performance of recently proposed five-part switched split VQ (SSVQ) method.

Total bits/vector {bitallocation to switchdirection and sub-vectors}

Avg.SD(dB)

SD outliers kflops/vector(CPU)

kfloats/vector(ROM)

2–4 dB(in %)

>4 dB(in %)

Number of switch directions is 842 (3,7,7,8,8,9) 1.123 1.28 0.001 19.08 34.9443 (3,7,8,8,8,9) 1.071 0.85 0.001 20.74 38.0144 (3,8,8,8,8,9) 1.036 0.72 0.000 22.40 41.0845 (3,8,8,8,9,9) 1.003 0.59 0.000 25.73 47.2346 (3,8,8,9,9,9) 0.967 0.46 0.000 29.06 53.37

Number of switch directions is 1642 (4,6,7,8,8,9) 1.101 1.06 0.001 18.64 66.8143 (4,7,7,8,8,9) 1.057 0.83 0.001 19.47 69.8844 (4,7,8,8,8,9) 1.009 0.54 0.000 21.13 76.0345 (4,8,8,8,8,9) 0.976 0.46 0.000 22.80 82.1746 (4,8,8,8,9,9) 0.945 0.37 0.000 26.12 94.46

Fig. 6. Comparison of performance between split VQ (SVQ), switched split VQ (SSVQ), two stage split VQ (TsSVQ) and switched two stage split VQ (STsSVQ)methods.

STsSVQ method saves nearly 1 bit/vector at the requirement of much lower memory, but the computational complexity isnearly same. Thus, the STsSVQ method may be considered as an effective structured VQ method for wide-band speech LSFquantization.

8. Conclusions

In this paper, we discuss about the limitations of using direct full search VQ in practical applications. Structurally con-strained/product code VQs, such as tree-structured, multistage VQ, split VQ and switched split VQ, alleviate the problemof complexity, but introduces sub-optimality in the sense of degraded rate–distortion performance. In order to address thecomputational complexity and memory requirement, we propose a new product code VQ method referred to as the twostage split vector quantizer (TsSVQ). In this two stage architecture, the first stage quantizer exploits all the higher dimen-sional coding advantages and the incorporation of SVQ technique in the second stage dramatically reduces the computationalcomplexity and memory requirement. For further improvement in rate–distortion performance, a variance normalizationprocedure is introduced to design the effective second stage codebook. We also investigate the issue of using multiplecoding based technique by designing a switched two stage split VQ (STsSVQ) method.

Our new methods are evaluated for wide-band speech LSF quantization. It is shown that the proposed methods outper-form traditional split VQ (SVQ) method in all the aspects. Also, the new STsSVQ method provides improved rate–distortionperformance over the recently proposed switched split VQ (SSVQ) method, even at much lower requirement of memory.

Acknowledgments

We thank the two anonymous reviewers for their detailed comments, which have been extremely helpful in improvingthe clarity and quality of this paper.

References

[1] Y. Linde, A. Buzo, R.M. Gray, An algorithm for vector quantizer design, IEEE Trans. Commun. 28 (1980) 84–95.

490 S. Chatterjee, T.V. Sreenivas / Digital Signal Processing 19 (2009) 476–490

[2] J. Makhoul, S. Roucos, H. Gish, Vector quantization in speech coding, Proc. IEEE 73 (11) (1985) 1551–1588.[3] N. Phamdo, N. Farvardin, Coding of speech LSP parameters using TSVQ with interblock noiseless coding, in: Proc. ICASSP, vol. 1, April 1990, pp. 193–196.[4] R. Laroia, N. Phamdo, N. Farvardin, Robust and efficient quantization of speech LSP parameters using structured vector quantizers, in: Proc. ICASSP,

1991, pp. 641–644.[5] A. Gersho, R.M. Gray, Vector Quantization and Signal Compression, Kluwer Academic, 1992.[6] K.K. Paliwal, B.S. Atal, Efficient vector quantization of LPC parameters at 24 bits/frame, IEEE Trans. Acoust. Speech Signal Process. 1 (1993) 3–14.[7] W.F. LeBlanc, B. Bhattacharya, S.A. Mahmoud, V. Cuperman, Efficient search and design procedures for robust multi-stage VQ of LPC parameters for

4 Kb/s speech coding, IEEE Trans. Speech Audio Process. 1 (4) (1993) 373–385.[8] R. Lefebvre, R. Salami, C. Laflamme, J.P. Adoul, High quality coding of wide-band audio signals using transform coded excitation (TCX), in: Proc. ICASSP,

1994, pp. 193–196.[9] A. Ubale, A. Gersho, A multi-band CELP wide-band speech coder, in: Proc. ICASSP, 1994, pp. 1367–1370.

[10] J. Pan, T.R. Fischer, Two-stage vector quantization-lattice vector quantization, IEEE Trans. Inform. Theory 41 (1) (1995) 155–163.[11] W.R. Gardner, B.D. Rao, Theoretical analysis of the high-rate vector quantization of LPC parameters, IEEE Trans. Speech Audio Process. 3 (5) (1995)

367–381.[12] J.H. Chen, D. Wang, Transform predictive coding of wide-band speech signals, in: Proc. ICASSP, 1996, pp. 275–278.[13] A. Ramaswamy, W.B. Mikhael, A mixed transform approach for efficient compression of medical images, IEEE Trans. Med. Imaging 15 (3) (1996)

343–352.[14] D. Chang, S. Ann, C.W. Lee, A classified vector quantization of LSF parameters, Signal Process. 59 (1997) 267–273.[15] A.P. Berg, W.B. Mikhael, An efficient structure and algorithm for image representation using nonorthogonal basis images, IEEE Trans. Circuits Syst. II:

Analog Digit. Signal Process. 44 (10) (1997) 818–828.[16] J. Pan, T.R. Fischer, Vector quantization of speech line spectrum pair parameters and reflection coefficients, IEEE Trans. Speech Audio Process. 6 (2)

(1998) 106–115.[17] A.P. Berg, W.B. Mikhael, A survey of mixed transform techniques for speech and image coding, in: Proc. IEEE International Symposium on Circuits and

Systems, ISCAS’99, vol. 4, 1999, pp. 106–109.[18] G. Guibe, H.T. How, L. Hanzo, Speech spectral quantizers for wide-band speech coding, Eur. Trans. Telecommun. 12 (6) (2001) 535–545.[19] L. Hanzo, F.C.A. Somerville, J.P. Woodard, Voice Compression and Communications: Principles and Applications for Fixed and Wireless Channels, IEEE

Press, New York, 2001.[20] W.B. Mikhael, V. Krishnan, Energy-based split vector quantizer employing signal representation in multiple transform domains, Digit. Signal Pro-

cess. 11 (4) (2001) 359–370.[21] G. Biundo, S. Grassi, M. Ansorge, F. Pellandini and P.A. Farine, Design techniques for spectral quantization in wide-band speech coding, in: Proc. 3rd

COST 276 Workshop on Information and Knowledge Management for Integrated Media Communication, Budapest, Oct. 2002, pp. 114–119.[22] A.D. Subramaniam, B.D. Rao, PDF optimized parametric vector quantization of speech line spectral frequencies, IEEE Trans. Speech Audio Process. 11 (2)

(2003) 130–142.[23] V. Krishnan, D.V. Anderson, K.K. Truong, Optimal multistage vector quantization of LPC parameters over noisy channels, IEEE Trans. Speech Audio

Process. 12 (1) (2004) 1–8.[24] E.R. Duni, A.D. Subramaniam, B.D. Rao, Improved quantisation structures using generalised HMM modelling with application to wide-band speech

coding, in: Proc. ICASSP, May 2004, pp. 161–164.[25] Y. Shin, S. Kang, T.R. Fischer, C. Son, Y. Lee, Low-complexity predictive trellis coded quantization of wide-band speech LSF parameters, in: Proc. ICASSP,

May 2004, pp. 145–148.[26] F. Norden, T. Eriksson, On split quantization of LSF parameters, in: Proc. ICASSP, vol. 1, May 2004, pp. I-157–160.[27] S. So, K.K. Paliwal, Efficient vector quantisation of line spectral frequencies using the switched split vector quantiser, in: Proc. Int. Conf. Spoken

Language Process., Jeju, Korea, Oct. 2004.[28] S. So, K.K. Paliwal, Multi-frame GMM-based block quantisation of line spectral frequencies for wide-band speech coding, in: Proc. ICASSP, vol. I,

Philadelphia, Mar. 2005, pp. 121–124.[29] S. So, K.K. Paliwal, Switched split vector quantisation of line spectral frequencies for wide-band speech coding, in: Proc. INTERSPEECH, Lisbon, Portugal,

Sept. 2005, pp. 2705–2708.[30] S. So, K.K. Paliwal, Multi-frame GMM-based block quantisation of line spectral frequencies, Speech Commun. 47 (2005) 265–276.[31] S. So, K.K. Paliwal, A comparative study of LPC parameter representations and quantisation schemes for wide-band speech coding, Digit. Signal Pro-

cess. 17 (1) (2007) 114–137.[32] S. So, K.K. Paliwal, Efficient product code vector quantisation using the switched split vector quantiser, Digit. Signal Process. 17 (1) (2007) 138–171.[33] S. Chatterjee, T.V. Sreenivas, Normalized two stage SVQ for minimum complexity wide-band LSF quantization, in: Proc. EUROSPEECH (INTERSPEECH),

Antwerp, Belgium, Aug. 2007, pp. 1657–1660.[34] S. Chatterjee, T.V. Sreenivas, Predicting VQ performance bound for LSF coding, IEEE Signal Process. Lett. 15 (2008) 166–169.[35] S. Chatterjee, T.V. Sreenivas, Optimum transform domain split VQ, IEEE Signal Process. Lett. 15 (2008) 285–288.[36] AMR wide-band speech codec, transcoding functions (Release 5), 3GPP TS 26.190 V 5.1.0.

Saikat Chatterjee was born in India in 1977. He earned his bachelor and master of engineering, both from Jadavpur University, Indiaand then submitted his doctoral thesis in Indian Institute of Science, India, in 2008. Currently, he is working as a post doctoral researcherin The Royal Institute of Technology (KTH), Sweden. His research interests include source coding, joint source-channel coding, estimationtheory, speech enhancement and auditory motivated signal processing for speech recognition.

Thippur Sreenivas graduated from Bangalore University in 1973, obtained M.E. from Indian Institute of Science (IISc), Bangalore, in1975 and Ph.D. degree from Indian Institute of Technology, Bombay, in 1981, working as Research Scholar at Tata Institute of FundamentalResearch, Bombay.