Adaptive Algorithm for Speech Compression Using Cosine Packet Transform

5
1-4244-1355-9/07/$25.00 @2007 IEEE International Conference on Intelligent and Advanced Systems 2007 1168 ~ Adaptive Algorithm for Speech Compression using Cosine Packet Transform P.Prakasam and M.Madheswaran Center for Advanced Research, Department of Electronics and Communication Engineering Muthayammal Engineering College, Rasipuram – 637 408, Tamilnadu, India. Phone: +91 4287 226737, Fax: +91 4287 226537 email: [email protected], [email protected] Abstract – This paper presents a new adaptive algorithm for speech Compression using Cosine Packet Transform. The proposed algorithm uses packet decomposition, which reduces a computational complexity of a system. This paper compare the compression ratio of methods using Wavelet Transform, Cosine Transform, Wavelet Packet Transform and proposed adaptive algorithm using Cosine Packet Transform for different speech signal samples. The mean compression ratio is calculated for all the methods and compared. The implemented results show that the proposed compression algorithm gives the better performance for speech signals. Keywords: Discrete Cosine Transform, Discrete Wavelet Transform, Wavelet Packet Transform, Cosine Packets, adaptive thresholding. I. INTRODUCTION With rapid deployment of speech compression technologies, more and more speech content is stored and transmitted in compressed formats. Speech signals has unique properties that differ from a general audio/music signals. First, speech is a signal that is more structured and band-limited around 4 kHz. These two facts can be exploited through different models and approaches and at the end, make it easier to compress. Today, applications of speech compression involve real time processing in mobile satellite communications, cellular telephony, internet telephony, audio for videophones or video teleconferencing systems, among others. Other applications include also storage and synthesis systems used, for example, in voice mail systems, voice memo wristwatches, voice logging recorders and interactive PC software[1]. The idea of speech compression is to compress speech signal to take up less storage space and less bandwidth for transmission. To meet this goal different methods for compression have been designed and developed by various researchers [2-7]. The speech compression is used in digital telephony, in multimedia and in the security of digital communications. Before the introduction of Packet based transform techniques, audio coding techniques used DFT and DCT with window functions such as rectangular and sine-taper functions. However, these early coding techniques have failed to fulfil the contradictory requirements imposed by high-quality audio coding. For example, with a rectangular window the analysis/synthesis system is critically sampled, i.e., the overall number of the transformed domain samples is equal to the number of time domain samples, but the system suffers from poor frequency resolution and block effects, which are introduced after quantization or other manipulation in the frequency domain. Overlapped windows allow for better frequency response functions but carry the penalty of additional values in the frequency domain, thus not critically sampled. Discrete Cosine Packet Transform is currently the best solution, which has satisfactorily solved the paradox. Speech compressions are done by either based on linear prediction or based on orthogonal transforms methods. On the basis of the classical papers written by Shannon, [8] and Kolmogorov, [9], recently was highlighted a strong connection between the systems proposed in many lossy compression standards and the harmonic analysis, [10]. All these systems use orthogonal transforms. The algorithm described in this paper belongs to the second category. Unfortunately there is no any fast algorithm for the computation of orthogonal transform. This is the reason why in practice other orthogonal transforms are used. The quality of compression system can be appreciated with the aid of his rate distortion function. A compression system is better than another if, at equal distortions, it realizes a higher compression rate. The maximization of compression rate can be done, if a good selection of orthogonal transform be made. This paper is organized as follows. The mathematical model for speech signal and the description about Discrete Cosine Transform is presented in Section II. With necessary mathematical modeling, the proposed adaptive algorithm for speech compression is explained in Section III. In section IV, the developed algorithm is tested for various speech signal samples and comparison is made with Wavelet Transform, Cosine Transform and Wavelet Packet Transform. Finally, section V concludes the paper with some discussions. II. MATHEMATICAL MODEL Mathematical model of speech signal Every spoken word is a sequence of tons with different intensities, frequencies and duration. Every ton is a sinusoidal signal with a certain amplitude, frequency and duration. Therefore it is possible to represent any speech signal in to a sinusoidal model. A mathematical description of this model is given by Authorized licensed use limited to: Jawaharlal Nehru Technological University. Downloaded on November 17, 2008 at 06:25 from IEEE Xplore. Restrictions apply.

description

Adaptive Algorithm for Speech Compression Using Cosine Packet Transform

Transcript of Adaptive Algorithm for Speech Compression Using Cosine Packet Transform

  • 1-4244-1355-9/07/$25.00 @2007 IEEE

    International Conference on Intelligent and Advanced Systems 2007

    1168 ~

    Adaptive Algorithm for Speech Compression using Cosine Packet Transform

    P.Prakasam and M.Madheswaran Center for Advanced Research, Department of Electronics and Communication Engineering

    Muthayammal Engineering College, Rasipuram 637 408, Tamilnadu, India. Phone: +91 4287 226737, Fax: +91 4287 226537

    email: [email protected], [email protected]

    Abstract This paper presents a new adaptive algorithm for speech Compression using Cosine Packet Transform. The proposed algorithm uses packet decomposition, which reduces a computational complexity of a system. This paper compare the compression ratio of methods using Wavelet Transform, Cosine Transform, Wavelet Packet Transform and proposed adaptive algorithm using Cosine Packet Transform for different speech signal samples. The mean compression ratio is calculated for all the methods and compared. The implemented results show that the proposed compression algorithm gives the better performance for speech signals.

    Keywords: Discrete Cosine Transform, Discrete Wavelet Transform, Wavelet Packet Transform, Cosine Packets, adaptive thresholding.

    I. INTRODUCTION

    With rapid deployment of speech compression technologies, more and more speech content is stored and transmitted in compressed formats. Speech signals has unique properties that differ from a general audio/music signals. First, speech is a signal that is more structured and band-limited around 4 kHz. These two facts can be exploited through different models and approaches and at the end, make it easier to compress. Today, applications of speech compression involve real time processing in mobile satellite communications, cellular telephony, internet telephony, audio for videophones or video teleconferencing systems, among others. Other applications include also storage and synthesis systems used, for example, in voice mail systems, voice memo wristwatches, voice logging recorders and interactive PC software[1]. The idea of speech compression is to compress speech signal to take up less storage space and less bandwidth for transmission. To meet this goal different methods for compression have been designed and developed by various researchers [2-7]. The speech compression is used in digital telephony, in multimedia and in the security of digital communications. Before the introduction of Packet based transform techniques, audio coding techniques used DFT and DCT with window functions such as rectangular and sine-taper functions. However, these early coding techniques have failed to fulfil the contradictory requirements imposed by high-quality audio coding. For example, with a rectangular window the analysis/synthesis system is critically sampled, i.e., the overall number of the transformed domain samples is equal to the number of time domain samples, but the system suffers from poor frequency resolution and block

    effects, which are introduced after quantization or other manipulation in the frequency domain. Overlapped windows allow for better frequency response functions but carry the penalty of additional values in the frequency domain, thus not critically sampled. Discrete Cosine Packet Transform is currently the best solution, which has satisfactorily solved the paradox.

    Speech compressions are done by either based on linear prediction or based on orthogonal transforms methods. On the basis of the classical papers written by Shannon, [8] and Kolmogorov, [9], recently was highlighted a strong connection between the systems proposed in many lossy compression standards and the harmonic analysis, [10]. All these systems use orthogonal transforms. The algorithm described in this paper belongs to the second category. Unfortunately there is no any fast algorithm for the computation of orthogonal transform. This is the reason why in practice other orthogonal transforms are used. The quality of compression system can be appreciated with the aid of his rate distortion function. A compression system is better than another if, at equal distortions, it realizes a higher compression rate. The maximization of compression rate can be done, if a good selection of orthogonal transform be made.

    This paper is organized as follows. The mathematical model for speech signal and the description about Discrete Cosine Transform is presented in Section II. With necessary mathematical modeling, the proposed adaptive algorithm for speech compression is explained in Section III. In section IV, the developed algorithm is tested for various speech signal samples and comparison is made with Wavelet Transform, Cosine Transform and Wavelet Packet Transform. Finally, section V concludes the paper with some discussions.

    II. MATHEMATICAL MODEL

    Mathematical model of speech signal

    Every spoken word is a sequence of tons with different intensities, frequencies and duration. Every ton is a sinusoidal signal with a certain amplitude, frequency and duration. Therefore it is possible to represent any speech signal in to a sinusoidal model. A mathematical description of this model is given by

    Authorized licensed use limited to: Jawaharlal Nehru Technological University. Downloaded on November 17, 2008 at 06:25 from IEEE Xplore. Restrictions apply.

  • International Conference on Intelligent and Advanced Systems 2007

    ~ 1169

    )(cos)()(

    1

    tAtx itQ

    ii (1)

    Where Ai, i and t are amplitude, frequency and time duration of the particular incident respectively.

    Every term of this sum is a signal with double modulation. So this is not a stationary signal. But frequently the speech is regarded like a sequence of stationary signals. Dividing the speech signal into a sequence of stationary signals, each of them having duration inferior to 25 ms, a sequence of stationary signals is obtained. On each segment the speech model can be of the form:

    tAtx in

    iis cos)(

    1 (2)

    This decomposition is very similar with the decomposition of the signal xs t into a cosine packet. The energy of the signal xs(t) can be computed using the following relation.

    n

    liix AE

    2|| (3)

    The Discrete Cosine Transform

    The most common DCT [11] definition of a 1-D sequence of length N is

    1

    0 2)12(

    cos)(N

    xii N

    ixxfC , (4)

    for i = 0,1,2,,N 1.

    Where

    02

    01

    iforN

    iforN

    i (5)

    It is clear from (1) that for i =0, 1

    0

    )(1)0(N

    x

    xfN

    iC

    Thus, the first transform coefficient is the average value of the sample sequence. In literature, this value is referred to as the DC Coefficient. All other transform coefficients are called the AC Coefficients.

    III. PROPOSED ALGORITHM

    The proposed adaptive algorithm for speech compression using Cosine Packet Transform is shown in Fig 1. The speech signal to be compressed is converted in to packets with finite duration. The Discrete Cosine Transform is applied to each packet and transformed coefficients are computed. The

    coefficients are extracted and fed into the adaptive threshold detector to nullify the inferior coefficient for better compression.

    Selection of best packets

    The main reason to choose the Packet Cosine transform is cost functional used for the best packet. This transform is an adaptive one. The result of its utilization in a given application can be optimize using the best packet selection procedure. This is a very efficient procedure which is able to enhance very much quality of a given signal processing method. There are some cost functions that can be minimized for the selection of the best cosine packet. The most used is the entropy but its utilization do not realizes the maximization of the compression rate. The optimal cost functional for compression is that realizing the minimization of the number of coefficients superior to a given threshold, t, Ci. Using this cost functional, Ci coefficients superior to the threshold t are obtained. This is a minimal number because it was obtained using the appropriate cost functional for the selection of the best packet. This is the reason why this cost functional realizes the maximization of the compression rate. Increasing the threshold value t, the number Ci decreases and the compression rate increases. Hence, the threshold detector must be an adaptive one. Another parameter of the DCPT who must be considered for the optimization of the compression is its number of iterations.

    Fig 1. Flow diagram for the proposed adaptive algorithm

    Packet Decomposition

    Computation of DCT

    Extracting the coefficients (Ci)

    Adaptive Threshold Detector

    Compressed Speech Signal

    Input Speech signal to be compressed

    Authorized licensed use limited to: Jawaharlal Nehru Technological University. Downloaded on November 17, 2008 at 06:25 from IEEE Xplore. Restrictions apply.

  • International Conference on Intelligent and Advanced Systems 2007

    1170 ~

    Adaptive Threshold Detector

    One of the most important processes of the proposed compression algorithm is the threshold detector. The main role of this process is to nullify all the coefficients obtained from the Cosine Packet Transform smaller to a threshold value. This is in fact the compression mechanism. This process is an adaptive system, which automatically choose the threshold value depending upon the transform coefficient value and repeat the process for a certain condition.

    Let us assume that the distortion parameter of a compression system is a, a Ci

    Ci= CiCi= 0

    Compute the Energy (Ex)

    Compute the New Threshold t = t + 0.1

    STOP

    YES

    YES

    Authorized licensed use limited to: Jawaharlal Nehru Technological University. Downloaded on November 17, 2008 at 06:25 from IEEE Xplore. Restrictions apply.

  • International Conference on Intelligent and Advanced Systems 2007

    ~ 1171

    new threshold value otherwise the compression process is stopped.

    Fig 3. Speech Signal Sample

    For 20 different speech signals, compression is performed using Discrete Cosine Transform, Discrete Wavelet Transform, Wavelet Packet Transform and the proposed adaptive algorithm. The compression ratios achieved through these methods are tabulated for various speech signal sample.

    TABLE I COMPARISON OF COMPRESSION RATIO

    Speech Signal Sample

    DWT DCT WPT Proposed Adaptive algorithm

    1. 6.1229 6.1421 11.8444 11.7985 2. 6.1462 6.1421 12.1766 12.2632 3. 6.1462 6.1421 12.4397 11.1433 4. 6.1473 6.1421 12.3433 12.8633 5. 6.1452 6.1421 42.1520 45.8856 6. 6.1482 6.1421 51.5191 55.7188 7. 6.1473 6.1421 23.8063 23.9820 8. 6.1482 6.1421 40.9006 43.0466 9. 6.1482 6.1421 26.5968 28.5052 10. 6.1482 6.1421 35.1952 36.0922 11. 6.1479 6.1421 19.9917 20.7104 12. 6.1337 6.1421 21.5817 21.7237 13. 6.1477 6.1421 13.8164 13.9029 14. 6.1272 6.1421 30.9609 31.1392 15. 6.1461 6.1421 15.5718 15.8481 16. 6.1468 6.1421 30.4582 31.6369 17. 6.1461 6.1421 22.4461 23.2086 18. 6.1482 6.1421 29.2490 30.8083 19. 6.1452 6.1421 26.8928 27.1709 20. 6.1443 6.1421 60.5990 68.0507

    The Table I shows the comparison of compression ratio for various methods. Analyzing the Table, the good performance of the proposed adaptive algorithm can be observed. The smallest compression rate, 11.1433, was obtained on the 3rdsample and the better compression rate, 68.0507, was obtained

    on the 20th sample. The proposed algorithm gives the better compression ratio for most he the speech samples. The comparison of compression ratio for speech signal sample from 1 to 10 and from 11 to 20 is plotted as shown in Fig 4 and 5 respectively for easy understanding.

    Comparison of Compression ratio

    0

    10

    20

    30

    40

    50

    60

    1 2 3 4 5 6 7 8 9 10Speech Signal Sample

    Com

    pres

    sion

    Rat

    io

    DWT

    DCT

    WPT

    ProposedAlgorithm

    Fig 4. Comparison of Compression ratio (Speech signal sample 1-10)

    Comparison of Compression Ratio

    0

    10

    20

    30

    40

    50

    60

    70

    1 2 3 4 5 6 7 8 9 10Speech Signal Sample

    Com

    pres

    sion

    Rat

    io

    DWT

    DCT

    WPT

    ProposedAlgorithm

    Fig 5. Comparison of Compression ratio (Speech signal sample 11-20

    The analysis from the above figures show that out of 20 signal sample only 2 samples have a less compression ratio as compared with WPT method and high as compared with other two methods. The mean compression ratio for all the methods are computed and tabulated in Table II.

    TABLE II MEAN COMPRESSION RATIO

    DWT DCT WPT Proposed Adaptive algorithm

    6.144 6.142 27.027 28.275

    Authorized licensed use limited to: Jawaharlal Nehru Technological University. Downloaded on November 17, 2008 at 06:25 from IEEE Xplore. Restrictions apply.

  • International Conference on Intelligent and Advanced Systems 2007

    1172 ~

    The analysis form Table II shows that the mean compression ratio for 20 samples was achieved using the proposed adaptive algorithm is 28.275. This is a sufficiently high value, taking into account the fact that any lossless compression method was not used.

    V. CONCLUSION

    A new compression method based on adaptive threshold detector is proposed and tested. The simulated results show that the proposed algorithm gives the better compression ratio as compared with other methods. Using this method, a mean compression rate of 28.275, was obtained in the simulation report. This value is superior to mean compression rate, of other methods. Using fast DCT algorithm, the proposed method can be implemented on a Digital Signal Processor. The proposed system is a good alternative to the speech compression systems based on the linear prediction approaches.

    REFERENCES

    [1]. R. W. Yeung, A First Course in Information Theory, New York: Kluwer Academic/Plenum Publishers, 2002.

    [2]. A.Gersho, Advances in Speech and Video Compressions, Proceedings of the IEEE, vol. 82, pp. 900-918, June 1994.

    [3]. J.L.Flanagaran, M.R.Schroeder, B.S.Atal, R.E.Crocherie, N.S.Jayant and J.M.Tribolet, Speech Coding, IEEE Transactions on Communications, vol. 27, pp.710-737, April 1979.

    [4]. P.Noll, Wideband Speech and Audio Coding, IEEE Communications Magazine, pp. 34-44, Nov. 1993.

    [5]. K. Sayood and J. C. Borkenhagen, Use of residual redundancy in the design of joint source/channel coders, IEEE Transactions on Communications, 39(6):838-846, June 1991.

    [6]. Edler, B., Coding of Audio Signals with Overlapping Block Transform and Adaptive Window Functions, (in German), Frequenz, vol.43, pp.252-256, 1989.

    [7]. Q. Memon, T. Kasparis, Transform Coding of Signals Using Approximate Trigonometric Expansions. Journal of Electronic Imaging, Vol. 6, No. 4, October 1997, pp. 494-503.

    [8]. C. E. Shannon, .A mathematical theory of communications,. Bell System Technical Journal, vol. 27, pp. 379.423, 623.656, 1948.

    [9]. A. N. Kolmogorov, .On the Shannon theory of information transmission in the case of continuous signals,. Trans. IRE, vol. IT-2, pp. 102.108, 1956.

    [10]. D. L. Donoho, M. Vetterli, R. A. Devore, and I. Daubechies, .Data compression and harmonic analysis,. IEEE Trans. Inf. Theory, vol. 44, no. 6, pp. 2435.2476, 1998.

    [11]. N. Ahmed, T. Natarajan, and K. R. Rao, Discrete cosine transform, IEEE Transactions on Computers, vol. C-32, pp. 90-93, Jan. 1974.

    Authorized licensed use limited to: Jawaharlal Nehru Technological University. Downloaded on November 17, 2008 at 06:25 from IEEE Xplore. Restrictions apply.